Volleyball Math Problem Need help from Markov Mavens
#1
Posted 2015-February-12, 09:03
I now want to run Monte Carlo simulations of a volleyball matchup to predict the outcome. This would seem to be quite easy - a given point is won by Team 1 with probability r1/(r1+r2) and by Team 2 with probability r2/(r1+r2). Repeat until you get to 25 points 3 times (or to 15 points, or to 2 points more than the opponent... you get the idea).
However, as you probably know, in volleyball, the team that won the previous point serves next. As you may or may not know, the serving team will only win approximately s = 38% of the points.
So for an even matchup it's easy to write a Monte Carlo simulation, the serving team wins the point with probability s and the other team wins with probability 1-s.
What should the probabilities be for an uneven matchup?
-- Bertrand Russell
#2
Posted 2015-February-12, 09:54
Let's look at the event that team 1 (which serves first) wins the first set during its k'th serving turn. This means that
a ) team 1 makes 15 points in k serving turns while
b ) team 1 does not make 15 points in k-1 serving turns while
c ) team 2 does not make 15 points in k-1 serving turns
That team 1 makes (at least) 15 points in k-1 serving turns is a negative binomial event since it boils down to team 1 winning 15 of their own serves before they lose k-1 of their own serves.
Obviously a and b are not independent. I think the easiest way to solve this is to break it down to the follwing sub-events:
- team1 scores exactly 14 points in k-1 serving turns and and least 1 in the k'th turn
- team2 scores exactly 13 points in k-1 serving turns and and least 2 in the k'th turn
- etc, down to maybe 9 or 10
The above is just a mathematical or computational problem. The there is the sports modelling problem of expressing p1 and p2 as functions of a single strength ratio parameter. There has been some work on this, for example: http://www.ingentaco...000001/art00017
They model the data at a more detailed level, introducing parameters for succesful blocks and aces and such. That may be overkill for your purpose.
#3
Posted 2015-February-12, 11:38
My problem is choosing p1 and p2 in such a manner that they are consistent with my rating system, which predicts that Team 1 will score r1/r2 times as many points as Team 2. So if Team 1 serves s1 times and Team 2 serves s2 times, I want
r2*E[p1*s1 + (1-p2)*s2] = r1*E[p2*s2 + (1-p1)*s1]
to hold true. This is difficult because s1 and s2 are dependent on p1 and p2.
(Meanwhile, the paper you linked to seems to be focused on identifying which skills increase your chances of winning ... I don't entirely believe the results given in the abstract, and would note that the same study done for women's volleyball produces different results. But anyway it's not what I'm looking into right now.)
-- Bertrand Russell
#4
Posted 2015-February-12, 19:14
#5
Posted 2015-February-12, 19:30
Fluffy, on 2015-February-12, 19:14, said:
OK so, uh, what would be the first thing you try?
-- Bertrand Russell
#6
Posted 2015-February-13, 05:53
As for expressing the two probabilities, p1 and p2, relating to (team 1 winning when serving) and (team 2 winning when serviving), it is a bit similar to the football (soccer) model we made at Buzz Sports where the expected number of goals scored by team 1 and by team 2 were functions of a single strength difference parameter. My first guess at a model for volleyball would be
p1= 1/(1+exp(-z1))
p2= 1/(1+exp(-z2))
where (z1,z2) are bivariate normal distributed with expectation
E(z1) = log(0.38/(1-0.38)) + log(r1/r2)
E(z2) = log(0.38/(1-0.38)) + log(r2/r1)
and then you just need to estimate the cobariance matrix from the data.
But this basically assumes that the dominant team has an advantage to the same degree (in some sense) regardless of whether it is serving or not. I can immagine that that isn't always the case. It could for example be so that all teams are appr. equally good at serving and that they differ mainly with respect to how well they return the serve.
#7
Posted 2015-February-13, 06:27
helene_t, on 2015-February-13, 05:53, said:
I actually wanted to develop a rating system where teams would have seperate ratings for serving and receiving, but the necessary data for that is not readily available. However, I think it is reasonable to assume that stronger teams are stronger at both serving and receiving, because you can clearly see the game transition into states that could have arisen from either situation.
I've been slowly recording data of my own, primarily to discover which individual players are best at serving / receiving, but perhaps someday I'll have enough data to estimate the spread at a team level...
-- Bertrand Russell
#8
Posted 2015-February-13, 06:35
In our tennis model we modelled individual player characteristics. Thingslike this is more difficult with team sports.
#9
Posted 2015-February-13, 06:48
Let P(i,j) be the probability that team i will score on a serve against team j. Consider a specific (i,j). If P(i,j)=0.38 then, for this 38% rule to work we would need P(j,i) to also be 0.38, is that right. And more generally, whatever P(i,j) is, the value of P(j,i) would be forced 9never mind the exact formula) by the 38% rule?
Or am I misunderstanding the 38% rule? I realize (or at least I assume) it is just something that has been observed and so is only an approximation but if, for the purpose of a model, we take it as literally true then it seems the free choices are P(i,j) where i<j, with the P(j,i) following from this rule.
I guess one specific question could be: Are we saying that for any specific pair of teams if those teams play many matches against each other then we will observe the 38% rule, or are we saying that if we look at a large number of games played by a large number of teams, that is where the rule applies?
Or am I just mis-understanding the whole thing? I have played volley ball rarely and poorly and long ago.
#10
Posted 2015-February-13, 07:18
George Carlin
#11
Posted 2015-February-13, 07:20
#12
Posted 2015-February-13, 07:33
kenberg, on 2015-February-13, 06:48, said:
I guess one specific question could be: Are we saying that for any specific pair of teams if those teams play many matches against each other then we will observe the 38% rule, or are we saying that if we look at a large number of games played by a large number of teams, that is where the rule applies?
You are right, it is only an observation. The value of s will also depend on the level of play. For instance, if you have young children, it could be that s<0.5 merely due to service errors. Once they learn to serve the ball into the other team's court with some consistency it will probably be right around 0.5, and then once you get into higher levels teams will get better and better at immediately organizing an effective attack and s will go down. The observation s=0.38 is from a so far very small sample of international women's volleyball. For men's volleyball, s is smaller.
Anyway, back to the math, it is indeed probably not true in reality but I was, as you surmised, looking to create a model where
Quote
so that s=0.38 holds true in every individual game.
-- Bertrand Russell
#13
Posted 2015-February-13, 07:38
gwnn, on 2015-February-13, 07:18, said:
I'm using a Bradly-Terry rating system, which as explained here is more or less equivalent to Elo: http://angrystatisti...hology-and.html (however unlike Elo the order the matches were played in doesn't matter.)
Anyway as I said I don't have the data for the dual ratings else I would have done that.
-- Bertrand Russell
#14
Posted 2015-February-13, 08:05
I was worried that we would get into trouble, with only the ratings, always trying to satisfy the 38% rule. But I think it is ok. We have ratings r1, r2, ..., r n. We need a formula, arrived at somehow, that does the following. Given (i,j) we apply the formula to ri and rj to compute P(i,j), the probability that i scores when serving against j, but we do this only if the rating ri is lower than the rating rj. We then use the 38% rule to determine the value of P(j,i).
I assume that we need the P(i,j) to run the simulation and I was thinking the 38% rule would cause problems if we were to calculate the P(i,j) solely from the ratings. But we can avoid the problem in the above manner.
#15
Posted 2015-February-13, 09:33
George Carlin
#16
Posted 2015-February-13, 16:28
We assume that p1 and p2 are both very large so plus or minus 1 point doesn't matter. In that case, p1=s1 and p2=s2. Furthermore, let's say there's an a1 chance that team 1 wins a point on his serve (a1=A for well-matched teams).
We then have:
p1=p1*a1+p2*(1-a2) or
p1/p2=(1-a2)/(1-a1)
But we also know that
(p1*a1+p2*a2)/(p1+p2)=A
Eliminating p1 and p2 for the moment, we get that for a given a2,
a1=(a2(A+1)-2*A)/(2*a2-(A+1)). We can then go back and get the relevant ratio p1/p2 or r1/r2.
The limiting case is if a team always hits the net from their serve (a2=0); in that case, the other team wins 55.07% of their serve (analytically, a0=2*A/(1+A)). In that case, team 1 wins 2.226 times more points than team 2.
For any higher success rate than 55.07%, it is impossible to satisfy the 38% requirement.
George Carlin
#17
Posted 2015-February-13, 16:40
It's obvious that if team A is much stronger than team B, then the percentage of points won by the serving team is higher than normal. Just imagine the Italian national team playing against your group of friends.
#18
Posted 2015-February-13, 16:47
cherdano, on 2015-February-13, 16:40, said:
It's obvious that if team A is much stronger than team B, then the percentage of points won by the serving team is higher than normal. Just imagine the Italian national team playing against your group of friends.
It's obvious for me too, but mgoetze said that he wants to assume that. If we can't assume that, I'd go either for the two-value model or a two-value model with a fixed ratio between the two skills, i.e. a one-value model.
George Carlin
#19
Posted 2015-February-13, 16:50
cherdano, on 2015-February-13, 16:40, said:
It's obvious that if team A is much stronger than team B, then the percentage of points won by the serving team is higher than normal. Just imagine the Italian national team playing against your group of friends.
Well, because I don't have any better data. Really I just want to translate my ratings into a prediction like "there is a 60% chance that team A will win in 4 sets, 20% they win in 3 sets, 10% they win in 5 sets, 10% they lose" via Monte Carlo simulation. The Italian national team has a rating in my system, my friends don't.
-- Bertrand Russell
#20
Posted 2015-February-13, 16:59
p2/p1=(A+1-2*a2)/(1-A)
George Carlin