It’s The Boat Race on Sunday, the historic annual competition between the best VIII of both Cambridge and Oxford, for no prize whatsoever. I took a look at the results on the Wikipedia page to see if there are any good predictors for the result.
TL;DR : the result of the second boat race is normally the same as the result of the first boat race, but isn’t significant. If a university has won the last three races, they’re likely to win again, and that is significant.
There’s not a lot of data easily available for previous races. Factors like crew and cox weight might make a difference, or a combination of wind and starting station. One piece of data that is available, though, is the result of the race of the second boats. This happens on the same day, over the same course, just a few hours before the first boats race. The results of the race are available on the same Wikipedia page as the results of the first boat race, and there have been 49 so far.
Including the 2012 race, 29 of the 49 second boat races have been won by the same university that won the first boat race. Of course, we don’t think that victories in the second boat races cause victories in the first boat race, but that both are caused by the same thing: a strong boat club.
However, if we take a null hypothesis that there is no connection between the results of the second race and the first race, 29 out of 49 isn’t a huge doesn’t look very significant. Using Excel’s BINOM.DIST function (BINOM.DIST(29,49,0.5, TRUE)), we get a result of 92%: interesting, but not really significant. If the pattern continues for a few more years, then it will tip into significance. If you’re a Bayesian, though, I guess you can make a bet if the bookies don’t change the odds after the second boat race!
Eyeballing the first boat results, one pattern does stand out: if one university has won the last few races, they’re likely to win the next one as well. That is, the results are ‘streaky’. I looked at the results from 1856 onwards, when the race became annual, and only counted results when the race had been rowed for the previous three years. That leaves 141 races. In 62 of those races, the same university had won the race for the previous three years. In those cases, they won again 43 out of 62 times. Using the same binomial test, with the same Excel function, we get a significance of 99.7%, more than enough to pass most significance tests. That is, if the result of the last three boat races was unrelated to the the result of this race, there’s only a 0.3% chance that there would be at least 43 wins for the winner of the last three races.
That does not mean that the winner of the last three races has a 99.7% chance of winning! The best estimate of the chance of winning, if you won the last three races, is still 43 / 65, or about 63%.