Several folks have emailed me (and my own brother took me to task) regarding my article Pythagoras and the White Sox on The Hardball Times and asked why I didn’t include a measure of how well the Pythagorean method predicted a team’s final number of wins given the games that they had already won. In other words, use the Pythagorean method to predict the number of games a team will win in the remainder of the season and then add that to the number they had already won. That result could then be compared with the other methods. After all, a team already has those wins in the bank and so they are in economic terms a “sunk cost”. Quite frankly, I missed this entirely but it makes perfect sense. So I did in fact do the calculations and here they are:
Pyth Actual Pyth + Actual
Avg G AvgE StdDev AvgE StdDev AvgE StdDev
10-Apr 5 18.2 11.9 25.5 18.0 18.2 12.0
15-Apr 9 15.8 14.5 17.8 14.6 15.5 14.5
30-Apr 22 11.5 9.3 12.9 7.1 11.2 9.0
30-May 49 8.0 5.6 7.3 5.6 7.5 5.0
30-Jun 76 6.7 5.0 6.5 4.8 6.3 4.5
30-Jul 102 5.1 3.3 3.8 3.0 4.0 2.7
30-Aug 130 3.6 3.0 2.3 1.8 2.2 1.8
5-Oct 162 3.1 2.8 0.0 0.0 0.0 0.0
This table now includes the Pyth + Actual columns that show how well the Pythagorean method does when incorporating the current number of wins. As you can see the average error and standard deviation are a bit lower than using the actual winning percentage through April and then the average error basically tracks with the actuals all the way through the end of August while the standard deviation remains a bit lower. I was a bit surprised that even as late as August 30th the Pythagorean + Actual method was slightly more accurate (2.3 to 2.2 in average error). As a result, I wouldn’t be hesitant to use it at any point in the season.
I must also here admit to an error in the original article. The fourth table I showed that included the correlation coefficients for the Pythagorean wining percentage and the actual winning percentage computed against the final winning percentage was in error. Here is the corrected table along with a third column for the Pyth + Actual.
AvgG PythW% ActualW% Pyth+ActualW%
10-Apr 5 0.414 0.270 0.411
15-Apr 9 0.300 0.347 0.306
30-Apr 22 0.530 0.649 0.559
30-May 49 0.719 0.771 0.757
30-Jun 76 0.780 0.797 0.812
30-Jul 102 0.890 0.930 0.934
30-Aug 130 0.937 0.975 0.976
5-Oct 162 0.951 1.000 1.000
Of course, this clears up my surprise in the original article that the PythW% correlations were so strong (all around .8 or higher). As might be expected the actual winning percentage does a better all the way through of predicting a teams relative rankings and hence number of wins. However, using the combination of Pythagorean winning percentage and actual wins does a better job through mid June until the end of the season. That conclusion validates the results of the previous table and means that using the Pyth+Actual method is a valid way of predicting season outcomes at anytime.