Monday, June 27, 2005


Several folks have emailed me (and my own brother took me to task) regarding my article Pythagoras and the White Sox on The Hardball Times and asked why I didn’t include a measure of how well the Pythagorean method predicted a team’s final number of wins given the games that they had already won. In other words, use the Pythagorean method to predict the number of games a team will win in the remainder of the season and then add that to the number they had already won. That result could then be compared with the other methods. After all, a team already has those wins in the bank and so they are in economic terms a “sunk cost”. Quite frankly, I missed this entirely but it makes perfect sense. So I did in fact do the calculations and here they are:

Pyth Actual Pyth + Actual
Avg G AvgE StdDev AvgE StdDev AvgE StdDev
10-Apr 5 18.2 11.9 25.5 18.0 18.2 12.0
15-Apr 9 15.8 14.5 17.8 14.6 15.5 14.5
30-Apr 22 11.5 9.3 12.9 7.1 11.2 9.0
30-May 49 8.0 5.6 7.3 5.6 7.5 5.0
30-Jun 76 6.7 5.0 6.5 4.8 6.3 4.5
30-Jul 102 5.1 3.3 3.8 3.0 4.0 2.7
30-Aug 130 3.6 3.0 2.3 1.8 2.2 1.8
5-Oct 162 3.1 2.8 0.0 0.0 0.0 0.0

This table now includes the Pyth + Actual columns that show how well the Pythagorean method does when incorporating the current number of wins. As you can see the average error and standard deviation are a bit lower than using the actual winning percentage through April and then the average error basically tracks with the actuals all the way through the end of August while the standard deviation remains a bit lower. I was a bit surprised that even as late as August 30th the Pythagorean + Actual method was slightly more accurate (2.3 to 2.2 in average error). As a result, I wouldn’t be hesitant to use it at any point in the season.

I must also here admit to an error in the original article. The fourth table I showed that included the correlation coefficients for the Pythagorean wining percentage and the actual winning percentage computed against the final winning percentage was in error. Here is the corrected table along with a third column for the Pyth + Actual.

AvgG PythW% ActualW% Pyth+ActualW%
10-Apr 5 0.414 0.270 0.411
15-Apr 9 0.300 0.347 0.306
30-Apr 22 0.530 0.649 0.559
30-May 49 0.719 0.771 0.757
30-Jun 76 0.780 0.797 0.812
30-Jul 102 0.890 0.930 0.934
30-Aug 130 0.937 0.975 0.976
5-Oct 162 0.951 1.000 1.000

Of course, this clears up my surprise in the original article that the PythW% correlations were so strong (all around .8 or higher). As might be expected the actual winning percentage does a better all the way through of predicting a teams relative rankings and hence number of wins. However, using the combination of Pythagorean winning percentage and actual wins does a better job through mid June until the end of the season. That conclusion validates the results of the previous table and means that using the Pyth+Actual method is a valid way of predicting season outcomes at anytime.

1 comment:

Sox Fan said...

Sorry Dan, as a frustrated Sox fan I feelk compelled to ask you to check the last three years. The Sox are alot closer to the Twins in pythagorean wins than in actual wins, yet the twins play in the postseason and the Sox watch. The fault of pythagorean wins is the same as the BCS. It rewards teams fro 12-2 wins. None of these runs count the next day.

Yes, one and two run wins vary year to year, but every year somebody gets lucky in addition to being good enough. Marlins, Diamondbacks, Angels etc... all surprises.

There are lies, damn lies and statistics. In the end, the only stat that counts is wins. Other stats are useful for helping GM's to figure out where they need to improve. Saying "We were number one in the league with japanese players on 2nd and less than two out in the 3rd through 5th inning" is interesting, but does not get to the root of the problem. The Sox win because they have great starting pitching, good relief pitching, good defense, and an offense that does whatever it needs to do to score.

BTW, home runs re overrated. Would you rather hit a solo homer where you send one extra guy to the plate and never make the pitcher work, or score off 3 singles where 3 extra hitters see the pitcher, he pitches out of the stretch, throws 20 extra pitches, thinks more, works harder and leaves the game to be replaced by weak middle relief?