FREE hit counter and Internet traffic statistics from freestats.com

Tuesday, June 28, 2005

More Pythagoras

Bruce Cowgill, a member of the SABR Statistical Analysis Committee was kind enough to send me another formula used to predict a team's final winning percentage given their actual winning percentage at a point during the season. I don't have the original site where it was posted (by an analyst with the name or initials DBOZ) but here is the formula:

Final W% = (0.5 x (1 - GamesPlayed / 162)^2.25) + (CurrentWin% x (1 - (1 - GamesPlayed / 162)^2.25))

I applied this formula to the data I had for 2004 and added yet another column to the two tables I posted in my previous post.


Pyth Actual Pyth + Actual DBOZ
Avg G AvgE StdDev AvgE StdDev AvgE StdDev AvgE StdDev
10-Apr 5 18.2 11.9 25.5 18.0 18.2 12.0 11.1 6.6
15-Apr 9 15.8 14.5 17.8 14.6 15.5 14.5 11.0 6.0
30-Apr 22 11.5 9.3 12.9 7.1 11.2 9.0 8.6 6.2
30-May 49 8.0 5.6 7.3 5.6 7.5 5.0 7.2 5.1
30-Jun 76 6.7 5.0 6.5 4.8 6.3 4.5 6.6 4.8
30-Jul 102 5.1 3.3 3.8 3.0 4.0 2.7 4.1 3.1
30-Aug 130 3.6 3.0 2.3 1.8 2.2 1.8 2.3 1.8
5-Oct 162 3.1 2.8 0.0 0.0 0.0 0.0 0.0 0.0

Because this formula is a "regress to the mean" formula much like using the Pyth+Actual method, the average error ends in 0 at 162 games. It matches pretty closely with Pyth+Actual but does a better job very early in the season. In looking at the raw data, the reason is that it tends to lump everybody together, thereby reducing the standard error but reducing the correlation coefficient as well.

AvgG PythW% ActualW% Pyth+ActualW% DBOZ
10-Apr 5 0.414 0.270 0.411 0.255
15-Apr 9 0.300 0.347 0.306 0.348
30-Apr 22 0.530 0.649 0.559 0.641
30-May 49 0.719 0.771 0.757 0.771
30-Jun 76 0.780 0.797 0.812 0.799
30-Jul 102 0.890 0.930 0.934 0.929
30-Aug 130 0.937 0.975 0.976 0.976
5-Oct 162 0.951 1.000 1.000 1.000

No comments: