FREE hit counter and Internet traffic statistics from

Sunday, January 15, 2006

OPS as a Run Estimator

Thanks to everyone who emailed me regarding my article on THT, "Run Estimation for the Masses". A couple of comments.

First, I noted in the article that:

"some analysts have noted, as discussed by Michael Lewis in Moneyball, that each point of On-Base Percentage is more valuable than each point of Slugging Percentage. How much more has been the topic of some discussion over the years, but a multiplier of 1.8 has been suggested. This turns out to be the value that results in the maximum correlation coefficient."

That is indeed the case. Using the 1.8 weighting produced a correlation coefficient of .959417 that was even higher than BRA and all but BsR, XRR, RC, and XR. I forgot to include a link to an post I wrote awhile back on this topic titled DePodesta and OPS that explores some of the recent research that has been done on the topic.

It turns out that the November 2005 issue of SABR's By The Numbers also has two more articles on this topic, one by Mark Pankin and the other by Donald A. Coffin and Bruce Cowgill. In Pankin's article he uses two different approaches to calculate the "marginal value ratio" or MVR of OBP to SLUG and finds that both result in a value of around 2.0 depending on the team and league context. Coffin and Cowgill used regression analysis and came up with a value of 1.90 using data from 1987-2004 with a higher value in the NL (2.03) than in the AL (1.72). What was interesting, however, is that their values differed wildly from year to year with a low value of .37 in 1990 to a high value of 4.71 in 2000. They chalk this up to small sample sizes for individual seasons.

Some have argued that perhaps DePodesta used an MVR of 3.0 simply because OBP is calcuated on a scale of 1.0 and SLUG is on a scale of 4.0 on the basis that a team of players who had an OBP of 1.000 would score an infinite number of runs while a team of players with a SLUG of 1.000 would still make outs and therefore score fewer runs. I doubt this was the case and think that perhaps DePodesta's conclusion stemmed from creating a model (perhaps using regression analysis) using a smaller sample size, perhaps from a season like 2001, where the value using regression analysis is 3.55 as calculated by Coffin and Cowgill.

A second point of interest came from reader Moshe Koppel that offered another OPS' type formula that does a better job of measuring run production per out made rather than per plate appearance. The formula is:

OPS'' = OPS/(1-OBP)

Using this formula against the 2000-2004 dataset yields a correlation coefficient of .959129, a value higher than OPS but just under the OPS' I used in the article.

No comments: