FREE hit counter and Internet traffic statistics from freestats.com

Wednesday, November 03, 2004

Actuaries and Sabermetrics

I was alerted to an article titled "Stat of the Art: The Actuarial Game of Baseball" published in a journal for actuaries, Contingencies May/June 2004. The article itself is just a recapitulation of Moneyball, Billy Beane, and Bill James with the Red Sox and is old news to anyone who's been following sabermetric developments. Two factual errors I noticed on skimming the article.

  1. It says Bill James is from "Lawrenceville" Kansas instead of Lawrence
  2. It says Bill James invented On-base percentage. A precursor to OBA was actually developed and then dropped as early as 1879 but OBA became more recognized in the 1950s.

What is more interesting are the eight proposals for "new" statistics from readers of Contingencies that Bill James reviews.

In a nutshell these are:

  • The Reliever Effectiveness Ratio by Damian Birnstihl. This is simply a recalculation of ERA by assigning a pitcher a half run for each runner he let on that scores and a half run for each runner he lets in after he comes in as a reliever. This is a simpler version of what Ari Kaplan did a decade ago as I blogged about here.
  • Rearranging the Starters by Aryeh Bak. This is a study that concludes that rearranging your starting pitchers based on the opponent's starters could gain a couple extra wins during the season. James points out that this question has been studied before by Dallas Adams and Tom Tippett and that there are so many assumptions you have to make that in practice this doesn't work. To me what is more important is getting your best pitchers on the mound more frequently (hence the four-man rotation), not working matchups.
  • A New Wrinkle by Rod Keefer. This is a "new' stat called "Run Production Index" (RPI) that takes runs produced (R+RBI-HR) and divides it by at bats. James points out that runs produced has been around since the 1950s and then goes into its flaws. Keefer also proposes calculating "bases advanced per at bat" (BAAB) or how many bases each offensive event resulted in. To me this makes some sense but is awfully dependent on the other hitters in the lineup.
  • The Three-Year-Ago Correlation by Paul Conlin. Conlin said that he found that team winning percentages were more highly correlated with results from three years ago than from any previous year. James responds with a study that refutes it and shows that the average change in wins from one year to the next is around 10 and then steadily increases each prior year.
  • Starting Pitchers and Relief Pitchers by Mark Seliber. A spreadsheet with the same stat as RPI as well as new stats for starters and relievers. Nothing new or interesting here.
  • The Efficiency Rating by Myron Kraynyk. This is a more interesting attempt that uses the same idea as BAAB but also considers the total possible bases available in each plate appearance. These are summed over all plate appearances and divided to yield an Efficiency Rating. No data is provided however and it suffers from being very context dependent.
  • The Base Advancement Percentage by Richard T. Newell Jr. This stat is essentially the same as the Efficiency Rating.
  • The Ultimate Baseball Statistic by Spencer M. Gluck. This is essentially another version of Player Win Averages (PWA) developed by the Mills brothers in the late 1960s. James isn't sold on these techniques and says that systems like this that take into account everything end up producing nothing of value since there are too many unknowns that have to be glossed over. Another way of saying it is that a good baseball analyst contributes to the discussion by answering specific questions that are actionable. In summary James says:

"Good analysis never begins with statistics. Good analysis always begins with a specific question, and a question which is of interest to baseball people, whether they are actuaries or artists or aging scouts. Proposing a system which instantly evaluates everything that every player does is analogous to fixing insurance rates for drivers by attaching a box of sensors to the hood of every automobile and keeping track of how often every driver does something dangerous, and calculating exactly how dangerous that was. It’s not the real world. It’s not practical, and it’s not useful. Maybe, in 50 years, it will be practical or it will be useful, but it’s not now. Our general knowledge is limited by our specific knowledge. Our ability to have an impact on the discussion cannot be larger than our ability to find a question which has an actual answer."

What also interested me about these "new" statistics are of course that they're not new and that even in this small sample four of the eight people had an idea that was the same or very similar to one of the others. More to the point, these are the same kinds of ideas that baseball analysts have had going back to F.C. Lane in the 1920s, George Lindsey in the 1950s, Earnshaw Cook, the Mills brothers, Pete Palmer and on and on. Once again this seems to call out for a place where this kind of information is collected.

1 comment:

Anonymous said...

Did Bill James correlate world series victory with various stats, in particular team triples?