FREE hit counter and Internet traffic statistics from freestats.com

Tuesday, October 05, 2004

A Brief History of Run Estimation, Part I

One of the axioms of sabermetrics listed in my post Sabermetrics 101 is:

"The goal of a batter is to help his team score runs, the goal of a defensive player is to prevent runs. Therefore statistics that do not directly measure run production (e.g. batting average) or run prevention (pitcher's wins) are less meaningful than those that do."

As a result sabermetricians have long been on a quest for the perfect run estimation formula so that an offensive player's contribution can be adequately measured. In this series of posts I'll detail the various formulas and how they've evolved over the years including:

  • Runs Created
  • Batting Runs
  • Estimated Runs Produced
  • Extrapolated Runs
  • Base Runs

Along the way I'll detail some of the derivative formulas used to contextualize and produce rates from the results of these formulas as well as talk about the strengths and weaknesses of each.

As a little background run estimation formulas are first and foremost a counting statistic. In other words, these formulas attempt to estimate or count the number of runs a player is responsible for. They are therefore not rate statistics that are used to compare the rate at which, for example, an offensive player gets hits (batting average), or accumulates bases (slugging percentage).

These formulas can further be classified as linear versus non-linear formulas. A linear formula like Batting Runs or Extrapolated Runs attempts to assign weights to various offensive events and add the weighted values to estimate the number of runs they account for. As a result, linear formulas produce a straight line when plotted for changing values in one of the offensive categories. For example:



Non-linear formulas attempt to model the interaction of offensive events and so will essentially multiply events to produce the estimate. As a result, when graphed the plotted line looks as follows:



In my next post I'll start with perhaps the most well-known run estimator, Runs Created.


No comments: