FREE hit counter and Internet traffic statistics from

Tuesday, April 20, 2004

Sabermetrics 101

Since I've written a variety of posts about sabermetrics (the named derived from the acronym of the Society for American Baseball Research (SABR) to which I proudly belong) I thought it would be useful for those not already indoctrinated in the concepts to lay out the fundamental principals, axioms, or truths that sabermetrics has come to mean. Bill James broadly defined sabermetrics as "the search for objective knowledge about baseball." If you have others or disagree with these please email me. You might also be interested in the Sabermetric Manifesto which explains some of the axoims listed here in more detail.

These include:

  • Because of the limitations and biases of human thought significant difference in the contributions of players towards winning baseball games is difficult if not impossible to judge outside of the use of statistics. Fortunately, baseball events can be sufficiently individualized, counted, aggregated, and analyzed in large enough sample sizes to overcome these limitations

  • Reverse of the first axiom: Decisions in baseball are often made on the basis of sample sizes that are too small to be predictive. Example: An otherwise poor hitter is 3 for 5 against a certain pitcher over the last 3 years and so the manager elects to pinch hit him

  • The number of games a team will win is able to be predicted by the ratio of the number of runs they score to those they allow. This is known as the Pythagorean formula

  • Teams that win more games than they were expected to win (see above) generally regress the next season

  • The goal of a batter is to help his team score runs, the goal of a defensive player is to prevent runs. Therefore statistics that do not directly measure run production (e.g. batting average) or run prevention (pitcher's wins) are less meaningful than those that do. This is why OPS (on base average + slugging percentage) is a more accurate way of measuring an offensive player than batting average

  • The number of runs an offensive player contributes can be measured more accurately by non-traditional techniques by either assigning values to the various baseball events and adding up the results (Linear Weights) or multiplying aspects of run production including the ability to get on base and to move runners (Runs Created)

  • Because baseball is a game of limited opportunity (27 outs to a team in a normal game), the value of an offensive player consists not only in how many runs he produces but in how few outs he consumes while doing so

  • Both hitters and pitchers peak at age 27 and decline more quickly than is commonly thought. This should impact how players are scouted, developed, and paid. For example, many players by the time they reach free agency are already past their peak performance and so can be expected to decline.

  • There is a replacement level for major league performance that is quantifiable and that at which it makes little sense to pay a player more than league minimum. See Value Over Replacement Player (VORP)

  • The above axoim leads to the corollary that some 90% of the players at the major league level are replaceable by lower priced and often younger talent. This is because the distribution of players in the minor and major leagues form the right tail of a normal distribution. Therefore there will always be a large number of players who are able to replace marginal major leaguers.

  • High school pitchers are a very risky investment primarily because over time injuries tend to winnow the pool of prospects that actually make it to the major leagues. College pitchers are a better investment since they are being selected from an already smaller pool

  • High strikeout pitchers have a much higher ceiling in terms of future wins than do low strikeout pitchers of the same age and ability. See this post.

  • Errors and therefore fielding percentage are an inadequate way of measuring fielders because of the subjective nature of the decisions and because they only record failures and thus fail to take into account the fact that good fielders cover more ground and therefore record more outs. See Range Factor and Defensive Average.

  • When evaluating baseball statistics the context, including the ballpark at which the player plays and the run environment (meaning the average number of runs scored in the league for that year) must be taken into account. Not contextualizing statistics hides weaknesses (see Castillo, Vinny; see also Park Factors.

  • The concept of the closer as defined by the Save statistic and as far as management is influenced by the statistic, is not a true measure of worth for a relief pitcher

  • To a large degree and for most pitchers there is little evidence they have the ability to control the percentage of balls put into play that turn into outs. This is known as Defense Indepedant Pitching Statistics (DIPS)

  • Defensive ability is generally overrated because the differences in run prevention between the best and worst fielders are not as large as the differences in run production between the best and worst hitters

  • Hitters differ in their ability to control the strike zone and therefore get on base via walks. Because of the bias against walks as appearing passive, walks have traditionally been undervalued

  • The stolen base is a tactical rather than a general purpose weapon and therefore decreases in value as the run environment expands

  • In order to be beneficial a base stealer must be successful on average two thirds of the time although the percentage decreases as the score tightens and in later parts of the game

  • The sacrifice bunt except when used with the weakest hitters does not produce positive offensive results

  • The hit and run is akin to the sacrifice bunt but entails more risk with only a slightly higher reward

  • There is no evidence for a general or sustained ability to hit in the "clutch". The same applies to most other "splits" including home/road (except a general bias for all players), month by month, and turf vs. grass

  • There is ample evidence that platoon differences are real and very large

  • Defensive positions can be arranged on a spectrum of least to most demanding i.e. [ DH - 1B - LF - RF - 3B - CF - 2B - SS - C ]. Players generally move from right to left on this spectrum throughout their careers. Shifts in the other direction are rare and seldom work.

  • The most important pitch is strike 1 and the most important out is out 1. That is, the chance of a hitter getting on base lessens rapidly as the pitcher stays ahead in the count and the odds and number of runs a team scores in an inning is dramatically decreased by retiring the first batter in an inning

  • Performance at the major league level can be predicted by performance at the minor league level and to a lesser degree in other leagues including college, the Japanese, and Mexican leagues

  • Diving into first base is pointless (ok, ok, this is not a sabermetric conclusion but I had to get it in here)