The following is the second article in my series on the sabermetric history of run estimation. This article covers Pete Palmer's Batting Runs, a component of the Linear Weights system.
Batting Runs, a linear run estimator, was developed by Pete Palmer in the 1970s and was introduced as a part of his Linear Weights (LWTS) system to the world in his and John Thorn's 1984 book The Hidden Game of Baseball, like Bill Jame's Baseball Abstracts, one of the preeminent documents in the history of sabermetrics. Palmer went on to apply his linear weights system to defense and pitching and derive his Total Player Rating (TPR) system that was tracked in Total Baseball and continues as Batter-Fielder Wins (BFR) and Pitcher Wins (PW) in the 2004 edition of The Baseball Encyclopedia.
However, the history of Batting Runs and Linear Weights actually goes back much further.
As documented by Alan Schwarz in his excellent book The Numbers Game, F.C. Lane, the editor of the Baseball Magazine from 1912-1937 was actually the pioneer of linear weights when he observed that batting average was an inadequate way of measuring the contribution individual players make to winning baseball games by remarking in 1916,
"Would a system that placed nickels, dimes, quarters, 50-cent pieces on the same basis be much of a system whereby to compute a man's financial resources? And yet it is precisely such a loose, inaccurate system which obtains in baseball..."
So Lane took it upon himself to correct the situation and kept track of the results of 1,000 hits and their results in order to assign them coefficients to use in an equation he developed. The simple equation was:
Total Run Value = (1B*a)+(2B*b)+(3B*c)+(HR*d)
The values for a,b,c, and d he assigned were .30, .60, .90, and 1.15. The core of Lane's observations of the 1,000 hits being that the hits were not only valuable for the obviously different number of bases gained by each, but there was also a component of advancement value that contributed to run creation. Later Lane also assigned a value of .164 to walks, a value now recognized as too low by half but revolutionary for its time by crediting a walk on the batter's part as valuable at all.
It must also be remembered that Lane's innovation came in a time when batting average, made official way back in 1876, was the only way most people had ever evaluated offensive players. It is true that Henry Chadwick developed a stat in the 1860s he called "Total Bases Per Game", which was slugging perentage with a different denominator, but it didn't really catch on and slugging percentage was not made official in the National League until 1923 and the American League until 1946.
Lane used his formula to compare Brooklyn firstbaseman and former batting champion Jake Daubert to Phillie's slugger Gavvy Cravath, who had hit 24 homeruns in 1915. Not suprisingly, Lane's analysis showed that Cravath was the more valuable player with a Total Run Value of 79 versus 62 for Daubert.
Lane went on to adjust his formula and eventually settled on the following:
Total Run Value = (1B*.457)+(2B*.786)+(3B*1.15)+(HR*1.55)+(BB*.164)
Unfortunately, Lane's pioneering work was all but forgotten soon after.
In the mid 1950s George Lindsey, a military officer, listened to and watched around 400 baseball games and from what he learned he began submitting articles to the statistical journal Operations Research on various aspects of baseball strategy. With the help of his retired father their combined scoring efforts produced the 1963 article "An Investigation of Strategies in Baseball", another of sabermetric's founding documents. In that article Lindsey produced the first Run Expectancy table, a table that showed how many runs were expected to score from any of the 24 base/out combinations (I use a similar table in my Big League Pocket Manager application for the Pocket PC to calculate the break-even probabilities for various strategies).
O/B 0 1 2 3 1,2 1,3 2,3 Full
0 0.46 0.81 1.19 1.39 1.47 1.94 1.96 2.22
1 0.24 0.50 0.67 0.98 0.94 1.12 1.56 1.64
2 0.10 0.22 0.30 0.36 0.40 0.53 0.69 0.82
From here it was a simple step to calculate the run expectancy before an offensive event occurred, the run expectancy after, and along with the typical advancement on singles and doubles and the frequency of the base/out combinations (which Lindsey also tracked), compute the run values or weights for each offensive event. Lindsey came up with .41 for singles, .82 for doubles, 1.06 for triples, and 1.42 for homeruns - very similar to what Lane had done 40 years earlier.
Interestingly, Lindsey like Lane then used his system to compare a singles hitter, in this case the Tiger's Harvey Kuenn who had won the 1959 AL batting championship hitting .353, with a homerun hitter, the Indians Rocky Colavito who had hit 42 homeruns. This comparison had a bit more riding on it as the two were traded for each other. Colavito came out on top 114.5 to 112.6.
Using Run Expectancy and advancement tables like those calculated by Lindsey is only one way of calculating run values for various offensive events. And this brings us back to Batting Runs and Pete Palmer.
In 1978 Pete Palmer ran a computer simulation of "all major-league games played since 1901." From that simulation Palmer tabulated the frequencies of the offensive events and by assigning advancement values based on observation of 100 World Series games was able to calculate the expected run values for each event. The formula he devised was:
Batting Runs = (.46*1B)+(.80*2B)+(1.02*3B)+(1.40*HR)+(.33*(BB+HBP))+(.30*SB)+(-.60*CS)+(-.25*(AB-H))-(.50*OOB)
What is interesting about this formula first is that it includes hit by pitch (HBP) and stolen bases and of course that the weights are similar to those calculated by both Lane and Lindsey. It's real import, however, is that for the first time the number of outs (AB-H, CS, and OOB or "outs on base") the player is responsible for is included and given a coefficient. Like other offensive events, outs have a run value, it is simply the case that the run value is negative since outs decrease the opportunity for scoring runs by either ending an inning or moving the team in that direction. Typically, OOB is difficult if not impossible to find for individuals without play-by-play data but for teams is simple to calculate as OOB = H+BB+HBP-LOB-R-CS.
Stolen bases and caught stealing can also be taken out of the Batting Runs formula and be calculated separately as Stolen Base Runs (SBR) or Base Stealing Runs (BSR) as (.30*SB)-(.60*CS). Originally, the value of the stolen base and caught stealing was set at around .20 and -.35 respectively. However, Palmer was convinced by Dave Smith of Retrosheet to increase both the positive and negative impacts of the stolen base on the basis that they occur in situations where games are more in question. In other words, stolen bases are strategically more important and so have a greater impact on wins and losses. Not many people seem to buy this argument since runs and not wins are what is being calculated. Apparently, Palmer agreed and so in The 2004 Baseball Encyclopedia BSR is simply calculated as (.22*SB)-(.38*CS).
The most important fact about Batting Runs is that because of the inclusion of negative values for outs Batting Runs is a measure of the "net runs produced above average" in a given offensive context. In other words, a Batting Runs value of 55 means that the batter produced 55 runs above what an average batter would have produced given the same opportunities, which means given the same number of outs consumed. Of course, this also means that a player can be assigned negative Batting Runs indicating they performed below average. Batting Runs, therefore cannot be compared with Runs Created without making adjustments. That adjustment is to reduce the value of the out from around -.25 to -.10 or -.09. The basis for this is straightforward. The value of an out (or any offensive event for that matter) can be thought of as the sum of the value the out in moving runners over and the value of ending the inning. Using the run environment of 4.3 runs per game (the average runs per game from 1901-1977) each out is worth -.16 runs in terms of its inning-ending value. Subtracting -.16 from -.25 yields a value of -.09 as the value of the out related to moving runners along. By using -.09 as the value for outs, Batting Runs can be compared to Runs Created.
It's also important to keep in mind that technically the Batting Runs formula shown above is valid only for a given offensive context, namely the 4.3 runs per game of 1901-1977. Palmer and Thorn show in The Hidden Game several sets of weights by period (1901-20, 1921-40, 1941-60, and 1961-77). Fortunately, these values are very similar, something Palmer apparently did not expect, thinking that in the "deadball era" the relative value of a stolen base might be significantly greater and homerun smaller (they were but only very slightly, for example the homerun going from 1.36 in the earliest period to 1.42 in the latest and the stolen base going from .20 to .19).
As a result, Palmer was able to present a single formula and use the value of the out to adjust for differences by era. Some out values for different eras as noted in Curve Ball and The Hidden Game are:
-.24 for 1901-1920
-.30 for 1921-1940
-.27 for 1941-1960
-.25 for 1961-1977
In the modern era Palmer then recommends that a value of -.25 value be used when pitcher's hitting is included (for example in the NL) while a value of -.27 is recommended when the DH is employed since making an out is more costly when the run environment expands as it does when pitchers are not hitting.
Batting Runs has also been adjusted slightly throughout the years using different weights. For example, the formula in the 1989 edition of Total Baseball was:
BR = (.47*1B)+(.78*2B)+(1.09*3B)+(1.40*HR)+(.33*(BB+HBP))+(.30*SB)+(-.60*CS)+(-.25*(AB-H))
And the formula in the 2004 edition of The Baseball Encyclopedia reduces the weights of the extra base hits by including their value into the value for hits since singles are not weighted separately:
BR = (.47*H)+(.38*2B)+(.55*3B)+(.93*HR)+(.33*(BB+HBP))+(.22*SB)+(-.38*CS)-(ABF*(AB-H))
Also included here is ABF, or the "league batting factor". This is essentially a custom "out" value for the league context to ensure that the average batter's Batting Runs equal zero for the given league and year. It is calculated using league totals as:
For example, the ABF in the NL for 2003 was .28 and in the NL for 1968 was .23 since the increased offensive context of 2003 dictates that an out cost a team more potential runs than it did in 1969.
LGF in the calculation of ABF is the league factor designed to increase the number of Batting Runs for leagues deemed inferior to the typical major-league. It equals 1 expect when it is:
Union Association (1884) = .8
Federal League (1914-15) = .9
In reality, the linear weights associated with Batting Runs differ not only by era but also by league and for each league by each individual team and for each team by position in the batting order. In other words, in order to caclulate how many runs an individual player is responsible for it would be necessary to calculate weights for each offensive event that were particular to his team and position in the batting order. However, because of the complexity of making such calculations and because creating custom linear weights at the lower levels reduces their usefulness for comparison across teams, leagues, and eras, most sabermetricians use a single formula and adjust the outs value based on era or league. For an interesting discussion of creating custom linear weights see Tangotiger's site.
Another area for refinement in the era of Barry Bonds is separating the value of a regular base on balls from an intentional walk, and for that matter hit-by-pitch. The general consensus is that a regular walk has a weight around .31 while an intentional walk is around .18 and a HBP slightly more than a regular walk (since walks occur disproportionately with two outs and when first base is empty).
From the beginning adjustments have been made to Batting Runs. The most obvious, and one that Palmer and Thorn discuss in The Hidden Game is to take the batter's home park context into consideration. To do so they first calculate the BPF or Batter's Park Factor. This number is based on the number of runs scored in the park versus the number of runs scored in road games and takes into account the fact that hitters don't have to face their own pitchers. BPF is centered on 1 and so an above average hitter's park will have value slightly above 1 such as 1.04 while a pitcher's park will have a BPF of under one, say .96.
In order then to calculate the Adjusted Batting Runs or ABR the following calculation is used:
ABR = BR-((BPF-1)*RPA*PA/BPF)
Here BR is the unadjusted Batting Runs, RPA is the number of runs per plate appearance for the league, and PA is the plate appearances for the batter. For example, if the player had 55.0 batting runs in 700 plate appearances while playing in a pitcher's park with a BPF of .92 in a league like the 2003 National League where .122 runs were scored per plate appearance, the ABR would be 55-((.92-1)*.122*700/.92) = 58.6.
A second derivative is a conversion from Batting Runs to wins, or Batting Wins. This statistic is based on Palmer's empirical observation that on average a win is purchased at the cost of 10 extra runs. In other words, if a player contributed 10 Adjusted Batting Runs, then he was worth 1 extra win to his team. Of course, the number of runs per win varies with the league context and can be calculated as:
RPW = 10*Sqrt(RPI)
RPI or Runs per Inning here is the runs scored by both teams per inning. So for a league that scores 4.5 runs per game, the two teams combined score 1 run per inning, the square root of which is 1, multiplied times 10 equals 10. As a result, in lean offensive times like the 1968 NL the RPW will be around 8.75 while in good offensive times like the 2003 NL it will be closer to 10.5.
It is then a simple matter then to divide the Adjusted Batting Runs by RPW to get the Batting Wins. The formula used in the 2004 edition of The Baseball Encyclopedia is:
BW = ABR/(10*Sqrt(RPI+(ABR/G/9)))
In this case the runs per inning of the player is added to the runs per inning for the league to take into consideration the increased or decreased offensive context that the player contributes.