This weekend I received my copy of the Baseball Prospectus 2005, which I'm reading in preparation for my trip to spring training in a couple of weeks with Ron. I was pleased to find that on pages 511-519 there was an article entitled "Station to Station: The Expensive Art of Baserunning" by James Click.
In his article Click uses a very similar methodology to my baserunning framework to try and quantify the effects of good and bad baserunning. Like my own framework Click looks at three scenarios: runner on first batter singles, runner on first batter doubles, and runner on second batter singles. However, I noted that he apparently did not consider the case where the runner on first scores on a single, a play which happened 58 times in 2004 in 6,754 opportunities (a little less than 1% of the time).
Before showing the results he first discusses his methodology and the "Fundamental Adjustments" that need to be considered when performing the calculations including outfield defenders, park, outs, and batter (a particular hitter might allow runners to advance more frequently by displaying a tendency to hit the ball harder or to particular locations on the field).
He concludes that adjustments needn't be made for outfield defenders since individual fielders seldom field more than one or two balls each season for a particular baserunner in one of the three scenarios. Therefore any effect of the fielder will tend to even out over the number of opportunities a runner has. This seemed obvious to me when developing my own framework and so like Click I did not investigate it further.
Click does, however, find that the park needs to be taken into account. I hinted at this in my analysis of 2003 and 2004 since the Rockies, who play in the largest outfield in baseball, led the league in both 2003 and 2004 in the number of extra bases gained per opportunity. Click wrote an article on computing these park factors on the Baseball Prospectus web site. Unfortunately, I'm not a subscriber and so can't comment on the article except to say that like park factors calculated for hitting and pitching he uses a three year average and he uses a methodology that looks both at what the visiting teams do in the park and how the home team differs on the road. This seems to me like a solid approach.
Click also takes into account the number of outs in each scenario. As I documented the advancement percentage changes dramatically with the number of outs. However, as we'll see in a moment Click does not base his analysis on incremental bases but rather uses run expectancy tables that include the number of outs. This was one of the two approaches I suggested could be used to quantify baserunning in terms of runs. The other was simply to assign a run value to each incremental base, which I did for ease of calculation.
Finally, Click also discounts the effect of the batter on how many bases baserunners advance by looking for correlations over pairs of seasons. He finds a low correlation and an essentially random distribution of advance rates for baserunners and therefore concludes that batters have little if any influence on the number of bases runners advance on their hits. I was wondering about just this question when developing my framework but came to the same conclusion intuitively. I'm glad to see a more rigorous investigation however.
In my framework I also take into account the handedness of the batter hitting behind the runner as well as the fielder who fielded the ball. When calculating incremental bases this is necessary since a single hit to left field with 1 out for example, advances a runner from first to third about 14% of the time, a single hit to right field does so 39% of the time. Click needn't take into account either of these since they are subsumed into the run expectancy table.
One other difference in Click's approach is that he only looks at scenarios where the base in front of the runner is not occupied (which turns out to be about a quarter of the time). In my framework I included both, however, I'm now persuaded that I should have used only the "empty base" scenarios and so the calculations shown below do so.
So to actually perform the calculation Click calculates the difference in the run expectancy in the initial and final states for each runner in each situation and sums them. He then subtracts from this the number of runs that would be expected given the situations in which the runner found himself. He calls the former Equivalent Base Runs (EqBR) and the latter Player Base Runs (PBR). He finds that Matt Holiday of the Rockies led the league with an EqBR of 5.0 while Rafael Furcal was close behind at 4.9. At the bottom of the spectrum Mike Piazza was a negative 4.7 and A.J. Pierzynski was close behind at -4.4.
To try and reproduce Click's work I used the run expectancy table published in the article to calculate the relative differences as described by Click and plugged them into my software for 2004. In my approach I did not credit the runner if they took only the expected number of bases (for example, second base on a single when the runner is on first). I also calculated the expected number of runs for each situation and subtracted this from the actual number of runs. I call the result Incremental Base Runs (IBR) with the following results. Keep in mind that my calculations do not include an adjustment for park effects.
Reed Johnson TOR 4.97
Vernon Wells TOR 4.92
Johnny Damon BOS 4.45
Rafael Furcal ATL 4.10
Rocco Baldelli TBA 3.71
Brad Wilkerson MON 3.14
Aaron Miles COL 3.08
Scott Rolen SLN 3.07
Edgar Renteria SLN 3.02
Matt Holliday COL 2.99
Of Click's top 10 I have six represented here. On the bottom of the scale I show:
Bill Mueller BOS -5.23
Mike Piazza NYN -4.54
David Ortiz BOS -3.83
Ross Gload CHA -3.79
Manny Ramirez BOS -3.65
Carlos Delgado TOR -3.45
Bill Hall MIL -3.44
Mike Lieberthal PHI -3.40
Jim Thome PHI -3.36
Craig Biggio HOU -3.36
Here I have only three of Click's bottom 10.
Overall, although I wasn't able to fully reproduce Click's data this does validate that the different between the best and worst baserunners in these three scenarios over the course of a season is roughly ten runs or the equivalent of a single win. My failure to reproduce his results is certainly related to his application of park factors and perhaps a different approach to calculating some of the run expectancy values (for example, if a runner scored I credited him with a run expectancy of one plus the new base out situation). In addition, I find that my software does not detect quite as many opportunities for some baserunners as Click shows in his table. For example, Matt Holiday is credited with 39 opportunities by Click while I show him with 37, which may be accounted for by the different sources of our data.
Even so, in the end (and contrary to the title of Click's article) even assuming that the gains were perfectly correlated from season to season (which they aren't as I showed) it probably does not benefit a team much to make baserunning a major factor when making personnel decisions. In other words, this kind of analysis is not all that actionable. However, ceteris paribus (all things being equal), there is a small edge to be gained through this knowledge.
On a side note, looking through my software again while writing this article I found a bug that was more than double crediting opportunities for players that played on two teams in the same season. For example, in my 2004 post I show Carlos Beltran as having 134 opportunities and garnering 226 bases. In reality with no runners on in front of him he had 27 opportunities with the Royals and 24 with the Astros and totaled 86 bases with an IBR of 1.75.