FREE hit counter and Internet traffic statistics from freestats.com

Wednesday, November 10, 2004

Measuring Baserunning: A Framework

Who says there's an unemployment problem in this country? Just take the five percent unemployed and give them a baseball stat to follow.
--Outfielder Andy Van Slyke

In my previous two posts (here and here) I laid the groundwork for evaluating the baserunning of teams and players using play-by-play data from 2003. In the second post particularly, I showed the percentage of times players take the expected, +1, +2 number of bases in various situations and how often they get thrown out.

The Questions
Now I'm ready to make a first attempt at developing a baserunning framework in order to answer three related questions:

a) What player helped (or hurt) his team the most with his baserunning?
b) What team gained the most from smart or good baserunning in 2003?
c) What is the quantitative difference between good and bad baserunners?


Note that although this is my first attempt I'm putting this in public in order to get some feedback and certainly don't claim that this is the best method to use. I'm sure there are plenty of holes and problems, the two most pressing of which are that the sample sizes for a single season may not be large enough to differentiate ability from luck, and who hits after you has a large say in how many bases you advance. The former may be insurmountable with the limited data set I have although I'll try and correct for the latter as you'll see.

The Framework
The foundation for my baserunning framework is the table discussed in my previous post. You'll remember that it showed how often runners advance in various situations. For example, with a runner on first and nobody out when the batter singles to left field, the odds are:

Typ     +1    +2  OA

84.5% 14.1% 0.6% 0.7%

In other words, 84.5% of the time the batter stops at second, 14.1% of the time he advances to third, and .6% of the time he scores while .7% of the time he is thrown out on the bases. Using this set of percentages one can calculate the average number of bases advanced in this situation by multiplying the percentages by the bases gained. In this case (.841*1)+(.141*2)+(.06*3)-(.07*1) = 1.14. So when this event occurs a typical runner will advance 1.14 bases. Since this is the average across both leagues (I assumed it wouldn't be necessary to separate the leagues since there is significant overlap with interleague play, but more on that later) I call this Expected Bases (EB). The same calculation can then be done for the other 26 scenarios in the table (I did not use the Runner on 2nd Batter Doubles scenario in the calculations that follow since only one runner in all of 2003 was thrown out in that situation - the A's Mark Ellis). When this is done it turns out that the highest number of Expected Bases for any scenario is 1.86 which occurs with a runner on 2nd and 2 outs when the batter singles. The lowest number of Expected Bases is 1.14 for both the scenario given above and the same one but with 1 out.

It should be noted that for the total calculations below I also included singles fielded by other positions and so the actual number of scenarios is greater than 27. I found that shortstops and second baseman, for example, field a significant number of singles and to a lesser extent doubles, and that the typical number of bases advanced is similar to those fielded by outfielders. There were some plays were 0 was recorded as the fielder and so those were not considered.

As you probably anticipated one can then match up the baserunning situations for individual teams and players in order to compare the actual with the Expected Bases in each scenario. For example, Carlos Beltran of the Royals was at first base 9 times in 2003 when a batter singled to left field with 2 outs. In those situations he advanced to third twice and to second the other seven times. As a result he gained 11 bases. With those 9 opportunities he could have been expected to gain 10.39 (1.15*9) bases given the league average. As a result, he's credited with a positive .61 bases for this scenario, which I'm calling Incremental Bases (IB). When calculated for all of Beltran's opportunities in all opportunities we get a matrix like so where R1BD = Runner on 1st, Batter Doubles, Opp is the number of opportunities in each scenario, EB is Expected Bases, and IB is the Incremental Bases gained.

Sit Outs Fielded Opp Bases EB IB
R1BD 0 9 2 4 4.51 -.51
R1BD 1 7 4 11 8.95 2.04
R1BD 1 8 1 2 2.50 -.50
R1BD 2 8 2 6 5.45 .54
R1BS 0 3 1 1 1.02 -.02
R1BS 0 6 1 1 1.07 -.07
R1BS 0 7 1 2 1.13 .86
R1BS 0 8 1 1 1.28 -.28
R1BS 0 9 1 2 1.36 .63
R1BS 1 7 7 9 7.96 1.03
R1BS 1 8 4 5 5.11 -.11
R1BS 1 9 5 5 6.84 -1.8
R1BS 2 7 9 11 10.3 .60
R1BS 2 9 5 9 7.47 1.52
R2BS 0 7 2 3 2.72 .27
R2BS 0 8 1 2 1.63 .36
R2BS 0 9 4 6 5.62 .37
R2BS 1 3 3 3 3.58 -.58
R2BS 1 7 4 8 5.61 2.38
R2BS 1 8 3 6 4.90 1.09
R2BS 1 9 3 4 4.47 -.47
R2BS 2 4 2 2 2.05 -.05
R2BS 2 7 1 2 1.69 .30
R2BS 2 8 4 8 7.47 .52
R2BS 2 9 1 2 1.74 .25

So when all of these are summed we find that Beltran, in 72 opportunities gained 115 bases. He was expected to gain 106.7 so that puts him 8.36 bases gained above expected. What I like about this method is that it takes into consideration three context dependencies for the runner.

First, the handedness of the batters behind the baserunner are accounted for by looking at the fielder who fielded the hit. So if Mike Sweeney, a right handed hitter hits behind Carlos Beltran one would naturally assume that Beltran will have fewer opportunities to go from first to third because Sweeney is right handed. Beltran will not be punished in this system since we're comparing the number of bases he gained against the expected bases for the scenarios he was actually involved in. This system does not, however, control for how hard the batter hit the ball (which is possible given that there are codes in the data indicating line drive, fly ball, grounder) or park effects (Fenway Park might tend to decrease advancement to third on singles to left).

Second, this system takes into consideration the number of outs. This is important since we know from the table shown in the previous post that with two outs the probability of being able to advance extra bases often doubles. With this system Beltran does not get additional credit if he happens to be on base alot with 2 outs.

And most importantly, because each player will get a different number of opportunities both because of their own ability to get on base and because of the abilities of the batters following them, the sum of the bases gained can be divided by the Expected Bases to yield an Incremental Base Percentage of IBP. For Beltran that number is 1.08 and ranks him 56th among the 331 players with more than 20 opportunities in 2003, largely vindicating his reputation as an above average baserunner. In other words Beltran gained 8% more bases than would have been expected given his opportunities.

The Results
This calculation can then be run for all players and teams. The leaders in IBP for 2003 (more than 20 opportunities) are (you can find the complete Excel spreadsheet here):
	

Opp Bases EB IB IBP OA IBR
Miguel Olivo 26 43 35.45 7.55 1.21 0 2.49
Shane Halter 20 34 28.41 5.59 1.20 0 1.85
Chone Figgins 33 54 45.37 8.63 1.19 0 2.85
G. Matthews Jr. 50 89 75.75 13.25 1.17 0 4.37
Brian Roberts 63 104 89.37 14.63 1.16 0 4.83
Randy Winn 63 109 93.76 15.24 1.16 0 5.03
Denny Hocking 24 40 34.57 5.43 1.16 0 1.79
B. Phillips 31 53 45.85 7.15 1.16 1 2.27
Omar Vizquel 31 54 46.74 7.26 1.16 0 2.39
Rey Sanchez 35 62 53.93 8.07 1.15 0 2.66

While the leaders in total Incremental Bases are:

Opp Bases EB IB IBP OA IBR
Raul Ibanez 76 129 113.24 15.76 1.14 0 5.20
Randy Winn 63 109 93.76 15.24 1.16 0 5.03
Brian Roberts 63 104 89.37 14.63 1.16 0 4.83
Marcus Giles 74 122 108.29 13.71 1.13 0 4.53
Orlando Cabrera 67 116 102.61 13.39 1.13 0 4.42
G. Matthews Jr. 50 89 75.75 13.25 1.17 0 4.37
Luis Castillo 92 148 135.55 12.45 1.09 0 4.11
Albert Pujols 68 117 105.54 11.46 1.11 1 3.69
Derek Jeter 65 112 100.61 11.39 1.11 1 3.67
Todd Helton 84 142 131.16 10.84 1.08 0 3.58
Melvin Mora 59 99 88.27 10.73 1.12 0 3.54

In perusing the leaders in IBP and IB (we'll get to IBR in a moment) you do get the impression that these measures makes sense. The leaders in both lists tend to be those players we think of as fast and/or good baserunners. Even Larry Walker, not a particularly fast man but often mentioned as a good baserunner comes in 33rd out of 331 while players perceived as bad baserunners, such as Moises Alou at 277th and Ken Harvey at 289th, or simply slow (Jon Olerud at 321st and Edgar Martinez at 312th are near the bottom. Although the leaders in IB also reflect more opportunities, they seem to be pretty indicative of good baserunners with Raul Ibanez and Randy Winn leading the list.

Of course, I say overall because a pair of catchers, Miguel Olivo and Ben Petrick are the IBP leaders. This can be explained, however, by the fact that they had 26 and 22 opportunities respectively - very near the cutoff - and in the case of Olivo, he scored twice from first base on singles with two outs. Petrick was simply more consistent overall and scored from second all eight times he was there when a batter singled. Neither one was thrown out. This also points out that perhaps 20 opportunities is too low a threshold.

Another interesting case is the Tigers Alex Sanchez, a speedy man who often bunts for hits and who stole 52 bases in 2003. His IBP is only .90 ranking him 286th. A quick look reveals that while he's fast, he also takes lots of chances and was thrown out eight times, the most in the league, in 84 opportunities.

So in answer to question (a) above we can say that Raul Ibanez helped his team the most from his baserunning although Chone Figgins, Randy Winn, Brian Roberts, and Marlon Anderson are all right up there.

On the other side of the coin Mark Bellhorn is at .66 IBP good for 331st place. Bellhorn's poor performance was highlighted during his time with the Cubs in 2003 by his getting thrown out three times in twelve opportunities and only garnering 6 bases out of an expected 17. Some of this may be attributed to "Waving" Wendell Kim as I'll discuss below. For Chicago his IB was -10.92 and his IBP .35. His bad baserunning continued to some degree with the Rockies where his IB was -1.11 and his IB .94.

From a team perspective the leaders in IBP and IB were:

Opp Bases EB IB IBP OA IBR
COL 572 912 877.75 34.25 1.04 11 10.31
BAL 635 972 944.07 27.93 1.03 9 8.41
OAK 572 902 880.76 21.24 1.02 13 5.84
ANA 597 909 888.03 20.97 1.02 9 6.11
ATL 659 1005 982.46 22.54 1.02 14 6.18
CLE 551 850 831.83 18.17 1.02 15 4.65
MIN 626 959 944.88 14.12 1.01 15 3.31
SDN 594 900 887.49 12.51 1.01 6 3.59
NYN 521 778 769.58 8.42 1.01 13 1.61
KCA 681 1015 1004.02 10.98 1.01 12 2.54

As you can see Colorado had the highest IB followed by Baltimore. On the other end the Cubs had an IBP of .95 and an IB of -42.56. Using this we can tentatively answer question (b) as including Colorado, Baltimore, Oakland, and Anaheim as good baserunning teams. For question (c) the difference appears to be on the order of 75 or so bases per season that a great baserunning team takes over a bad one. It would be interesting to compare the 2004 numbers to see if there is any trend here and if the Cubs were justified in firing Kim.

The issue that this immediately raises is how much of Mark Bellhorn's poor performance can be attributed to his third base coach and how much to himself? As I showed earlier his performance definitely improved with the Rockies as did that of Jose Hernandez who's IBP was .96 with the Cubs and 1.05 and 1.08 with the Rockies and Pirates respectively although he had only four opportunities with the Cubs, far too small to say anything. So the question of whether there is a team bias at work and how large it may be is unknown.

From the team numbers it also appears there may be a league bias. Nine of the bottom ten teams are from NL while seven of the top ten teams are from the AL. I'll have to rerun the numbers to see if the probabilities are significantly different between the AL and NL but my assumption was that pitchers, while poor hitters, would not be significantly poorer in their baserunning ability. This may be incorrect or it could be that NL third base coaches are much more cautious with pitchers on the bases or that they take more chances when pitchers are coming up. Or a combination of all three.

Next Steps
So where to go next? It seems to me that next logical step is to translate IB into a number of runs gained or lost by individuals and teams. Two possible ways to do this occur to me.

One way would be to assign weights to the outs and advancements and simply sum them. For example, in the linear weights formula an out costs approximately -.09 runs (see my post on Batting Runs for a discussion of why -.09 instead of -.25) while a base gained from an intentional walk is weighted at .33. Using these values one can calculate an IBR (Incremental Base Runs) and see that the Rockies gained 10.31 runs while the Cubs lost 15.84 runs.

Opp Bases EB IB IBP OA IBR
COL 572 912 877.75 34.25 1.04 11 10.31
BAL 635 972 944.07 27.93 1.03 9 8.41
ATL 659 1005 982.46 22.54 1.02 14 6.18
ANA 597 909 888.03 20.97 1.02 9 6.11
OAK 572 902 880.76 21.24 1.02 13 5.84
CLE 551 850 831.83 18.17 1.02 15 4.65
SDN 594 900 887.49 12.51 1.01 6 3.59
MIN 626 959 944.88 14.12 1.01 15 3.31
KCA 681 1015 1004.02 10.98 1.01 12 2.54
NYN 521 778 769.58 8.42 1.01 13 1.61
SLN 611 940 931.92 8.08 1.01 14 1.41
TEX 550 825 822.77 2.23 1.00 14 -0.52
CHA 562 865 868.97 -3.97 1.00 8 -2.03
NYA 638 960 960.81 -0.81 1.00 20 -2.07
DET 446 639 642.49 -3.49 0.99 11 -2.14
TOR 656 1004 1007.05 -3.05 1.00 13 -2.18
FLO 560 823 829.39 -6.39 0.99 15 -3.46
SEA 664 976 983.03 -7.03 0.99 13 -3.49
PIT 585 867 877.20 -10.20 0.99 13 -4.54
TBA 603 894 906.19 -12.19 0.99 18 -5.64
CIN 508 754 767.83 -13.83 0.98 14 -5.82
MON 570 853 868.26 -15.26 0.98 14 -6.29
BOS 667 1011 1027.14 -16.14 0.98 12 -6.41
HOU 610 922 937.93 -15.93 0.98 18 -6.88
SFN 579 846 864.84 -18.84 0.98 12 -7.30
LAN 514 757 778.87 -21.87 0.97 10 -8.12
ARI 580 856 880.05 -24.05 0.97 10 -8.84
PHI 626 912 947.43 -35.43 0.96 15 -13.04
MIL 524 750 786.43 -36.43 0.95 17 -13.55
CHN 537 766 808.56 -42.56 0.95 20 -15.84

From an individual perspective Raul Ibanez leads with 5.20 IBR while Geoff Jenkins is last with -6.48 (he had an IBP of .80 in 53 opportunities). Looking at the spread this analysis indicates that good baserunning teams pick up about a win per year (assuming a win is purchased at the cost of 10 or so runs) over average teams and somewhat less than three wins over poor baserunning teams while an individual may be responsible for somewhat less than an extra win with his baserunning.

A second technique that could be used is to look at the run expectancy value for each situation before and after the play and calculate the difference. To me this makes a good deal of sense since it will have the tendency to weight the outs more properly and give more credit for actually scoring a run than simply advancing. A weakness of IBR is that an out at second base is treated the same as an out at the plate. I haven't yet run those numbers but may do so in the future. There is an additional problem doing it this way, however. The presence runners on the bases ahead of the runner we're analyzing will change the run expectancy even though of course the baserunner in no way controls what happens to those runners.

What's Missing?
I'm sure as you've read this you've thought of several things that might be included. Here is what I've identified.

1) This framework only includes three basic situations (runner on first batter singles, runner on second batter singles, and runner on first batter doubles). The situations could be expanded by looking at advancement on groundballs (so called "productive outs").

2) To get a complete view of the baserunning of an individual scoring on sacrifice flies, pickoffs, advancing on sacrifice hits, stolen bases, and even defensive indifference should be taken into account.

3) While some of the context is here accounted for, much else is not. For example, what if four of the eight times Alex Sanchez was thrown out on the base paths he was the tying run with two outs in the bottom of the ninth? Is it reasonable to punish him as severely as a guy who gets thrown out at third base with his team down 3-0 in the third? Obviously not.

4) The framework makes no allowances for the base ahead of the runner being occupied. This particularly effects hitters who are intentionally walked alot like Barry Bonds. For Bonds second was occupied 30 of the 47 times (64%) a batter singled with him there against the league average of 29.2%. In these circumstances the runner will find it more difficult to take an extra base, which will artificially hold down his IB and IBP values. The reason I didn't exclude these situations was because it would have further reduced the number of opportunities but a good case can be made for doing so.

5) It's not clear to me how a team would use this information to make better decisions except at the extremes: telling Alex Sanchez to stop trying to take an extra base every time you're on, firing your third base coach if you're the Phillies or Cubs, and using Gary Matthews Jr. and Shane Halter as my first pinch runners. In other words, while all of this is interesting and provides some quantification of baserunning, it's not very actionable for most teams or players. I realize that wasn't one of my questions when I started this but really worthwhile research should lead to something actionable.

Conclusion
In summary I want to reiterate that this is a first pass at analyzing and quantifying baserunning and for many of you (as for me) I'm sure has raised more questions than answers. I'd appreciate your thoughts in any case.

No comments: