FREE hit counter and Internet traffic statistics from freestats.com

Tuesday, November 02, 2004

Measuring Baserunning: A Preliminary Attempt

Another great quote from Rich's post yesterday on the 1984 Baseball Abstract was James' contention that Project Scoresheet was going to definitively answer questions about baserunning.

"Baserunning is perfectly measurable; it can be easily defined and, given properly maintained scoresheets, easily researched. Our lack of knowledge on the subject is attributable entirely to record-keeping decisions that were made a little over a century ago and have never been intelligently or systematically reviewed. We know so much about hitting that we can talk about it forever and measure it with extraordinary precision because a few men, at the beginning of Time, made some very good decisions about how to record and organize information, decisions that are now so natural a part of our thinking about the game that it is difficult even to see that any decision had ever to be made.

For this we applaud them. Their decisions about baserunning and fielding were much less wise. They failed to address many issues, and drew arbitrary lines where they drew them at all, and time has laid waste to their designs."

Rich then goes on to say:

"If this information is known today, it sure isn't widely disseminated. Why don't we know how often (in absolute terms and as a percentage of opportunities) various runners go from first to third on a single, first to home on a double, or second to home on a single? How often does Ichiro Suzuki reach base on an error as opposed to the average batter? Are we limited in recording the data or in distributing the data? Until this information is made available to the public, we will be limited in our ability to fully understand and appreciate all the nuances of the game and its players."

Well, in the interests of picking up the challenge I loaded the 2003 play-by-play data (found on the Yahoo group stats_software) into SQL Server last night and wrote a few quick queries to provide a baseline to answer a couple of Rich's questions and for future work.

Situation
Man on First - Batter Singles
Opportunities: 10430
ToThird: 2841 (27.2%)
Scores: 94 (0.9%)
Out Advancing: 147 (1.4%)

Man on First - Batter Doubles
Opportunities: 2979
Scores: 1315 (44.2%)
Out Advancing: 95 (3.2%)

Man on Second - Batter Singles
Opportunities: 6128
Scores: 3703 (60.4%)
Out Advancing: 217 (3.5%)

I was somewhat surprised how often a runner scores from first on a double (44.2%) and how few times a runner on first is thrown out advancing. Breaking these numbers down by the number of outs (which I plan to do) would also show a difference I assume.

From a team perspective the leaders in these categories were:

Man on First - Batter Singles
ToThird: Colorado and Minnesota (33.3%)
Scores: San Diego (2.0%)
Out Advancing: Houston (3.4%)

Man on First - Batter Doubles
Scores: Montreal (53.8%)
Out Advancing: Cubs (6.4%)

Man on Second - Batter Singles
Scores: Oakland (66.0%)
Out Advancing: Cubs (6.3%)
Cubbs fans should probably not be surprised that former third base coach "Waving" Wendell Kim in 2003 contributed to 20 runners being thrown out on the basepaths, tied for the most in baseball with the Yankees (the Padres had only 6).

These metrics get a bit difficult, however, when you consider the personnel the third base coach has to work with and judging what is and is not an opportunity In these numbers I've looked at all opportunities and not only those where the next base was unoccupied. There is an argument to be made that if the next base is occupied then the runner may be hindered in their attempt to take the extra base. Overall, the next base was unoccupied around 70% of the time so I went ahead and included all opportunities.

Overall, a quick way to measure a team's baserunning skills might be to see how many times they took at least one "extra" base. The percentage leaders are:

Colorado 46.6%
Oakland 46.1%
Minnesota 44.1%
Cleveland 44.0%
Baltimore 43.6%

The Cubs were last at 35.5%.

As far as individual leaders in these categories go:

Man on First - Batter Singles (more than 20 opp)
ToThird: Jose Guillen (59.1% 26/44)
Scores: Luis Matos (12.9% 4/31)
Out Advancing: Matt Lawton (11.1% 3/27)

Man on First - Batter Doubles (more than 10 opp)
Scores: Miguel Tejada (81.3% 13/16)
Out Advancing: Juan Encarnacion (18.2% 2/11)

Man on Second - Batter Singles (more than 20 opp)
Scores: Jose Guillen (90.9% 20/22)
Out Advancing: Tony Womack (14.3% 6/42)

Interestingly Juan Encarnacion was second in getting thrown out when on second with a single 13.6% of the time (3/22). So even from these small samples it appears that perhaps good and bad baserunners can be identified. Jose Guillen looks pretty good here while Juan Encarnacion does not.

My ultimate goal here is to assign weights to the various baserunning plays and come up with a single number that estimates how many runs a player gained or lost for his team on the bases. This methodology has some limitations and doesn't provide a complete picture, e.g. looking only at advancement in this manner does not include advancement on errors, does not measure the impact of holding the runner, doesn't take into account park effects (the Green Monster likely has a negative impact on the times a runner can go from first to third on a single), and as I mentioned there is a strong weighting based on the third base coach that would have to be separated from individual players. Also the sample sizes in a single season for individuals is really very small. I think you'd need 5 to 10 years worth of data to start to get something meaningful.

All of these limitations and more are what make analyzing baserunning difficult.

1 comment:

Anonymous said...

RS + 2B + 3B + SB -HR - 2*CS/AB

is what I do. But I'm sure some smoothing factor must be considered such as a team's SP perhaps.

Ron Nordquist