Thursday, July 08, 2004

Bottom of the Order Analysis

A week or so ago I was watching an Astros game on ESPN when David Justice, former player and the analyst on the broadcast, began talking about Roger Clemens. In particular he noted that Clemens had recently said that 1) the inclusion of the pitcher in the lineup made pitching much easier in the National League and that 2) pitching to the 7 and 8 hitters in the National League was much easier than pitching to the 8 and 9 hitters in the American League.

Although I wasn't suprised by the former statement I was by the latter. I was skeptical of Justice's comment since my reasoning had always been that with the inclusion of the pitcher National League teams would generally prefer better hitting over defense in the remaining 8 positions. I then assumed that Clemens likely perceived the 7 and 8 hitters as weaker since he was able to pitch around them more frequently in order to get to the pitcher. Of course, the other side of the coin is that NL teams play in a more restrictive run environment and so are more prone to choose good defense over good offense at shortstop, second base, and center field.

To try and put together some numbers to answer the question I turned to Retrosheet. In less than an hour over lunch I did the following:

1. Downloaded the event files (play-by-play data files) for the AL and NL for 1992 (the most recent year available)
2. Generate comma-delimited files using the BEVENT.exe utility also on the site
3. Take the CSV files and loaded them into a SQL Server database on my laptop
4. Run a few simple queries to see how various positions in the batting order performed

What I found is that from this data (again, only for 1992) it appears that Justice may be on to something although the differences are not monumental. In fact, it could be argued that since the NL numbers likely include many instances for the 8 hitter in particular of being pitched around, the NL numbers are at least on a par with the AL. On the other hand, the number of walks drawn is not significantly different. It should be noted that on base average (OA) was calculated without the intentional walks (IBB) added in.

This then led me to consider how other positions fared and I was able to tweak my queries a bit to get the second data set. Once again nothing earth shattering here although the NL 3 and 4 hitters outperformend the AL 3 and 4 hitters which may indicate that the AL lineups have more balance.

So here it is:

	   AB     H    TB    BB IBB     BA    SA     OA    OPS

AL 8-9 15670 3912 5324 1391 53 0.250 0.340 0.311 0.651
NL 7-8 13774 3292 4624 1075 233 0.239 0.336 0.294 0.630

AL 7-8-9 23934 5975 8366 2131 0.250 0.350 0.311 0.661
NL 6-7-8 21030 5127 7350 1635 0.244 0.350 0.298 0.648

AL 1-2 18531 5028 7124 2031 0.271 0.384 0.343 0.728
NL 1-2 15883 4325 5930 1546 0.272 0.373 0.337 0.710

AL 3-4 17858 4767 7648 1768 0.267 0.428 0.333 0.761
NL 3-4 15050 4141 6601 1602 0.275 0.439 0.345 0.783

AL 5-6 16824 4236 6564 1850 0.252 0.390 0.326 0.716
NL 5-6 14617 3695 5617 1224 0.253 0.384 0.311 0.695

