FREE hit counter and Internet traffic statistics from freestats.com

Thursday, January 31, 2008

Yin and Yang

I thought I'd finish off January with a couple of links...

  • Lovin' on Bannister. MLB Trade Rumors did a great interview with Royals pitcher Brian Bannister in three parts. Part 3 is where it gets really good as Brian reveals that he does his own statistical analysis (we already knew he was a BP reader) and gives us his take on DIPs theory. Several well-renowned analysts have already started the discussion into his insights and I'm sure we'll be seeing more in the future. What really encourages me about this is recognizing the value that thoughtful players like Bannister of Jeff Francis can provide and it makes me wonder how teams are utilizing those resources in parterning with the analytical resources they have.


  • Not so Much Insight. Kind of the anti-Bannister kind of observations were offerred by MLB.com reporter Marty Noble back in early January. In a previous article Noble used RBIs per 100 at bats to make a comparison between newly acquired catcher Brian Schneider and departed backstop Paul Lo Duca. In the response I've linked he tries to explain himself and although his main point that Schneider and Lo Duca are no longer as different offensively as some people claim is valid, there's simply no way he can get out of the hole he's dug. He has the right idea, namely that opportunities are important and rate statistics rather than counting stats are key, but of course he fails to select the right kind of opportunities to make the kinds of comparisons he's going for.

    That said, he dropped two little gems that I couldn't pass up:

    Computers have contributed to a current glut of statistics that, to a degree, distort the picture. We have so many now that we lose focus on what is most important. The objective of the game is to win, and to win a team must outscore its opponent. Nothing, therefore, is more important than runs -- both producing and preventing them.

    To what degree and to which statistics is he referring? Actually, I would argue that by translating traditional statistics into the currency of runs assuming an accurrate weighting, the vast majority of the supposed "glut" of statistics (VORP, BaseRuns, Linear Weights, defensive metrics, base running, etc.) have served to paint a more accurrate picture of "what is most important" - creating run differential that leads to winning games.

    That Lo Duca might have had a higher on-base percentage or slugging percentage means less to me than the number of runs he produced. The next time a team wins a game because it produced a higher on-base mark and scored fewer runs than its opponent, please alert me.

    Here I think there are two points of confusion.

    First, it turns out that the very combination of metrics he mentions, on-base percentage and slugging percentage (OPS), is a very strong predictor of runs produced since it accounts for the key ingredients (getting on base, moving runners, and avoiding outs) that are so problematic in looking at things like RBIs per 100 at bats which only measure one part of the equation. Additionally, by not accounting for context nor understanding how other metrics predict offensive output Noble ends up inverting the relationship between offensive production between the statistics he discusses.

    Second, in his last sentence he stumbles across the problem of scale. It is tautological to say that run differential is a perfect predictor of wins and losses at the level of an individual game. Therefore RBIs and run scored (at least for the offense) take on primary significance in that context and at that scale while OBP and SLUG are less predictive. However, once you raise the aggregation level, those counting stats take on less significance in player evaluation because a particular player's role in generating offense is about more than the tallying of the end result (an RBI or run scored) to the point where it quickly becomes the case (and well before the level of seasons) that OBP+SLUG and other derivative metrics are more indicative of offensive contribution and therefore wins and losses.

    This confusion of effects at various scales reminds me (not coincidentally because I'm now reading this book) of one of the primary themes in the writing of the late Stephen Jay Gould. He often railed against the position of ultra selectionists or adaptationists who insisted that natural selection was the exclusive driver and shaper of the pattern of life on earth. Gould contended that evolution operated differently at different levels through various mechanisms and that what worked at one level did not necessarily have power at another. For example, he argues that while natural selection works through differential reproductive success to build adaptations at the level of individual organisms (coloring, wings, claws, size, etc.) those adaptations may have little or nothing to do with survival at the higher level of species. In one of his favorite examples he liked to point out that the small size and adaptability of mammals during the age of the dinosaurs was likely the result of the domination by dinosaurs in the niches available to larger animals. However, when the meteor struck it was those "negative" traits that allowed the mammals to survive but doomed the dinosaurs.
  • Outfield Defense Redux

    I've made some changes to the SFR system for outfielders based on excellent feedback from readers and others that I discuss in today's column appropriately titled "Back to the Drawing Board". One of the things you'll notice is that my correlations for right fielders (primarily because of Brian Giles and Juan Encarnacion) with UZR for the 2003 through 2006 period are very poor. Not sure why but I noticed that Sean Smith apparently has similar issues. Overall, I like the system better and it seems to handle Manny Ramirez better and correlates pretty well with UZR overall.

    As a bonus you can now download a spreadsheet with version 1.0 of SFR for the infielders that includes 2007 minor league data.

    Baseball Prospectus Top 100

    At Baseball Prospectus today we published our top 100 prospects list as compiled by Keving Goldstein. Jay Bruce leads the field and below is the list ordered by team so you can count and debate your favorites. In terms of sheer numbers the A's, Rangers, and Red Sox each have seven prospects in the list while the White Sox, Blue Jays, Indians, Mets, Tigers, and Astros have one a piece.


    Rank/Name Pos Team
    27. Nick Adenhart rhp Angels
    38. Brandon Wood 3b/ss Angels
    59. Jordan Walden rhp Angels
    89. Hank Conger c Angels
    54. J.R. Towles c Astros
    22. Daric Barton 1b Athletics
    26. Carlos Gonzalez of Athletics
    46. Fautino de los San rhp Athletics
    50. Brett Anderson lhp Athletics
    56. Gio Gonzalez lhp Athletics
    98. Trevor Cahill rhp Athletics
    99. Chris Carter 1b Athletics
    7. Travis Snider of Blue Jays
    17. Jordan Schafer of Braves
    36. Jason Heyward of Braves
    63. Brent Lillibridge ss Braves
    70. Brandon Jones of Braves
    83. Gorkys Hernandez of Braves
    86. Jair Jurrjens rhp Braves
    31. Matt LaPorta of Brewers
    42. Manny Parra lhp Brewers
    76. Jeremy Jeffress rhp Brewers
    8. Colby Rasmus of Cardinals
    69. Chris Perez rhp Cardinals
    71. Bryan Anderson c Cardinals
    37. Geovany Soto c Cubs
    45. Josh Vitters 3b Cubs
    40. Jacob McGee lhp Devil Rays
    20. Jarrod Parker rhp Diamondbacks
    64. Gerardo Parra of Diamondbacks
    90. Max Scherzer rhp Diamondbacks
    5. Clayton Kershaw lhp Dodgers
    14. Andy LaRoche 3b Dodgers
    32. Chin-Lung Hu ss Dodgers
    66. Scott Elbert lhp Dodgers
    78. Wes Hodges 3b Georgia Tech
    29. Angel Villalona 3b Giants
    84. Henry Sosa rhp Giants
    52. Adam Miller rhp Indians
    33. Jeff Clement c Mariners
    44. Chris Tillman rhp Mariners
    55. Carlos Triunfel ss Mariners
    93. Wladimir Balentien of Mariners
    10. Cameron Maybin of Marlins
    88. Chris Volstad rhp Marlins
    51. Fernando Martinez of Mets
    28. Chris Marrero of/1b Nationals
    35. Ross Detwiler rhp Nationals
    81. Michael Burgess of Nationals
    12. Matt Wieters c Orioles
    75. Chorye Spoone rhp Orioles
    85. Radhames Liz rhp Orioles
    23. Chase Headley 3b Padres
    39. Matt Antonelli 2b Padres
    61. Matt Latos rhp Padres
    68. Carlos Carrasco rhp Phillies
    96. Joe Savery lhp Phillies
    24. Andrew McCutchen of Pirates
    43. Steven Pearce 1b Pirates
    94. Neil Walker 3b Pirates
    30. Neftali Feliz rhp Rangers
    49. Eric Hurley rhp Rangers
    58. Elvis Andrus ss Rangers
    62. Engel Beltre of Rangers
    73. Michael Main rhp Rangers
    74. Chris Davis 3b Rangers
    77. Taylor Teagarden c Rangers
    3. Evan Longoria 3b Rays
    6. David Price lhp Rays
    15. Wade Davis rhp Rays
    18. Desmond Jennings of Rays
    25. Reid Brignac ss Rays
    2. Clay Buchholz rhp Red Sox
    16. Jacoby Ellsbury of Red Sox
    53. Justin Masterson rhp Red Sox
    57. Jed Lowrie ss Red Sox
    60. Ryan Kalish of Red Sox
    95. Michael Bowden rhp Red Sox
    100.Lars Anderson 1b Red Sox
    1. Jay Bruce of Reds
    9. Homer Bailey rhp Reds
    21. Joey Votto 1b Reds
    41. Johnny Cueto rhp Reds
    13. Franklin Morales lhp Rockies
    80. Chris Nelson ss Rockies
    82. Greg Reynolds rhp Rockies
    91. Casey Weathers rhp Rockies
    92. Dexter Fowler of Rockies
    19. Mike Moustakas ss Royals
    72. Luke Hochevar rhp Royals
    11. Rick Porcello rhp Tigers
    65. Carlos Gomez of Twins
    79. Deolis Guerra rhp Twins
    97. Ben Revere of Twins
    87. Aaron Poreda lhp White Sox
    4. Joba Chamberlain rhp Yankees
    34. Ian Kennedy rhp Yankees
    47. Austin Jackson of Yankees
    48. Jose Tabata of Yankees
    67. Alan Horne rhp Yankees

    Wednesday, January 30, 2008

    Profiling a Ray

    The always fascingating Marc Normandin allowed me to horn in on his regular gig this week and contribute a little PITCHf/x analysis to his player profile of the Rays (not Devil) James Shields. Unfortunately Shields had just eight of his 31 starts recorded by PITCHf/x but still the 712 pitches does allow us to see some definite patterns. The article does not require a subscription.

    Monday, January 28, 2008

    The Hot Stove at Altitude


    The Rocky Mountain Chapter of SABR held its annual “Hot Stove” meeting on January 26th at Jackson’s All-American Grill located across the street from Coors Field in Denver. Being the Secretary of the chapter, yours truly took some notes and what follows is the synopsis.

    President Paul Parker called the meeting to order a little past 10AM and after a few brief remarks introduced trivia-master Dave Wallack. Dave lead-off the festivities by distributing a quiz loaded with Rockies and other trivia to the 32 assembled members and guests. After 10 minutes of contemplation and confused looks the quizzes were scored with the top three finishers receiving their choice of copies of Baseball Between The Numbers or a wall poster of vintage baseball cards from the 1920s. It happened that three members tied for second place and so what turned out to be a not-so-fast “lightening round” moderated by Parker and Wallack (whose questions were a little tough to say the least) was held to determine the two who would take home the remaining door prizes.

    Parker then continued the meeting by reminding members of the upcoming Denver Bears/New York Yankees Reunion fundraiser event now scheduled for May 3rd at the Denver Athletic Club. As of now the event will feature Ralph Terry, Johnny Blanchard, Ryne Duran, and Woody Held and consist of research presentations, a panel discussion, and autograph session. Later in the meeting member Matt Repplinger discussed the availability of Rockies/Dodgers tickets for that evening’s game, which will be sold at the event with a part of the proceeds benefiting the chapter. The face value of the ticket will be $38 for an Outfield Box seat down the right field line in section 116. The chapter will be selling the $38 ticket for $28, ten dollars off the face value. The group’s planned summer trips to Albuquerque, Colorado Springs, and Casper, Wyoming were also discussed.

    The meeting then continued with a research presentation by myself and Neal Williams where we presented our article “The Traffic Directors”, which will appear in the upcoming volume 36 of The Baseball Research Journal. The study focused on the attempt to quantify the contributions of third base coaches and to determine if there is a detectable skill component that can be measured. We used a subset of the base running metrics I developed for Baseball Prospectus but are augmented with the additional context of the personnel the coach had to work with. You can read the details in the BRJ or the two-part version online, but we conclude that if there is a skill component (or rather a skill difference between coaches if you prefer), it is too subtle to measure given the combination of play by play data and other influences. Attendees engaged in a short question and answer period before taking a brief break.

    After the break Parker introduced the keynote speaker, Jeff Bridich, the Rockies Director of Baseball Operations. After giving a brief rundown of his career in baseball and his opportunity to join the Rockies in 2004 as the Director of Minor League Operations, Jeff began by asking the crowd how they would grade the Rockies off-season moves thus far which included the signings of Matt Holliday, Willy Taveras, and the record-breaking contract of Troy Tulowitzki. With that opening the attendees grabbed the bull by the horns and peppered Bridich with questions ranging from the arbitration cases of Brian Fuentes, Brad Hawpe, and Garrett Atkins to this spring’s competition at second base involving Jayson Nix and Marcus Giles among others, to the health of pitcher Jason Hirsch and the prospects of catcher Chris Iannetta. As you might imagine, much time was spent dissecting the options at second base for the upcoming season and Bridich provided some interesting background on the development of Nix as he went from an offensive prospect after being drafted in 2001 to the player most likely to hurt the team defensively at second. He also indicated that the loss of Carney Lansford as a minor league hitting coach was a big blow to their organization.

    Being closely involved with the arbitration process, Bridich was able to provide excellent insight into the dynamics of the interaction between the two sides and with the three-member panel in what he characterized as often “not a friendly exchange of information”. Further, he discussed approaches to preparing arbitration cases from the club perspective including their use of some advanced metrics such as Zone Rating for measuring defense. Interestingly, he indicated that while the use of advanced metrics was certainly a part of their strategy, those metrics needed to be published and proven in the industry to the extent that they can show the panel that the metrics have some legs.

    I found particularly interesting his comments on the baserunning of Willy Taveras where he noted that the Rockies are encouraging Taveras to be more liberal in his stolen base attempts, especially of third. Bridich related that when asked how many times he could have stolen third in 2007, Taveras indicated that he could have swiped third 30 or so times. While that's certainly an optimistic assessment even for a competitve player, it's certainly true that Taveras seems to have a fear of stealing third. Overall, in my baserunning framework I have him for 138 events at second base and just 3 at third over the course of his career and one of those three was actually a pickoff at second base and one came as his only stolen base attempt of 2004 (perhaps that's what instilled the fear?). Now if they could just teach him to bunt towards first base...

    If Bridich had prepared remarks, the steady stream of questions from the attendees and his thoughtful and articulate answers prevented him from getting to them. After over an hour of discussion many in the group, including Bridich, enjoyed lunch at the restaurant while the discussion continued.

    Thanks to all members and guests who participated in this stimulating morning of baseball discussion.

    Saturday, January 26, 2008

    The Moral Hazards of the Hit Batsmen

    This the final in a series of three columns I wrote for BP on the topic of hit batsmen. You can find the other two on this blog as well. It appeared on May 18, 2006



    Schrodinger's Bat:The Moral Hazards of the Hit Batsmen
    by Dan Fox

    "The designated hitter rule is like letting someone else take Wilt Chamberlain's free throws."

    --Rick Wise (1974)

    In the previous two weeks, we’ve been looking at historical hit by pitch rates and their trends, and investigating a variety of theories that have tried to explain the fluctuation of those rates. We’ve looked at a wide variety of theories that account for factors such as aluminum bats at the amateur level, changes in the strike zone, the increase in body armor, intimidation, retaliation, and even the win expectancy of the hit batsmen. While individual theories may lack explanatory power for a specific period of time, taken together they do provide insight into the sometimes opposing forces that underlie trends in baseball's complex competitive environment.

    There is one trend, however, that we failed to discuss. So this week we’ll take a look at the difference in league rates of hit batsmen since the introduction of the designated hitter in 1973. This topic has been taken up before, so we’ll start by covering some of the old ground, and then hopefully add something new to the discussion.

    Setting a Baseline
    Before we discuss what the impact of the DH on HBP rates might be, let’s lay out the raw facts that have inspired so much conjecture. The following graph shows the percentage of AL hit batsmen per 1,000 plate appearances as opposed to the NL since the DH was adopted in the American League in 1973. The shaded line is a three-year moving average.



    What this shows is that from 1973 until the mid 1990s the rate of hit batsmen in the AL was anywhere between 3% and 30% higher than in the NL. While that’s a wide range, more typical values are between 10% and 20%, with the average during the period being 17%:


    1973 9.3% 1990 19.0%
    1974 14.0% 1991 18.6%
    1975 7.8% 1992 20.0%
    1976 17.3% 1993 9.4%
    1977 17.2% 1994 -7.6%
    1978 12.5% 1995 -6.7%
    1979 8.4% 1996 2.5%
    1980 24.2% 1997 -15.9%
    1981 22.8% 1998 4.4%
    1982 3.2% 1999 10.2%
    1983 19.1% 2000 -17.6%
    1984 29.7% 2001 7.0%
    1985 21.5% 2002 7.4%
    1986 26.5% 2003 3.4%
    1987 16.7% 2004 7.7%
    1988 20.9% 2005 -5.6%
    1989 22.9%


    Around 1994, things began to change and in the following dozen years HBP rates in the NL actually surpassed those in the AL five times, including in 2005 where 9.52 batters were hit per 1,000 PA in the AL, against 10.05 in the NL.


    So in fact, there are actually two questions that we can ask about this trend. First, what accounts for the difference in rates of hit batsmen during the twenty-year period following the introduction of the DH (1973-1993), and secondly, what caused those differences to shrink in the period after 1993?

    A Moral Hazard or More Opportunity?
    As mentioned in the introduction, the topic of league differences in HBP rate have been researched in the past. Most recently, Lee A. Freeman wrote an excellent article titled "The Effect of the Designated Hitter Rule on Hit Batsmen" in Volume 33 of The Baseball Research Journal. In it, Freeman provided a short synopsis of the previous work, citing articles in the journal Economic Inquiry in 1997 and 1998, as well as a follow-up in a 2004 issue of the Journal of Sports Economics.

    Prior to Freeman’s paper the two theories that had been bandied about to explain the difference (at least from 1973 until the mid 1990s) were the "moral hazard theory" and the "lineup composition theory." The former theory argues that because American League pitchers needn’t fear retaliation with the presence of the DH, they are more apt to hit opposing batters since they don’t bear the costs of their actions directly. The latter theory also argues from a cost-benefit basis, although differently--AL pitchers hit more batters because the cost in terms of run scoring when hitting a DH is so much less than hitting a pitcher. This follows from the fact that the designated hitter is much more likely to be an offensive producer than your typical weak-hitting full-time hurler.

    As a variation of the lineup composition theory, Freeman contended that more hit batsmen in the AL can be explained largely (but not totally, as he rightly cautions against single-theory explanations) simply by more "true" hitters coming to bat in the AL. In his words:


    American League pitchers are not given the opportunity during a game to 'ease up' their delivery to the opposing pitcher. As a result, AL pitchers are likely to 'want' or 'need' to pitch inside to more batters during the course of a game, thereby increasing the chances of these batters being hit by a pitch.

    Through an analysis of average HBP per season and per team, both before and after the introduction of the DH, Freeman concludes that there is no statistical significance (at the .001 or .005 levels) to the differences in hit batsmen across the two leagues once you adjust the averages for the fact that in the AL approximately 12.5% more true hitters come to the plate in the DH era.

    What this analysis lacks, as admitted by Freeman himself, is a more granular accounting for the differences in the number of "true" hitters, and instead relies on a quick and dirty approximation. Using Retrosheet data, we can address that weakness in the study.

    The following table shows the percentage of plate appearances consumed by each fielding position, along with the HBP per 1,000 plate appearances for both the AL and NL in the period 1973-1993.


    <----AL----> <----NL----->
    HBP / HBP /
    POS PAPct 1000 PAPct 1000
    ------------------------------------
    P 0.0% 0.0 6.8% 2.2
    C 10.1% 6.3 10.4% 4.9
    1B 11.1% 4.9 11.3% 4.8
    2B 10.9% 5.0 11.2% 4.7
    3B 10.8% 5.3 11.1% 5.7
    SS 10.3% 4.8 10.8% 3.7
    LF 11.2% 6.2 11.4% 5.0
    CF 11.4% 5.2 11.5% 4.7
    RF 11.0% 5.6 11.3% 4.3
    DH 11.2% 6.1 - -
    PH 2.0% 4.9 4.2% 3.7

    TOTAL 5.5 4.5


    In total, AL hitters were hit at a rate 20.8% higher than NL hitters.

    As you can see, in the AL designated hitters consumed 11.2% of the plate appearances, and were hit at a rate of 6.1 times per 1,000 PA. Both totals are among the highest for AL hitters. So, while the DH might be the equivalent of someone else taking Wilt's free throws, the price the DH pays is some additional pain.

    On the other side of the fence, NL pitchers consumed just 6.8% of the plate appearances, and were hit just 2.2 times per 1,000 PA. Interestingly, although the percentage of plate appearances for AL pitchers is rounded to 0%, they actually came to the plate 79 times, mostly as the result of games where the AL team lost their DH as a result of the DH assuming a defensive position per rule 6.10.

    So, rather than seeing Freeman's 12.5% more "true" hitters in the AL, in actuality AL pitchers see around 7% more true hitters when you subtract the pitchers from the NL totals. However, Freeman also noted that pinch-hitters are often used for pitchers in the NL, and this is borne out by the fact that pinch-hitters came to the plate more than twice as often in the NL (4.2%) than in the AL (2.0%). Freeman also speculated that pinch-hitters are not as likely to get hit since they are often weaker hitters than players in the regular lineup (it should be noted that as reported in The Book, there is also a "pinch-hitting penalty" that drags down performance). The lesser rate of hit batsmen for pinch hitters is verified by the data. So, assuming that the NL rate of pinch-hitting was the same as the AL rate, and throwing the remainder of the NL pinch-hitters into the bucket of poor hitters with the pitchers, we can estimate that the AL pitchers see approximately 9% more true hitters than pitchers.

    The difference between Freeman's estimate and the actual numbers lies in the fact that the vast majority of pitchers hit ninth, Dontrelle Willis being the most recent occasional exception. Hitting from the last slot in the order, pitchers therefore come to the plate less frequently than position players.

    To adjust for that, given the data in the above table, we can now make an estimate for the true differences in hit batsmen by controlling for pitcher plate appearances. One simple way to do this is to estimate what would happen if all pitcher and pinch-hitter plate appearances in the NL were consumed by a true hitter whose rate of getting hit was relatively as high as a designated hitter's in the AL. This means that 11% of the NL plate appearances (6.8% + 4.2%) will be assigned a new HBP rate based on the difference between a DH and the rest of the positions in the AL. To do so we'll first calculate the ratio of the DH rate (6.1) to the non-DH rate (5.4) as 1.13. If we assume that true hitters in the NL consuming those plate appearances would have produced 13% more hit by pitches than the non-pitchers and pinch-hitters (which turns out to be 5.4 HBP/1000 PA), then the average for the NL would jump 30% from 4.5 hit batsmen per 1,000 PA to 4.8. As a result, instead of a 20.8% advantage for the AL during the period, the true advantage is around 13.6%.

    So while accounting for a different lineup composition in the AL helps level the playing field, it obviously doesn't account for the entire difference, as Freeman concluded. We're still left with around two-thirds of our original difference between the leagues. Does that mean we're left with the moral hazard theory to explain the remaining difference?

    Readers familiar with this subject will note that this cursory analysis lines up nicely with the fine work done by J.C. Bradbury and Douglas Drinen in a paper titled "Identifying Moral Hazard: A Natural Experiment in Major League Baseball" (warning: .pdf). In that paper, using data from 1989-1992 compared against 1969 plus 1972-1974, the authors conclude that:

    "Controlling for variables that proxy batter quality, pitcher quality, retaliation, and game situation we find that the DH rule increases the likelihood that any batter will be hit during a plate appearance between 11 and 17 percent. This explains approximately 60 to 80 percent of the differential in the hit batsmen rate between leagues."

    But there are also two additional theories to consider.

    If you look back at the previous articles in this series you'll notice that the rate of hit batsmen in the AL actually surpasses that of the NL prior to the introduction of the DH. In fact, beginning in 1967, the rate of AL hit batsmen to NL went as follows:


    1967 11.5%
    1968 18.1%
    1969 -1.5%
    1970 10.1%
    1971 8.3%
    1972 11.0%


    During this six-year period the differences in the AL rate with the pitcher hitting were not much different than those immediately after the introduction of the DH. What this indicates is that hit batsmen were already more frequent in the junior circuit. Perhaps some of this remaining difference lies elsewhere.

    As mentioned last week, one of the factors that may influence hit batsmen is the definition (both written and as interpreted) of the strike zone. There is of course anecdotal evidence that the strike zone varied in the two leagues primarily as the result of AL umpires using the old-style "balloon" chest protector that forced them to stand more upright and therefore call more high strikes. And although by around 1983 AL umpires were also using the inside chest protector popularized by Bill Klem, they may have retained their traditional strike zone for some years. But still, outside of concocting what Stephen Jay Gould would call a "just-so story," there is no clear connection between high strikes and hit batsmen. A related hypothesis might be that the AL, being known as more of a curveball league, induced more hit batsmen since curveballs are inherently more difficult to control than fastballs. But both of these theories are difficult to quantify.

    A more straightforward idea is that one or two individuals skewed the numbers for this time period, accounting for the remaining difference between the leagues. This follows the dictum that when what you're measuring has inherently low frequencies, you should always be aware of a small number of samples having a large influence on the data.

    As most readers have already guessed, when you're talking about hitters and HBPs during this period, Don Baylor and Chet Lemon are two players who immediately spring to mind. Both played their entire careers in the AL, with Baylor suiting up for the Orioles, A's, Angels, Yankees, Red Sox, and Twins from 1970-88, and Lemon for the White Sox and Tigers from 1975-90. Baylor was hit 257 times in 8,888 plate appearances (defined simply as hits plus walks plus HBP for this analysis) from 1973 through 1988, for an astounding rate of 28.9 per 1,000 PA--tops during the period and ranking him 15th for players since 1901. Lemon was hit 151 times in 7,768 PA for a rate of 19.4. If these two players' rates are adjusted down to the average for the period, the overall rate for the AL drops from 5.5 to 5.3 and therefore accounts for about 4% of the remaining difference.

    In summary then, from an initial difference of nearly 21% in the rate of hit batsmen between the two leagues in the 1973-1993 period, just over 7% can be accounted for by the presence of more true hitters in the lineup and another 4% by two hitters who were exceptionally "gifted" at getting plunked. This still leaves ample room for the moral hazard theory, a theory that incorporates differences in the two leagues relating to strike zone or styles of play, or a combination of all of the above to operate.

    Evening the Score
    The second question introduced above is related to the disappearance of the difference in rate of hit batsmen between the two leagues, beginning in 1994. Since that time, the National League has actually topped the American League in five of the twelve years, as shown in the previous table.

    What can account for this dramatic shrinking of differences between the two leagues?

    First, let's take a look at the same table for the years 1994-2005 as we did for the preceding years.


    <----AL----> <----NL----->
    HBP / HBP /
    POS PAPct 1000 PAPct 1000
    ------------------------------------
    P 0.4% 1.4 5.9% 3.3
    C 10.1% 11.1 10.5% 12.8
    1B 11.1% 10.7 11.2% 10.5
    2B 11.0% 10.6 11.5% 13.1
    3B 10.8% 9.3 11.1% 9.8
    SS 10.9% 10.3 11.1% 8.5
    LF 11.2% 8.9 11.3% 10.5
    CF 11.3% 8.5 11.5% 10.0
    RF 11.0% 9.8 11.3% 10.7
    DH 10.4% 10.2 0.5% 12.2
    PH 1.6% 8.4 4.2% 10.0

    TOTAL 9.9 10.3


    What you'll notice is that the NL has outpaced the AL since 1994 despite leading in a minority of those seasons. This data set now includes interleague games, so a DH is listed in the NL column, and pitchers in the AL with the rate of hit batsmen for NL DHs even higher than that for the NL, and the rate for AL pitchers lower than in the NL. Of course, both leagues saw massive increases in their rates reflected as well.

    In a follow-up paper (another .pdf) also published in 2004 Bradbury and Drinen conclude that during the entire history of the DH, batters were about 8% more likely to be hit in games where the DH was played accounting for around half of the difference between the leagues. However, when looking only at 1994-2005 data and breaking down the data into games played with the DH and those without we find the following:


    <----DH----> <---NO DH---->
    HBP / HBP /
    POS PAPct 1000 PAPct 1000
    -------------------------------------
    P 0.0% 0.0 6.2% 2.8
    C 10.1% 10.0 10.5% 11.0
    1B 11.1% 9.6 11.2% 9.0
    2B 11.0% 9.6 11.5% 11.2
    3B 10.8% 8.4 11.1% 8.4
    SS 10.8% 9.1 11.1% 7.5
    LF 11.2% 8.1 11.3% 9.0
    CF 11.3% 7.8 11.6% 8.5
    RF 11.0% 8.8 11.3% 9.2
    DH 11.2% 9.0 - -
    PH 1.4% 7.9 4.3% 8.6

    TOTAL 8.9 8.8


    Here there is only a 1% overall difference. If one were to "correct" the data to account for lineup composition, as we did with the 1973-1993 data, you would find that games in which the DH was not in force produced 8.1% more hit batsmen per 1000 plate appearances than games without the DH. Truly, this is a large shift, for which we can offer three possible explanations.

    First, as with the 1973-1993 data, we may be seeing the influence of one or several extreme players. It just so happens that during this period the NL has been blessed with a trio of the most-frequently hit batters in the history of baseball in Jason Kendall (except 2005), Craig Biggio, and Fernando Vina (except 1995-1997 and 2004). A clue to their contribution can be seen in the previous table, where the rates for second baseman and catchers are conspicuously high in the NL. Overall, their rates during that time…


    PA HBP HBP/1000
    ---------------------------
    Kendall 5908 197 33.3
    Vina 4633 154 33.2
    Biggio 7930 245 30.9


    Don Baylor has nothing on these guys.
    If we adjust these three players' rates down to the league average for the period it drops the overall NL rate 4.3%, down to 9.8, just under the AL rate. Even so, this doesn't fully account for the fact that, given the lineup composition theory, we should see even fewer hit batsmen in the NL.

    A second theory, and one proposed by Bradbury and Drinen in their follow-up paper, targeted the expansions of 1993 and 1998 as possible factors. Although discussed in the first article in this series, this theory does accurately predict a larger increase in HBP in the NL than in the AL in 1993-1994 because of the asymmetrical nature of the expansion draft. In 1993, the HBP rate rose 7.3% in the AL and 21.6% in the NL, and for 1994 it was -6.0% and 11.6%. In the years following 1994 the rate increases evened out. But even so, one wouldn't think that NL pitchers would go on hitting more batters even after the affects of expansion were absorbed as they did in 1997 and 2000.

    The final theory, and one also proposed by Bradbury and Drinen, is that the implementation of the "double-warning rule" (8.02(d)) in the winter of 1993 had an immediate impact. Essentially, this rule raised the costs for teams hitting opposing batters, and placed that cost squarely on the pitcher and manager, both of whom can be immediately ejected from the game. One result is that AL pitchers now have a greater fear of hitting batters in retaliation lest they be ejected, thereby lowering their rate of hit batsmen. At the same time, it could be argued (as Brady and Drinen do) that NL pitchers have less fear of retaliation under the double-warning rule, since they know that the opposing team dare not hit them or their teammates or suffer the cost. The combination of more fear by AL pitchers and less fear by NL pitchers could together be responsible for essentially erasing the gap between the leagues.

    Take Your Base
    One of the reasons so many of us love baseball is that while it is seemingly simple, it is also a very human activity with naturally endless complexity. In this series of articles, I hope that we've highlighted some of that complexity in a statistically small but interesting part of the game. But while I for one love big-picture analysis, there's nothing more exciting than getting caught up in the one-on-one confrontations between pitcher and batter that are really the source of our ruminations.

    Friday, January 25, 2008

    Chat Transcript 1/25

    Thanks to everyone who showed up at the chat today. Lots of questions on SFR and defense in general that were very interesting. You can find the transcript here and as always you can use this post for follow-ups.

    Rocky Mountain SABR 2008 Hot Stove Meeting

    For anyone in the Denver and surrounding area you'll be interested to learn that the next meeting of the Rocky Mountain chapter of the Society for American Baseball Research will be held Saturday January 26, 2008. The public is welcome and if you want to know what to expect you can take a look at last year's minutes.

    We'll kick off at 10:00 AM at Jackson's All-American Grill directly across the street from Coors Field at 20th and Blake in downtown Denver.


    Our featured speaker will be Jeff Bridich, the Rockies' Director of Baseball Operations. Jeff is Harvard alumnus and was a catcher and outfielder on the Harvard baseball team serving as a tri-captain in his 2000 senior year. He formerly worked in the Office of the Commissioner for Major League Baseball where he worked closely with teams facilitating, reviewing and approving minor league contracts and transactions. Jeff became the Rockies Director of Minor League Operations in 2004 before being promoted to his current position in October of 2005.

    In addition RMSABR member Dave Wallack will lead-off with some trivia and members Dan Fox and Neal Williams will present their article on third base coaches titled "The Traffic Directors" that will appear in the next edition of SABR's Baseball Research Journal due to be published this month.

    Please join us for what will certainly be a stimulating morning of baseball discussion.

    Thursday, January 24, 2008

    Chat Tomorrow 1/25

    Just a quick note that I'll be chatting at Baseball Prospectus tomorrow at 1:30PM Eastern time for 90 minutes or so. You can submit questions beforehand here. See you then.

    SFR v1.0

    As promised in yesterday's post about Peter Gammons, today's Schrodinger's Bat on BP walks through the official release of version 1.0 of Simple Fielding Runs (for infielders anyway) and its similarities to UZR and the Plus/Minus system. As a bonus you can now download the 2005 through 2007 data in Excel to play with the numbers to your heart's content. Oh, and the 2007 minor league leaders and trailers are also discussed.

    Wednesday, January 23, 2008

    Gammons and Cyberspace

    A nice column yesterday by Peter Gammons on the impact of the Internet on the sports as well as the political culture (similar to another column he wrote back in 2006). Two quotes in the column in particular caught my eye (other than the mention of this blog, Baseball Prospectus, and The Hardball Times albeit sadly not in that order) that deserve a few comments.

    First, Gammons says:

    I make no bones about my strong feelings about the human element. Pure numbers cannot do justice to character and drive and energy. They cannot measure the impact Robin Yount had on teammates when he ran down the first-base line at the same breakneck speed (one scout had nearly 90 Yount games in a six- or seven-year period and claimed he never got Yount faster than 3.9 seconds, or slower than 4.0).

    What a wonderful anecdote and one that relates to what I found when looking at the baserunning exploits of Yount in last week's column. To summarize, Yount was the only player who was a career leader (from 1956-2007 anyway) in multiple of the five baserunning metrics. Overall Yount contributed +54 theoretical runs ranking him 13th in total number of runs. However, he was first in advancing on hits (EqHAR) at +39 runs and first in advancing on fly balls (EqAAR) at +17 runs. He did this despite costing his team 7 runs in stolen bases (EqSBR) and a half run in advancing on passed balls, balks, and wild pitches (EqOAR).

    Below you'll find Yount's career baserunning statistics.


    Year Opps EqGAR Opps EqSBR Opps EqAAR Opps EqHAR Opps EqOAR Opps EqRuns
    1974 24 0.8 15 -3.0 19 1.0 38 2.1 203 1.2 299 2.1
    1975 42 -0.9 17 0.1 37 -0.2 42 1.1 317 0.1 455 0.3
    1976 33 0.6 31 -3.9 49 -0.7 42 2.2 311 -0.6 466 -2.2
    1977 39 0.0 24 -0.5 49 0.5 63 0.4 397 -0.1 572 0.4
    1978 27 -0.4 21 0.5 32 0.4 41 0.9 278 -0.7 399 0.7
    1979 30 0.9 23 -1.1 47 2.5 43 2.6 311 2.0 454 6.9
    1980 39 1.2 27 0.7 46 1.7 46 2.5 353 0.0 511 6.0
    1981 26 -0.2 5 0.1 30 0.8 29 1.5 199 0.0 289 2.2
    1982 46 0.9 17 0.9 60 2.5 47 3.5 399 0.3 569 8.0
    1983 29 -0.8 15 -0.4 53 0.9 47 3.3 343 -0.8 487 2.3
    1984 43 -0.7 18 1.1 56 0.9 64 2.6 386 -0.1 567 3.8
    1985 20 -0.2 15 -0.7 24 0.1 51 3.5 252 -0.8 362 1.9
    1986 40 0.7 20 0.6 42 1.8 47 1.8 367 -0.4 516 4.6
    1987 39 0.0 26 -2.2 50 1.0 32 1.3 406 0.9 553 1.1
    1988 25 0.2 24 2.4 49 -0.6 45 1.1 397 0.1 540 3.2
    1989 29 1.5 21 1.6 53 0.5 61 0.4 402 -0.9 566 3.1
    1990 23 0.7 21 -1.6 46 2.1 52 2.6 360 0.6 502 4.4
    1991 16 0.5 9 -1.2 43 0.5 39 2.1 277 -1.2 384 0.7
    1992 32 0.0 20 -0.7 47 0.9 44 2.3 311 -0.6 454 1.9
    1993 16 0.3 11 0.4 30 0.2 48 1.5 258 0.4 363 2.7
    618 5.3 380 -6.8 862 16.7 921 39.4 6527 -0.5 9308 54.1



    Yount managed to turn in a positive run value in EqHAR in each of his 20 seasons - a rare feat to say the least.

    I was also interested by this comment in Gammons' piece.

    Bill James is trying to define clutch, what made George Brett so different, or sets David Ortiz, when healthy, apart in swagger and presence. You can present me with 4,765 pages of anti-Derek Jeter material; it won't work, I watch him too much.

    Although he mentions in the column that he was reading The Hardball Times apparently he didn't let Tom Tango's excellent piece titled "With or Without Derek Jeter" sink in. In that article Tom uses Retrosheet data to demonstrate without a doubt (at least to me) that Jeter is among the worst fielding shortstops of his generation by showing that when Jeter is on the field, regardless of the other context which Tom does a great job of neutralizing, fewer batted balls are turned into outs. Period. And one would think that should be the bottom line when evaluating defense.

    In tomorrow's Schrodinger's Bat at Baseball Prospectus I go one more round with the fielding system dubbed Simple Fielding Runs (SFR) that I developed for use with Retrosheet style play by play data. In the article I compare SFR to UZR (Ultimate Zone Rating) as well as John Dewan's Plus/Minus system. Not coincidentally both Plus/Minus and SFR rate Derek Jeter as the worst shortstop in baseball from 2005 through 2007 and of course UZR is no fan either. For my part, here are Jeter's SFR numbers since 2002 (ExR is expected runners, Rn is actual runners, and Balls are the number of balls allocated to Jeter's area of responsibility).


    Year Balls ExR Rn Diff SFR
    2002 461 14 10 4 3
    2003 479 119 139 -20 -15
    2004 637 154 151 3 2
    2005 721 183 195 -13 -9
    2006 625 163 174 -11 -8
    2007 615 168 194 -26 -20
    3538 800 863 -64 -47


    So over the course of six seasons Jeter is worth -47 runs by handling 64 fewer balls than would have been expected.

    What I find interesting about Gammons' comment (and his take on Jeter is of course not a rare one and so I'm not just picking on Gammons) is the almost absolute faith in observation over other evidence when the evidence from every analytical tool available concurs as to the quality of Jeter's defense. Perhaps people are simply wired differently with some inherently more skeptical of what they see (or think they see) and therefore more willing to let other kinds of input shape their opinions. I'll admit it's kind of a mystery to me.

    Tuesday, January 22, 2008

    The Catch

    Saw a link to this photo come across the SABR listserv with the author wondering whether this really does depict "The Catch" made by Willie Mays in game one of the 1954 World Series. I hadn't seen this photo before and if anyone has any comments on it I'll pass them along.



    And for those interested in the background of "The Catch", here's a snippet from the Ken Burns documentary featuring George Will and Bob Costas.

    Sunday, January 20, 2008

    Running and Tulo

    Just a head's up for those interested that I posted responses to a few reader questions on BP's Unfiltered blog in response to last week's Schrodinger's Bat column.

    Also just read the six-year deal with a club option for 2014 that Troy Tulowitzki signed with the Rockies. At $30M the deal seems like a good one for the Rockies, especially the club option that would take him through his age 28 season and buy out his second year of free agency. Certainly investing in any young player involves risk (and perhaps more so for hitters in Colorado) but in Tulowitzki the upside for the club is substantial because of his contributions on both sides of the ball. As for his fielding I had him at +18.6 runs using my Simple Fielding Runs (SFR) system, ranking second behind only Omar Vizquel.

    Thursday, January 17, 2008

    Willie, Mickey, and Hank

    My column this week on Baseball Prospectus titled "For the Sake of Completeness" ties up some loose ends with the baserunning framework by showing the results from more or less all Retrosheet years (1956-2007). To that end I not only look at the aggregate leaders and trailers and discuss the merits of Lou Brock and Dave Parker but also develop a new rate statistic that incorporates four of the five metrics. This new rate (Equivalent Base Running Rate or EqBRR) is a more "pure" measure of baserunning and using this I develop an aging curve for baserunning as a whole and by position and finally examine baserunning as a skill and its persistance across career halves.

    You'll need to read the column to get the details but in researching the article I took a look at more than a few old-timers and so I thought I'd share the baserunning exploits of Willie Mays, Hank Aaron, and Mickey Mantle.

    First, the Say Hey Kid.


    Year Opps EqGAR Opps EqSBR Opps EqAAR Opps EqHAR Opps EqOAR Opps EqBRR
    1956 22 0.3 36 1.3 30 -1.8 21 3.1 189 -1.0 298 2.0
    1957 24 -0.1 46 -1.1 43 -2.5 30 2.6 276 1.1 419 0.0
    1958 29 0.0 41 3.3 51 1.0 35 0.1 341 1.6 497 5.9
    1959 21 -0.3 37 1.1 27 1.0 54 3.3 308 1.1 447 6.2
    1960 20 -0.3 40 -0.7 35 1.9 45 1.6 308 -0.2 448 2.3
    1961 30 0.7 29 -2.2 38 0.4 37 2.5 308 0.4 442 1.7
    1962 26 0.4 25 1.0 44 2.2 53 0.5 313 -0.2 461 3.9
    1963 37 0.5 19 -2.1 34 2.0 50 -0.2 357 1.3 497 1.6
    1964 24 0.2 28 0.5 38 2.0 39 2.7 317 0.0 446 5.4
    1965 22 0.2 16 0.0 20 0.6 36 -0.7 284 1.5 378 1.5
    1966 23 -0.7 8 0.2 24 -0.9 34 -0.6 292 -0.5 381 -2.5
    1967 16 -0.6 7 1.2 19 0.7 38 2.5 235 0.9 315 4.7
    1968 17 0.8 18 -0.3 35 1.7 49 -0.4 269 0.4 388 2.3
    1969 15 0.0 8 -0.6 23 0.3 26 0.3 228 -1.0 300 -1.1
    1970 12 0.1 6 0.1 26 0.7 29 1.9 308 0.6 381 3.3
    1971 26 1.4 24 2.0 34 -0.1 33 0.8 308 0.5 425 4.6
    1972 22 0.3 9 -1.8 30 -0.8 21 0.9 144 0.4 226 -0.9
    1973 12 0.0 1 0.1 10 0.1 18 -1.4 80 0.1 121 -1.1
    398 2.9 398 2.0 561 8.5 648 19.4 4865 7.0 6870 39.9



    Mays finished 23rd in aggregate EqBRR (and of course his 1951-1955 seasons are missing) and led the league in 1958 although doing better in 1959 and pretty well in 1964 and somewhat surprisingly 1971 thanks to some high percentage base stealing. In terms of pure baserunning Mays contributed 19% more runs than an average runner which is a little on the high side for centerfielders. He seemingly was pretty good at advancing on fly balls (EqAAR), fairly average on grounders (EqGAR) and held up his own both in advancing on hits (EqHAR) and balks, passed balls, and wild pitches (EqOAR).

    Next, we have Hammerin' Hank.


    Year Opps EqGAR Opps EqSBR Opps EqAAR Opps EqHAR Opps EqOAR Opps EqBRR
    1956 32 -0.6 3 -1.7 30 0.5 32 0.6 234 0.6 331 -0.7
    1957 28 0.7 3 -1.5 37 -1.6 42 0.3 328 1.1 438 -1.0
    1958 44 1.1 4 -0.2 49 -0.5 48 1.7 321 0.3 466 2.4
    1959 25 -0.1 8 1.4 47 1.3 52 3.2 355 -0.6 487 5.2
    1960 24 1.3 23 -0.6 33 -0.6 31 1.8 228 0.2 339 2.2
    1961 23 -0.7 31 -0.4 32 1.4 37 1.2 274 1.7 397 3.2
    1962 34 -0.6 21 -1.2 34 0.6 39 0.8 296 -0.5 424 -0.9
    1963 21 0.5 39 2.7 51 0.2 45 3.7 359 -0.4 515 6.7
    1964 30 0.7 26 1.6 25 0.1 42 0.7 278 1.6 401 4.6
    1965 12 -0.2 29 -0.1 27 1.0 38 1.5 281 2.0 387 4.3
    1966 25 -0.3 24 1.5 39 1.8 53 1.6 323 0.3 464 4.8
    1967 28 1.6 24 0.1 38 1.6 45 1.3 297 -0.9 432 3.7
    1968 14 0.2 32 1.9 25 -2.7 36 1.6 226 -0.4 333 0.6
    1969 15 -0.5 21 -2.7 29 0.7 38 2.3 263 0.0 366 -0.2
    1970 21 -0.5 11 1.5 28 0.0 43 -0.5 277 -0.7 380 -0.2
    1971 15 -0.5 2 0.0 30 0.1 44 -0.7 262 -0.1 353 -1.2
    1972 18 -0.4 3 0.5 36 0.7 42 -0.1 265 -0.2 364 0.5
    1973 6 0.2 2 -0.4 19 -0.2 31 -0.7 174 -0.2 232 -1.2
    1974 12 0.3 1 0.1 17 0.1 20 -3.1 121 -0.5 171 -3.1
    1975 19 -1.0 1 -0.4 38 0.0 29 0.0 237 -0.5 324 -1.9
    1976 8 -0.1 1 -0.7 11 -0.6 11 -1.1 79 -0.4 110 -2.9
    454 1.1 309 1.4 675 4.1 798 16.1 5478 2.5 7714 25.2



    Hank does pretty well overall and was well above average runner from 1958 through 1967. He apparently slowed considerably after that though which depressed his career total substantially. Had he simply treaded water those final few years he would have been at something like +35. He led the league in 1963 with his +6.7 runs and was consistently effective in advancing on hits (EqHAR).

    Finally, The Mick.


    Year Opps EqGAR Opps EqSBR Opps EqAAR Opps EqHAR Opps EqOAR Opps EqBRR
    1956 24 0.1 13 0.7 44 -0.3 49 1.0 376 1.1 506 2.5
    1957 27 -0.4 20 1.4 49 -0.9 61 1.1 399 0.4 556 1.7
    1958 37 -0.1 23 1.9 49 0.2 57 2.9 375 0.1 541 5.0
    1959 22 -0.3 26 1.9 31 1.0 43 2.2 286 -0.9 408 4.0
    1960 26 -0.3 21 -0.3 46 -0.2 49 1.7 299 1.8 441 2.7
    1961 26 0.1 15 0.5 58 0.7 47 0.6 369 0.9 515 2.8
    1962 23 -0.7 8 1.5 39 0.1 44 0.6 290 0.6 404 2.0
    1963 13 -0.2 4 -0.9 9 -0.1 12 1.0 93 -0.4 131 -0.6
    1964 29 -0.2 9 -1.5 28 1.1 35 1.7 254 -0.9 355 0.1
    1965 12 -0.2 6 -0.2 22 -0.5 26 -0.2 173 -0.3 239 -1.4
    1966 11 -0.3 2 -0.4 20 0.3 20 -1.4 168 0.3 221 -1.4
    1967 20 -0.2 4 -1.3 33 -0.2 27 -1.0 237 -0.4 321 -3.0
    1968 23 -0.6 7 0.1 26 0.7 34 0.1 268 -0.3 358 0.0
    293 -3.3 158 3.5 454 1.9 504 10.3 3587 1.9 4996 14.4



    After 1962 Mantle's knees didn't hold up and that is reflected in his running. Before that, though he was above average and inline with most centerfielders contributing plus runs from 1956 through 1962. Interestingly, he always did poorly on advancing on ground outs (EqGAR) but could seemingly take the extra base on hits (EqHAR).

    Wednesday, January 16, 2008

    Wednesday Links

    Just a couple of notes for a Wednesday...

  • Jayson Stark had a nice live blog post on the hearings yesterday for those of use who had to work. Some pretty good perspective I think and in answering questions he brings up a few other issues worth thinking about.


  • Former BP'er Keith Woolner has a nice bio on the Science Magazine site that goes through his background leading up to his current position as Manager of Baseball Research and Analysis for the Indians. Very cool.


  • Mike Fast writes a wonderful PITCHf/x primer on MVN that explains what you need to know about f/x in seven easy steps.
  • Monday, January 14, 2008

    A Podsednik Nugget?

    As many of you know John Dewan publishes a "Stat of the Week" on the ACTA publishing site. These are often inciteful but I'm a little perplexed by the stat for last Friday January 11th which will appear in the upcoming book from ACTA Sports, The Bill James Gold Mine, available in February 2008..


    In that nugget he notes that...

    "Scott Podsednik has proven quite effective as a leadoff man. Until last year. Every year prior to 2007 Pods' teams have scored significantly more runs when he led off an inning than when others have led off."

    This is accompanied by a table that illustrates how in 2007 the team scored -.01 runs fewer per inning when Podsednik led off than when his teammates did so.

    While I haven't done exhaustive research on this I do know that Podsednik went from being a leadoff hitter in previous years to batting further down in the order in 2007. In fact, here are his plate appearances by lineup position since 2003.


    Year/Pos 1 2 3 4 5 6 7 8 9
    2003 279 291 2 27 11 2 1 15
    2004 703 8 2
    2005 564 2 2
    2006 574 2 1 11 3
    2007 89 19 3 2 27 95


    Clearly a large part of the difference is due to the fact that the two and three hitters in the order are guaraneteed to bat when Podsednik leads off an inning from the leadoff spot in the order as opposed to when he's in the 6th or 7th hole. A little follow-up would be to see whether on average when a leadoff hitter leads off an inning it typically raises scoring by at least +0.25 runs as in the case of Podsednik in previous seasons or whether his totals actually indicate that he was a hinderance in the leadoff spot (as might be guessed by his career .338 OBP). It's also the case that lineup balance will play a role since it could be the case that the two, three, and four hitters for the White Sox were relatviely more potent than the rest of the lineup when compared with other teams.

    In any case this Stat of the Week, while interesting, doesn't really show what it purports.

    Saturday, January 12, 2008

    Strike Zones, Trilobites, and a Vicious Cycle

    Last week I ran the first in a series of three columns I wrote on hit batsmen. Today it's time for the second in the series originally published In May of 2006. Enjoy.




    May 11, 2006
    Schrodinger's Bat: Strike Zones, Trilobites, and a Vicious Cycle
    by Dan Fox

    "If they knocked two of our guys down, I'd get four. You have to protect your hitters."
    --Don Drysdale

    "I hated to bat against Drysdale. After he hit you he'd come around, look at the bruise on your arm and say, 'Do you want me to sign it?'"
    -- Mickey Mantle

    In our last installment of Schrödinger’s Bat we began an investigation of hit batsmen by looking at the big-picture trends in the rate of hit batsmen since 1901. That exploration led to summarizing various theories that have been proposed over the years to explain the fluctuation of rates, including the physical hazard theory, the offensive context theory, the intimidation theory, the expansion theory, the new strike zone theory, and finally the aluminum theory. From among that group, we can say that the last one seemed to make sense for the recent upward trend that began circa 1985.

    Although I promised that this week we’d scrutinize the differences in hit batsmen rates since the introduction of the designated hitter in 1973, and discuss the theories proposed to explain it, last week’s column generated such a large volume of email that I thought it would be worth spending one more column on the big picture before moving on to the DH era.

    Big Picture Trends Redux
    Let’s start off by addressing a few of the more prevalent reader questions regarding the bevy of big picture trends discussed last week. Indicative of the questions received was this one from reader Marc Stone, where Marc touches on two aspects of HBP trends that the article overlooked.

    Nice job, Dan, but you left out one very useful comparison: how do changes in HBP compare to changes in BB rates and, to a lesser extent, K rates and pitches per PA.

    Reader Ryan Tippetts echoed the second part of that question by noting:

    My immediate thought, specifically regarding recent upward trends, was the modern trend of increased pitches per AB. Might it be as simple as because a batter sees more pitches he has more opportunities to be hit by a pitch?

    Thanks to Ryan and Marc, and to all the other readers who had similar comments. I have to admit that neither looking at walk and strikeout rates nor at pitches per plate appearance in comparison with the rate of hit batsmen had occurred to me. But of course all three suggestions make a lot of sense:


    • If pitchers are walking more batters at the same time they’re hitting more of them, that may be indicative of worse control (the “wildness theory”).

    • If strikeouts are strongly correlated with hit batsmen, then perhaps a more aggressive hitting style (the “free swinger theory”), or the intimidation of the HBP, or even changes in the strike zone are playing a role.

    • If pitchers are throwing more pitches overall, it does indeed provide more opportunity for hitters to get plunked (the “opportunity theory”) which in the end may be all that is required.


    To see whether the wildness or free swinger theories shed any light on the question of changes in HBP rates over time, we can add unintentional walks and strikeouts per 1,000 plate appearances for each league to the graph we showed last week:



    What you’ll notice is that up until around 1970, there appears to be some correlation between walk rate and HBP rate. Unfortunately, the correlation is the inverse of that which the wildness theory would predict. As walk rates increased from around 1920 through the late 1940s the rate of hit batsmen fell. As walk rates declined, the frequency with which batters were hit increased.

    In other words, one might be inclined to conclude that there is a more or less constant rate at which pitchers put batters on for free via the HBP or unintentional walk, at least based on the graph from 1901 through 1970. While that’s an attractive idea, and akin to the offensive context theory discussed last week, you can’t simply add the two rates, since hit batsmen are so much less frequent than walks--as evidenced by the fact that in order to get both on the graph, the scale of HBP is per 1,000 PA while that for walks is per 100 PA. As a result, the number of runners that pitchers put on for free is driven almost entirely by the number of walks.

    In any case, there appears to be no correlation over the past 35 years, as walk rates have been fairly steady, while the number of hit batsmen has increased dramatically.

    On the other hand, the free-swinger theory appears more promising. Strikeout rate does correlate pretty strongly with the HBP rate since around 1950, and in the 1910-1925 period as well. In fact, from 1950 through 2005 the correlation coefficients are a very healthy .72 and .69 for the American and National Leagues respectively, which can be interpreted to mean that strikeout rates explain around 50% (.702) of the variation in HBP rates (or vice versa).

    But as every statistics professor drums into the heads of his students, correlation is not necessarily causation, and before 1950 the correlation is much weaker--in fact, for the preceding 25 years the two rates were moving in opposite directions. As a result, one might argue that the free-swinger theory holds since 1950 because the normative hitting style became more aggressive, resulting in hitters diving over the plate more frequently, which in turns results in more hit batsmen. Under this interpretation, during the 1970-1984 period, free swinging was less in vogue, and pitchers reacted with fewer brushback pitches, resulting in fewer HBP.

    An alternative theory noted by reader JMHawkins that would fit the same set of facts holds that an expanding strike zone, especially on the outside corner, forces hitters to stand closer to the plate and dive over it more frequently, resulting in more batters being hit. The expanded zone also happens to induce more strikeouts, so strikeout rate and HBP rate aren’t causally related, but both are related to this third factor. There is undisputed evidence that the strike zone expanded in 1963, and anecdotal evidence that the low outside corner became an increasingly rewarding target for pitchers in the last 20 years or so. As umpires reigned in the zone after the redefinition in 1969 and the increased scrutiny around 2001, both strikeouts and hit batsmen fell. This “fluctuating strike zone theory” then explains why strikeout and HBP rate seem to mirror each other.

    In either case, we’d still need a theory to account for the preceding 25 years, when strikeouts rose and hit batsmen fell, although under the above theory it appears that those 25 years from 1925 to around 1950 are the exception and not the rule.

    To be honest, I was initially most hopeful about the opportunity theory. It's pretty well known that the number of pitches per plate appearance has been on the rise, so it makes intuitive sense, but when we try to look at this theory, we run into the problem that we don’t have complete play-by-play data--and hence pitch counts--for most of baseball's history. Despite the recent and very welcome additions to the work being done at Retrosheet we are still missing the vast majority of the data required to complete the picture from 1901 through 2005; the 49 seasons that Retrosheet provides are often missing pitch sequence data.

    Some alert readers (aka, the real stat geeks) may also be thinking that perhaps we could use pitch count estimators in order to estimate the number of pitches, and hence the rate at which batters are hit per pitch. Unfortunately, the basic estimators that are in use rely on constant multipliers for strikeouts and walks to estimate the number of pitches, and we’ve already taken those into account in the graph above. More complex estimators rely on estimates of balls-in-play rate (the percentage of pitches on which balls are put into play, which varies by league and year), which we don’t have historically. There are other factors that could also influence the result which models have difficulty capturing.

    However, we can look at data we do have, and that's as far back as 1988. You’ll recall that during the 1988-2005 period HBP rates have more than doubled. What we find, however, is that during that time the number of pitches per plate appearance has risen only around 5%. So it doesn’t look like the opportunity theory explains at least the most recent upward trend.


    Year P/PA
    1988 3.60
    1989 3.63
    1990 3.64
    1991 3.71
    1992 3.68
    1993 3.68
    1994 3.75
    1995 3.75
    1996 3.75
    1997 3.76
    1998 3.70
    2000 3.75
    2001 3.72
    2002 3.73
    2003 3.74
    2004 3.76
    2005 3.73


    What do Trilobites and Jason Kendall Have in Common?
    Although the free-swinger and fluctuating strike zone theories (or some combination thereof) provides some insight, and the opportunity and wildness theories perhaps less so, the most often cited theory by readers not discussed in last week’s column is the “body armor theory.” A succinct explanation was provided by reader Jeff Bullington:

    This would only affect the recent rise, but what about the increased use of body armor? Would this be the 'contra-intimidation theory'?

    As Jeff noted, this is the polar opposite of the intimidation theory and holds that as hitters began to wear more and more protective gear, they’ve been less afraid of getting hit, allowing them to stand closer to the plate and be more aggressive about hanging in. It follows logically that pitchers would respond by upping the ante in an effort to move batters off the plate, and reclaim their rightful territory.

    This idea is akin to the evolutionary arms race between predator and prey, whereby one species evolves stronger protection in response to selection pressure from predators as has been speculated for trilobites, which in turn leads to selection pressure on predators to evolve accordingly.

    As arguments go, this is a particularly difficult one to measure quantitatively. What we can certainly see that the use of protective gear--such as hard elbow and shin pads--has increased in the past 20 years. One only has to look at the protection worn by Craig Biggio, or Jason Kendall and consider his recent run-in with John Lackey to understand how that protection might affect the game. It’s probably not a coincidence that coming into 2006, Biggio's 273 HBPs rank second all-time, and Kendall ranks 8th with 197.

    That said, in 2002 Major League Baseball began enforcing rules that limited the use of protective gear to players with medical exemptions, such as the one employed by Barry Bonds, which allows him to wear his elbow armor. The rules also limited the size of the various pads and devices worn.

    Whether coincidentally or not, the recent Kendall incident notwithstanding, the rate of hit batsmen has stabilized since that time. This was also immediately after the rate had reached its apogee in 2001, when the AL set its all-time record in hit batsmen per 1,000 plate appearances and the NL its highest total since 1901.


    AL NL
    2001 10.67 9.92
    2002 9.90 9.17
    2003 10.21 9.86
    2004 10.40 9.60
    2005 9.52 10.05


    We can also note that although helmets have been mandatory for MLB players since 1956, ear flaps have only been enforced for players who reached the majors after 1983. Ear flaps do coincide with the recent upward trend, and although one can imagine there would be an attendant psychological boost for the hitter, it’s more difficult to believe that this relatively minor change would have had that large of an immediate impact. After all, players already in the league were allowed to use the old-style helmets, so the change was gradually phased in, and the head is the part of the body hit with the least frequency.

    But this does provide the opportunity to sneak in a quick trivia question: Who was the last player to wear a helmet without an earflap in a game and in what year? (Wait for it, we'll get to the answer at the bottom of the column.)

    So, whether or not body armor and the introduction of the ear flap is responsible for the twenty-year upward trend in HBP rates or not, an argument can be made that the crackdown on body armor has played a role in retarding the arms race.

    A Vicious Circle?
    Finally, reader Jake Slemp wrote to say that whatever the cause of an increasing or decreasing trend in hit batsmen, it would likely be self-sustaining and reinforcing. His reasoning:

    After all, hit batsmen beget more hit batsmen within the same game, which often beget still more in subsequent games between the two teams…which beget more in those games, etc.


    In other words, even a small increase in hit batsmen might form a feedback loop based on retaliation. This situation is often described in economic terms as a virtuous (if the results are favorable) or a vicious (if they are negative) circle, where each cycle continues the trend in the current direction until stopped by some outside force.

    To look at this “vicious circle theory,” we can use play-by-play data for 2001 through 2005 to examine the distribution of games by the number of hit batsmen. We can then compare the actual distribution with what would be expected if the hit batsmen were distributed randomly (in a binomial distribution) given the overall rate of HBP and the average number of plate appearances per game. What we find when we do so is as follows:


    HBP Games Expected
    7 1 0
    6 1 1
    5 10 10
    4 118 71
    3 455 394
    2 1626 1610
    1 3980 4325
    0 5953 5732
    6191 6412


    As you can see, the number of games where zero through two batters are hit are all pretty much in line with what would be expected. However, we do see that the frequency of three and especially four batters hit in a game surpass the numbers you'd expect, and there are fewer games with a single batter hit than expected. And of course this list provides the opportunity for a second trivia question: What teams were involved in the lone seven hit batsmen game of the past five years? (Again, answer appears at the bottom.)

    What this confirms is that retaliation is a likely factor in hit batsmen. Games where we would otherwise expect two batters to be hit can quickly turn into games where three or four are hit. We already knew that intuitively, but what we need to know is whether or not increased retaliation is responsible for the increasing number of hit batsmen.

    To look at this, we can calculate the expected number of games with various numbers of hit batsmen over four successive periods, starting in 1985.


    Actual vs 1985-1989 1990-1994 1995-2000* 2001-2005
    Expected
    5+ 850% 246% 322% 104%
    3 - 4 162% 125% 119% 123%
    0 - 2 100% 100% 99% 99%

    * Does not include 1997-1999.

    As we saw with the 2001-2005 period, in all periods there are just about the expected number of games with zero, one, or two HBP. However, there are always more games than expected with three or four batters hit, and lots more with five or more hit.

    While this confirms that retaliation within games is probably a persistent feature of hit batsmen, it doesn’t appear as if blatant retaliation has increased over the past twenty years. Keep in mind, the HBP rate has doubled during that time frame. If anything, it would appear there are slightly fewer beanball wars now than in the past, perhaps as a result of the double-warning rule put into effect in 1994. Note that this conclusion holds even if you assume that the increase in games with three or more hit batsmen is completely due to wildness (after all, it’s certainly true that when a pitcher hits one batter he’s more likely to hit another simply due to control problems).

    What this doesn’t rule out is the idea that teams now employ a more subtle form of retaliation, whereby they will wait to take revenge in a subsequent series, and where the retaliation doesn’t escalate out of control. As a result, it would be possible that retaliation and escalation are to blame for the recent increase in hit batsmen, but it seems unlikely.

    However, even if retaliation is not the cause of the increasing rate of hit batsmen, the body armor theory may provide the starting point for the vicious circle that was interrupted by the new rules, starting in 2002.

    Error on the Side of Caution
    If nothing else, I hope that we’ve highlighted that in an activity as complex as baseball, there are usually many factors that contribute to the big-picture trends that we see. That’s true for hit batsmen as well as the more visible trends, like the offensive upsurge of the last dozen years or so. If there is a lesson to be learned here, it’s probably that we should all be more cautious of simple explanations and easy answers.

    Let’s wrap up with a couple of corrections from last week.

    First, when discussing the expansion theory I noted that expansion would have a tendency to dilute talent in both leagues. While that’s true to some extent, I was reminded by our own Christina Kahrl that actually the 1992 expansion draft was the first time players from both leagues were available in an expansion draft. Prior to that, for example in 1977, the expansion teams could only choose unprotected players from their own league. And in that 1992 draft, AL teams were able to protect more players than NL teams; it was not until the 1997 draft that all teams were able to protect the same number of players.

    Second, I noted last week that Ray Chapman was the only professional player ever fatally injured in a game. Reader Bill Johnson pointed out that Chapman was the only major-leaguer to be fatally injured by a beanball. Several minor leaguers were killed in the 1950s and 1960s including Otis Johnson in 1951.
    ---

    Okay, so you waited, here are a couple of answers. For the first trivia question, Tim Raines never wore an earflap in a 23-year career that spanned from 1979 through 2002. As quoted in a MLB article documenting it, he did not wear one because, being a switch hitter, he didn’t want to carry two helmets.

    The answer to the second question: June 7, 2001 the A’s visited Anaheim to take on the Angels. In that game Jason Giambi was hit by Scott Schoeneweis following a first-inning home run by Frank Menechino. In the third inning, Schoenweis then hit Menechino (one wonders if accidentally) and later in the inning also hit Olmedo Saenz. Barry Zito subsequently hit Tim Salmon in the 6th. Almost certainly not coincidentally, Schoeneweis again hit Menechino leading off the 8th. Later in that same inning, Mike Holtz entered the game and promptly plunked Eric Chavez for good measure. And just to round things out Scott Spiezio was hit by Mark Guthrie in the bottom of the 8th. Ouch.

    Thursday, January 10, 2008

    Pulling for Teddy Ballgame

    My column today at BP deals with the history and of defensive shifts and delves into a little data on The Splendid Splinter's propensity to pull as well as an analysis of the most pull-happy modern players. I was inspired to take up the topic after spending several enjoyable hours digesting the essays in The 2008 Hardball Times Annual. As you might imagine I'm partial to the "Analysis" section and although I wasn't particularly impressed with one of the essays, for the thinking fan the material in the ten essays is well worth the cost of the book.

    And while I appreciated Tom Tango's work on catcher defense and an exhaustive look at what can be learned about Derek Jeter's defense from Retrosheet, John Walsh's investigation of platoon splits, David Gassko's new take on the vexing subject of managerial contributions, and John Beamer's walk through a Markov Model for the 2007 season, it was a section of the essay "Of Home Runs and Free Agents" by Greg Rybarczyk of Hit Tracker fame that caught my eye. In that article Greg has a section titled "'Did Anyone Order a Center fielder?' Case Study: All Batted Balls by Torii Hunter and Andruw Jones," in which, as the title implies, he takes an in-depth look at the balls in play for these two players, and in doing so mentions the idea of employing an infield shift against Jones. For those interested Greg has posted data from that article at SOSH.

    Great stuff and once again THT has put together a very fine collection of analytical and historical essays coupled with a look back at 2007.