FREE hit counter and Internet traffic statistics from freestats.com

Tuesday, January 30, 2007

Plunking Explosion

Steve Treder over at The Hardball Times has a good article today on hit batsmen and how it has changed over time. He reviews many of the arguments that I discussed in my series last summer...

  • Schrodinger's Bat: Beautiful Theories and Ugly Facts

  • Schrodinger's Bat: Strike Zones, Trilobites, and a Vicious Cycle

  • Schrodinger's Bat: The Moral Hazards of the Hit Batsmen


  • As a conservative I appreciate Steves use of the Law of Unintended Consequences in positing that the introduction of batting helmets in the early 1960s and the adoption of the "zero-brushback-tolerance protocol" of the 1990s ironically may have contributed to the increasing rate at which hitters are plunked. This argument would also hold for the increased ability of players to wear body armor thereby leading to a kind of arms race in which hitters stand closer and pitchers try and back them off.

    It struck me, however, in light of my columns the past two weeks, that the increasing size of major league hitters may also play a role all on its own, especially in the past 30 years. Larger hitters would likely be less afraid of being hit and observational evidence tells me that hitters today do less to avoid being hit than did hitters in the past. This struck home to me as I watched the footage from the 1954 World Series the other night.

    I also noted that J.C. Bradbury mentions that he discusses how the distribution of talent in baseball has affected the HBP rate in his new book which should be available in mid-March. I'm looking forward to giving it a read.


    Update: Baseball Musings has a little post on this subject and two interesting graphs. To me, the first illustrates that the rate of HBP has affected both low and high ERA pitchers roughly equally over the course of baseball history although in the last few years it seems to have diverged. The second graph is an illustration of how inferior pitchers now pick up more innings than they did in the past. It should be cautioned, however, that the increasing ERA of the leagues as a whole will cause some of this as the sub 5.00 ERA group shrinks and the +5.00 ERA group grows. Historically the +5.00 group would be very small simply because pitchers with ERAs that high would be so far from the mean.

    Friday, January 26, 2007

    Mid-Course Corrections

    So here is something I don't understand (I know that will come as no surprise to many). In researching the prevalence of in-season managerial changes for teams that made it to the postseason I noticed that for playoff teams and other teams alike there is an interesting curve that looks as follows:



    In-season managerial changes hovered at around 10% of teams in the first three decades (1900s-1920s) of the 20th century before increasing to around 15% over the next three decades (1930s-1950s) and then exploding to over 20% for the next three decades (1960s-1980s). Since then the frequency of changes has dropped again to around 13%.

    Why the upward trend over much of baseball history? I thought at first it might be because it seemed to work but outside of the 1932 and 1938 Cubs who changed managers midstream and caught fire on their way to the pennant, there aren't similar cases until the summer of 1978 when Yankees manager Billy Martin resigned and was replaced by Bob Lemon. And then why the downward trend over the past two decades? Is a new found realization that the influence of managers is circumscribed responsible for the downward trend? Is it the more substantial financial investment in managers by front offices that makes them less willing to pull the plug during the season? Is there something else going on? It's a mystery to me.

    Incidentally, over the course of history 15.6% of teams changes manager during the season - 11.9% for teams that reach the post season. I would have thought the latter percentage would have been less but then again it is sometimes the case that a team going nowhere installs a new manager (the 1989 Blue Jays who fired Jimmy Williams after a 12-24 start and hired Cito Gaston who led them to a 77-49 finish comes to mind) and takes off for whatever reason.

    Thursday, January 25, 2007

    Triple Play

    My column this week titled "A Triple Redux" on BP focuses on the relationship between triples and body mass index (BMI) over the history of baseball. The topic was suggested to me by several readers who noted that in the my analysis of triples titled "Baseball's Trifecta" published back in September I looked at several of the theories that may be responsible for the decline in triples including more highly skilled and athletic outfielders, park configurations, risk aversion in a changing run environment, and the aging of the player population.

    Several readers pointed out that I didn't look at the changing bulk of players over time as a possible explanation. The long and the short of it is that controlling for the distribution of the changing BMI in the player population does not substantially lessen the steep curve that tracks the decline of triples. Bigger players hit fewer triples and from the 1980s through 2006 the percentage of players with higher BMIs has risen substantially (weight training and the spectre of steroids), but simply not enough to offset the background decline in the triple rate.

    What were left with then, is the likelihood that declining triples is a consequence of both the standardization of the game and baseball's ever-increasing level of play.

    On the Merits of Probability

    Some have called the Dodgers/Padres game on September 18, 2006 "the game of the century" for its unparalleled excitement and finish in the midst of a pennant race. Down 9-5 entering the bottom of the ninth inning the Dodgers Jeff Kent, J.D. Drew, Russell Martin, and Marlon Anderson hit consecutive homeruns off of first Jon Adkins and then Trevor Hoffman to not the game at 9-all. In the top of the 10th the Padres once again struck for a run on a Josh Bard single to take the lead. But in the bottom of the 10th with Kenny Lofton aboard and one out, comeback player of the year Nomar Garciapara sent the Dodgers faithful home with a 11-10 victory powered by his homerun to left field. Oh and it put the Dodgers in first place to boot.

    The following graph shows the Win Expectancy (WX) for the Dodgers during the game and highlights some of the key plays along the way. The table that follows includes each play and how that play either increased or decreased the WX for the Dodgers.



                                                                                  Score
    Inning Outs Batter Event Text Start End Diff LA SD
    1 0 Dave Roberts 43/G 0.500 0.522 0.022 0 0
    1 1 Brian Giles K 0.522 0.538 0.016 0 0
    1 2 Adrian Gonzalez S8/L 0.538 0.527 -0.012 0 0
    1 2 Mike Piazza D8/L.1-H 0.527 0.419 -0.108 0 1
    1 2 Russell Branyan W 0.419 0.410 -0.009 0 1
    1 2 Mike Cameron T9/L.2-H;1-H 0.410 0.245 -0.165 0 3
    1 2 Geoff Blum S9/L.3-H 0.245 0.187 -0.058 0 4
    1 2 Josh Barfield 8/F 0.187 0.198 0.011 0 4
    1 0 Rafael Furcal S5/BG 0.198 0.227 0.029 0 4
    1 0 Kenny Lofton S8/G.1-2 0.227 0.276 0.049 0 4
    1 0 Nomar Garciaparra 64(1)3/GDP.2-3 0.276 0.185 -0.091 0 4
    1 2 Jeff Kent D8/F.3-H 0.185 0.252 0.066 1 4
    1 2 J.D. Drew K 0.252 0.223 -0.028 1 4
    2 0 Jake Peavy K 0.223 0.237 0.014 1 4
    2 1 Dave Roberts K 0.237 0.248 0.010 1 4
    2 2 Brian Giles S7/G 0.248 0.240 -0.007 1 4
    2 2 Adrian Gonzalez K 0.240 0.254 0.014 1 4
    2 0 Russell Martin 13/G 0.254 0.232 -0.023 1 4
    2 1 Marlon Anderson HR/9/F 0.232 0.317 0.086 2 4
    2 1 Wilson Betemit 43/G 0.317 0.300 -0.017 2 4
    2 2 Brad Penny K 0.300 0.289 -0.011 2 4
    3 0 Mike Piazza 53/G 0.289 0.307 0.018 2 4
    3 1 Russell Branyan K 0.307 0.320 0.013 2 4
    3 2 Mike Cameron S7/G 0.320 0.311 -0.010 2 4
    3 2 Geoff Blum CS2(26) 0.311 0.329 0.018 2 4
    3 0 Rafael Furcal HR/8/F 0.329 0.438 0.109 3 4
    3 0 Kenny Lofton K 0.438 0.410 -0.028 3 4
    3 1 Nomar Garciaparra 9/F 0.410 0.390 -0.020 3 4
    3 2 Jeff Kent D8/L 0.390 0.417 0.027 3 4
    3 2 J.D. Drew DGR/7/L.2-H 0.417 0.538 0.121 4 4
    3 2 Russell Martin 1/L 0.538 0.500 -0.038 4 4
    4 0 Geoff Blum 6/P 0.500 0.528 0.028 4 4
    4 1 Josh Barfield E6/TH/G 0.528 0.498 -0.030 4 4
    4 1 Jake Peavy 4/P 0.498 0.533 0.035 4 4
    4 2 Dave Roberts CS2(26) 0.533 0.561 0.028 4 4
    4 0 Marlon Anderson S9/G 0.561 0.601 0.041 4 4
    4 0 Wilson Betemit K+SB2 0.601 0.583 -0.019 4 4
    4 1 Brad Penny 6/L 0.583 0.541 -0.041 4 4
    4 2 Rafael Furcal 43/G 0.541 0.500 -0.041 4 4
    5 0 Dave Roberts K/B 0.500 0.530 0.030 4 4
    5 1 Brian Giles 9/F 0.530 0.553 0.022 4 4
    5 2 Adrian Gonzalez S6/G 0.553 0.537 -0.016 4 4
    5 2 Mike Piazza W.1-2 0.537 0.509 -0.027 4 4
    5 2 Russell Branyan W.2-3;1-2 0.509 0.472 -0.037 4 4
    5 2 Mike Cameron 9/F 0.472 0.567 0.095 4 4
    5 0 Kenny Lofton K 0.567 0.537 -0.030 4 4
    5 1 Nomar Garciaparra D8/L 0.537 0.592 0.055 4 4
    5 1 Jeff Kent 63/G 0.592 0.547 -0.046 4 4
    5 2 J.D. Drew IW 0.547 0.558 0.011 4 4
    5 2 Russell Martin 5(2)/FO/G.1-2 0.558 0.500 -0.058 4 4
    6 0 Geoff Blum D9/L 0.500 0.409 -0.091 4 4
    6 0 Josh Barfield K 0.409 0.472 0.063 4 4
    6 1 Terrmel Sledge 43/G.2-3 0.472 0.517 0.045 4 4
    6 2 Dave Roberts K 0.517 0.576 0.059 4 4
    6 0 Marlon Anderson S9/L 0.576 0.625 0.049 4 4
    6 0 Wilson Betemit W.1-2 0.625 0.699 0.073 4 4
    6 0 Oscar Robles FC1/SAC/BG.2-3;1-2 0.699 0.787 0.088 4 4
    6 0 Rafael Furcal 42(3)/FO/G.2-3;1-2 0.787 0.706 -0.081 4 4
    6 1 Kenny Lofton 12(3)3/GDP 0.706 0.500 -0.206 4 4
    7 0 Brian Giles E5/G 0.500 0.443 -0.057 4 4
    7 0 Adrian Gonzalez 3/SAC/BG.1-2 0.443 0.466 0.023 4 4
    7 1 Mike Piazza IW 0.466 0.441 -0.025 4 4
    7 1 Josh Bard 54(1)3/GDP 0.441 0.590 0.148 4 4
    7 0 Nomar Garciaparra 7/L 0.589 0.551 -0.039 4 4
    7 1 Jeff Kent S8/L 0.551 0.591 0.040 4 4
    7 1 J.D. Drew 6(1)/FO/G 0.591 0.541 -0.050 4 4
    7 2 Russell Martin 13/G 0.541 0.500 -0.041 4 4
    8 0 Mike Cameron 9/L 0.500 0.548 0.048 4 4
    8 1 Geoff Blum W 0.548 0.499 -0.049 4 4
    8 1 Josh Barfield D9/L.1-H;B-3(TH) 0.499 0.197 -0.301 4 5
    8 1 Todd Walker S8/L.3-H 0.197 0.137 -0.061 4 6
    8 1 Dave Roberts K+SB2 0.137 0.146 0.009 4 6
    8 2 Brian Giles WP.2-3 0.146 0.143 -0.003 4 6
    8 2 Brian Giles 9/F 0.143 0.167 0.024 4 6
    8 0 Marlon Anderson T9/L 0.167 0.300 0.133 4 6
    8 0 Wilson Betemit S8/G.3-H 0.300 0.405 0.105 5 6
    8 0 Olmedo Saenz K 0.405 0.316 -0.089 5 6
    8 1 Rafael Furcal 7/F 0.316 0.234 -0.082 5 6
    8 2 Kenny Lofton D9/L.1-3 0.234 0.334 0.100 5 6
    8 2 Nomar Garciaparra K 0.334 0.166 -0.168 5 6
    9 0 Adrian Gonzalez S7/L 0.166 0.142 -0.024 5 6
    9 0 Manny Alexander 14/SAC/BG.1-2 0.142 0.150 0.008 5 6
    9 1 Josh Bard D8/F.2-3 0.150 0.104 -0.046 5 6
    9 1 Mike Cameron IW 0.104 0.103 -0.001 5 6
    9 1 Geoff Blum WP.3-H;2-3;1-2 0.103 0.047 -0.055 5 7
    9 1 Geoff Blum 8/SF.3-H;2-3 0.047 0.036 -0.011 5 8
    9 2 Josh Barfield S9/L.3-H 0.036 0.018 -0.018 5 9
    9 2 Jack Cust 3/G 0.018 0.019 0.002 5 9
    9 0 Jeff Kent HR/8/F 0.019 0.043 0.023 6 9
    9 0 J.D. Drew HR/9/F 0.043 0.094 0.051 7 9
    9 0 Russell Martin HR/7/F 0.094 0.206 0.112 8 9
    9 0 Marlon Anderson HR/9/F 0.206 0.642 0.436 9 9
    9 0 Julio Lugo 8/F 0.642 0.583 -0.059 9 9
    9 1 Andre Ethier 6/P 0.583 0.536 -0.048 9 9
    9 2 Rafael Furcal 9/F 0.536 0.500 -0.036 9 9
    10 0 Dave Roberts 8/L 0.500 0.560 0.060 9 9
    10 1 Brian Giles D7/L 0.560 0.442 -0.118 9 9
    10 1 Adrian Gonzalez IW 0.442 0.421 -0.021 9 9
    10 1 Paul McAnulty 8/F 0.421 0.523 0.102 9 9
    10 2 Josh Bard S9/L.2-H;1-3 0.523 0.167 -0.356 9 10
    10 2 Mike Cameron W.1-2 0.167 0.155 -0.012 9 10
    10 2 Geoff Blum 9/F 0.155 0.206 0.051 9 10
    10 0 Kenny Lofton W 0.206 0.338 0.132 9 10
    10 0 Nomar Garciaparra HR/7/F.1-H 0.338 1.000 0.662 11 10

    In the aftermath of that game fellow Baseball Prospectus author Will Carroll and I engaged in a dialogue on the merits and usefulness of measures such as Win Probability Added (WPA) and Win Expectancy Added (WXA). And so for your reading pleasure think of this as a primer on the subject as Will raises legitimate criticisms and throws me a few bones along the way...


    [WCarroll] Ok, this WPA thing is beyond me, Dan. I realize that I'm the guy in the group that can't do the complex math, but this reeks of the type of things that statheads get hated for. On the one hand, it reduces "clutch" to mathematical terms, which seems counter to most orthodox analysis and on the other, it makes timing more important than skill. If we look at the amazing Dodgers game this week, Marlon Anderson comes out as the hero. I realize he went five for five with a pair of homers, but why was his homer - the fourth in sequence - any more important than the first one? Jeff Kent's homer came four runs down and started this thing, but he gets almost no credit. And what about the fact the two of the homeruns came off of a superior pitcher in Trevor Hoffman? How does that work?

    [DFox] Oh Will. Statheads love to get hated for stuff like this and so I doubt they’ll much sleep over it. But seriously all of the objections you cite are perfectly legitimate. But first, let’s keep in mind that what Win Probability Added (or Win Expectancy Added, which is slightly different although those differences are not important at the moment) is trying to do. At its core it is a technique that’s been around for more than 30 years that simply attempts to quantify how far a player’s actions in a particular game push his team toward a win or a loss. The assignment of those probabilities, and this is the core of all the objections, are based on a matrix that indicates just how probable it is that a team will win given a series of specific game states taking into account outs, inning, score and so on. By crediting (or debiting) the player for a change in the game state we assign them with a certain amount of WPA or WXA (granted, the simplistic way that most folks do this today is to assign all the credit to the pitcher and batter while leaving out the rest of the defense entirely).

    Now, because the probabilities used take into account the inning and score primarily, it will always be the case that an event that ties the game or puts the team head will have a much larger magnitude than the same event that occurs in a different context. That’s just the nature of the technique.

    [WCarroll] So you're admitting this is flawed? The idea that the timing of a play has as much to do with the sequence is flawed to me. Now, the likelihood of the back times four homers occurring is so low as to be near zero and not worth calculating, it still seems to me that the sequence of events is ignored here. Anderson's homer is a reduced value without Kent's or Martin's and not accounting for that seems to call the technique into question.

    [DFox] In the case of Kent versus Anderson Kent’s homerun came at a time when the Dodgers had just a 1.9% chance of winning, being down by four runs in the bottom of the ninth as they were. Although he hit the homerun to make it 4-1 the Dodgers still had just a 4.3% chance of winning and therefore we credit Kent with a 2.3% change or .023 in WXA terms. Anderson on the other hand hit his second homerun when the Dodgers had a 20.6% chance of winning (the two intervening homers raising the odds by just over 5 and 11 percent respectively) and pushed them to 64.2% thereby assigning him a WXA of .436. Of course, Anderson’s homerun wasn’t more important in the big sense of contributing to the win, but it was the event that pushed the Dodgers over the top in terms of their odds of winning the game.

    [WCarroll] I can see where the math is going, but Kent's home run is so necessary to the process that it seems he should get more credit than just making a four run game into a three run game with a swing of the bat. The event driven model doesn't account well for the actual nature of the game. Did J.D. Drew’s home run cause the pitching change and if so, where is that factored in?

    [DFox] I would certainly agree with you that Kent’s homerun was absolutely necessary to the process. However, it can’t really be argued that immediately after the homerun the Dodgers had a greatly improved chance of winning. They didn’t. They still were down three runs in the bottom of the ninth with no runners on base and the technique credits him appropriately. But this, I think, gets to the heart of one the objections you raised in your initial question regarding two of the homers coming against a tougher pitcher than the other two. The way in which WPA and WXA are calculated do take into account a good portion of the context in which the play was made (and WXA takes in more by including the run environment in a theoretical framework) – but not all of it. While you could conceivably adjust the probability of winning for each batter/pitcher matchup along with a host of other variables including defensive personnel and positioning, weather, tendencies of the manager, and what the batter had for breakfast, the ability to use the technique would drop sharply. In the end these methods provide a model of the game and like all models are imperfect. It’s a bit of a balancing act between greater precision on the one hand and usability on the other.

    As to the question of the pitching change anyone watching the game could see that the Drew homerun on the back of the Kent dinger “caused” the pitching change. But again, that is not factored into the equation since the models most folks use don’t take into account differences in batter/pitcher matchups nor the relative strength of the respective team bullpens. For example, although the Dodgers had a 5.1% chance of winning after Drew’s homerun one could argue that with Hoffman still available in the pen their chances were actually smaller than that.

    [WCarroll] Everyone's looking for a quantification of clutch and in this analysis, I'm not convinced that the methodology does anything more than make nice graphs and flawed conclusions.

    [DFox] But it does allow us to make pretty graphs and that should count for something shouldn’t it? :)

    Although originally the Mills Brothers who pioneered this concept (albeit in a slightly different form) in a 1970 book titled Player Win Averages: A Computer Guide to Winning Baseball Players had intended for it to be used to measure clutch ability, this doesn’t do it for the simple reason that it doesn’t correct for the quality, or leverage, of a player’s opportunities. Anderson and Kent did not have the same level of opportunity to accumulate WXA in this game. I think of it more as measuring the total contribution of a player towards winning and losing given the situations they were placed in and so your caution against flawed conclusions is well taken. Of course the more games you aggregate the more the quality of the opportunities tends to even out leveling the playing field (for most hitters anyway, for relief pitchers and especially setup men and closers it’s a different story).

    Part of the confusion I think comes from our terms. I’ll admit to having written in the past that we can use WX to quantify "clutch performances" but that is a different thing than “clutch ability”. The former is an acknowledgement that an individual play like Anderson’s second homerun was a very important play and WX can be used to get a feel for how improbable those "clutch" performances are. The latter, however, is about the inherent ability of a player to perform above his normal level when the game is on the line.

    To measure clutch ability what analysts have done as you know is look for differences in performance in situations termed clutch and non clutch and see if those differences persist across seasons or careers. What they’ve found in analysis like that in The Book is that there may indeed be a small clutch ability but that ability is basically drowned out by the normal variability inherent in the game. In other words, it may be there but it doesn’t matter much. Now, if it had been shown that there is a wide variance in ability between players in clutch situations then WPA and WXA would be much more useful in measuring that ability all other things (like opportunities given larger sample sizes) being equal. But since that is not the case WXA is obviously not going to be able to capture it.

    [WCarroll] Sure. I'm not going to disagree with the theory and I'm certainly not going to critique the math, but what this amounts to is a grand equation that seems to be Game Winning RBI. Back to back to back to back is improbable, sure, but let's look at the situation. What if the hitter before Anderson gets out? Hits a double? How does that change things from this standpoint. In one case, they win in a different, slightly less unique way (or tie that is) and in another, Anderson is penalized for something he has absolutely no control over.

    Forget the red herring of clutch. What we have here is a measure of timing, of coincidence, and have disconnected talent from the discussion. Anderson is not a better player than any of the other three. He's been on a hot streak since coming west, but few would argue that he may have been the weakest player on Little's lineup card. For one night, he hits well -- 5 for 5 is nice -- and hits in an interesting way, but he's still the weakest player in the lineup. He just has a better story to tell his grandchildren some day.

    [DFox] I think your intuition about WPA or WXA being a fancy form of GWRBI makes for a good analogy. I’m certainly not saying that using WXA for a single game allows you to make a case that Marlon Anderson or Neifi Perez for that matter are actually better players than J.D. Drew or Brandon Inge. And you’re completely on track that if the hitter before Anderson makes an out, or does anything but hit a homerun, it changes the potential WXA for Anderson’s plate appearance. And yes, Anderson has no control over it and as mentioned previously the performance analysis community in general doesn’t think that Anderson has much control over how he performs (relative to his true talent level) when in the situation dictated by the previous hitters in the inning and in the game.

    In the end WXA is a measure of timing and coincidence and when taken in very small doses (one very exciting game for example), it is largely disconnected from talent. That’s why when taken over the course of a season or career, if you rank players by their WXA Drew will still beat Anderson and Inge will still beat Perez (actually, almost everyone whose ever played will beat Perez). That said, I believe it’s still an interesting analytical tool when looking not at projecting or evaluating talent, but when looking at the flow of individual games and crediting players over the long haul. For example, I wouldn’t be worried about using seasonal WXA as a tool for input into the discussion of MVPs, Rookie of the Year, or Cy Young awards. In fact, to me, WXA would provide a good mix of who’s the best player and who was the most valuable (leaving questions of team quality aside since WXA doesn’t account for that).

    Wednesday, January 24, 2007

    SABR and the Humidor

    For those in the Denver area there will be a meeting of the Rocky Mountain SABR chapter on Saturday February 10th from 9:30AM until Noon at the Breckenridge Brewery - 220 Blake Street in downtown Denver near Coors Field. This will be the annual "Hot Stove" meeting but the topic of discussion will be the humidor with speakers Walter Sylvester, an Assistant in the Baseball Operations department for the Rockies and Dave Dresen presenting on the topic of "Physics of Baseball at Altitude". Should be a great time and so if you're interested in perhaps joining SABR feel free to come by.

    If you're interested in the topic of the humidor you might enjoy these...

  • Schrodinger's Bat: Swing and Miss

  • Schrodinger's Bat: More Humidity

  • Of Humidors and Humidity

  • Coors Field Fun Facts

  • Saturday, January 20, 2007

    The Power of Squares

    Nice article by Dave Studeman over at Baseball Analysts on Pythagoras, run estimation and Bill James. I especially liked the following:

    "The power of two is everywhere in life. E=MC squared, after all. When you move closer to a light, cutting the distance in half, the light doesn't become twice as bright...So when Bill James discovered that the nature of runs to winning is squared, it seemed as though something essential and fundamental had been discovered."

    Another example of this phenomena is the inverse-square law of gravitation which Newton published in his Principia but which was first hinted at by Ismael Bullialdus and known (or guessed at) in some form to the likes of Christopher Wren, Emond Halley, and Robert Hooke as told in James Gleick's wonderful biography of Isaac Newton titled Isaac Newton.

    For more thoughts on run estimation see:

    Run Estimation for the Masses
    A Closer Look at Run Estimation

    Thursday, January 18, 2007

    Myths and Excellence

    My column this week on Baseball Prospectus, published this morning, is titled "The Myth of the Golden Age" and explores the reasons why, and the demonstration of, an increasing level of play over time. In addition to reviewing the arguments the late Stephen Jay Gould put used in his 1996 book Full House: The Spread of Excellence from Plato to Darwin, I take a quick look at how the hitting of pitchers relative to position players (inspired by the comments of fellow SABR member Stew Thornley) has changed over the course of time and how it is arguably a demonstration of increasing excellence under an evolutionary model.

    One of the interesting aspects of that discussion involves how the slope of the relative OPS of pitchers (defined very loosely as players who appears in more than one game as a pitcher in a given year) seems to have changed after World War II. The following two graphs illustrate this change using linear trend lines.





    You'll notice that the slope of the line in the first graph is over twice that of the second. This supports nicely the research by Gould on decreasing variation in batting average and Nate Silver's research in Baseball Between the Numbers that shows the game stabilizing after 1940.

    Monday, January 15, 2007

    Umpire Stats

    Also was alerted to this interesting analysis of umpires based on data from BP. This analysis is just for 2006.

    In and of itself this doesn't tell us much since you would expect there to be some spread here due to randomness. Nor is it surprising that, for example, some of the same umpires show up in high SO/9 quadrant as well as the high percentage of called strikes quadrant since they are clearly related. It would be interesting to see whether there is any trend that holds over from year to year (for example for Randy Marsh who has a very low % of called balls and a high % of called strikes) and then to quantify the effect if any. The author notes that he is looking into this so stay tuned.

    And while the two measures that are shown are clearly related it could be that some umps are more hesitant to ring batters up or make a ball four call and so that could explain why an umpire appears to be a "hitter's ump" in one graph and a "pitcher's ump" in another (again Randy Marsh is an example). Interesting stuff.

    Teams and Leagues on the Bases

    This week on BP in my column I focused on team baserunning in 2006 where the Angels led the pack with 13.64 theoretical runs picked up on the bases aggregated from EqAAR, EqGAR, EqHAR, and EqSBR. On the opposite end of the spectrum the White Sox were at -21.77 runs and did especially poorly in advancing on hits where almost everybody (save Scott Podsednick and Pablo Ozuna) did terribly.

    I also took a look at the differences between the leagues and it appears the NL is indeed the better baserunning league even after controlling for the poor baserunning of both designated hitters in the AL and pitchers in the NL. This result tends to negate the idea that the AL outfields are simply better at depressing runner advancement since NL runners also do better in advancing on the ground.

    Thursday, January 11, 2007

    The Hook Part II

    In a recent post I provided a list of how frequently teams enjoyed the platoon advantage on defense when changing pitchers. Tangotiger asks...

    "Your list for pitchers is 62%. Of course, when a closer is brought in, the manager is not looking at the platoon advantage. As well, in blowouts, the manager is not looking for platoon advantages.

    Can you break up your list based on whether the pitcher is the "ace" or not, and whether the score is within 4 runs or not?"
    Yes.

    Below is the same table but this time only when the team's closer (defined as the player who had the most saves in 2006) is not the pitcher being brought in and when the run differential is 3 or fewers runs.


    Team Changes Adv Pct
    SEA 259 199 76.8%
    DET 219 167 76.3%
    CHA 217 163 75.1%
    SLN 231 166 71.9%
    CLE 202 144 71.3%
    PIT 332 235 70.8%
    TOR 290 204 70.3%
    BAL 260 182 70.0%
    NYA 264 184 69.7%
    CHN 290 200 69.0%
    COL 296 201 67.9%
    CIN 256 173 67.6%
    KCA 275 183 66.5%
    MIN 217 144 66.4%
    SFN 274 181 66.1%
    HOU 274 180 65.7%
    ATL 317 208 65.6%
    MIL 255 167 65.5%
    TEX 253 164 64.8%
    PHI 288 185 64.2%
    ANA 186 119 64.0%
    OAK 247 156 63.2%
    SDN 295 183 62.0%
    WAS 280 171 61.1%
    FLO 230 139 60.4%
    BOS 245 147 60.0%
    NYN 246 145 58.9%
    ARI 286 164 57.3%
    TBA 291 166 57.0%
    LAN 237 130 54.9%

    Total 7812 5150 65.9%


    As you can see it doesn't change the numbers that greatly and lifts the aggregate from 62% to 66%. As Tangotiger points out, the offense takes the platoon advantage 78% of the time when pinch hitting and that disparity is to be expected since the pitcher must actually pitch to one hitter and because these numbers include instances where the offensive team then used a pinch hitter, which in many cases gains they do for the express purpose of gaining the advantage. If non-pinch hitting cases are excluded the overall percentage goes up to 69% with Seattle climbing to 82.3% and the Dodgers still at the bottom at 58.8%.

    When the game is tighter managers are also more likely to try and gain the platoon advantage (non closer).


    Tied 65.5%
    1 Run 67.0%
    2 Run 65.9%
    3 Run 64.6%
    4 Run 64.0%
    5 Run 62.1%
    6 Run 55.3%
    >6 53.3%


    Perhaps counterintuitively, managers do not seem to display the same tendency based on the inning of the pitching change (given 3 runs or fewer differential and non-closer).


    Inning 4 69.4%
    Inning 5 69.0%
    Inning 6 69.4%
    Inning 7 67.9%
    Inning 8 66.6%
    Inning 9 58.4%
    Inning 10+ 58.1%


    The reason for this is probably due in large part to the fact that the manager has only so many pitchers at his disposal and so his first reliever (likely used in the 6th or 7th innings) will be the most likely to enjoy the advantage. The manager will then be forced to use the pitchers that remain as the game goes later and so can't be so choosy about the matchup he'll get.

    Tuesday, January 09, 2007

    Ripken and Gwynn

    Well, the votes have been cast and both Cal Ripken Jr. and Tony Gwynn enter the Hall of Fame (not unanimously as revealed yesterday). And as expected Mark McGwire received less than a third of the vote based on the "wait and see" approach as applied to him specifically which I applaud. Steve Garvey was in his final year of elligibility and all those below 5% will be removed from the ballot as well.

    Hard to believe someone actually cast votes for Jose Canseco, Ken Caminiti, Dante Bichette et. al. but then again...

    Jim Rice (64.8% in 2006), Andre Dawson (61%), and Bert Blyleven (53.3%) all fell in the voting although Goose Gossage (64.6% based on last year's induction of Bruce Sutter) has gained strength.

    For me, of those on this list and not elected I would like to see Blyleven, Gossage, and McGwire (if nothing further develops in a few years) in the Hall but the HOF is not usually something I get too fired up about either way.


    2007 BBWAA Hall of Fame Voting Results
    Name Votes % of Votes
    Cal Ripken Jr. 537 98.5
    Tony Gwynn 532 97.6
    Rich Gossage 388 71.2
    Jim Rice 346 63.5
    Andre Dawson 309 56.7
    Bert Blyleven 260 47.7
    Lee Smith 217 39.8
    Jack Morris 202 37.1
    Mark McGwire 128 23.5
    Tommy John 125 22.9
    Steve Garvey 115 21.1
    D.Concepcion 74 13.6
    Alan Trammell 73 13.4
    Dave Parker 62 11.4
    Don Mattingly 54 9.9
    Dale Murphy 50 9.2
    Harold Baines 29 5.3
    Orel Hershiser 24 4.4
    Albert Belle 19 3.5
    Paul O'Neill 12 2.2
    Bret Saberhagen 7 1.3
    Jose Canseco 6 1.1
    Tony Fernandez 4 0.7
    Dante Bichette 3 0.6
    Eric Davis 3 0.6
    Bobby Bonilla 2 0.4
    Ken Caminiti 2 0.4
    Jay Buhner 1 0.2
    Scott Brosius 0 0.0
    Wally Joyner 0 0.0
    Devon White 0 0.0
    Bobby Witt 0 0.0

    Monday, January 08, 2007

    The Future of Data Collection

    In my post on the Year in Review I noted that I'm looking forward to my third season as a stats stringer for MLB.com. To that piece of info Tangotiger asked whether the stringers would be using stopwatches to record data items like hang time in order to more accurately measure batted balls for purposes of defensive evaluation.

    Before I had a chance to ask, Tango took matters into his own hands and had an interesting email conversation with the Director of Stats at MLBAM.

    One tidbit here, as many have guessed, is that there will likely eventually be a subscription or premium service to get access to this data in a more useable format. In addition, in relation to my column last week on camera angles he had this to say regarding the Enhanced Gameday system used in the 2006 postseason and which I wrote about here.

    "As an aside, what’s been amazing to me about this program is what we’ve learned from the data we captured last season. That is, we found out that what we thought we understood about pitch movement has been, for lack of a better word, wrong. Think about how most fans observe pitches: on TV, through the center field camera. However, think about the challenges of accurately judging the pitch this way: you’re trying to follow a 4-inch wide ball from a distance of 400 or more feet, scaled down onto a 27-inch TV screen or 17-inch computer monitor, or whatever your viewing screen might be. And don’t forget that the camera is offset from center by an unknown amount that varies in each ballpark. This creates massive scaling errors in the human mind… for instance, we discovered that in many cases, a pitch that looks like it just missed the black may actually have been 8 to 10 inches outside."

    This is fundamentally the reason why other camera angles or even enhanced computer images like Gameday would be wonderful to have. While the centerfield angle may give us the most information about the pitch in real time, that information is not very accurate.

    Pitching Change Platoon Advantages

    As some readers may be aware the Cubs had a really really bad season on their way to losing 96 games. They gave up 834 runs, good for second worst in the NL in no small part as a result of 60 starts being given to pitchers who had never seen big league time before. By comparison Houston gave 44 starts to newbies, Florida 40, and Tampa Bay 34 while Cincinnati had none at all.

    While that may bode well for the future as the Cubs will now have some experience at the AAA level (Sean Marshall pictured on the left and Carlos Marmol particularly) they can draw from, it resulted in 2006 in Dusty Baker making 542 pitching changes, easily outpacing the previous record set by the Giants in 2004.

    All of those pitching changes got me to wondering how frequently a manager tries to maintain the platoon advantage when making a pitching change. In other words, while The Bill James Handbook publishes the numbers for how often a manager maintains the platoon advantage when making out his lineups, I've never seen the numbers for pitching changes. I wrote and ran a simple script to count each pitching change and determine when the defense had the platoon advantage. The results are shown in the table below and as you can see the percentage varies from the mid 50s to the low 70s. Obviously, these numbers are heavily influenced by roster construction and the effectiveness of the pitchers the manager has to work with at any given time. It's not surprising to me to see the Cardinals near the top although I would have expected the Rockies to be up there as well as Clint Hurdle seems to like using LOOGYs even when they are manifestly ineffective (Ray King).





    Team Changes PlatoonAdv Pct
    SEA 429 309 72.0%
    CHA 398 282 70.9%
    DET 390 261 66.9%
    CIN 475 317 66.7%
    SLN 468 312 66.7%
    CHN 542 360 66.4%
    NYA 488 314 64.3%
    TEX 489 313 64.0%
    KCA 473 302 63.8%
    BAL 472 300 63.6%
    OAK 444 282 63.5%
    MIL 427 271 63.5%
    PIT 504 319 63.3%
    MIN 421 265 62.9%
    HOU 497 312 62.8%
    SFN 438 271 61.9%
    ANA 380 235 61.8%
    CLE 376 232 61.7%
    ATL 522 321 61.5%
    TOR 481 292 60.7%
    COL 498 302 60.6%
    WAS 515 306 59.4%
    SDN 475 276 58.1%
    BOS 454 262 57.7%
    TBA 444 256 57.7%
    FLO 435 249 57.2%
    PHI 500 286 57.2%
    ARI 461 256 55.5%
    LAN 454 252 55.5%
    NYN 473 252 53.3%

    Thursday, January 04, 2007

    Wish List 2007

    My column today was a wish list for 2007 and beyond that includes some of my pet peeves like the rigid television broadcasting of the game, doubleheaders, interleague play, "this time it counts", "small-ball wins in the post-season", etc.

    For some of the research on baseball and television I used Peter Morris' A Game of Inches: The Stories Behind the Innovations That Shaped Baseball Volume 2: The Game Behind the Scenes published in 2006. I'll have to admit I wasn't aware of the first volume either until I saw this one while perusing a Barnes & Noble over Christmas break. Any baseball fan will want to add both volumes to their library as it contains loads of interesting tidbits on everything from the evolution (not invention) of the pitcher's mound to the first recorded wave on October 15, 1866.

    MLB 2K6


    I think I mentioned in a blog post earlier this year that I had purchased an XBox 360 and bought a copy of MLB 2K6 as soon as it was available (you can get it now for $19.95). I've now played two full seasons in Franchise mode (using the default settings) as the Cubs and wanted to share the results.

    In season one (2006) I finished second in the NL Central behind the Cardinals. Although my team won 93 games the Cardinals won 105 and I was never really in the race. I was hampered by injuries to Todd Walker (60 games) and Ronny Cedeno and underperformance by Derrek Lee and Aramis Ramirez. I was able to flip Greg Maddux and a couple throw-ins for Ben Sheets and J.J. Hardy although Hardy ended up in the minors. I also upgraded by acquiring Wily Mo Pena to play left field. Carlos Zambrano was the brightest spot as he won the Cy Young award.

    For the first two-thirds of the season I simulated the games and only set the typical lineups, rotation, and playing time and then let the computer manage each game. I did notice that once I started managing the games myself, the record improved dramatically but too little too late. The computer manager makes poor decisions frequently. When managing the games, however, there is apparently no way to warm up a pitcher when not on defense and so it makes pinch hitting more difficult on occasion. You also don't have the option of setting the defense in manager mode as you do when actually playing the game. You also don't have the option of skipping to the end of the game when it gets out of hand. Another problem which manifested itself in both seasons was that although roster expansion on September 1st is a part of the game, the roster screen wouldn't allow me to bring up any minor leaguers and so I was stuck with 25 players through the end of the season. The trade deadline worked as expected and the options for finding and offering players are pretty decent.

    After the season is over you go through a five round draft period and free agent signings before beginning the next season. Players that you haven't extended before the end of the season go into the free agent pool. I've heard from others that there is a bug where the next season will start with a schedule containing only 10 games but I haven't seen that in either of my seasons. The free agent period lasts "10 days" and allows you to make offers and see how the market is progressing. What I've noticed in both trying to re-sign my own players and sign free agents is that when a player says he'll sign with your team for x dollars over y years he's not kidding. I've tried numerous times to offer slightly more money in exchange for fewer years (it seems almost everybody wants a 3-5 year contract) but to no avail. As a result I ended up signing a couple of free agents who would take 2-year contracts. I should have mentioned that before the season you're given a budget with which to work and for the Cubs that was around $75M for the 2006 season. If you reach various milestones you'll receive additional money the following season.


    There's also an interesting player morale system whereby each player has a rating that can be boosted by changing his batting order position or adjusting his playing time. Generally it seems that aside from these two variables the aggressiveness settings of the manager also play into how the player feels as well as whether he'll sign with your team as a free agent. You can call team meetings to try and affect the morale but I haven't messed around with this feature that much.


    One of the features I liked very much is the Inside Edge scouting reports. In Franchise mode you can use some of your budget to purchase reports for entire teams, hitters, or pitchers and of course they give you a slight advantage. I made sure to purchase them for my primary division rivals and of course its interesting just to look through them since they're based on actual data. Those scouting reports translate into the live action mode as well and when pitching suggest pitches and location and when hitting reveal likely zones for the upcoming pitch.

    When I started my second season things started to go haywire. I was able to trade Ben Sheets and Glendon Rusch for Derek Jeter to shore up my hole at shortstop and pick up Scott Kazmir but otherwise started the season with roughly the same roster as the year before. This time, however, the gaming engine seemed to allow my pitchers to dominate at the same time my hitting took off. I was also able to swing a deal for Jason Bay and Craig Miller at mid season by sending Jacque Jones and Michael Barrett to Pittsburgh. The end result of the pitching dominance was that my team went 120-42 with Zambrano and Mark Prior shutting down the rest of the league (throwing three no-hitters between them) and finishing 1-2 in the Cy Young balloting with Prior first this time. Zambrano went 33-4 with a 1.10 ERA and Prior 29-3, 1.05. Both pitched around 300 innings with Prior striking out 481 batters. The third and fourth starters in Jerome Williams and Scott Kazmir both won 19 games with eras in the low 2s and struck out over 200 batters each. On the offensive side Derrek Lee hit 42 homeruns and drove in 137 hitting .327 while Ramirez hit 27 homeruns, Pena 29, and Bay 24. That wasn't the strangest part however. Juan Pierre, who I tried to trade in the offseason, hit .342 with 78 walks and stole 172 bases in 198 attempts. As a result he won the MVP with Lee coming in second. Weird.

    In the playoffs I was ousted in the first round by the Mets losing two extra-inning games but did pick up $3M to work with in the following season.

    In perusing the other teams statistics it's clear that the rest of the league must have hit something like .240 while my team hit .285. As I mentioned I didn't mess around with any of the settings and wanted to see how the game played out of the box. It'll be interesting to see if the trend continues as I move in the '08 season or if I'll need to start adjusting the settings.

    I've not read too many kind things about the game in general (there was a freeze bug that has some workarounds and a patch) but I haven't really been displeased overall. The game did freeze on me initially but after replacing the entire Xbox unit I've not had it lock since. I'll play a live action game occasionally and the game play is decent with the pitching controls and catcher placement being especially well-done. The game does come with some historical teams that can be unlocked and so once I got the cheat codes I was excited to play the 1969 Mets and 1976 Reds. Much to my disappointment the rosters of those teams are populated with no-name players I assume because of licensing issues (the same reason Barry Bonds does not appear in the game). The game could also use more historical stadiums and I've had trouble trying to play in the World Baseball Classic mode which should allow you to play a team all the way through the tournament.

    In the final analysis yes the game is a little buggy but I've certainly enjoyed it.