FREE hit counter and Internet traffic statistics from

Monday, February 28, 2005

Taking the Extra Base

One of the interesting things to look at when you have access to play by play data is how often baserunners advance on various plays. Probably the two most common scenarios that come up are the first to third and second to home scenarios when the batter singles. If you’ve read this blog previously you’ll know that these are two of the scenarios I use when calculating my baserunning framework.

In any case here are the leaders for 2004 in going from first to third on a single (more than 10 opportunities):

Jack Wilson PIT 21 13 .619
Larry Walker COL 26 16 .615
Jose Macias CHN 12 7 .583
Miguel Olivo CHA 14 8 .571
Alfonso Soriano TEX 21 12 .571
Chone Figgins ANA 29 16 .551
Laynce Nix TEX 20 11 .550
Robert Fick TBA 22 12 .545
Torii Hunter MIN 22 12 .545
Tony Womack SLN 35 19 .542

The league average is .272.

In contrast Tino Martinez and Bill Mueller were apparently dragging boat anchors behind them as they were 1 for 33 and 1 for 30 in these situations respectively.

In going from second to home the leaders were:

Luis Castillo FLO 16 15 .938
Rondell White DET 13 12 .923
Cesar Izturis LAN 20 18 .900
Dan Wright NYN 10 9 .900
David Wright NYN 10 9 .900
Coco Crisp CLE 17 15 .882
Hideki Matsui NYA 22 19 .863
Reed Johnson TOR 20 17 .850
Vladimir Guerrero ANA 19 16 .842
Dave Roberts BOS 12 10 .833

The league average here is .598. On the bottom of the pile Sammy Sosa was 1 for 10 and Randall Simon, Mike Piazza, and Shannon Stewart were 2 for 12.

Now of course in both of these lists there is no allowance made for where the ball is hit nor how many outs there were when these opportunities occurred. Both of these are factors, however, that come into play when calculating the baserunning framework.

Sunday, February 27, 2005

The Endless Summer

So I was driving across the Western Interior Seaway searching the AM band for some entertainment when I ran across a program called "Satellite Sisters". It was a talk show hosted by four or five real-life sisters that discussed women's issues. As I tuned in they were heatedly discussing the comments made by Harvard President Larry Summers several weeks back. In short Summer's postulated that part of the reason there are fewer tenured professors at University science departments is that there are genetic differences between men and women that lead more men to pursue science as career.

This caused quite an uproar for those who adopt the "blank slate" view of human nature and it was obvious that the Sisters didn't appreciate Mr. Summer's ruminations and were goading their guests to take shots at Summers.

When this first occurred unca pointed me to this column by George Will, my favorite line of which is:

"He thought he was speaking in a place that encourages uncircumscribed intellectual explorations. He was not. He was on a university campus."

However, the incident is not merely the familiar story of group think and political correctness run amok. This morning I flipped on a Sunday morning interview show and was treated to pretty much the same analysis. Interestingly, one of the guests did leave the door open by admitting that there were physical differences between how men and women used their brains (for example the activity in women's brains tends to be more diffuse while in men's it is more concentrated). Directly after that admission the guest went on pose the question that if there are physical differences, then "what can be done about it?"

What can be done about it?

Obviously, there is nothing that can be done about the physical differences. Men and women are what they are. The only action that people can take is to be clear in their thinking. It does no good to deny that group differences exist because it might lead to discrimination. Rather, we should acknowledge the differences in groups and yet understand that the generalizations cannot be applied to any single individual since there is a large overlap in group abilities.

Saturday, February 26, 2005

When to Steal?

Some recent discussion on a list of the SABR statistical analysis committee inspired me to take a quick a look at how stolen base attempts are distributed throughout a game. The results for 2004...

Inning PerInn SB2 CS2 PCT SB3 CS3 PCT SB4 CS4 PCT
1 4857 0.129 400 151 0.726 60 15 0.800 2 0 1.000
2 4858 0.074 200 118 0.629 26 11 0.703 1 5 0.167
3 4856 0.098 280 133 0.678 38 22 0.633 0 5 0.000
4 4857 0.081 243 101 0.706 27 16 0.628 2 4 0.333
5 4856 0.086 237 119 0.666 42 14 0.750 1 6 0.143
6 4853 0.078 221 106 0.676 38 10 0.792 0 3 0.000
7 4851 0.081 255 80 0.761 43 10 0.811 1 2 0.333
8 4850 0.070 219 79 0.735 26 13 0.667 1 3 0.250
9 3771 0.055 130 50 0.722 24 3 0.889 0 2 0.000
10+ 946 0.096 65 17 0.793 7 1 0.875 0 1 0.000

What first caught my attention was that there were more stolen bases attempts per inning (.129) in the first inning than in any other. As you might imagine, this is mostly the case because the leadoff hitter, who is typically more of a stolen base threat, is guaranteed to bat in the first inning. In fact, in all innings in which the leadoff batter hit first (9,308 half innings) there were .119 attempts per half inning.

What most surprised me, however, is that there was not a general trend towards more attempts in later innings. One would think that later in the game teams would tend to gravitate towards strategies that produced single runs when those runs were more valuable. I then wrote a similar query that looked at the distribution of sacrifice hits that showed largely the same thing with the exception of extra innings where sacrifice hits are employed much more often.

Inning SH Per Inning
1 129 0.027
2 187 0.038
3 225 0.046
4 170 0.035
5 223 0.046
6 172 0.035
7 178 0.037
8 185 0.038
9 139 0.037
10 123 0.130

However, at second thought it occurred to me that the spread of the score would tend to increase during the game which may offset this effect. To check this I ran one more query which showed...

Diff PA Att/PA
0 48766 0.024
1 43536 0.022
2 32214 0.020
3 21834 0.020
4 14750 0.019
5 9807 0.011
6 6573 0.007
7 3996 0.001
8 2711 0.001
9 1890 0.001
10 1071 0.001
11 672 0.000
12 292 0.000
13 162 0.000
14 82 0.000
15 92 0.000
16 48 0.000
17 2 0.000
18 1 0.000
19 4 0.000
20 5 0.000
21 23 0.000
22 8 0.000

Here you can see that as the spread of the score increases the number of stolen base attempts per plate appearance decreases. However, given that the stolen base is a tactical weapon that typically increases the odds of scoring a single run at the cost of the big inning I'm surprised that the difference in attempts per half-inning between a one-run game (.022) and a four-run game (.019) is so small (I also checked to see if the frequency was significantly different when, for example, a team was trailing by three runs as opposed to winning by three - there wasn't). Although not shown the stolen base percentage from 0 to +-4 runs is roughly the same and then jumps up over 80% at +-5.

My conclusion is that this is another situation where major league managers don't really understand the value of stolen bases and therefore employ them too often, especially when down by two or more runs.

Tuesday, February 22, 2005


This catchy, albeit hyperbolic, acronym ("there is no such thing as a pitching prospect") is used by the folks at Baseball Prospectus to draw attention to the idea that drafting pitchers is a bit of a crapshoot. This is the case because of the higher probability of injury associated with pitchers - even pitchers drafted out of college. The implication is that organizations can only protect themselves by accumulating alot of good arms and seeing which ones are not winnowed out. In other words, with young pitchers don't put all your eggs in one basket and it is prudent to sometimes sacrifice quality for quantity.

This point was borne out by some research Jason Collette over at RotoJunkie did recently. Jason took a look at the Top 100 Prospect lists from Baseball America from 1990 to 2004 and came up with a list of 420 pitchers. His results...

"The overall numbers of the studies show that in the last 15 years of ranking minor league prospects, there have been 420 pitchers on those lists. Out of those 420 pitchers, only 296 have appeared in the majors, which computes to 70%. Therefore, 30% of the pitchers that appear on this list never make the major leagues. Out of the remaining 296 pitchers, only 103 of them have produced what I consider to be roster-worthy statistics; that’s only 25% of the original 420 pitchers [emphasis added]. If only one in four pitching prospects become roster-worthy material, it is important to do your homework on these guys before picking them for your futures roster. I’ve done the research for you, but please understand past performance doesn’t guarantee future performance."

Case in point: Kyle Snyder of the Royals who is profiled on the Royals website from which most of the information below was gleaned


1999: The 6'8" pitcher is a 1st round draft choice from the University of North Carolina. Struck out 102 batters in 96 2/3 innings. Pitched seven games for class A Spokane

2000: Sore elbow after two starts followed by surgery to transpose the ulnar nerve in his right elbow. Then the ligament tore and Tommy John reconstructive surgery was required on Sept. 7.

2001: Out all season

2002: Pitched in 21 minor league games

2003: After a 3-0 start at AAA was promoted to the Royals. For the Royals, he made 15 starts, went 1-6 with a 5.17 ERA and went on the disabled list with a sore shoulder. Worse, he struck out only 39 batters in 85.3 innings. He had arthroscopic surgery

2004: Back in Spring Training but the shoulder immediately statred hurting and on February 25th he underwent more extensive surgery by Dr. Craig Morgan of Wilmington, Del., noted for his repair work on Curt Schilling. Out the remainder of the season and was only up to 50 pitches by the end of October

2005: Reported to Spring Training at 27 years old

Monday, February 21, 2005

2003-2004 Baserunning

Using my baserunning framework I ran the numbers combining both 2003 and 2004. Here are the top 20 with 50 or more opportunities:

Opp Bases EB IB IBR OA
Vernon Wells 90 159 139.90 19.10 1.14 2
Jack Wilson 69 118 104.60 13.40 1.13 1
Larry Walker 75 132 117.28 14.72 1.13 0
Mike Cameron 81 131 117.43 13.57 1.12 1
Eric Hinske 84 143 128.26 14.74 1.11 0
Barry Larkin 62 114 102.41 11.59 1.11 0
Chone Figgins 74 121 108.72 12.28 1.11 3
Hee-Seop Choi 59 103 92.57 10.43 1.11 0
Jose Reyes 52 85 76.44 8.56 1.11 2
Scott Podsednik 101 162 145.93 16.07 1.11 1
Aaron Miles 51 82 73.87 8.13 1.11 1
Jeff DaVanon 62 99 89.60 9.40 1.10 0
Alfonso Soriano 78 122 110.70 11.30 1.10 1
Gary Matthews Jr. 66 110 100.02 9.98 1.10 1
Aaron Rowand 60 101 91.86 9.14 1.10 1
Ryan Freel 52 82 74.66 7.34 1.10 1
Miguel Cairo 56 89 81.19 7.81 1.10 0
Rafael Furcal 147 241 220.00 21.00 1.10 0
Carlos Beltran 123 201 183.50 17.50 1.10 1
Brian Roberts 130 207 189.03 17.97 1.10 2

A couple of observations:
  • Chone Figgins made the list while still being thrown out advancing three times in 2003-2004

  • I normally wouldn't have thought of Hee-Seop Choi as a good baserunner. With his 1.11 IBR maybe he's a smart baserunner or maybe he was just luckly

  • The average age of these 20 players is 26 years old

  • The bottom 10 in 2003-2004 were:

    Opp Bases EB IB IBR OA
    Dmitri Young 86 114 130.84 -16.84 0.87 3
    Matt Stairs 59 74 85.49 -11.49 0.87 1
    Damian Miller 52 69 80.02 -11.02 0.86 2
    Edgar Martinez 92 121 140.54 -19.54 0.86 1
    Rafael Palmeiro 104 129 150.27 -21.27 0.86 7
    Mike Piazza 56 74 86.34 -12.34 0.86 2
    Jason LaRue 63 83 97.37 -14.37 0.85 3
    John Olerud 102 129 151.91 -22.91 0.85 3
    Bill Mueller 111 143 170.80 -27.80 0.84 6
    Ben Grieve 55 69 83.20 -14.20 0.83 3

    The average age of these players is 32 years old.

    A quick graph of IBR by age shows a pretty steady downward trend over time with a spikes at ages 37 and 40 when the number of opportunities decreases.

    Sunday, February 20, 2005

    Standard Deviation Through the Years

    Just saw this nice set of graphs from fellow SABR member John Rickert that looks at the same question I addressed in my post Where have the .400 hitters gone?

    Per John these are based on "3.1 plate appearances per game (the modern batting title qualifying level). On each page the first plot is the "regulars" average relative to the league average (regular - league) and the second plot is the standard deviation of the regulars' numbers."

    Sad but True

    Wednesday, February 16, 2005

    2004 Baserunning Framework

    Using the same kind of approach as that taken by David Pinto when he calculates his Probabalistic Model of Range (PMR), I created a baserunning framework (Probabalistic Model of Baserunning?) last year that I explained here. Basically, this methodology uses standard advancement tables to calculate how many extra bases (which I called Incremental Bases or IB) a baserunner gained in the following scenarios:

    1) Runner on first batter singles
    2) Runner on first batter doubles
    3) Runner on second batter singles

    The advancement tables then take into account the number of outs, the handedness of the batter, and which fielder fielded the ball. I also calculate the ratio of actual bases to Expected Bases (EB) to create an Incremental Base Percentage (IBP).

    In any case, thanks to SABR's Mike Emeigh I have now obtained the 2004 play-by-play data and rerun my framework. Drum roll please....the leaders for 2004 (20 or more opportunities are):

    Opp Bases EB IB IBP OA
    Jorge Cantu 21 36 29.70 6.30 1.21 0
    Rafael Furcal 73 127 106.15 20.85 1.20 0
    Matt Holliday 50 95 79.69 15.31 1.19 0
    Jack Wilson 43 76 64.00 12.00 1.19 1
    Alfonso Soriano 36 64 54.07 9.93 1.18 0
    Chase Utley 22 38 32.16 5.84 1.18 0
    Vernon Wells 47 83 70.56 12.44 1.18 0
    Jose Valentin 43 70 59.74 10.23 1.17 0
    Ryan Freel 49 83 71.30 11.70 1.16 0
    Tony Womack 61 103 88.88 14.13 1.16 1
    Torii Hunter 43 74 63.86 10.14 1.16 1
    Gabe Kapler 38 63 55.03 7.97 1.14 0
    David DeJesus 46 75 66.12 8.88 1.13 0
    Laynce Nix 42 75 66.19 8.81 1.13 1
    Rocco Baldelli 44 74 65.36 8.64 1.13 0
    Reed Johnson 59 101 89.34 11.66 1.13 0
    Johnny Damon 70 125 110.71 14.29 1.13 0
    Joe Crede 38 66 58.47 7.53 1.13 0
    Carlos Beltran 134 226 200.74 25.26 1.13 0
    Willie Bloomquist 31 47 41.79 5.21 1.12 0
    Sean Burroughs 67 109 96.92 12.08 1.12 1
    Mike Cameron 41 68 60.53 7.47 1.12 0
    Cesar Izturis 68 113 100.72 12.28 1.12 0
    Juan Pierre 83 131 116.87 14.13 1.12 1
    Robert Fick 30 50 44.63 5.37 1.12 0

    Just as in 2003 the top performers came out about 21% above average in terms of gaining extra bases. And as you'd imagine most of the top performers seem to be guys you would think might be decent baserunners (especially those with over 50 opportunities). On the bottom of the list of 333 qualifer were:

    Opp Bases EB IB IBP OA
    Randall Simon 34 32 51.61 -19.61 0.62 4
    Ken Griffey Jr. 24 30 39.84 -9.84 0.75 3
    Ross Gload 32 37 48.83 -11.83 0.76 5
    Ben Molina 31 35 46.08 -11.08 0.76 1
    Mike Piazza 45 49 63.36 -14.36 0.77 2
    Bill Mueller 64 79 102.01 -23.01 0.77 3
    Jason LaRue 44 54 68.30 -14.30 0.79 2
    Kevin Youkilis 30 37 46.72 -9.72 0.79 2
    Chad Moeller 20 24 29.88 -5.88 0.80 2
    Gary Bennett 25 32 39.12 -7.12 0.82 1
    Jacob Cruz 24 33 40.15 -7.15 0.82 2
    Karim Garcia 44 56 68.02 -12.02 0.82 2
    Ben Davis 40 46 55.63 -9.63 0.83 2
    A.J. Pierzynski 49 59 71.25 -12.25 0.83 3
    Doug Mirabelli 22 27 32.44 -5.44 0.83 0

    Once again, nothing too surprising. The range of the best to worst baserunners using this measure is on the order of 30 to 40 bases.

    When I ran the 2003 results against 1992 data I did a quick study of the players who were in both data sets to see if there was any correlation and to see if as players age their IBP would decrease as expected. Indeed I found that 9 of the 13 players had higher IBPs in 1992 than in 2003 and that their cumulative IBP was just a tad higher in 1992.

    Now having the 2004 data gives me an opportunity to do a side by side comparison. In all there were 261 players with 20 or more opportunities that played in both seasons. Their cumulative IBP for 2003 was 1.00205 whereas for 2004 it was 1.00099. This equates to 20 more bases gained in 2003 than in 2004. While not a large difference it is in the right direction on the assumption that baserunning skills decline with age as a player slows down.

    I also ran a regression on the data and calculated a correlation coefficient of .298 for the two years. When I increased the threshold to 50 opportunities in both season the correlation rose to .320. Not a particularly stong correlation but a positive one that indicates there is some predictive power here.

    Up next I plan on creating a cumulative set of advancement tables for 2003-2004 and then recalculate the leaders for each season as well as the leaders across both seasons.

    From a team perspective it broke down like this:

    Opp Bases EB IB IBP OA
    COL 616 1008 958.07 49.93 1.05 16
    SLN 626 982 948.87 33.13 1.03 11
    MON 584 884 857.88 26.12 1.03 12
    LAN 651 981 952.85 28.15 1.03 10
    TEX 576 899 873.32 25.68 1.03 9
    CHA 607 925 902.94 22.06 1.02 14
    MIN 587 914 892.81 21.19 1.02 10
    KCA 646 976 959.17 16.83 1.02 11
    ARI 620 942 927.44 14.56 1.02 7
    DET 576 874 865.26 8.74 1.01 13
    SDN 717 1069 1061.14 7.86 1.01 13
    CLE 654 1025 1021.44 3.56 1.00 15
    FLO 636 961 957.72 3.28 1.00 18
    ATL 597 903 901.92 1.08 1.00 9
    NYA 631 952 954.75 -2.75 1.00 13
    CHN 582 874 878.57 -4.57 0.99 15
    TOR 663 986 991.94 -5.94 0.99 12
    ANA 668 990 996.91 -6.91 0.99 19
    BAL 708 1050 1062.57 -12.57 0.99 20
    TBA 554 815 827.36 -12.36 0.99 10
    SFN 698 1032 1051.49 -19.49 0.98 14
    HOU 670 994 1013.12 -19.12 0.98 18
    SEA 740 1073 1095.85 -22.85 0.98 12
    PHI 610 919 939.74 -20.74 0.98 19
    MIL 533 793 814.62 -21.62 0.97 10
    OAK 628 925 953.57 -28.57 0.97 13
    NYN 579 863 893.98 -30.98 0.97 15
    PIT 605 868 905.84 -37.84 0.96 17
    CIN 549 809 847.42 -38.42 0.95 16
    BOS 778 1166 1226.45 -60.45 0.95 19

    Once again the range here is around 75 to 100 bases. It's interesting that Colorado led in both 2003 and 2004 with IBPs of 1.04 and 1.05. This immediately cries out for a ballpark explanation. Off the top of my head the larger outfield and fast surface probably contribute to the ability of baserunners to take extra bases.

    A next step here would be to run these numbers for the various parks.

    Tuesday, February 15, 2005

    Mientkiewicz and Defense

    Here's an interesting story on ESPN that relates to the value of defense and particularly Doug Mientkiewicz and his acquisition by the Mets. Interestingly, the article notes...

    Minaya figures first base is undervalued in the market place and in the minds of the average fan. "People take the position for granted," he said. He looks at a guy like J.T. Snow of the Giants, a smooth, graceful glove who "saves the Giants 10 games a year," and he anticipates something similar for his club with Mientkiewicz.

    Once again, questions like these, while not being able to be answered definitely, can be estimated within a certain range. For example, it is well established that wins are purchased at the cost of 10 to 11 runs. Therefore, Snow would have to save 100 runs or more per season to actually save the Giants ten games. And as Baseball Musings points out, even if Minaya meant that Snow makes ten clutch plays per year that heavily impact the outcome of a game, it is still likely not true given his analysis of play by play data.

    Although defense has historically been difficult to measure, a few systems have cropped up in recent years including Win Shares by Bill James, which calculates win shares for fielders, Ultimate Zone Rating (UZR) by Mitchell Lichtman, Defensive Regression Analysis (DRA) by Michael Humphreys, and David Pinto's Probabalistic Model of Range (PMR). What most of these systems have in common is that they estimate a difference on the order of 15 to 20 runs per season between stellar and average defenders at first base (the Win Shares difference is around 4 win shares which is 1.33 wins or 13 or so runs). PMR is the odd man out and sees a difference of about 40 outs which translates to around 30 runs although there is some debate on this topic. In either case Minaya is likely off by a factor of five or more.

    To me, what this illustrates once again is that humans have difficulty accurately measuring a large number of observations, which leads to valuation based on general perception or a few memorable plays.

    Monday, February 14, 2005

    Patterns, Patterns

    Here's an interesting site that catalogs patterns of many kinds.

    Friday, February 11, 2005

    Looking at DIPS for 2005

    One of the most interesting observations in baseball analysis in the last decade was that made by a college student from Chicago named Voros McCraken about five years ago. Essentially, McCraken argued that pitchers have little control over whether balls that are put in play (BIP) (excluding homeruns) turn into hits or outs. What follows is the realization that a pitcher's effectiveness can be directly correlated with his ability to stop hitters from making contact (strikeouts), getting on base for free (walks), and producing runs with a single swing of the bat (hitting homeruns). A corollary to this realization is that the difference in ERA between pitchers of similar ability (in terms of strikeout, walk, and homerun rates) is most influenced by luck and secondly by the defense's ability to convert batted balls into outs. In other words, the number of non-homerun hits a pitcher gives up is not indicative of his skill, as was universally assumed, but rather of luck and defense with luck being by far the larger component.

    Of course, baseball fans have long known that scratch and bloop hits can beat your team just as easily as ringing line drives but the underlying assumption always was that "good pitchers" will be beat less often by those sorts of events. Well, under McCraken's view of the world that thinking is correct only if by "good pitchers" you mean those that strikeout an above average number of hitters, have good control, and stay away from the homerun.

    McCraken then built a system to evaluate pitchers that he called Defense independent Pitching Statistics or DIPS that he parlayed into a consulting position for the Red Sox. For example, McCraken created a DIPS ERA that more accurately predicts a pitcher's ERA for the coming year than the previous year's ERA. McCraken created versions 1.0, 1.1, and 2.0 of this methodology but as a result of his position with the Red Sox no longer participates in the public discussion of his system.

    Many in baseball's sabermetric community, notably Bill James in the New Historical Baseball Abstract, have commented that McCraken's idea is one of those that is so obvious in retrospect that it's surprising that it hadn't been explored before 1999. I liken it to the wide-spread realization from a decade before that outs are the most precious resource a team possesses and therefore they shouldn't be squandered. In recent years there has been some pushback on DIPS and several analysts have detected an ability of some pitchers, particularly knuckleballers like Phil Niekro and Tim Wakefield but also left-handers, to induce a slightly higher percentage of outs on ball put in play by reducing the number of line drives that are hit. I wrote about this work last year and discussed some of its implications for strategies that can be employed for successful pitchers.

    From a practical perspective one of the conclusions that follows from DIPS is that general managers should be wary of investing long-term in young non-knuckleball pitchers with low strikeout rates. This is because these sorts of pitchers may have simply been the recipient of good luck and because their strikeouts rates will inevitably decline as they age, further reducing the wiggle room they have in terms of control and the avoidance of homeruns. The Royals Jimmy Gobble comes immediately to mind as I've written about before.

    To illustrate the concept behind DIPS I used the Lahman database to take a look at the 86 pitchers who threw more than 120 innings in both 2003 and 2004. I then calculated various rate statistics including their Batting Average on Balls in Play (BABIP), Walks+Hits per inning pitched (WHIP), K/IP, BB/IP, HR/IP, ERA, Component ERA (ERAC), a quick and dirty version of DIPS ERA , and my own simple version of DIPS ERA (SDERA) which I calculated by substituting the non-homerun hit portion of the Component ERA formula with the number of hits the pitcher would have given up had his BABIP been the .288 average of the two seasons. Component ERA simply attempts to predict a pitcher's ERA given the components of his performance. I did not adjust these statistics by ballpark as called for by the full DIPS methodology.

    Using this data I then calculated the correlation coefficient of each of these stats for 2003 and 2004. For Component ERA and my DIPS ERA I ran the correlation against the 2004 ERA. The assumption is that those statistics that have higher correlation can be attributed to a pitcher's ability while those that have low correlation can be attributed to other factors such as randomness. The results were:

    BABIP .087
    ERA .190
    ERC-ERA .233
    SDERA-ERA .276
    HR/IP .312
    DERA-ERA .322
    WHIP .407
    K/IP .717
    BB/IP .732

    A few observations of the result that illustrate the logic behind DIPS include:

  • There was virtually no correlation between BABIP in 2003 and 2004. Therefore, BABIP can attributed to randomness

  • Note too that for many of these pitchers the same defense was behind them in both seasons and yet the correlation was non-existent. This indicates that randomness is by far the larger factor in the variation of BABIP

  • Only strikeouts and walks per inning pitched had a correlation coefficient greater than .7 and could be considered strongly correlated. This indicates that pitchers have the most control of these two abilities

  • Just as McCraken discovered DIPS ERA calculated for 2003 was a better predictor of the pitcher's actual 2004 ERA than was his 2003 ERA, his 2003 Component ERA, or even my simple version using CERA as a base. The correlation for this quick and dirty version of DIPS ERA was not as strong, however, as the full version as documented by The Futility Infielder

  • I was surprised that homeruns per inning pitched did not result in a stronger correlation which may indicate that the ability to avoid homeruns is not as much of a skill as some proponents of DIPS might argue and should be factored into revisions of the formula

  • So in what other ways can DIPS be used practically?

    First, consider the "leaders" in Batting Average on Balls in Play (BABIP) in 2003:

    2003 2004
    Glendon Rusch 0.381 0.287
    Jeff Weaver 0.343 0.293
    Rodrigo Lopez 0.340 0.277
    Shawn Estes 0.329 0.301
    Jason Jennings 0.327 0.326
    Kelvin Escobar 0.325 0.293
    Mark Hendrickson 0.325 0.296
    Josh Beckett 0.322 0.284
    Jeremy Bonderman 0.318 0.278
    Joe Kennedy 0.318 0.294

    In other words, these were the pitchers who were the unluckiest in terms of batted balls falling for hits. What you'll notice is there is little correlation between their 2003 and 2004 performance and in fact every pitcher did better in 2004. Why is that the case? DIPS says that in 2003 these pitchers were unlucky and so the probability that they would be just as unlucky in 2004 was remote and so they've regressed to the mean. For example, in 2003 opponents hit a whopping .381 on balls in play against Glendon Rusch of the Brewers resulting in his giving up 160 non-homerun hits in 123.3 innings and a painful 6.43 ERA. In 2004 his luck changed and opponents hit a more reasonable .287 (in fact .288 was the average for the two years) and with a bit better control Rusch's ERA dropped to 3.48 for the Cubs while he gave up just 127 hits in 129.7 innings. Rusch's 2003 DIPS ERA was 3.89 which was much closer to his actual 2004 ERA. Seven of the ten pitchers had an ERA in 2004 that was better than his ERA in 2003.

    On the flip side those with the lowest BABIP in 2003 were:

    2003 2004
    Barry Zito 0.239 0.291
    Ryan Franklin 0.245 0.289
    Darrell May 0.249 0.318
    Russ Ortiz 0.250 0.283
    Jason Schmidt 0.253 0.263
    Kip Wells 0.253 0.313
    Tim Hudson 0.253 0.297
    Jake Peavy 0.255 0.300
    Victor Zambrano 0.259 0.266
    Jarrod Washburn 0.259 0.284

    Opposite of the previous list all of these pitchers had 2004 seasons that were worse than in 2003. These were the lucky pitchers in 2003. A case in point is the Royals Darrell May who in 2003 gave up just 166 non-homerun hits in 210 innings while giving up 31 homeruns. Opponents, however, hit just .249 on balls in play. Had he been average in this respect he would have given up 192 non-homeruns hits which would have impacted his fine 3.77 ERA.

    Going into 2004 many of us had hoped that May had become one of the rare pitchers that can actually suppress hits on balls in play. Apparently, Royals GM Allard Baird was as well as he signed May to a 2-year $4.95M contract after the 2003 season. Alas, we were all disappointed in 2004 as his BABIP climbed to .318 and he was subsequently dealt to the Padres. His DERA in 2003 was 4.75, much closer to his actual 2004 ERA of 5.23. To be fair, he also gave up more homeruns per inning pitched and walked more batters.

    From this list nine of the ten pitchers (all accept Jake Peavy) had worse ERAs in 2004 than in 2003.

    From both of these lists that conclusion that one might come to is that pitchers with high BABIP in one season will likely have better results in the following season and vice versa. Therefore general managers should be looking for pitchers that are under-valued because of their bad luck and avoid pitchers who are over-valued because of their good luck.

    So who might be undervalued coming into 2005? Here are the top ten in highest BABIP for 2004.

    Sidney Ponson 0.327
    Derek Lowe 0.327
    Kevin Millwood 0.327
    Jason Jennings 0.326
    Kyle Lohse 0.321
    Darrell May 0.318
    Kenny Rogers 0.315
    Roy Oswalt 0.314
    Brian Anderson 0.313
    Aaron Sele 0.313

    A few observations:
  • It's interesting that both Darrell May and Brian Anderson made the top 10 and hopefully for the Royals this means that Anderson can expect better results than his awful 2004 campaign.

  • Although the Dodgers were much ridiculed for signing Derek Lowe, that deal, in conjunction with Lowe pitching half his games at pitcher-friendly Dodger Stadium, will likely have many pundits singing a different tune this season

  • The Indians seem to have made a good move in picking up Kevin Millwood, who had largest differential in DIPS ERA versus actual ERA of this group (3.79 to 4.85)

  • And who might be over-valued? Here are the leaders in BABIP for 2004.

    Al Leiter 0.240
    Johan Santana 0.250
    Kazuhisa Ishii 0.254
    Ted Lilly 0.261
    Tom Glavine 0.261
    Odalis Perez 0.263
    Jason Schmidt 0.263
    Victor Zambrano 0.266
    Jamie Moyer 0.268
    Jerome Williams 0.270

    The Marlins may be in for a surprise with Leiter as may the Twins with Santana. It is interesting that two pitchers made the top 10 in both seasons (Schmidt and Zambrano), which may indicate that these are among the few pitchers who have an ability to suppress BABIP although I'll reserve judgement since neither is a knuckle-baller or left-handed. Jamie Moyer of course may also be in this group.

    Monday, February 07, 2005


    "Part of the contemporary predicament of an old one; it is that we cannot have everything: we cannot live in a society that is materially rich, individualistic, open to all currents of ideas, one that allows and encourages free expression and mobility of every kind, where we can shop around for our favorite religion, experiment with new identities, and sample available options and life styles and at the same time also enjoy the benefits of stable communal ties, sustaining beliefs, taken-for-granted values, and a solid sense of purpose."

    ...Paul Hollander

    Saw this quote on Unca's blog. I thought it was a great description of the tradeoffs inherent in our modern society.

    Inerrancy and 1 Peter

    Awhile back I talked about the concept of inerrancy and Ephesians 4:8 and Pslam 68. In the recent Bible study I facilitated we ran into a similar example in 1 Peter 4:18. This verse is rendered in the NIV as:

    And, “If it is hard for the righteous to be saved, what will become of the ungodly and the sinner?”

    This verse is a direct quote from Proverbs 11:31. As with Ephesians 4:8 there is some confusion over this verse and its proper translation. It appears that the author of 1 Peter was quoting from the Septuagint, the Greek translation of the Old Testament made around 250 BC in Alexandria. There the passage says basically what Peter quotes - namely that if the righteous are saved by the skin of their teeth, then what chance do sinners have.

    However, in modern Bibles, including the King James, based on the Masoretic text (the Hebrew version, earliest extant copy from 1000 AD) Proverbs 11:31 does not refer to salvation at all. For example, the NLT translates the passage:

    "If the righteous are rewarded here on earth, how much more true that the wicked and the sinner will get what they deserve!"

    While the NASB has:

    "If the righteous will be rewarded in the earth,
    How much more the wicked and the sinner!"

    These appear to be in opposition to 1 Peter’s rendering focusing instead on how the righteous are rewarded here on earth while sinners will get what they deserve. To me the differences call into question the doctrine of inerrancy and point to the probability that the author of 1 Peter used a faulty translation of Proverbs 11:31.

    Pickering and PECOTA

    Here was an interesting little tidbit on Baseball Prospectus regarding Calvin Pickering:

    "Calvinist Theology: One of the most interesting developments of last season was the play of Calvin Pickering, the much traveled, and monstrous (6'5, 278 lbs.), first baseman/DH. Pickering is something of an enigma; after having collected just 81 professional ABs from 2002-2003, Pickering caught on with Kansas City in '04 and absolutely destroyed the Pacific Coast League while at Omaha. In 299 at-bats, Pickering put up a Bondsian .712 slugging percentage by hitting 35 home runs, and then continued his barrage in Kansas City, hitting .246/.338/.500 with seven homers in 122 ABs. PECOTA thinks Pickering, who will still be just 28 in 2005, is for real. Here's his 2005 projection:

    337 24 .272 .400 .543 36.6

    PECOTA tabs Pickering as the 2005 Royals' offensive MVP. It's doubtful that he will get enough at bats to claim that honor, however. Chronically injured Mike Sweeney, signed to a huge deal, is lodged in the DH slot, and the team is way too high on Ken Harvey at first base. Sweeney, an inferior defender, has been making things difficult for Royals' management recently, demanding that he play first base regularly or be dealt. The situation could be a blessing in disguise for Kansas City, however, for the team would be better off trading Sweeney for a few prospects and playing Pickering every day at DH.

    I couldn't agree more with the sentiment and this compares favorably to the assessement in the 2005 Bill James Handbook I mentioned awhile back. For those who aren't aware PECOTA is a forcasting system developed by BP which you can read a bit about here.

    On the Royals website Dick Kaegel comments that Pickering will likely start the season as the DH at Omaha. Too bad.

    Sunday, February 06, 2005

    Highly Paid

    After looking at the trend in salaries and payrolls in recent posts I thought it would be interesting to list the top paychecks for the last few years:

    2004 $22,500,000 Manny Ramirez
    2003 $22,000,000 Alex Rodriguez
    2002 $22,000,000 Alex Rodriguez
    2001 $22,000,000 Alex Rodriguez
    2000 $15,714,286 Kevin Brown
    1999 $11,949,794 Albert Belle
    1998 $14,936,667 Gary Sheffield
    1997 $10,000,000 Albert Belle
    1996 $9,237,500 Cecil Fielder
    1995 $9,237,500 Cecil Fielder
    1994 $6,300,000 Bobby Bonilla
    1993 $6,200,000 Bobby Bonilla
    1992 $6,100,000 Bobby Bonilla

    I had forgotten that Bonilla and Fielder were so well payed.

    Saturday, February 05, 2005

    Monarchs SABR Chapter Meeting

    As is the case each year the local Monarchs chapter of SABR here in Kansas City held its mid-winter meeting this afternoon at the Johnson County Library with around 25 in attendance including my brother and I. After brief introductions that included good baseball books that members had recently read John Wathan gave a talk followed by some questions.

    Wathan was of course a catcher for the Royals from 1976 through 1985 and then managed the Royals from 1987 through the start of the 1991 season. He also briefly managed the Angels in May of 1992 in an interim capacity when Buck Rodgers was hospitalized after the team bus had an accident on the New Jersey turnpike. He is currently a member of the Royals Player Development department with the title Special Assignments for Scouting and Player Development. In addition to his scouting duties he works with minor leaguers on base running and will be uniform in spring training heading the baserunning work in Surprise.

    I found Wathan a very entertaining speaker and was appreciative of his praise for the baseball intelligence and passion of the members of SABR. In his remarks he related the high points of his career and knowing his audience, talked a bit about scouting reports and how managers use statistics. For example, he talked about how when he was managing they began to create defensive charts that track where opposing hitters hit the ball against Royals pitchers. Each hitter would have his own chart and hits were drawn onto the charts in different color pencils representing the different Royals pitchers – blue for Mark Gubicza, red for Brett Saberhagen etc. Interestingly, Wathan then noted that he thought their primitive system was better than modern computerized systems from STATS, Inc. and others since it reflected what the opposing hitters had done against Royals pitchers rather than against the league as a whole. He also seemed not to trust the folks who do the data entry in such cases. I’m not sure I buy that reasoning since his charts must have reflected a relatively small sample size. I would think more data would be better although I can see his point.

    And while Wathan was not a manager who was deep into statistics he mentioned that most managers are pretty aware of lefty/righty breakdowns and matchups before the game starts and noted that matchup data often suffers from small sample sizes. In fact, he told an interesting story of how a national reporter once saw him and another coach inputting some of the defensive data into a computer and wrote a column saying that Wathan would never be a good manager since he relied too much on the computer.

    Inevitably, in the question and answer period that followed the question of Moneyball and scouting came up and Wathan was quick to point out how friends of his had been fired because of the impact of the book. He specifically mentioned the Cardinals organization and the questioner mentioned the Blue Jays. Clearly, he was passionate on the subject and showed his disdain for Moneyball kind of thinking to the exclusion of traditional scouting. He was pretty critical of ballclubs hiring people from Harvard that think they have it all figured out without having been “in the trenches”. There were two points in this exchange that were interesting.

    First, he reflected the attitude of other baseball insiders, as I’ve written about before here and here, that statistics reflect only past performance while scouting and baseball knowledge is what enables predictions about the future to be made. In other words, statistics have their uses but those with an insider’s perspective know better. What interested me about this was that several minutes before when discussing scouting reports he noted how the scouting reports on himself were woefully inadequate. While still in the minors he badgered a front office guy for a look at them after he was told he wouldn’t make it to the majors. The reports noted that he had a bad arm. When Wathan asked how many games the scout had seen him the answer was three and for two of those games Wathan had played first base. In addition, one of those in attendance was once a scout for the Padres and told Wathan during the question and answer period that he wrote several reports that said Wathan would never be a big leaguer. To me this illustrates the inherent weakness of scouting in baseball – beyond a certain base level of ability the difference between poor and average and average and stellar performers can only be measured over a period of months and years. Wathan did emphasize what I think only a scout can do – try to measure the desire and determination of players before they are signed. As my brother said as we were leaving, both statistics and scouting are effective tools if you understand their proper use.

    The second interesting aspect of the conversation was that Wathan mentioned that the Red Sox had hired a guy and unbeknownst to Wathan that guy, Bill James who lives in Lawrence, was in the audience. After James was pointed out Wathan asked him a few questions about working of the Red Sox and then mentioned that he was impressed by Red Sox GM Theo Epstein who he had sat with at a ball game recently.

    In the question and answer time Wathan talked at length about the famous pine tar game (he was sitting next to George Brett in the dugout and was trying to convince Brett that he was going to be called out), his frustrations as a manger, and how catchers call games. He also gave his opinion on the new MLB steroid policy – not stringent enough – and his view that the MLB Player’s Association made a mockery of drug policy with its defense of seven-time loser Steve Howe.

    Interestingly, he also made the argument that the current financial situation in baseball is becoming untenable and seemed to support revenue sharing much like in the NFL. In addition to the negative effect on small market teams like the Royals he made the point that journeyman players like himself are getting squeezed out of the game as teams will now bring up young players with minimum salaries in order to devote more resources to super stars. I’ve noted this perspective before but I hadn’t tried to document it. Using the Lahman database I did some quick calculations on salaries shown below.

    Avg Median StdDev Min Max M/Avg Std/Avg
    2004 $2,491,776 $775,000 $3,543,036.61 $300,000 $22,500,000 3.22 1.42
    2003 $2,573,473 $788,000 $3,479,837.67 $300,000 $22,000,000 3.27 1.35
    2002 $2,392,527 $900,000 $3,068,571.97 $200,000 $22,000,000 2.66 1.28
    2001 $2,279,841 $925,000 $2,906,019.23 $200,000 $22,000,000 2.46 1.27
    2000 $1,995,371 $750,000 $2,516,418.97 $200,000 $15,714,286 2.66 1.26
    1999 $1,504,762 $450,000 $2,060,941.15 $200,000 $11,949,794 3.34 1.37
    1998 $1,300,389 $355,000 $1,830,101.03 $170,000 $14,936,667 3.66 1.41
    1997 $1,233,235 $360,875 $1,728,851.70 $150,000 $10,000,000 3.42 1.40
    1996 $1,040,275 $270,000 $1,554,216.38 $109,000 $9,237,500 3.85 1.49
    1995 $981,909 $225,000 $1,541,064.27 $109,000 $9,237,500 4.36 1.57
    1994 $1,057,966 $350,000 $1,356,082.16 $109,000 $6,300,000 3.02 1.28
    1993 $987,667 $300,000 $1,288,636.02 $109,000 $6,200,000 3.29 1.30
    1992 $1,052,998 $450,000 $1,181,160.47 $109,000 $6,100,000 2.34 1.12

    Using this data I thought that if Wathan’s argument were true you would see a growing disparity between the median player’s salary and the average and a greater standard deviation as compared to the average. The data shows that there is a bit of trend in this direction since 2000 but before then the data appears to be fairly inconclusive.

    After Wathan spoke a Kansas City Star reporter named Jeff Spivak talked about his soon to be released book on the 1985 World Champion Royals which provoked much discussion among the members, many of whom attended some or all of the games.

    All in all, it was an afternoon well spent.

    Friday, February 04, 2005

    Final Sosa Numbers

    So the Sammy deal is official. Here's the way it breaks down financially.

    The Cubs will pay $8.15M of Sosa's $17M 2005 salary with the Orioles picking up the remaining $8.85M. The Cubs will also have to pay $3.5M in severence and Sammy's $4.5M 2006 buyout which is now called a "bonus". So all told the Cubs will spend $16.15M on Sosa for 2005, which is just shy of his salary.

    Total, the Cubs would have owed Sosa $17M for this year and the $4.5M buyout for 2006 totalling $21.5M. In other words, the Cubs ended up paying Sosa three quarters of his guaranteed money to not play for them. That's a pretty good deal if you're Sosa.

    But in addition to the $16.15M outlay for Sosa in 2005 they're now paying around $2M for the trio of Hairston, Crouthers, and Fontenot and $4.5M for Jeremy Burnitz (with a $500K buyout for 2006 that I'm betting they'll end up taking). The grand total the Cubs will spend on this deal is $22.85M in 2005, raising their payroll by over $5M. To put it in perspective their outlay on rightfielders will be $20.65M in 2005.

    And what did they get?

    They replaced the 36 year-old Sosa with the 36 year-old Burnitz and picked up a 29 year-old utility infielder and outfielder in Hairston and a couple of minor leaguers. I didn't mention Burnitz as an option in my previous post because I didn't think he was one. At his age and given his dramatic decline since 2001 (partially masked by playing in Denver last season) I wouldn't have thought he would be considered. His plate discipline seems to have largerly disappeared although his power is still there. So here's how his numbers compare to Sosa's since 2001.


    2004 539 126 849 109
    2003 589 137 911 123
    2002 666 150 993 135
    2001 711 160 1174 158

    2004 606 150 916 110
    2003 246 61 643 89
    2003 259 65 925 124
    2002 550 154 677 94
    2001 651 154 851 111

    In 2003 Burnitz played for both the Mets and the Dodgers (in New York is where he put up the 124 NOPS/PF) so his combined performance works out to 107. Quick aside: NOPS/PF is the normalized OPS taking into account the Batter Park Factor to adjust for the context in which each played.

    Burnitz and Sosa played at essentially the same level last year although Sosa's decline started from a much higher peak season in 2001 and he put up higher numbers in each of the previous three seasons. Burnitz appears to be the more durable although less consistent player during this time.

    In the end I don't think the acquisition of Burnitz is an upgrade but rather a lateral move with a significant amount of risk. What is clear is that the Cubs and Sosa were desparate to part company and so both gave up lots of money to do so.

    Thursday, February 03, 2005

    If you're in Arkansas...

    On February 10, Jon Box will be in Little Rock, AR, speaking with the likes of Billy Hollis (RD), and some other Microsoft employees (Steve Loethen - DE, Brian Moore – DE, and Brad Nelson). He’ll be speaking on VSTS (very cool stuff), Billy is doing his Smart Client sermons, and the Microsoft guys are doing a SQL 2005 track.

    See the agenda for more details.