FREE hit counter and Internet traffic statistics from freestats.com

Wednesday, March 30, 2005

Calvin a Free Man

Well, the Royals trimmed their roster to 25 for next Monday's opener in Detroit. Two somewhat surprisingly developments include the additions of Calvin Pickering and Emil Brown. These come at the expense of 2004 All-Star Ken Harvey, 2003 surprise Aaron Guiel and the self-proclaimed "5-tool" Abraham Nunez. Guiel and Harvey will report to Omaha while Nunez will be placed on waivers.

Readers of this blog know that I've been arguing for Calvin's chance to play since early last season. More recently, I argued that even if Ken Harvey made the team Pickering should as well since a productive left-handed bat off the bench is worth more than either another utility infielder or the 12th man in the bullpen. However, it's nice that despite Pickering's struggles this spring (he hit .208 with 1 homerun ) Tony Pena recognized that he can contribute with his plate discipline and power:

"I will use Pickering against right-handed pitchers. The thing we have seen in Pickering is that he is a threat. He worked the count better than anybody, he puts his bat in position to hit, and he supplies power"

Now all of us stat-geeks can keep quiet and hope that Pickering doesn't disappoint.

Emil Brown on the other hand is not a guy anyone would have expected to make the club and certainly not the type of player that sabermetrically-oriented fans fall in love with. There was a nice profile of him in the KC Star a week or so ago. He spent a long time in the Pirates organization and his career .200/.289/.302 line in 442 career plate appearances is not a cause for optimism. To me he seems like a classic tweener. He's not particularly fast, he doesn't have much power, he's not patient, and I don't think he's known as a great defender. At ages 27-29 he has had solid seasons at AAA. However, at 30 he's likely not going to get any better and projects as a worse Terrance Long without the advantage of being left-handed.

Still, even though Guiel probably has more upside, Brown is a guy you like to root for and he tore the cover off the ball in spring training (the day we saw him play in Surprise he had 3 hits including a homerun) and it seems the Royals were serious that there was a real competition going on for the final outfield spot.

Monday, March 28, 2005

Greinke Redux

Here's an interesting discussion over at the Baseball Think Factory on my previous post about Zack Greinke and "Old Pitcher Skills". My original point was best expounded upon by a poster who commented:

"Greinke is a fine prospect no doubt, after all, he was only 20, and he outperformed Bonderman at the same age, and people were still optimistic about him after 2003.

However, the hype surrounding Greinke is far beyond that. The comps that people are giving for Greinke are guys like Maddux, Saberhagen, and now Schilling. Perhaps we're being a bit over optimistic here?

Yes, he has superhuman control, but that's exactly the problem. If his overall periphs are so mediocre despite the fabulous control, then he's got less room to grow, since K rate generally declines, and his BB rate is already essentially maxed out. (I won't common on HR rate, since that's inherently flukish.)

This is like the old adage about the two runners who run the 100 meter dash equally fast. One has perfect form, the other has horrible form. Which do you want? The guy with horrible form, because if you can fix that, then he'll be even better.

Greinke, in the stat that's most likely to improve, simply doesn't have much further to go, and realistically probably can't get any better.

In the stat that's most likely to decline however, he's already starting out below league average, and if he declines in a fashion similar to that of most pitchers, then he'll be out of baseball sooner than you think.

Yes, Greinke is starting out with old-pitcher skills, and yes, that's a big problem."


I wish I had said it as well.

I'm a bit less negative, however, and would simply reiterate that, being 20, Greinke may indeed pick up velocity and movement on some of his pitches. I'm cautiously optimistic. A couple of commenters also noted that despite his famed control his vulnerability to the homerun may be related to trying to throw his 87 mph fastball a little too finely. In other words, he may have considerable room to grow in learning his craft as well. Both of which project to a higher upside.

Saturday, March 26, 2005

Playing for One Run

Question of the day: What does a "Moneyball" team do differently strategically than a traditional team?

Answer: Well, one thing they do is take seriously Earl Weaver’s axoim that a team’s 27 outs are their most valuable possessions.

One way this can be seen is by looking at how often American League teams successfully sacrificed in the first three innings of games in 2004.


CHA 25
ANA 15
KCA 15
SEA 14
DET 13
BAL 11
NYA 11
TBA 11
TEX 9
CLE 9
MIN 6
OAK 5
TOR 3
BOS 1

As you’ll note the teams run by sabermetrically-minded Bill Beane, J.P. Riccaldi, and Theo Epstein are at the bottom of this list. Why? Because bunting is inherently a strategy that increases the probability of scoring a single run while decreasing the chances of scoring multiple runs. This is easily seen by looking at simple run expectancy and scoring probability tables. When you do the math to calculate the break-even percentage you find that it is never a good idea to sacrifice when you’re trying to maximize runs. And of course early in the game a team should be attempting to maximize runs, which incidentally, leads to another of Weaver’s axioms; “Don’t play for one run unless you know that run will win a ballgame”.

Clearly the White Sox under Ozzie Guillen don’t follow this approach by sacrificing 25 times in the first three innings. When you examine those 25 you find that fully 12 of them were by the first or second spot in the order although one of those was actually a squeeze play.

When you look at successful sacrifices more generally you find the following for the major leagues as a whole:

By Lineup Pos

1 11%
2 18%
3 2%
4 0%
5 3%
6 4%
7 7%
8 9%
9 46%

By Outs

0 78%
1 22%

By Inning

1 7%
2 11%
3 13%
4 10%
5 13%
6 10%
7 10%
8 11%
9 8%
10 3%
11 1%
12 1%
13 1%
14 >1%
15 >1%
16 >1%
17 >1%

By Score Differential

-8 >1%
-7 >1%
-6 >1%
-5 >1%
-4 1%
-3 2%
-2 6%
-1 14%
0 35%
1 16%
2 11%
3 7%
4 4%
5 2%
6 1%
7 >1%
8 >1%
9 >1%

Thursday, March 24, 2005

The Luck of the Single

Question of the day: Is last season's batting average a good predictor of next year's?

Answer: Intuitively, the answer to this question would seem to be yes. After all, teams tend to pay players based on last year's performance and what is more indicative than batting average?

Well, to answer the question I took a look at the 262 players who had more than 200 plate appearances in both 2003 and 2004. I then calculated batting average, slugging percentage, on base percentage, and OPS (on base plus slugging) and ran a quick regression using Excel. The correlation coefficients were as follows:


SLUG: .569
OBP: .532
OPS: .472
AVG: .337

In other words batting average varies from season to season more than either slugging percentage or on base percentage. As a result, when you consider how a player might perform in 2005 it would be better to look at his 2004 slugging percentage than his 2004 batting average.

But why is this the case? Why is their more variation in batting average than in slugging percentage or on base percentage? One reasonable interpretation is that there is more variation because there is more luck involved in batting average than in either of the other two.

First, consider batting average versus slugging percentage. If you think about games that you watch this actually makes perfect sense. After all, slugging percentage is calculated by using total bases rather than simply hits. Therefore singles comprise one-fourth of the components used to compute batting average whereas they comprise only one-tenth of the components used to compute slugging percentage. And when you watch a ballgame which kind of hit is more likely to the result of a broken-bat flair, a topper, a lucky bounce, a "seeing-eye" grounder, or a Texas-leaguer? A base hit of course. Doubles and homeruns on the other hand are less likely to occur as the result of a lucky bounce or fortunate placement and more often occur because the batter put a good swing on the ball and hit it solidly on a line or a deep fly ball.

This is backed up by looking at play-by-play data for 2004 as summarized in the following tables.

BIP Pct Outs Pct
Ground 60234 42% 45433 75%
Line 25654 18% 6712 26%
Fly 47693 33% 37186 78%
Pop 11010 8% 10784 98%
144591 100115

And...

Single Double Triple Homerun
Ground 45% 14% 42% 0%
Line 46% 50% 28% 14%
Fly 8% 35% 30% 86%
Pop 1% 0% 0% 0%

When you summarize this you can see that 69% of the balls put into play were converted into outs.

You should also notice that only 18% of the balls put in play were line drives and they are converted into outs only a quarter of the time. On the other hand balls hit on the ground are converted into outs fully 75% of the time. When you look at singles as contrasted with doubles you see that 45% of base hits were on the ground while 46% were line drives. For doubles the percentages were 14% and 50% respectively. What this means is that 21.5% of ground balls turn into singles while only 2% of ground balls turn into doubles. Since the odds of turning a ground ball into an out are higher it follows that ground balls are inherently easier to convert into outs and therefore that there is a much larger element of luck in determining which ground balls end up getting through the infield for singles and which are turned into outs. In fact, since 45% of the singles are ground balls and 1% pop-ups, one might conclude that almost half the singles hit are predominately luck.

When looking at doubles we find the opposite. Only 14% are the result of grounders and so a much larger percentage can be attributed to solidly hit balls that are more likely to reflect a batter’s skill. Of course the same argument can be made for homeruns. The larger amount of luck in the accumulation of singles means larger variation in batting average as opposed to slugging percentage and thus the likelihood of less correlation from year to year for a specific batter.

It should be noted that with triples the argument doesn't really hold. Most triples are technically grounders (42%) hit down the line with a smaller number (30%) being fly balls in the gap or in the corners. Still, 28% are line drives. And because triples are also a function of a batter's running speed, it's not really possible to conclude that triples are more a reflection of skill than luck. Since triples make up only 3.2% of all hits while doubles and homeruns make up 32%, they don't play a large role in the conclusion that slugging percentage is more of a reflection of a hitter's skill than batting average is.

Of course, I'm not arguing that there is no ability of some batters to hit grounders that get through the infield. Harder hit grounders will tend to get through more often. However, this is likely offset by the additional time infielders have to throw the runner out when they knock down a grounder and the ability of fast runners to beat out slowly or weakly hit ground balls.

The reasoning for less variability and therefore more skill reflected in on base percentage than batting average is straightforward. Since the publication of the Baseball Abstracts and The Sinister First Baseman in the early 1980s baseball analysts have become increasingly aware that walks are as much a function of the hitter as the pitcher. This view has slowly made its way to the front offices of many major league teams to the point that teams have now begun to consider strike zone judgment in their scouting scheme.

Historically this wasn't the case. Since the batter was the passive actor in the base on balls it was long assumed that walks were purely or at least mostly under the control of the pitcher. To a certain degree this attitude is still prevalent although in a modified form. For example, this quote was taken from an article by a Toronto beat writer in 2003:

"Clearly, the easiest positive statistic for mediocre hitters, one that requires keeping the bat glued to your shoulder, instead of the traditional hand-eye induced ball-whacking (which is far more exciting), is the ability to draw walks."

But throwbacks aside, it is now generally recognized that a batter's ability to control the strike zone is definitely a skill and it makes sense that it would not be as subject to variability as batting average, a statistic heavily dependent on singles which as we've shown are themselves heavily dependent on luck.

For more information on this question check out Jim Albert's paper, Batting Average: Does it Represent Ability or Luck?. Although Albert did not look at slugging percentage specifically, he did show that strikeout, walk, homerun, on base percentage, and in-play average (batting average on balls put in play) are all more strongly correlated from year to year. In his analysis strikeout, walk, and homerun rates all had strong correlations with coefficients of .7 or greater.

Wednesday, March 23, 2005

BRJ and the Fog

While winging my way to unseasonably cool and drizzly Phoenix for four days of spring training with family and friends I had a chance to read the 2005 Baseball Research Journal, which I received as a benefit of SABR membership just last week.

Immediately my attention was turned to Bill James' contribution titled "Underestimating the Fog" This article has also been the topic of a bit of discussion on the SABR-L list since its publication.

In short, James' argues that some of the best known negative sabermetric conclusions should not really be viewed as conclusions at all, but rather simply as non-answers to questions under study. In particular he discusses the following conclusions:

  • There is no such thing as a clutch hitter. Deviations of performance in clutch situations are essentially random. Cyril Morong has a nice reference list of studies on clutch performance here.
  • There is no such thing as an "ability to win" in a pitcher. In other words, there is no skill beyond preventing runs from scoring that allows a pitcher to win games. There are no "clutch pitchers" who have the ability to eek out wins and likewise there are no pitchers who are "losers".
  • Winning or losing close games is luck. In other words a team that wins a lot of close games in a season does so by good fortune rather than some collective ability to pull such games out. Such teams then regress to the mean the following year.
  • Catcher's have little or no impact on a pitcher's ERA. When a pitcher does well with a certain catcher behind the plate it is luck.
  • A pitcher has little or no control over how many hits he gives up per inning other than through strikeouts and giving up homeruns. This is the Voros McCraken observation I wrote about recently.
  • Base running has no positive impact on runs scored other than through base stealing. In other words, if a team scores more runs than would be predicted by the combination of their hits and walks, it is just luck.
  • Batters have no individual ability to hit well or poorly against left-handed pitching. However, there is a strong group tendency to do so.
  • There is no such thing as a "streaky hitter" in either the positive or the negative senses.
  • There is no "protection effect" for hitters in a lineup. In other words, the quality of hitters before and after a particular hitter has no effect on the performance of the hitter.

James goes on to criticize the common technique employed in various sabermetric studies that typically cited to "prove" these conclusions - for example Dick Cramer's famous 1977 Baseball Research Journal article on clutch hitting and James' own look at platoon differentials in the 1988 Baseball Abstract. That technique involves the search for recurrence or persistence of the phenomena being studied. In other words, in each of the cases above studies were done that attempted to determine if the effect (clutch hitting, catcher's ERA, winning close games etc.) persisted across seasons. In each case repeated studies have shown that it doesn't - therefore the effect is, in the words of James, "transient" and not "persistent". That which is not persistent is then assumed not to be real.

James then argues that in many of these cases the negative conclusion - the phenomena is not real - is flawed because there is too much instability in the data used to make the conclusion. For example, the conclusion that there is no specific ability to hit well or poorly against left-handed pitching is based on platoon differentials where the number of plate appearances against left-handed pitchers is around 120 in a season. The randomness involved in such a small sample size tends to swamp the differential itself, thereby making the results meaningless. James notes that Cramer's original study of clutch hitting was flawed for the same reason.

Of course, sabermetricians have always cautioned against drawing conclusions based on small sample sizes and so James' cautions here are well taken. But I believe he was also making a more subtle point: Sometimes the magnitude of the phenomena under study - if it exists - is smaller than the magnitude of the normal variation in the statistics we use to try and study it. In other words, while a skill like clutch hitting may indeed exist in the real world, the noise or fog in the data used to try and measure it will obscure our finding and measuring that skill. These are two different problems and it seems to me that the former is solvable by accumulating more trials while the latter is not.

Having said that, I'll take a pass through the nine conclusions above and discuss whether their conclusions are in question because of small sample sizes or skills with effects too small to measure. Taken one by one...

  • It would seem to me that larger sample sizes (contrary to Cramer's study), spanning entire careers for example, would help us to understand whether clutch hitting is indeed a skill. In that regard Tangotiger has come to the conclusion that clutch hitting does indeed exist, but that it is rather rare and not that important. From that perspective sabermetric analysis has done a useful service in pointing out that clutch hitting simply cannot be that important since its effects are so hard to measure.
  • While I haven't seen or done any long term study on this subject, this too would seem to need more than a comparison across seasons and would also need to take into account run support and league contexts. In other words this is likely a case of inadequate sample sizes, however, I have little doubt that the conclusion is essentially correct.
  • The premise of this conclusion is based on inherently small sample sizes involved in one-run games in the course of two seasons. As a result, I think James' main point clearly applies to this conclusion. Further, in this scenario there is no way to increase the sample size since team composition can change radically from season to season, thereby making it impossible to make comparisons across a set of seasons.
  • This conclusion too is based on small sample size but has an additional problem that applies to James' second point. While increasing the sample size in an effort to tease out the effect would be useful, so many other factors (other defenders primarily) go into run prevention that ERA may simply be too blunt a tool to use.
  • In this area James concedes that McCraken's observation has the equivalent of a "stable platform" against which to judge that historically 70% of the balls put into play result in outs. Coupled with the fact that starting pitchers often face over 1,200 batters per season resulting in fairly stable data, this conclusion appears pretty solid. Even so, a longer term study done by Tom Tippett have shown that there is indeed an element of skill, small though it may be.
  • This conclusion is especially interesting to me. Measured at a team level I would agree that the collection of normal offensive elements that ignore base running beyond stolen bases will accurately predict the number or runs scored. In other words, when taken collectively, base running other than stolen bases tends not to be a factor in predicting the number of runs scored. Of course, the reason could simply be that good and bad base runners on a team tend to cancel each other out coupled with the fact that the effect of good base running is simply not that large. I've done some work in this area and like James feel that "base running can be measured in simple, objective terms". My research and that of Baseball Prospectus indicate that good base runners can pick up as many as five runs per season for their team. The persistence of that base running effect of course is the question and I've found a positive correlation using 2003-2004 data indicating that there is indeed something real here.
  • To me it seems obvious that this effect could be measured simply by increasing the size of the sample. So while I agree that season to season platoon differentials are not very useful, career ones should be. This is buttressed by the recognition that there is a group effect. In other words, everyone agrees there is a differential; it simply cannot be reliably used in the context of a single season because of small sample sizes.
  • The existence of streakiness seems to be unlike the other conclusions in that it does not rely on small sample sizes nor need larger sample sizes in order to study. If I understand the studies that have been done in this area they generally conclude that hitting streaks simply do not historically occur more often than would be predicted by a random model. Having said that, Albert and Bennett in Curve Ball used a model that showed that a few players may indeed be streaky. In a post a couple months back I talked about how the author of A Mathematician at the Ballpark is convinced that streakiness is a persistent phenomena in sport based on studies of bowling. I'm not convinced these have much to do with baseball, however. At this point I'm willing to concede that streakiness may be a real phenomena but it falls under the radar of normal variation per James' second point.
  • As far as "protection effect" is concerned I don't have much to offer. However, it would seem that there are probably few plate appearances where the hitter behind or ahead of the batter actually has an influence on the pitcher. And so if there is an effect I would venture that it requires study across a multiple seasons and even if present it probably falls into James' second category.

Sunday, March 13, 2005

Parity?

A couple days ago Jayson Stark at ESPN published an article titled "Hope now a widespread reality" in which he says in regard to teams that won't be in contention in 2005...

"It could be as few as four teams. It might be as many as eight. But six is more like it. And that's a long ways from two-thirds, in anybody's math book...To get a clearer picture, we did a recent scout-and-GM survey on this. Shockingly, only four teams – the Rockies, Royals, Pirates and Devil Rays – got a vote from all six of the people we polled."

Setting aside the fact that GMs and scouts may not be overly objective about such a question, Stark contrasts this with the approximately 15 teams that Commissioner Bud Selig was referring to in his November 2000 comment when he said "At the start of spring training, there no longer exists hope and faith for the fans of more than half of our 30 clubs."

Stark cites three lines of evidence to back up his claims:

  • Revenue Sharing. Here Stark notes that the Marlins will receive $27M, the Twins $21M, and the A's $19M to name a few.

  • Sharper General Managers. Particularly Stark calls out strategies of Terry Ryan of the Twins and Billy Beane of the A's along with smarter contracts by other GM's.

  • Luxury Tax. Stark sees the tax as putting downward pressure on salaries.

Is Stark right that there "has never been more hope in spring-training camps everywhere"?

Well, to start we should get right to the bottom line and look at how teams have fared since 2000. Using the same methodology I used in a previous article by calculating the Normalized Payroll (NPayroll) of each team as their payroll divided by the average payroll for the year. As far as division finishes go here is what we find:


Finish Payroll Teams
1 1.25 31
2 1.09 29
3 .97 30
4 .90 30
5 .78 25
6 .74 5


I'm not much of a mathemetician but it appears that there is a strong correlation between the place at which a team finishes and their payroll.

Regarding Stark's three lines of evidence, I fully agree that revenue sharing, smarter management, and the luxury tax are a good thing. However, we're not there yet....

Revenue Sharing
After 2004 the Yankees had to contribute $60M of their estimated $315M revenue to the revenue sharing plan. Combined with the $25M luxury tax (see below) that means the Yankees still had $230M at their disposal while small market teams still have less than 50% of that total. Clearly, the revenue sharing scheme doesn't go far enough. A fair system would split per game revenue 50-50 between the home and visiting teams - after all, it takes two teams to play a baseball game.

Smarter GM's
The increase in knowledge about performance analysis in the post-Moneyball era can indeed give some teams an edge up on others. As I've said before, however, that advantage is fleeting in the big picture since information flows freely and even teams that are fighting and kicking against the new knowledge like the Cubs will eventually succomb as it becomes the new orthodoxy. In short, I don't see this development as any sort of solution to the financial inequities in the game. In fact, the new knowledge may work against the revenue-poor teams as Baseball Prospectus 2005 noted in its comment on the 2005 Pirates and the drafting strategy of GM Dave Littlefield.

"College juniors and seniors don't have alot of leverae, barring another Varitek-style standoff; yet even then, the Pirates apparently have to settle for less...the draft is a second area of player acquisition handicapped by financial limitations. Littlefield isn't merely acceding to McClatchy's (the Pirate owner) oft-stated preference to college talent or following a Moneyball-minded script; he has to worm his way down to signability-inspired choices."

In other words, teams like the Royals and Pirates have to pass on top level talent like Jered Weaver and Mark Prior and instead sign second tier prospects because they cannot pay the bonsues these top picks demand. With better performancing forcasting tools available to all teams it will become less likely that small-revenue teams will steal a pick looked over by richer teams.

Some might argue that smart GM's will continue to innovate through new sabermetric insights, for example by attempting instead to exploit the undervaluing of defense as the value of on base percentage and OPS begins to be understood as pointed out by Peter Gammons and Ken Rosenthal. Billy Beane also pointed this out in a recent interview.

However, as I've said before I think the inherent structure of the game of baseball ultimately limits the possibilities for progress. Using the Batting Runs formula, for example, the value of a homerun or a stolen base or a walk has changed only slightly since 1900. From the offensive perspective it isn't like someone will suddenly learn that stolen bases indeed are more valuable than power hitting. And even in the big picture sabermetricians pretty well understand the relative values of offense and defense (split into pitching and fielding). As a result sabermetrics will become more specialized and so I wouldn't look for "a host" of new sabermetric breakthroughs. In addition, taking advantage of what is currently undervalued can only get you so far if what is undervalued is not as valuable in an absolute sense. Right now we're in a time when some teams can and do exploit the market but with the spread of sabermetrics this advantage will be lessened.


Luxury Tax
I don't think there is any evidence yet that the luxury tax has inhibited spending. After all, in 2003 only the Yankees paid the luxury tax (paying $11.8M) while in 2004 the Yankees ($25M), the Red Sox ($3.1M), and the Angels ($.9M) all paid the tax. However, since the tax is progressive in that it exacts a higher penalty for consecutive offenses - for example the Yankees paid 30% this year while the Red Sox and Angels only paid 22.5% - it may indeed have that effect eventually, we just haven't seen it yet.


So for 2005, who's out and who's not. My estimation of those with no chance at division championships include:

Baltimore
Tampa Bay
Toronto
Kansas City
Detroit
Seattle
Texas
Washington
Philadelphia
Milwaukee
Pittsburg
Cincinnati
Arizona
Colorado
San Diego

That's 15. Of those that have a chance at a wild card berth I would include Philadelphia and San Diego. So, rather than as many as eight teams I would say there are more like 13 that have no hope of post season play. And of course, the good fortune of the White Sox, Cleveland, and Minnesota is that they play in a division without the Yankees and Red Sox. A less fortuitous partitioning would include those teams as well while perhaps adding Baltimore as a team with a small chance of getting in via the Wild Card.










Saturday, March 12, 2005

DIPS for Hitters

Last month I explained the concept behind Defense Independant Pitching Statistics (DIPS) and showed that the correlation between a pitcher's batting average on balls in play (BABIP) is near zero, indicating that how often a batter gets a hit when making contact (other than a homerun) is not under the control of the pitcher.

But what about the reverse? Does a batter have any more control over whether a ball put in play becomes a hit? To answer that question I took at look at the 244 players who garnered over 200 at bats in both 2003 and 2004. I then calculated their batting average, BABIP = (H-HR)/(AB-HR-SO), and the difference between the two. I then ran a regression and found the following correlation coefficients:

BA .337
BABIP .253
DIFF .609

STDDEVP(DIFF) = .018
AVG(BABIP) = .304
AVG(BA) = .272
MAX(-DIFF) = Luis Terrero (-.109)
MAX(+DIFF) = Barry Bonds (.048)

What this indicates is that a hitter's batting average is positively correlated from year to year, a fact common sense and observation would indicate. The positive correlation of BABIP also indicates that hitters have more control over whether balls they put in play become hits than do pitchers. In my previous article I showed the correlation for pitchers was around .087. Further it is clear that the difference between BA and BABIP is around 20 points.

What is most interesting, however, is that the difference between the two has the strongest correlation at .609. While this might seem surprising at first it makes sense. Hitters with lots of power will tend to have a higher BABIP than batting average since each homerun raises their batting average while not affecting their BABIP. As a result, hitters whose hits include higher percentages of homeruns and a modicum of strikeouts will tend to have a large difference from year to year thereby increasing the correlation. By the same token hitters who hit few homeruns and don't strike out much will tend to have similar BA and BABIP and consequently will have consistently small differences in the two. At the other end of the spectrum hitters who tend to strike out alot will consistently have higher BABIP than they do BA since strikeouts make up a larger percentage of their at bats.

So from this we can conclude that it is not purely luck that a player like Barry Bonds has a 48 point difference in his batting average of .362 and his BABIP of .314, or find it surprising that among those with big differences between the two include Albert Pujols, Tony Bautista, Aramis Ramirez, Mark Bellhorn, Jason Varitek, and Moises Alou.

Even given that there is some reasonable correlation going on it is possible to find players who seemingly ran into some good or bad luck in terms of getting hits on balls in play in 2004. For example consider these two groups of players:


BABIP 2003 BABIP 2004
Erubiel Durazo .287 .369
Jason Varitek .306 .373
J.T. Snow .307 .369
Jack Wilson .282 .333
Darin Erstad .285 .335
Travis Hafner .306 .356
Damien Miller .290 .329
Ivan Rodriguez .337 .376
Carlos Guillen .315 .352


Scott Posednik .362 .275
Corey Koskie .360 .276
Jacque Jones .356 .283
A.J. Pierzynski .335 .270
Brad Wilkerson .352 .294
Tike Redman .349 .298
Jose Guillen .362 .312
Bill Mueller .355 .305
Mike Lieberthal .335 .287

In the first group the players had roughly average BABIP in 2003 and very good BABIP in 2004. It's no surprise that each of the players in the first list was considered to have a good year in 2004. In the second list are players with high BABIP in 2003 and merely average BABIP in 2004. Likewise, each of these players was considered a bit of disappointment in 2004. The interpretation of this data is that the players in the first list were a bit lucky in 2004 while the players in the second list were lucky in 2004.

The end result is that if I were a general manager I'd be a bit wary of players with really high BABIP (over .350) when they haven't historically performed at the same level (I would exclude a player like Melvin Mora who had a BABIP of .374 in 2004 almost duplicating his .364 in 2003). The players in the first list above are the ones that stand to come down to earth a bit in 2005 just as those in the second list did in 2004.

Friday, March 11, 2005

Old-Pitcher Skills?


Probably the most anticipated aspect of the upcoming baseball season for Royals fans is the performance of sophomore pitcher Zack Greinke. In his rookie season Greinke at the tender age 20 was clearly the Royals best pitcher. In 24 starts he pitched 145 innings surrendering 143 hits while walking only 26 (1.67 per nine innings) and striking out 100. His only nemesis was the long ball – he gave up 26 – which helped push his ERA to 3.97.

Royals fans have a right to be excited as he’s been compared to a young Greg Maddux or Bret Saberhagen. In sabermetric circles Baseball Prospectus, a publication not known or its positive endorsements, had this to say.

“With apologies to Jon Landau, we have seen the future of pitching, and his name is Zack Greinke…In the last 70 years, only three pitchers as young as Grienke walked fewer than 2.1 men per 9 innings. Two of them were Bert Blyleven and Bret Saberhagen…He has excellent mechanics, has never thrown 110 pitches in a game, and since he rarely throws at maximum velocity, he’s about as low an injury risk as any young pitcher in the game…His profile is so unique that trying to project his future is a fool’s errand.”

The uniqueness of that profile led, despite the warning given above, the PECOTA forecasting system created by Nate Silver and published in BP to predict that there is a 23% that Greinke’s sophomore season will be a breakout type year, a 63% chance he’ll improve, and a 0% chance he’ll collapse.

Although the future certainly looks bright for Greinke there are some, like myself, who wonder at just how high his upside is. Having watched Greinke pitch a number of games last season I think it’s accurate to say that his strengths are his ability to change speeds, he’ll often throw pitches as fast as 94 and slow as 63 to the same batter with a couple of 70s and 80s thrown in, and his ability to locate his cut and four seam fastball. Given his age, these two skills are phenomenal.

And that’s just where my cautious optimism about Greinke comes in.

In sabermetric circles it has been noted that players with “old-hitter skills” don’t tend to age well. In this context old-hitter skills typically include power and control of the strikezone coupled with average or below average speed and defensive ability suitable for the left-end of the defensive spectrum. These kinds of players tend to peak early and decline rapidly as their physical skills fade. I’ve done some analysis in the past of how hitters age and have now found this nice piece by David Luciani that summarizes some of the same ground.

While this is not a perfect analogy by any means (essentially because "old-pitcher skills" are not negative as are some "old-hitter skills") I think what Greinke has in abundance are “old-pitcher skills”. The ability to locate his pitches and change speeds are skills that one normally finds in crafty veteran pitchers who have had to adjust to declining physical skills or injuries. Pitchers like Frank Tanana, who became an off-speed pitcher after being loaded with innings early in his career, come to mind. What Greinke does not possess are skills which include velocity, movement, and an “out” pitch. These are the kinds of skills that often get young pitchers promoted in the hopes that they’ll develop control and “learn how to pitch”. By all accounts Greinke has already learned to pitch to a large degree and so his ceiling is not as high as a pitcher with comparable statistics who got the job done with a nasty slider and a 98 mile per hour fastball.

So in short, I’m not saying that Greinke won’t be even better in 2005. Indeed, his skills should serve to make him a much more consistent pitcher in the long run, a fact that PECOTA captured in its assessment that he has a 0% chance of collapse (along with his few innings at a young age, and no injury history). However, I am speculating that he is closer to his maximum performance at his young age than some people might think. Only time will tell of course.

What would really excite me as a Royals fan is if I saw him add two to four miles per hour on his fastball (which is not out of question by any means since he is so young) and develop a sharper breaking curve. Added to his skills, these weapons could transform him into a very consistent superstar pitcher. On a side note, this spring Greinke has shown uncharacteristic wildness in his two starts. Today in fact, he walked four batters in an inning and two-thirds.

While thinking about this idea I wanted to get a feel for how different abilities change with age for pitchers I looked at all pitchers after 1945 which comprised 23,358 seasons and compared the pitcher’s SO/9, BB/9, ERA, and HR/9 to the league average. I then weighted their rates by innings pitched and grouped by age. What I found proved out the idea that control increases with age. Of the measures I looked at only walks per nine innings decreased with age every year between the ages of 20 and 34 with a high 117% of league average at the age of 20 and 88% of league average at age 34. From age 35 on the rate stayed relatively constant. Strikeouts also decreased steadily with age from ages 20 (108%) to 32 (96%) and then held relatively steady through age 39.

Interestingly, homeruns per nine innings stayed almost constant never getting more than 1.5% away from league average.

Thanks to Ron Hostetter for inspiring this post…only one more week until Ron, his son, brother, my brother and father are soaking up the rays in the Cactus League.

Tuesday, March 08, 2005

Another Big Man

Saw this article about the Orioles' Walter Young over at Baseball Musings. To Royals fans this should sound familar...

Monday, March 07, 2005

New Ballpark in Kansas City?

While reading the 2005 BP I came across two interesting items in the article on the Milwaukee Brewers that relate to the Kansas City Royals.

First, according to market research done by Mike Jones that takes into account TV ratings Kansas City is the smallest of the 30 major league markets coming in behind Birmingham, Norfolk, New Orleans, and Greenboro/Winston-Salem. That doesn't bode well for the future of baseball in Kansas City.

Second, with all the controvery in KC these days regarding the building of a new downtown stadium (a move the Royals have publicly said they don't want) it was interesting to see the following table that shows how recent teams have fared in their new stadiums (I added the Old-1 column).


City Old-1 Old New1 New2 New3 New4 New5

Detroit 1998 1999 2000 2001 2002 2003 2004
Payroll 24.1 35.0 61.7 49.4 55.0 49.2 46.8
Attendance 1.41 2.03 2.44 1.92 1.50 1.36 1.91
Win % .401 .426 .487 .407 .339 .266 .444

Milwaukee 1999 2000 2001 2002 2003 2004
Payroll 43.4 35.8 45.1 50.3 40.6 27.5
Attendance 1.71 1.57 2.81 1.97 1.70 2.06
Win % .459 .451 .420 .346 .407 .416

Pittsburg 1999 2000 2001 2002 2003 2004
Payroll 24.7 29.6 57.8 42.3 54.8 32.2
Attendance 1.63 1.75 2.46 1.78 1.64 1.58
Win % .484 .426 .383 .444 .463 .444

Cincinnati 2001 2002 2003 2004
Payroll 49.0 45.1 59.4 46.6
Attendance 1.88 1.86 2.36 2.28
Win % .407 .481 .426 .469

A couple things jump out immediately. First, all the teams increased their payroll significantly in their first year in the new park. However, only the Tigers actually saw any results on the field. Second, all the teams have since reverted to their payroll level before moving into the new park. Third, all these teams were bad before the new park and bad after the new park. Fourth, the attendance spiked in the first year in the new ballpark and decreased thereafter.

So judging from recent history a new ballpark in downtown Kansas City likely won't reverse the fortunes of the team on the field nor draw more fans resulting in increased revenue.

Which brings me to the main point of this post.

Increasing revenues by simply attracting more fans does not raise enough revenue to help a team be competitive. Far more important are broadcasting rights where the difference between the top and the bottom can be around $50M and luxury boxes where the difference is around $20M. In both of these areas, even with a new stadium, the Royals aren't likely to increase their revenue because of the small population base and the lack of large corporate interests in Kansas City.

Until Major League baseball comes to its senses and devises a real revenue sharing plan, what you'll continue to see is the trend I documented in my article on salaries where the place a team finishes in the standings is strongly correlated with their relative payroll.

Rank NPayroll Teams
1 1.29 43
2 1.06 41
3 0.98 42
4 0.89 42
5 0.77 35
6 0.76 7

Sunday, March 06, 2005

Bench Woes

A few minutes after I wrote my post on Calvin Pickering complaining about how weak the Cubs bench was/is I read this in the 2005 Baseball Prospectus.

"The Cubs are poised to get above average performance from at least seven out of the eight starting position players. Their bench on the other hand is barren."

So true. If you can refrain from averting your eyes due to the badness I'm about to reveal, here are BP's PECOTA predictions for the 2005 Cubs bench contenders.

OF/R Jason DuBois .262/.344/.487
OF/R David Kelton .247/.309/.433
IF/S Jose Macias .239/.278/.366
IF/S Neifi Perez .241/.276/.322
C/R Henry Blanco .226/.288/.367
OF-2B/R Jerry Hairston Jr. .274/.348/.376

I'm not a general manager but this bunch, with the exception of DuBois, would have a hard time hitting their way out of a paper bag. And keep in mind that DuBois will likely be a starter against left-handers. What you'll also notice is that it gives the Cubs exactly zero productive left-handed bats off the bench when Todd Hollandsworth is in left field. In the National Leauge that's simply unfathomable (last season Todd Walker could come off the bench when not starting). And as I've mentioned before it is dangerous to give Dusty Baker versatile light-hitting utility-type players since he'll actually use them in starting roles and bat them at or near the top of the order.

The best move for the Cubs would be to release either Macias or Perez and I'd prefer to let Macias go since at least Perez can field, let Kelton walk (he's out of options and doesn't seem to be progressing) and pick up two legitmate bats for the bench one of which must be left-handed and one of which can play a passable right field.

Saturday, March 05, 2005

BP on Baserunning

This weekend I received my copy of the Baseball Prospectus 2005, which I'm reading in preparation for my trip to spring training in a couple of weeks with Ron. I was pleased to find that on pages 511-519 there was an article entitled "Station to Station: The Expensive Art of Baserunning" by James Click.

In his article Click uses a very similar methodology to my baserunning framework to try and quantify the effects of good and bad baserunning. Like my own framework Click looks at three scenarios: runner on first batter singles, runner on first batter doubles, and runner on second batter singles. However, I noted that he apparently did not consider the case where the runner on first scores on a single, a play which happened 58 times in 2004 in 6,754 opportunities (a little less than 1% of the time).

Before showing the results he first discusses his methodology and the "Fundamental Adjustments" that need to be considered when performing the calculations including outfield defenders, park, outs, and batter (a particular hitter might allow runners to advance more frequently by displaying a tendency to hit the ball harder or to particular locations on the field).

He concludes that adjustments needn't be made for outfield defenders since individual fielders seldom field more than one or two balls each season for a particular baserunner in one of the three scenarios. Therefore any effect of the fielder will tend to even out over the number of opportunities a runner has. This seemed obvious to me when developing my own framework and so like Click I did not investigate it further.

Click does, however, find that the park needs to be taken into account. I hinted at this in my analysis of 2003 and 2004 since the Rockies, who play in the largest outfield in baseball, led the league in both 2003 and 2004 in the number of extra bases gained per opportunity. Click wrote an article on computing these park factors on the Baseball Prospectus web site. Unfortunately, I'm not a subscriber and so can't comment on the article except to say that like park factors calculated for hitting and pitching he uses a three year average and he uses a methodology that looks both at what the visiting teams do in the park and how the home team differs on the road. This seems to me like a solid approach.

Click also takes into account the number of outs in each scenario. As I documented the advancement percentage changes dramatically with the number of outs. However, as we'll see in a moment Click does not base his analysis on incremental bases but rather uses run expectancy tables that include the number of outs. This was one of the two approaches I suggested could be used to quantify baserunning in terms of runs. The other was simply to assign a run value to each incremental base, which I did for ease of calculation.

Finally, Click also discounts the effect of the batter on how many bases baserunners advance by looking for correlations over pairs of seasons. He finds a low correlation and an essentially random distribution of advance rates for baserunners and therefore concludes that batters have little if any influence on the number of bases runners advance on their hits. I was wondering about just this question when developing my framework but came to the same conclusion intuitively. I'm glad to see a more rigorous investigation however.

In my framework I also take into account the handedness of the batter hitting behind the runner as well as the fielder who fielded the ball. When calculating incremental bases this is necessary since a single hit to left field with 1 out for example, advances a runner from first to third about 14% of the time, a single hit to right field does so 39% of the time. Click needn't take into account either of these since they are subsumed into the run expectancy table.

One other difference in Click's approach is that he only looks at scenarios where the base in front of the runner is not occupied (which turns out to be about a quarter of the time). In my framework I included both, however, I'm now persuaded that I should have used only the "empty base" scenarios and so the calculations shown below do so.

So to actually perform the calculation Click calculates the difference in the run expectancy in the initial and final states for each runner in each situation and sums them. He then subtracts from this the number of runs that would be expected given the situations in which the runner found himself. He calls the former Equivalent Base Runs (EqBR) and the latter Player Base Runs (PBR). He finds that Matt Holiday of the Rockies led the league with an EqBR of 5.0 while Rafael Furcal was close behind at 4.9. At the bottom of the spectrum Mike Piazza was a negative 4.7 and A.J. Pierzynski was close behind at -4.4.

To try and reproduce Click's work I used the run expectancy table published in the article to calculate the relative differences as described by Click and plugged them into my software for 2004. In my approach I did not credit the runner if they took only the expected number of bases (for example, second base on a single when the runner is on first). I also calculated the expected number of runs for each situation and subtracted this from the actual number of runs. I call the result Incremental Base Runs (IBR) with the following results. Keep in mind that my calculations do not include an adjustment for park effects.


Reed Johnson TOR 4.97
Vernon Wells TOR 4.92
Johnny Damon BOS 4.45
Rafael Furcal ATL 4.10
Rocco Baldelli TBA 3.71
Brad Wilkerson MON 3.14
Aaron Miles COL 3.08
Scott Rolen SLN 3.07
Edgar Renteria SLN 3.02
Matt Holliday COL 2.99

Of Click's top 10 I have six represented here. On the bottom of the scale I show:

Bill Mueller BOS -5.23
Mike Piazza NYN -4.54
David Ortiz BOS -3.83
Ross Gload CHA -3.79
Manny Ramirez BOS -3.65
Carlos Delgado TOR -3.45
Bill Hall MIL -3.44
Mike Lieberthal PHI -3.40
Jim Thome PHI -3.36
Craig Biggio HOU -3.36

Here I have only three of Click's bottom 10.

Overall, although I wasn't able to fully reproduce Click's data this does validate that the different between the best and worst baserunners in these three scenarios over the course of a season is roughly ten runs or the equivalent of a single win. My failure to reproduce his results is certainly related to his application of park factors and perhaps a different approach to calculating some of the run expectancy values (for example, if a runner scored I credited him with a run expectancy of one plus the new base out situation). In addition, I find that my software does not detect quite as many opportunities for some baserunners as Click shows in his table. For example, Matt Holiday is credited with 39 opportunities by Click while I show him with 37, which may be accounted for by the different sources of our data.

Even so, in the end (and contrary to the title of Click's article) even assuming that the gains were perfectly correlated from season to season (which they aren't as I showed) it probably does not benefit a team much to make baserunning a major factor when making personnel decisions. In other words, this kind of analysis is not all that actionable. However, ceteris paribus (all things being equal), there is a small edge to be gained through this knowledge.

On a side note, looking through my software again while writing this article I found a bug that was more than double crediting opportunities for players that played on two teams in the same season. For example, in my 2004 post I show Carlos Beltran as having 134 opportunities and garnering 226 bases. In reality with no runners on in front of him he had 27 opportunities with the Royals and 24 with the Astros and totaled 86 bases with an IBR of 1.75.

Thursday, March 03, 2005

Free Calvin Pickering

Interesting article in the KC Star by Jeff Passan on the ongoing debate between sabermetrics and scouting revolving around the Royals Calvin Pickering.

It's no secret that I've been one of those "stat geeks" who've long argued that Pickering deserves his chance to play everyday - especially in light of what the Royals have there now (Ken Harvey, aka Grimmace). Passan does a decent job of explaining both the sabermetricians and scouts arguments relating to Pickering and even discusses the PECOTA projections of Baseball Prospectus that I wrote about last month. What is especially interesting is his description of the great shape Pickering is in after a stint at a performance institute in Tempe in the offseason.

"He's 28 now. He's in the best shape of his baseball career, shedding most of the belly. He drinks protein shakes. He still hits behemoth left-handed home runs."

He's stronger and leaner - the major knock against him by the scouting world.

But it occurs to me that this is a discussion that we shouldn't even be having for two reasons.

To help me get through the long winter I bought a copy of the book Weaver on Strategy: The Classic Work on the Art of Managing a Baseball Team by Earl Weaver and Terry Pluto originally published in 1984 but lightly revised in 2002. In the chapter on "The Lineup" Earl says:

"On my bench I emphasized hitting, because the bench guys are around for offense. Often they're not as good defensively as the regulars. What they will do is step into the lineup if some of your nine guys aren't producing offensively."

It seems to me that more recent managers have clearly emphasized defense over offense in the makeup of the bench. One only has to lament the Cubs bench of the last few years that featured such lightweights as Tom Goodwin, Jose Macias, Neifi Perez, Calvin Murray, Paul Bako, Damian Jackson, and Rey Ordonez. But like much of what Weaver says his point makes sense when you give it a moment's thought.

Question: In what ways can a player who doesn't start most effect winning baseball games?

Answer: Obviously, by coming in at a crucial time and getting the big hit that wins a game or starting against a pitcher that gives a particular regular trouble. If the bench player is a good defender he may be used as a defensive replacement but how often have you seen a game won or lost on a play made by a defensive replacement? I would hazard a guess that its more than 10 times more often that a bench player makes his contribution offensively than defensively.

Given the above it is reasonable for a team to devote at least one spot to a player with primarily or even only offensive skills. After all, the Royals broke camp last season with a pinch runner in Rich Thompson (an experiment that was quickly called off after it was apparent the 2004 Royals had no offense and worse pitching).

The second reason we shouldn't be discussing whether Pickering will get a roster spot is because there should be ample room for hitters if teams would stop carrying so many pitchers. Again, in the chapter on pitching Weaver says:

"I know alot of teams carry ten pitchers on the roster, but I believe going with eight or nine."

The reasoning he gives is much like that above - the last position player on the team will help you win more ballgames by performing in key situations than will the 10th pitcher who will pitch only in blowouts. Of course, when this book was published ten pitchers was pretty standard because of the four-man rotation, today teams often carry twelve. I would love to see the Royals (or any team for that matter) go back to the four-man rotation and spare themselves 20% of their games being pitched by a pretty bad pitcher in accordance with Weaver's Seventh Law: "It's easier to find four good starters than five. "

So what should the Royals 25-man roster look like? Here's my wish-list:

OF Terrance Long
OF Eli Marrero
OF David DeJesus
OF Abraham Nunez
OF Matt Stairs
3B Mark Teahan
SS Angel Berroa
2B Tony Graffanino
UT Ruben Gotay
UT Chris Truby
1B-DH Calvin Pickering
1B-DH Mike Sweeney
1B-DH Ken Harvey
C John Buck
C Alberto Castillo

SP Zack Greinke
SP Runelvys Hernandez
SP Brian Anderson
SP Jose Lima
RP Jimmy Gobble
RP Jeremy Affeldt
RP Scott Sullivan
RP Mike MacDougal
RP Jamie Cerda
RP Nate Field

New Look at Wrigley

Just received this link of the proposed changes to Wrigleyville.

It seems like the changes most impact left field where there will be 1,790 new seats and where the new multi-purpose building will be (replacing the car wash and donut shop). That should create a nice area for fans although it will likely reduce the number of homeruns that make it out of the park. A new restaurant in centerfield will also be added as well as some much needed parking.

Wednesday, March 02, 2005

Another Oldie

Looks like I might have missed one:

The 1945 Tigers
1B Rudy York, 32
2B Eddie Mayo, 35
3B Bob Maier, 30,
SS Skeeter Webb, 35
LF Jimmy Outlaw, 32
CF Doc Cramer, 40
RF Roy Cullenbine, 31
C Bob Swift, 30

I think the reason my query didn't identify the Tigers is Bob Maier didn't turn 30 until September 5, 1945. My cutoff was July 1.

Tuesday, March 01, 2005

The Giants of Old

There's been a bit of talk this spring about how old the San Francisco Giants are. Particuarly, the fact that all eight positions will be manned by players over 30 years old. They are...

1B J.T. Snow, 37
2B Ray Durham, 33
SS Omar Vizquel, 38
3B Edgardo Alfonzo, 31
LF Barry Bonds, 39
CF Marquis Grissom, 37
RF Moises Alou, 38
C Mike Matheny, 34

While I assumed this was a pretty rare occurrence I found otherwise by doing a quick check using the Lahman database. I wrote a query that determined the 20 oldest teams through 2004. The oldest team was around 34 years old and as you might imagine the list was dominated by teams of the last twenty years. Better conditioning, medical technology, and investment in players has conspired to produce this result.

I then filtered these 20 and found that these six had players at 30+ years of age starting at all eight positions (I calculated ages based on a July 1 cutoff for the season in question).

1983 Angels
1B Rod Carew, 37
2B Bobby Grich, 34
SS Foli, 32
3B DeCinces 32
CF Fred Lynn, 31
LF Brian Downing 32
RF-DH Reggie Jackson, 37
C Bob Boone, 35
UT Ron Jackson, 30

1998 Orioles
1B Rafael Palmeiro, 33
2B Roberto Alomar, 30
SS Mike Bordick, 32
3B Cal Ripken, 37
LF B.J. Surhoff, 33
CF Brady Anderson, 34
OF-DH Eric Davis, 36
DH Harold Baines, 39
OF-DH Joe Carter, 38
C Lenny Webster, 33
C Chris Hoiles, 33

2004 Mariners
1B John Olerud 35
2B Bret Boone 35
SS Rich Aurilia 32
1B-3B Scott Spiezio 31
RF Ichiro Suzuki 30
CF Randy Winn 30
LF Raul Ibanez 32
DH Edgar Martinez 41
C Dan Wilson 35
UT Jolbert Cabrera 31

1981 Phillies
1B Pete Rose 40
2B Manny Trillo 30
SS Larry Bowa 35
3B Mike Schmidt 31
LF Gary Matthews 30
CF Garry Maddox 3
RF Bake McBride 32
C Bob Boone 33

1989 Tigers
1B Dave Bergman 36
2B Lou Whitaker 32
SS Alan Trammell 31
DH-IF Keith Moreland 35
OF Gary Pettis 31
OF Chet Lemon 34
OF-DH Fred Lynn 37
C Mike Heath 34
1B-DH-OF Gary Ward 35

2001 Diamondbacks
RF Luis Gonzalez 33
1B Mark Grace 37
CF Steve Finley 36
UT Craig Counsell 30
2B Tony Womack 31
SS Jay Bell 35
LF Reggie Sanders 33
3B Matt Williams 35
C Damian Miller 31