While winging my way to unseasonably cool and drizzly Phoenix for four days of spring training with family and friends I had a chance to read the 2005 Baseball Research Journal, which I received as a benefit of SABR membership just last week.
Immediately my attention was turned to Bill James' contribution titled "Underestimating the Fog" This article has also been the topic of a bit of discussion on the SABR-L list since its publication.
In short, James' argues that some of the best known negative sabermetric conclusions should not really be viewed as conclusions at all, but rather simply as non-answers to questions under study. In particular he discusses the following conclusions:
- There is no such thing as a clutch hitter. Deviations of performance in clutch situations are essentially random. Cyril Morong has a nice reference list of studies on clutch performance here.
- There is no such thing as an "ability to win" in a pitcher. In other words, there is no skill beyond preventing runs from scoring that allows a pitcher to win games. There are no "clutch pitchers" who have the ability to eek out wins and likewise there are no pitchers who are "losers".
- Winning or losing close games is luck. In other words a team that wins a lot of close games in a season does so by good fortune rather than some collective ability to pull such games out. Such teams then regress to the mean the following year.
- Catcher's have little or no impact on a pitcher's ERA. When a pitcher does well with a certain catcher behind the plate it is luck.
- A pitcher has little or no control over how many hits he gives up per inning other than through strikeouts and giving up homeruns. This is the Voros McCraken observation I wrote about recently.
- Base running has no positive impact on runs scored other than through base stealing. In other words, if a team scores more runs than would be predicted by the combination of their hits and walks, it is just luck.
- Batters have no individual ability to hit well or poorly against left-handed pitching. However, there is a strong group tendency to do so.
- There is no such thing as a "streaky hitter" in either the positive or the negative senses.
- There is no "protection effect" for hitters in a lineup. In other words, the quality of hitters before and after a particular hitter has no effect on the performance of the hitter.
James goes on to criticize the common technique employed in various sabermetric studies that typically cited to "prove" these conclusions - for example Dick Cramer's famous 1977 Baseball Research Journal article on clutch hitting and James' own look at platoon differentials in the 1988 Baseball Abstract. That technique involves the search for recurrence or persistence of the phenomena being studied. In other words, in each of the cases above studies were done that attempted to determine if the effect (clutch hitting, catcher's ERA, winning close games etc.) persisted across seasons. In each case repeated studies have shown that it doesn't - therefore the effect is, in the words of James, "transient" and not "persistent". That which is not persistent is then assumed not to be real.
James then argues that in many of these cases the negative conclusion - the phenomena is not real - is flawed because there is too much instability in the data used to make the conclusion. For example, the conclusion that there is no specific ability to hit well or poorly against left-handed pitching is based on platoon differentials where the number of plate appearances against left-handed pitchers is around 120 in a season. The randomness involved in such a small sample size tends to swamp the differential itself, thereby making the results meaningless. James notes that Cramer's original study of clutch hitting was flawed for the same reason.
Of course, sabermetricians have always cautioned against drawing conclusions based on small sample sizes and so James' cautions here are well taken. But I believe he was also making a more subtle point: Sometimes the magnitude of the phenomena under study - if it exists - is smaller than the magnitude of the normal variation in the statistics we use to try and study it. In other words, while a skill like clutch hitting may indeed exist in the real world, the noise or fog in the data used to try and measure it will obscure our finding and measuring that skill. These are two different problems and it seems to me that the former is solvable by accumulating more trials while the latter is not.
Having said that, I'll take a pass through the nine conclusions above and discuss whether their conclusions are in question because of small sample sizes or skills with effects too small to measure. Taken one by one...
- It would seem to me that larger sample sizes (contrary to Cramer's study), spanning entire careers for example, would help us to understand whether clutch hitting is indeed a skill. In that regard Tangotiger has come to the conclusion that clutch hitting does indeed exist, but that it is rather rare and not that important. From that perspective sabermetric analysis has done a useful service in pointing out that clutch hitting simply cannot be that important since its effects are so hard to measure.
- While I haven't seen or done any long term study on this subject, this too would seem to need more than a comparison across seasons and would also need to take into account run support and league contexts. In other words this is likely a case of inadequate sample sizes, however, I have little doubt that the conclusion is essentially correct.
- The premise of this conclusion is based on inherently small sample sizes involved in one-run games in the course of two seasons. As a result, I think James' main point clearly applies to this conclusion. Further, in this scenario there is no way to increase the sample size since team composition can change radically from season to season, thereby making it impossible to make comparisons across a set of seasons.
- This conclusion too is based on small sample size but has an additional problem that applies to James' second point. While increasing the sample size in an effort to tease out the effect would be useful, so many other factors (other defenders primarily) go into run prevention that ERA may simply be too blunt a tool to use.
- In this area James concedes that McCraken's observation has the equivalent of a "stable platform" against which to judge that historically 70% of the balls put into play result in outs. Coupled with the fact that starting pitchers often face over 1,200 batters per season resulting in fairly stable data, this conclusion appears pretty solid. Even so, a longer term study done by Tom Tippett have shown that there is indeed an element of skill, small though it may be.
- This conclusion is especially interesting to me. Measured at a team level I would agree that the collection of normal offensive elements that ignore base running beyond stolen bases will accurately predict the number or runs scored. In other words, when taken collectively, base running other than stolen bases tends not to be a factor in predicting the number of runs scored. Of course, the reason could simply be that good and bad base runners on a team tend to cancel each other out coupled with the fact that the effect of good base running is simply not that large. I've done some work in this area and like James feel that "base running can be measured in simple, objective terms". My research and that of Baseball Prospectus indicate that good base runners can pick up as many as five runs per season for their team. The persistence of that base running effect of course is the question and I've found a positive correlation using 2003-2004 data indicating that there is indeed something real here.
- To me it seems obvious that this effect could be measured simply by increasing the size of the sample. So while I agree that season to season platoon differentials are not very useful, career ones should be. This is buttressed by the recognition that there is a group effect. In other words, everyone agrees there is a differential; it simply cannot be reliably used in the context of a single season because of small sample sizes.
- The existence of streakiness seems to be unlike the other conclusions in that it does not rely on small sample sizes nor need larger sample sizes in order to study. If I understand the studies that have been done in this area they generally conclude that hitting streaks simply do not historically occur more often than would be predicted by a random model. Having said that, Albert and Bennett in Curve Ball used a model that showed that a few players may indeed be streaky. In a post a couple months back I talked about how the author of A Mathematician at the Ballpark is convinced that streakiness is a persistent phenomena in sport based on studies of bowling. I'm not convinced these have much to do with baseball, however. At this point I'm willing to concede that streakiness may be a real phenomena but it falls under the radar of normal variation per James' second point.
- As far as "protection effect" is concerned I don't have much to offer. However, it would seem that there are probably few plate appearances where the hitter behind or ahead of the batter actually has an influence on the pitcher. And so if there is an effect I would venture that it requires study across a multiple seasons and even if present it probably falls into James' second category.