FREE hit counter and Internet traffic statistics from freestats.com

Wednesday, August 04, 2004

Where have the .400 hitters gone?

This is a question that comes up from time to time among baseball fans who lament that the giants of yesteryear - Ty Cobb, Rogers Hornsby, and Ted Williams among them are long gone and that we'll never see their kind again.

If one doesn't accept the premise that these immortals were truly superior to modern stars such as George Brett, Tony Gwynn, and Ichiro Suzuki what accounts for the disappearance of the .400 hitter? Some of the factors that have been discussed in recent years that might cause modern hitters to be at a disadvantage include changes in the game itself - the increase of pitching specialists, the development of the slider and split finger fastball, the prevalence of night games, and intracontinental travel. But do any of these hold water?

Recently, this topic came up again on the SABR listserv in the context of sabermetric references made by the late Harvard paleontologist and baseball fan Stephen Jay Gould in his book Triumph and Tragedy in Mudville: A Lifelong Passion for Baseball. The discussion revolved around an essay that Gould wrote for Discover magazine in 1986 and that was reprinted both in his 1996 book Full House and Triumph and Tragedy under the title "Why No One Hits .400 Any More?"

In epitome Gould's argument was that .400 hitters haven't disappeared because of cosmetic changes in the game or that the heroes of the past were supermen, but rather as the natural consequence of of an increasing level of play that comes closer to the "right-wall" of human ability coupled with stabilization of the game itself. These factors tend to decrease the differences between average and stellar performers. As a result, although the mean batting average has remained roughly .260 since the 1940s, there are now fewer players at both the left and right ends of the spectrum. In other words, "variation in batting averages must decrease as improving play eliminates the rough edges that great players could exploit, and as average performance moves towards the limits of human possibility and compresses great players into an ever decreasing space between average play and the immovable right wall."

To support his argument Gould (actually his research assistant) calculated the standard deviation (a measure of the spread in values assuming a normal distribution) of batting averages over time and plotted them in a graph and presented the following table showing also the coefficient of variation (the standard deviation divided by the mean useful for comparing distributions with different means).

Decade Stdev Coeff
1870s .0496 19.25
1880s .0460 18.45
1890s .0436 15.60
1900s .0386 14.97
1910s .0371 13.97
1920s .0374 12.70
1930s .0340 12.00
1940s .0326 12.23
1950s .0325 12.25
1960s .0316 12.31
1970s .0317 12.13

This certainly shows a trend towards decreasing variability over time. Gould's conclusion was that this decreasing variability was due to refinments in the game (standardized techniques for pitching starting in the 1880s, the introduction of gloves, stabilization of the number of balls and strikes, and refinement of strategies) coupled with the entire system moving farther towards the limits of human ability in much the same way that track athletes move closer to that wall with each Olympics thereby decreasing the variation in sprint times. However, since Gould's data was published almost 20 years ago I decided to take a look and see if I could reproduce Gould's data and add data for the last two plus decades.

To do so I used the Lahman database and calculated the league batting average and OPS for the 250 league seasons included in the database. I then selected the 74,277 seasons where a player batted at least once calculating their batting average, OPS, and plate appearances along with their respective league averages. Finally, I selected all of the players with more than 2 at bats per game (relative to the league schedule) which pared the list down to 18,104 seasons. From these I calculated the standard deviation (using the league average) and coefficient of variation by decade and produced the following table.

Decade Seasons Stdev Coeff

1870s 587 .0508 18.60
1880s 1189 .0423 16.85
1890s 1004 .0402 14.57
1900s 1110 .0373 14.68
1910s 1220 .0372 14.56
1920s 1194 .0369 12.93
1930s 1233 .0349 12.53
1940s 1149 .0329 12.64
1950s 1145 .0334 12.88
1960s 1448 .0319 12.83
1970s 1867 .0316 12.33
1980s 1970 .0299 11.54
1990s 2103 .0311 11.73
2000s 885 .0310 11.71

As you can see I wasn't able to recreate Gould's results precisely for some reason but got very close in several decades including the 1970s (.317 to .316) and the 1910s (.371 to .372). Overall there appears to be less spread in my data in the early years and more in the latter years, again for unknown reasons. I've tried several different cutoffs (200 at bats, 250 at bats, etc.) but none have come any closer to reproducing Gould's numbers. It is also possible that Gould used a different dataset that was less complete for the years prior to 1900. The Lahman database includes the National Association for 1871-1875 and the American Association for 1882-1891, the Player's League for 1890 and the Federal League for 1914-1915. Adding more player seasons will tend to decrease the standard deviation. Overall though, the trend towards lower standard deviations seems to continue with the addition of the 1980s through 2003 as the three lowest standard deviations and coefficients of variation are for those three decades.

I then produced the following scatter plot that shows the standard deviations for each year.



Intuitively, it seems as if Gould's argument holds. However, several suggestions and points of discussion that were brought out on the SABR-L list included:

  • The analysis shows that standard deviations have fallen over time but much less so since the 1940s. Many then agreed that the stabilization of the game had occurred by the 1940s.
  • In order to test Gould's hypothesis some argued that higher standard deviations should be found for expansion years since the talent pool expands letting in more players, some of whom would not have been in the major leagues the previous year. When looking at the years 1901, 1961, 1962, 1969, 1977, and 1993 there is no evidence that the standard deviations were greater in these years. The reason this study doesn't find that result may be because when looking at players with 2 at bats a game or more in an expansion year you're really looking at players who were already bonafide major leaguers but who simply didn't get the at bats before expansion. When lowering the cutoff to 50 at bats you do see small increases the stdev in 1901, 1962, 1969, 1977, and 1993.
  • The league average has not been consistently .260 and so some argued that in order to perform the calculation you need to standardize the averages. I reran the numbers computing the average as AVG/lgAVG*0.260 and produced the following table. These results are very similar to the first set although the variation appears more consistent from the 1920s through the 1970s before dropping in the 1980s and later:

Decade # Stdev
1870s 587 .0483
1880s 1189 .0439
1890s 1004 .0379
1900s 1110 .0381
1910s 1220 .0379
1920s 1194 .0336
1930s 1233 .0325
1940s 1149 .0328
1950s 1145 .0335
1960s 1448 .0334
1970s 1867 .0321
1980s 1970 .0300
1990s 2103 .0305
2000s 885 .0305

  • To be more precise some argued that you should weight the averages by the number of at bats. However, since we're already selecting those who garnered significant playing time with greater than 2 at bats per game (and I'm too lazy to do the weighted calculation) I doubt weighting would change the results very much.
  • Perhaps the disappearance of the .400 hitter has more to do with an emphasis on power over average since the 1920s as some speculated. In other words, hitters are knowingly sacrificing average for power in the modern era. I have no doubt that generally this is true as both the increase in strikeouts and the increase in the diversity of skills of batting champions shows. I'm just not sure that there hasn't always been a substantial population of players who have focused on average and who would test the limits of singles hitting. In addition, major league baseball and the general public still hold batting average in high regard and so players are still rewarded for high averages over on base percentage.
  • Others contended that the stdev of other measures such as OPS and SLUG have increased over time or held steady and so would put a hole in Gould's hypothesis. I don't think that's the case since clearly the increase in slugging percentage (for example after 1920) doesn't reflect on the talent level but rather on many players adopting a different style of play coupled with rule change which would naturally increase the standard deviation. On a side note some mentioned that Gould's analysis was fatally flawed since it takes into consideration only batting average, a dubious although ubiquitous, measure of offensive value. I wouldn't disagree if the discussion was about pure offensive value. The question however, is what happened to .400 hitters, which by definition looks at batting average.
  • Other argued that in order to test Gould's hypothesis you should really be looking at the percentage of players several standard deviations above the league average since the players in the population are not a normal distribution but rather the right hand tail of the distribution (since players much below the league average won't get enough at bats while players above the league average will). I performed this calculation selecting only those players who had a batting average greater than league average plus 2.5 times the standard deviation. The percentage of players per decade is shown below. Interestingly, one would expect a higher percentage of players in the early years although this is not the case. The percentage begins to drop only in the 1960s. Why this is the case I don't know.

Decade %Players #Players
1870s 0.0153 26
1880s 0.0162 86
1890s 0.0127 112
1900s 0.0172 153
1910s 0.0181 248
1920s 0.0167 226
1930s 0.0161 166
1940s 0.0174 261
1950s 0.0177 236
1960s 0.0138 322
1970s 0.0110 311
1980s 0.0108 328
1990s 0.0097 407
2000s 0.0091 198

  • Finally, another way to look at this problem as pointed out by Bill James is to calculate the difference in standard deviations from .400 for the various decades. To do this I took the players with greater than 2 at bats per game and calculated their average batting average by decade. Then I subtracted the average from .400 and divided by the standard deviation for the decade to figure out how many standard deviations the average player in the study was away from .400 during that time period. This analysis showed that indeed players in the 1920s who qualified hit .300 with a league standard deviation of .0369 which puts them only 2.7 standard deviations away from .400. On the contrary in the 1990s the players that qualified hit .276 with a stdev of .0311 putting them 3.98 standard deviations away from .400. The higher average combined with higher standard deviations made it statistically more likely that someone would hit .400 as evidenced by the number of .400 hitters per decade (7 in the 1920s, 0 in the 1990s). It appears from this that both factors, the higher relative averages and the increased standard deviations, were in play to account for the prevalence of .400 hitters. What this also shows is that since league averages have risen in the past 11 years the odds are now slightly better that someone will hit .400 than they were in the 1970s and 80s (although not as high as in the 1930s through 1950s).


Decade # .400 hitters AVG Stdev Stdev from .400
1870s 587 9 0.275 0.0505 2.466
1880s 1189 3 0.261 0.0423 3.283
1890s 1004 11 0.286 0.0402 2.837
1900s 1110 1 0.268 0.0373 3.544
1910s 1220 3 0.272 0.0372 3.445
1920s 1194 7 0.300 0.0369 2.705
1930s 1233 1 0.294 0.0349 3.050
1940s 1149 1 0.275 0.0329 3.800
1950s 1145 0 0.275 0.0334 3.733
1960s 1448 0 0.265 0.0319 4.218
1970s 1867 0 0.269 0.0316 4.138
1980s 1970 0 0.269 0.0299 4.369
1990s 2103 0 0.276 0.0311 3.983
2000s 885 0 0.278 0.0310 3.941

Conclusion
So in the final analysis what does it all mean? My tentative conclusions are:

  • Based on intuition and observation Gould was certainly correct that baseball players are better than they were in the past and that the game is better played now than ever before. Analogs with basketball and track make this apparent.
  • Gould was partially correct that decreasing standard deviations do in fact record an increasing and standardized level of play. However, decreased league batting averages also played a role in the disappearance of the .400 hitter.

For other analyses on this question see this article and this one.


1 comment:

Anonymous said...

What do you know Perfect World Gold. And do you want to know? You can Buy Perfect World Gold here. And welcome to our website, here you can play games, and you will get Perfect World Silver to play game. I know Perfect World money, and it is very interesting. I like playing online games. Do you want a try, come and view our website, and you will learn much about cheap Perfect World Gold. Come and join with us. We are waiting for your coming.