FREE hit counter and Internet traffic statistics from freestats.com

Monday, May 17, 2004

Statistics and Baseball

This is a short essay on the relationship between statistics and baseball that I wrote 10 years ago and have just found among some old papers.


Statistics are the lifeblood of baseball. The true fan cannot discuss baseball without referring to them passionately and often. For many Americans the morning ritual of reading the box scores is a momentary escape and an assurance that all is well with the world. The mere mention of numbers like 2,130, 56 and 61 are powerful enough to evoke images of ballplayers past and present. The emphasis on statistics throughout baseball history has been one of the truly fortunate or perhaps divinely inspired aspects of the game. The ability to now, with data spanning back over a century, look back over the stat lines of players long since dead and get a feel for the kind of player they were connects us to that past and helps give baseball its sense of continuity that makes it the National Pastime.

Statistics are also what fuels the fire of the never ending controversies of who was better than who (or rather whom). Without the baseline of career batting average, how could one even argue whether or not Shoeless Joe Jackson (a career .356 hitter) was better than Ted Williams (.344)? Or whether or not Hank Aaron's 755 homeruns were a more impressive feat than Babe Ruth's 714? The players compared above never played in the same game and were separated by a decade or more. Statistics give us a place to begin the discussion.

To understand the depth and richness that statistics provide to the game, simply think of the careers of two players, Josh Gibson and Satchell Paige. Both in their day were considered the best players in the Negro Leagues. Because statistics were rarely compiled and the level of competition was not always major league caliber, it is now impossible to pin down with any certainty how great these players were. Dizzy Dean, one of the best pitcher in baseball in the 1930's considered Paige one of the fastest and best he ever saw. Paige in fact pitched effectively in the major leagues from 1948 to 1953 in his mid to late forties (Satchell never seemed to be able to make up his mind whether he was born in 1906 or some other year in the vicinity). Had he played in the majors in his prime he may have racked up more wins and strikeouts than any pitcher in history. Gibson's power was legendary and he was said to have hit over 80 homeruns in a Negro League season. Could he have challenged or even beaten Ruth? These questions cannot be answered with any degree of certainty or precision at all since there is no baseline to begin the discussion.

In another sense baseball statistics are meaningful because baseball is a team game of individual accomplishments and confrontations. The individual's performance is paramount at any one moment and can be separated from the team's performance. No doubt, what the individual does has a great impact on the success or failure of his team, but what he does is in a sense separate from the team as well. Statistics also lend themselves to baseball well because baseball, in the jargon of my profession as a computer scientist, is "event-driven". By "event-driven" I mean that a baseball game unfolds as a series of discrete events which can be separated and analyzed. A pitch is thrown, what are results? Was it a strike? Did the batter swing? What was the count? That event can then be recorded for posterity and tracked. In other games, notably basketball and football, individuals act in concert with teammates to attempt to produce desirable results. In these games it is not so easy to separate the actions of one man from his team. In a typical play in football, 22 players are moving simultaneously and the outcome of the play is dependent on a whole host of variables that are not easily identified. When the play is over what can you record? The ball moved from the 20 yard line to the 23 yard line and Emmett Smith carried the ball before being tackled by Lawrence Taylor. While that accounts for two of the 22 players on the field, what do you say about the other 20? Certainly they were an integral part of how the play unfolded and yet there is no easy way to evaluate their performance. In contrast, the confrontation of batter vs pitcher and fielder vs ball are much more easily recorded and analyzed.

The games themselves (football and basketball) involve a flurry of action from multiple players simultaneously and hence are more interesting on television - a fact which accounts for their increase in popularity in the age of television. We can get some sense that Joe Montana is a great quarterback because of his completion percentage and his touchdown to interception ratio but put him on another team with worse linemen and worse receivers and his statistics will change markedly. But still because of his position, quarterback, Montana is the most visible player on the field and can be more easily tracked than say, the left guard or center. So too in basketball it is clear that a player's statistics are heavily influenced by his teammates. For instance Byron Scott is a great shooting guard, but when he played with Magic Johnson at the point guard position, he became that much better since he received more fastbreak attempts and more open shots than he would have with merely an average teammate. How much better was he? That's hard to say since now Scott players with another point guard. However, if you take a veteran baseball player with a career average of .275 and trade him, you can be reasonably assured that he will perform likewise for his new team (taking into consideration the effects of age and ballpark to be sure).

So we are luckily indeed that baseball lends itself so well to being recorded so that we have data to endlessly analyze and discuss. The role of sabermetrics is to tease out of the observations the truth that lies behind them.

No comments: