Saturday, August 07, 2004

Rejoice and Be Glad

Statistics are the lifeblood of baseball. The true fan cannot discuss baseball without referring to them passionately and often. For many Americans the morning ritual of reading the box scores is a momentary escape and an assurance that all is well with the world. The mere mention of numbers like 73, .406, 755 and 56 are powerful enough to evoke images of ballplayers past and present. The emphasis on statistics throughout baseball history has been one of the truly fortunate (or perhaps divinely inspired?) aspects of the game. The ability to now, with data spanning back well over a century, look back over the stat lines of players long since dead and get a feel for the kind of player they were connects us to that past and helps give baseball its sense of continuity that makes it the National Pastime. Truly, as noted sabermetrician (the word that describes those who work to analyze the stats, the word derived from the acronym of the Society for American Baseball Research or SABR,, to which I proudly belong) Bill James has said, we love baseball statistics because they have acquired the power of language.

Statistics of course are also what fuels the fire of the never ending controversies of who was better than who (or rather whom). Without the baseline of career batting average, how could one even argue whether or not Shoeless Joe Jackson - a career .356 hitter - was better than Ted Williams at .344? Or whether or not Hank Aaron's 755 homeruns were a more impressive feat than Babe Ruth's 714 or Barry Bonds’ 685 and counting? These players never played in the same game and were separated by decades. And although the statistics haven’t always been optimally fashioned or tracked (a situation that the likes of Retrosheet, are attempting to remedy), they do give us a place to begin the discussion.

To understand the depth and richness that statistics provide to the game, one need only reflect on the careers of two players, Josh Gibson and Satchell Paige. Both in their day were considered the best players in the Negro Leagues. Because statistics were rarely compiled and the level of competition varied, it is now impossible to pin down with any certainty how great these players were. Dizzy Dean, one of the best pitchers in baseball in the 1930's considered Paige one of the fastest and best he ever saw. A faint image of Paige’s greatness can be detected in the fact that he pitched effectively in the major leagues from 1948 to 1953 in his mid to late forties (Satchell never seemed to be able to make up his mind whether he was born in 1906 or some other year in the vicinity). Had he played in the majors in his prime he may have racked up enough wins and strikeouts to rank with the greatest in the game. Gibson's power was legendary and he was said to have hit over 80 homeruns in a Negro League season. Could he have challenged or even beaten Ruth? These questions cannot be answered with any degree of certainty or precision at all since there is no baseline to begin the discussion.

In another sense baseball statistics are meaningful because baseball is a team game of individual accomplishments and confrontations. The individual's performance is paramount at any one moment and can be separated from the team's performance. No doubt, what the individual does has a great impact on the success or failure of his team, but what he does is also in a sense separate from the team as well. Statistics also lend themselves to baseball because baseball, in the jargon of my profession as a computer scientist, is "event-driven". By "event-driven" I mean that a baseball game unfolds as a series of discrete events which can be separated and analyzed. A pitch is thrown, what are results? Was it a strike? Did the batter swing? What was the count? Did the runner move? Who field the ball? That event can then be recorded for posterity and tracked (I might add with increasing precision here at

In other games, notably basketball and football, individuals act in concert with teammates to attempt to produce desirable results. In these games it is not so easy to separate the actions of one man from his team. In a typical play in football, 22 players are moving simultaneously and the outcome of the play is dependent on a whole host of variables that are not easily identified. When the play is over what can you record? The ball moved from the 20 yard line to the 23 yard line and Priest Holmes carried the ball before being tackled by Brian Urlacher. While that accounts for two of the 22 players on the field, what do you say about the other 20? Certainly they were an integral part of how the play unfolded and yet there is no easy way to evaluate their performance. In contrast, the confrontation of batter versus pitcher and even fielder versus ball (despite Branch Rickey’s protestation that “There is nothing on earth anybody can do with fielding”) are much more easily recorded and analyzed.

And finally of course, statistics are the glue that binds generations of baseball fans and make baseball unique in its role as a heritage passed down from fathers to sons (or daughters in my case). When a father first tells his son about DiMaggio and 56 or Williams and .406 or when a father and daughter witness George Brett’s 3,000th hit they are speaking a language that connects them to each other and to an ongoing story.

So today as you peruse the box scores take a second to reflect on the fortuitousness before you and along with the Psalmist “rejoice and be glad in it.”

(this is a revision of an essay I previoulsy posted on this blog)

