A few weeks back I wrote a column at Baseball Prospectus titled "The Information Revolution" where I did a little analysis on the data captured with MLB.com's Enhanced Gameday application. Towards the end of the piece I envisioned what detractors might say to the growing amount of information and how it would overwhelm without informing.
"...to me the answer to the problem of overload is in how the information is analyzed, and then how it is presented. Information for information's sake is not the goal. Information analyzed in such a way as to inform the decision-making process on the field and in the front office is the end game. Indeed, more is not always better. More does provide the opportunity for better analysis, though, by broadening the kinds of questions that can be asked, and also the kinds of answers that we'll find."
That process of using new kinds of information to ask better question and find things that were not possible before was illustrated in an excellent essay in The Hardball Times Baseball Annual 2007 titled "Which Way Did It Go? By Greg Rybarcyzk. Greg runs the excellent Hit Tracker web site and after providing some background on homeruns and how they travelled in 2006, he gets down to business in analyzing the information he tracks in the cases of Josh Beckett and Glendon Rusch. Since Rusch is a Cub I found it especially interesting.
For those who care to remember that season that was, Rusch had about as bad a year as one could have. Way back on March 18 my buddies and I were witness to Rusch being struck by a line drive off the bat of Joe Borchard in a game against the White Sox down in Tuscon. From there things got worse. As Rybarcyzk documents Rusch gave up 11 homeruns in 22.3 April innings and wound up the season giving up a remarkable 21 homeruns in 66.3 innings before a blood clot in his lung ended his season on September 12th. Final line: 86 hits and 33 walks in those 66.3 innings, an ERA of 7.46, and a Pitching Runs Above Average (PRAA) of a dismal -20.
Perhaps that March performance really did set the tone because on that day the wind was blowing 20 to 30 miles per hour straight out and Rusch gave up two homeruns in the outing. And according to Rybarcyzk 20 of the 21 homeruns hit off of Rusch enjoyed a tail wind increasing the distance an average of 27 feet. Further, all of them were hit at an altitude of 535 or greater. Both factors were highly unlikely. Taking the distance, altitude, and direction of all of Rusch's homers and placing them in the context of Shea Stadium (elevation 10 feet with an average wind speed of 10 mph blowing in from left field) he finds that only two of the 21 homeruns would have left the park and only one other ball would likely have been a double off the wall. Of course, more realistically he then removes the wind factor and calculates that 9 of the balls would have gone out with one additional double. Yes, Rusch pitched poorly in 2006 but was also the recipient of some awful luck that made it look worse than it was.
As a side note luck seems to effect Rusch more severely than other pitchers as I've noted in the past his large fluctuation in BABIP from .386 in 2003, to .298 in 2004 to .350 in 2005.
The ability to do this kind of analysis only grows with the availability of information. Obviously a team could use this kind of analysis to help make a decision on whether to give a guy like Rusch another shot or even project what a player might do in another ballpark (Rybarcyzk's analysis of Beckett is interesting in this regard). As the data set is expanded to include not only homeruns but all batted balls, a whole new set of adjustments can be created that not only remove the effects of wind and altitude but can also go a long ways towards factoring out luck in general. And as the layers of the onion are peeled back we'll get closer to the skills and attributes that can better inform the decision making process.