Dan Agonistes: The Baseball Same Game

at 5:56 AM

Recently Stephen Lombardi was kind enough to send me a copy of his recently published book The Baseball Same Game: Finding Comparable Players from the National Pastime (224 pages, iUniverse Books). I got a chance to read it over the holiday weekend and give you a glimpse into the methodology and structure of the book.

In the book Stephen presents 65 "cases", each comparing two players who had very similar career statistics. The statistics, or as he points out better called "performance data", he uses for position players are:

Games played
Plate appearances
Runs Created Above Average (RCAA)
Offensive Winning Percentage (OWP)
OPS vs. League
Runs Created per Game (RC/G) vs. League

Of course, what I like about this approach is that he uses sabermetric performance measures to make his comparisons rather than the traditional runs, RBIs, homeruns, and batting average. These measures serve as a much better foundation since they more closely track with producing runs and therefore winning baseball games. In the case of two of the measures that use Runs Created as a basis (RCAA, OWP) they are also adjusted for the park the player played in thereby creating a level playing field when players find themselves playing in extreme parks. I also like that when he uses OPS and Runs Created he places them in the context of the league in which the player played.

By including games played and plate appearances he contextualizes the other stats so that the comparisons are meaningful (for example two players with an OWP of .560 are not really comparable if one had 1,000 career plate appearances and the other 10,000). All together then, these measures do a great job of finding comparable offensive players.

While I do think this methodology is on the right track, I have two criticisms. First, OPS vs. League and RC/G vs. League are not park adjusted. As I’ve discussed previously OPS can easily be normalized for both park and league to be a more accurate measure. I’m not sure why RC/G vs. League could be adjusted as well.

Second, within the cases themselves the measures are presented as follows as in case #1.


                Roy Campanella Sixto Lezcano
G                       1215        1291
PA                      4816        4814
RCAA                     135         146
OWP                     .586        .606
OPS vs. League          .107        .084
RC/G vs. League         1.27        1.17

As you can see the measures against league are presented as the difference between the league average and the player measures. In this case Campanella created 1.27 runs per game more than the league average and had an OPS .107 higher than the league average.

In thinking about this presentation it occurred to me that it would tend to help players who played in eras when more runs were scored. This would be the case since the run environment dictates the cost (in runs) of additional wins. For example, using the Pythagorean formula a team that gives up 700 runs and scores 700 runs will obviously win 81 games in a run environment where 4.32 runs per game are scored. To get to 82 wins, they need to score 710 runs. Therefore the cost of that additional win was 10 runs (they could also get that extra victory by giving up five fewer runs and scoring five more). If, however, a team gives up 600 runs and scores 600 (a run environment of 3.70 runs per game) that 82nd victory can be purchased at the cost of eight runs. As a result, a player that created 1 run more per game in the 4.32 run per game environment would not be as valuable as a player who created 1 run more per game in a 3.70 run per game environment. A better comparison would be to divide the players RC/G by the league average and multiply by 100 to produce a value centered around 100. In fairness to Stephen this is essentially what offensive winning percentage (OWP) does by taking the RC/G and dividing it by the sum of the square of the RC/G and the league average of the runs score per game. The same argument can be used for OPS where a difference of .107 in a lower scoring league (1968) means more than the same difference in a high scoring league (2001).

On the pitching side he uses:

Innings Pitched
Runs Saved Above Average (RSAA)
ERA vs. League
K to BB ratio vs. League
Base runners allowed per 9 IP vs. League
K per 9 IP vs. League

Of these, RSAA (along with RCAA) is a sabermetric measure created by Lee Sinins of Sabermetric Baseball Encyclopedia fame. RSAA measures the number of runs the pitcher saved over and above an average pitcher given the same number of innings and league and is park adjusted.

As with the offensive measures that are compared to the league average, these have the same issue of relative importance given the run, base runner, or strikeout and walk rates for the league in question.

As for the cases themselves there are some very interesting ones. I especially enjoyed seeing the similarities between Tom Seaver and Christy Mathewson, Mark McGwire and Johnny Mize, George Brett and Sam Crawford, Fergie Jenkins and Eddie Plank, and Barry Larkin and Jim Rice. These are especially interesting when the players made their contributions in very different ways as in the case of Larkin and Rice and even seen in Mize and McGwire. In each case Stephen provides a brief synopsis of the two careers and often uses the chance to discuss the relative merits of one or the other player for the Hall of Fame or makes a point about some other aspect of their careers that contributes to a fan's perception of these players - for example, in the case on Willie Hernandez and Jeff Reardon discussing the value of the Save statistic. I enjoyed the readability of the book and found lots of nuggets of info I hadn't heard before.

In some of the cases, such as Campanella and Lezcano above, Larkin and Rice just mentioned, and cases like Bruce Sutter and Smokey Joe Wood, I tend to question the value of the comparison not because the numbers don’t line up but because the players in question played different positions. Stephen fully realized that this would cause a bit of controversy and noted on his web site that

“my point is that once you step into the batter's box, you're a hitter, and you should not be given extra credit (or lose something) because of the position you play in the field when you are not batting.

Giving someone ‘extra’ or an ‘adjusted’ offensive value is his relative batting results because of his position in the field implies that just playing that position in the field provides a batting benefit to his team.”

While that reasoning makes perfect sense my preference would have been to make comparisons by ranges in the defensive spectrum so that the comparisons would have been a bit more meaningful. For example, while Larkin and Rice may have had similar offensive numbers, Larkin was far more valuable because he played a more demanding defensive position. And I’m not sure that comparisons between starters and relievers (Sutter and Wood for example) make much sense since it has been shown that reliever’s ERAs are not really equivalent to starter's ERAs and therefore should be adjusted based on something like Component ERA.

And as Stephen says, these are his cases and you could find your own, but I'm glad he did the work. Overall, it's an interesting book that I'd recommend.

Monday, June 13, 2005

The Baseball Same Game

3 comments:

Ads

Links

Now on Baseball Prospectus

MLB News From Ballbug

washingtonpost.com - George F. Will -- Washington Post Opinion Writer (washingtonpost.com)

Scriptorium Daily

Blog Archive

Categories

Baseball Links

Baseball Books Reviews

Articles on Other Sites

Best Of...Other Posts

Books and Book Contributions

Baseball Blogs

Other Blogs

Xbox 360

About Me