FREE hit counter and Internet traffic statistics from freestats.com

Tuesday, December 11, 2007

Searching for Mr. Clutch

I'm sure many of you read the recent (SI calls it "provocative") piece by Bill James on Sports Illustrated.com titled "Mr. Clutch: Big Papi, Chipper, Pujols come through when it counts". I was reminded of the piece when I found my copy of the 2008 Hardball Times Annual patiently waiting in the mailbox over the weekend, where of course, the article is also printed.

Now, I'm certainly among those who view James as a pioneer and even a person who had an impact on my development. In fact, I submitted a snippet about how the Abstracts impacted me at just the right time that was printed in the book How Bill James Changed Our View of the Game of Baseball which said:


"When I was a teenager, I loved baseball and started dabbling in analysis of its numbers - even searching for information on platoon splits at the library at Cooperstown on a summer trip with my family. For me, reading my first Abstract in the spring of 1984, and particularly its introductory essays "Inside-Out Perspective" and "Logic and Methods in Baseball Analysis" was validation that there was an expanded way to view the game. From that point forward, my view of the game changed to one where the primary question I asked myself was not 'what happened?' but 'why did it happen?' That mindset, based on logical reasoning and inculcated by reading Bill James, has served me well in areas that transcend baseball."

So I'm always interested in what James has to say and am pre-disposed, I would guess, to be less critical than others. But when I read this piece by James I was a little disappointed for two primary reasons.

First and foremost, the article seems to promote the idea that after the now famous study titled "Do Clutch Hitters Exist?" published in the 1977 Baseball Research Journal by Dick Cramer, that little to no work has been done on the subject of clutch hitting and that what has been done has had an in-grained bias. Quite to the contrary, the topic has been the subject of almost continual debate with a variety of studies published over the years as documented on Cyril Morong's fine site. And more recently there have been several very good analyses done as I discussed in the introduction to my Schrodinger's Bat column of March 1, 2007.


The controversy was never more in evidence than in the spring and summer of 2005 when, in the wake of the Bill James piece "Underestimating the Fog" (warning: pdf) published in Volume 33 of the Baseball Research Journal, there was plenty of point-counterpoint in the analysis community.

BP's own James Click got in on the act with
two interesting articles in the fall of 2005, where he used Keith Woolner's Win Expectancy (WX) framework combined with first VORP and then Marginal Lineup Value (MLV) to generate measures he termed PrjWINS and Clutch. He concluded that the correlation for Clutch from year to year, and even over halves of a career, indicated that the measure was "nearly completely random."

The issue was again resurrected after the publication of Tom M. Tango, Mitchel Lichtman, and Andrew Dolphin's The Book in early 2006, wherein the authors noted that there is indeed a small player-to-player variation in clutch skill, and measured that one in six players increase--and a comparable number of players decrease--their on base percentage by eight points or more when faced with pressure situations (defined as any situation in which the batter's team is trailing by one, two, or three runs in the eighth inning or later). The spread decreases to six points when using their weighted on base average (wOBA) metric, and when regressed to the mean the wOBA skill maxes out at around two points.

Around the same time Nate Silver (with his chapter "Is David Ortiz a Clutch Hitter?" in
Baseball Between the Numbers, get your copy in paperback today) used a similar approach to Click with WX and a modified version of MLV, but also included Leverage to create a measure also termed Clutch. After crunching the numbers, he found that players with higher walk and lower strikeout rates do perform slightly better than would otherwise be expected. Overall, he concluded, clutch hitting accounts for something on the order of two percent of what it takes to produce wins at the plate.

So when James implies that the discussion has "stalled out" and that no good work (at least no work worth mentioning) has been undertaken by the sabermetric community, I'm kind of at a loss since not only have there been a variety of studies, James himself participated in the debate as recently as 2005. The tone of his article is misleading at best and intellectually dishonest at worst. Although I can understand simplifying things for the more general audience of Sports Illustrated, publishing the same version in THT's annual (where the audience is well aware of everything I've written here) makes the piece seem oddly out of place among essays by Dave Studeman, Tom Tango, John Walsh, John Beamer, and Mitchell Lichtman among others.

Secondly, as concerns the methodology James has added the opposition, standings, and calendar to the attributes accounted for already in Win Expectancy (and used by Silver in the BBTN chapter for example) to come up with the criteria used to select just what are clutch plate appearances. As far as I know that in and of itself is novel. It’s just too bad he doesn’t give us any insight into how he’s weighting these factors since without that information others in the community are not able to provide much insight into the technique. Still, I'm prepared to give him a pass on that one since he may be refining things as he moves forward.

But what I think we can assume from the article is that he's saying that by adding the three new factors, the seasonal or career correlations in these situations will have a better chance of being statistically significant thereby allowing one to measure a clutch skill beyond the small effect Silver and others have found. I’m more than a little skeptical on several grounds and wonder why he didn’t at least publish some preliminary results aside from the comment "there may be a decentralization under pressure, the good hitters getting better and the weaker hitters struggling to stay where they are."

My skepticism includes the fact that adding additional criteria reduces the sample size (as evidenced by Mike Sweeney's 29 "clutch" AB in 2005) which will make it more difficult to detect a clutch skill if it exists, and my intuition that says that opposition, standings, and schedule would seem to have the least influence of all the factors (major league baseball is fraught with intense selection pressure and so I find it hard to believe that the difference between Tampa Bay playing Texas in April and San Diego playing Los Angeles in September will be discernible). But of course on this second point I don't really know and that's the problem. Instead of doing a little analysis or waiting until he had something at least preliminary to say, he cherry-picked seven players and showed their clutch batting records for seasons since 2002. While that makes for a nice sub-head on SI.com and gets the article mentioned everywhere, it tells us...absolutely nothing. But worse, by showing these seven players only, the implication left to the general reader is that this method will show that clutch hitting is a skill with a larger magnitude than most people think. That's getting the cart ahead of the horse to say the least.

As James has said in the past:

"It is a characteristic of statisticians that they see the game by the thousands. It's a way of looking at the biggest possible picture of the game. Backing away from it a great distance and trying to see patterns that aren't apparent close up."

Maybe someday he'll get to the patterns but given the following quote from the article, I won't be holding my breath.

As to whether these data prove that David is a clutch hitter ... I ain't going there. This discussion has been messed up for 30 years because we got our shoulders way out in front of our shoelaces. From now on, I'm holding back.

And finally, there's the smaller question of why be restricted to six years of data? I realize BIS has complete data going back that far but given the parameters he mentioned in the piece I see no reason why he couldn't run his system on all of the Retrosheet data in a day. He mentions that they'll get to it but one wonders what's holding them back.

In the end though, there is another way to look at this piece and that's through the prism of James upcoming book The Bill James Gold Mine 2008 due to be published in February. Under this view the piece was little more than an ad - just not a very good one. Because I've always loved James' writing and have learned immense amounts from his historical commentary and analysis, I will of course be purchasing a copy (and in fact have pre-ordered it). I just hope the actual content far exceeds the trailer.

5 comments:

Anonymous said...

Dan:
Excellent post. A very balanced assessment, I think. Like you, I am a huge James admirer, but one who has been disappointed in much of what James has written over the past few years.

After reading his Bert Blyleven article in last year's THT annual, which essentially explored whether Blyleven wan an "unclutch" pitcher, I made the following observation. Unfortunately, I think it applies here as well:

"The piece seems less important for what it tells us about Blyleven (little new) than what it says about James. In this piece, as in his "Fog" article and his piece on non-random hitting clusters, James seems to want to find evidence that there's more meaning -- and less luck -- in baseball performance than most of us assume. Less luck, in fact, than most of us were taught existed long ago by one Bill James. Similarly, Win Shares focused on evaluating players within the context of their team, a departure from James' earlier writing that focused on methods for gaining a clearer picture of players' contributions by extracting them from their context (team, park, etc.).

One gets the sense that James is responding to, or has sympathy for, the argument that statistical analysis has gone too far in removing players from the "real game," and too far in downgrading factors like "clutch" performance. Perhaps he has misgivings about his own contributions in this regard. But personally, I liked the "old" James better."

Dan Agonistes said...

I had that same thought in the back of my mind when I read this piece but couldn't articulate as well as you did. I wonder whether moving from an outsider's perspective to that of an insider has provided some insights that I haven't seen him write about. If so it would be great (and I say this in all seriousness) if he would expound on it and enlighten us.

studes said...

James has certainly been out of the sabermetric mainstream for a long time now; that's nothing new. It will be interesting to see how much he's pulled back into it with his new website going up.

Personally, I like this version of Bill James, who doesn't try to pull the nth degree of conclusion out of analyses. He may be overreacting to the most recent tendency of some analysts to reduce everything to the greatest possible degree. But, it he is, it's not an inappropriate way to act.

Anonymous said...

Dan,

I'm coming to this a little late, but I wonder why you're concerned about the topic of clutch hitting. James is writing about it because he's written all these various imprecise things over the years [e.g. - we can't analyze fielding with traditional stats; we can't project pitchers; there absolutely is no clutch] and has then come back to reconsider them.

So I see his piece as a refinement of his comments in 'Fog'. And also a marketing promotion. But whether he gave it a fair treatment in SI of all places? It doesn't seem like something to get worked up about, not when there are a lot of really interesting topics out there that don't involve situational OPS...

Dan Agonistes said...

Thanks Gabe, I agree that like the other topics you mention this is a reassessment and a refinement but I was disappointed by how devoid of content it was.

And I was more put off with its inclusion in THT than on SI although again, the impression the general reader and leaving out any mention of other work on the subject, was simply wrong.