FREE hit counter and Internet traffic statistics from freestats.com

Sunday, February 10, 2008

Statistical Profiling?

As always Alan Schwarz has an interesting piece in the New York Times, this time around on the topic of using statistics as a benchmark for increased testing for performance enhancing substances. The idea, floated by Representative Mark Souder, an Indiana Republican, is that by comparing actual statistical performance to the player's history and performance projected on the basis of an "average" or typical career path, major league baseball would flag certain players as more likely to be users of performance enhancing drugs. Those players would be tested more frequently or more closely I assume until they passed some criteria where their new performance level is accepted as legitimate.

Schwarz then goes through the litany of reasons why such "statistical profiling" would likely be futile ranging from the inherent variability in career path for any particular player, to the problem of what one would measure to try and catch such anomalies, to the fact that the current evidence stemming from the Mitchell report is inconclusive at best.

In the piece, however, he points to the similarities in the careers of Hank Aaron and Barry Bonds as evidence for why the first point above makes it unrealistic to use career paths as a measure.

Using my unsophisticated projection system for projecting Normalized OPS (a park adjusted and league adjusted OPS taking into account a three year weighted average regressed to the mean, age and league adjusted), here are the two career mentioned above.






To me, what's interesting about these two is that in the case of Aaron it certainly is true that he had his most productive season at age 37 and his fourth most productive at age 39 with a nice year thrown in at age 35. However, these were interspersed with seasons at ages 36, 38 and 40 that were pretty much what a projection system would indicate. Essentially Aaron had a very slight decline phase with some excellent seasons interspersed.

Bonds, however, had his four best seasons by a large margin at the consecutive ages of 36, 37, 38, and 39. Clearly this indicates that he established a new performance level that was around 25% higher than his established level from ages 29 through 34. I'm certainly not saying that I would agree with Souder that this kind of profiling should be the sole criteria used to trigger a more stringent testing regime for specific players. However, it certainly seems reasonable that statistics could be included in the set of criteria used to determine whether enhanced testing is warranted (assuming that the general concept of this second level is even accepted). In the case of Bonds, his associations and suspicions of club officials, physical appearance, and performance on the field should have combined to tip the balance in favor of increased scrutiny.

Of course it's also true as Schwarz indicates that just what statistics would be used would be problematic. Here we're looking at overall productivity but intertwined in OPS is both a measure of power (which is usually argued as the tell tale sign of steroid use but is more problematic when looking at other substances like human growth hormone) and patience. For Bonds, both components increased greatly as his power scared the daylights out of opposing teams to the extent that they would walk or pitch around him any time a runner was on base. There's no reason to believe that would necessarily be the case as a general rule.

For Aaron there were (ostensibly) no other circumstances that raised red flags and so on the strength of his career path alone that kind of scrutiny wouldn't be warranted. The reason, as Schwarz articulates, is that career paths do indeed vary significantly. For example, consider the case of Carlton Fisk.



Fisk showed a steady decline from his age 26 season through age 34 and then had a resurgent age 35 season in 1983 with the White Sox. After continuing the decline through age 39 he suddenly enjoyed three consecutive seasons at productivity levels he hadn't seen since his mid-20s albeit doing so in fewer plate appearances.

And then of course there are those players about whom there are whispers but no actual evidence coupled with a career path that could be interpreted in both ways. A case in point is Sammy Sosa.



Sosa's rise is a little earlier starting at age 29 and maxing out at age 32 and there is also other evidence including a changed approach at the plate under the tutelage of Jeff Pentland and certainly enhanced weight training (with the use of creatine); all of the above making it more than a little dicey to base enhanced testing on the statistical record alone.

With that said, the case is a little more convincing when looking at Mark McGwire.


Like Bonds, his established level of performance jumped at a rather late age (31) and was sustained through age 36 (at age 29 he had just 112 plate appearances and .333/.427/.726). If this kind of increase were coupled with allegations by former teammates and the use of the steroid precursor androstenedione (although legal at the time), then it just may rise to the level that Souder is talking about. It should be noted, though, that Jose Canseco did not (as far as I know) finger McGwire or anyone else while McGwire was still active although from the Mitchell report it is clear that both Tony LaRussa and Dave McKay (and possibly Sandy Alderson although he denies it) knew that Canseco was using steroids and did not report it. Had they done so, it should have cast suspiscion on McGwire's 1996 and 1997 performances while still with the A's.

In the final analysis while I believe that statistics could by one data point in a much more complex evaluation system, they should not be used blindly like Souder seems to be indicating. Baseball, like other human activities, is simply too dynamic and there are too many interacting variables in play to warrant that kind of simplistic system.

11 comments:

Matt Mitchell said...

Dan,

While the point of your post accurately shows that stats are not a "be-all and end-all" measure, I think the bigger thing to note from Alan's article is how Congress, once again, shows how little it knows about statistics and how willing they are to fling them around without any supporting explanation. This most certainly isn't the first occurrence of statistical misunderstanding by a politician, and it likely won't be the last one you'll hear about this month.

Anonymous said...

This is my first time visiting the site. I use firefox, and I have to say that those mouseover popup ad things are about the worst I've ever seen them on any site ever. There was literally no safe space for my mouse on your page; even when it was anywhere near the graphs, something popped up - directly over the graph. Anyways not a baseball comment, but something you may find useful.

Anonymous said...

What are your thoughts on Clemens production in his 30's and 40's dan?

Dan Agonistes said...

Well, that's been the subject of lots of speculation over the last couple weeks as you know. I thought Nate Silver did a nice job of going through the data the other day and I'd have to agree that unlike with Bonds, there is no statistical smoking gun when you look at the time frames mentioned in the Mitchell report.

Anonymous said...

well, that's the problem, is the times only mentioned in the mitchell report is what nate based his entire analysis by.

On the mitchell report we had no idea pettitte used HGH in 2004, or that knoblauch used more than what mcnamee claimed, but we later found out that both admitted extra usage of HGH, which had nothing to do with mcnamee.

I'm sure if you compare Bonds career numbers starting in 1999 age 34, to clemens numbers in 1997, age 34, you'll see some similarities.

Bonds statline is more stark, but clemens statline is a lot more suspicious than that of mcgwire and sosa, or even palmerio. I don't see any strange increase in productions in either of their careers in their mid-late 30's. Well, maybe Mcgwire, but Clemens statline looks more stark.

Dan Agonistes said...

In reference to the first Anonymous post, I removed the snap.com script. I too was starting to find it annoying. It seems it wasn't quite so bad in the past and then they put it into overdrive,

Unknown said...

Throw in Randy Johnson as another who got better with age. Perhaps Honus Wagner, depending on how much you value the Winshares system.

Anonymous said...

In the Aaron analysis, you neglect to point out what Bill James and others have noted, that Aaron's move to AFC as his home park significantly helped his numbers, masking a natural decline.

Dan Agonistes said...

That's a good question. Keep in mind, though, that Normalized OPS takes into account park factors and so it is accounted for.

London Crumpet said...

London escorts directory London Crumpet is created to expose Independent escorts and escort agencies in London. Everybody can post their advert on this directory for FREE.

London Oriental Escorts said...

London Asian Escorts is an Asian and Oriental escort agency operating in Central London. The agency provides escort service for a true gentlemen.