FREE hit counter and Internet traffic statistics from freestats.com
Showing posts with label Projections. Show all posts
Showing posts with label Projections. Show all posts

Sunday, February 10, 2008

Statistical Profiling?

As always Alan Schwarz has an interesting piece in the New York Times, this time around on the topic of using statistics as a benchmark for increased testing for performance enhancing substances. The idea, floated by Representative Mark Souder, an Indiana Republican, is that by comparing actual statistical performance to the player's history and performance projected on the basis of an "average" or typical career path, major league baseball would flag certain players as more likely to be users of performance enhancing drugs. Those players would be tested more frequently or more closely I assume until they passed some criteria where their new performance level is accepted as legitimate.

Schwarz then goes through the litany of reasons why such "statistical profiling" would likely be futile ranging from the inherent variability in career path for any particular player, to the problem of what one would measure to try and catch such anomalies, to the fact that the current evidence stemming from the Mitchell report is inconclusive at best.

In the piece, however, he points to the similarities in the careers of Hank Aaron and Barry Bonds as evidence for why the first point above makes it unrealistic to use career paths as a measure.

Using my unsophisticated projection system for projecting Normalized OPS (a park adjusted and league adjusted OPS taking into account a three year weighted average regressed to the mean, age and league adjusted), here are the two career mentioned above.






To me, what's interesting about these two is that in the case of Aaron it certainly is true that he had his most productive season at age 37 and his fourth most productive at age 39 with a nice year thrown in at age 35. However, these were interspersed with seasons at ages 36, 38 and 40 that were pretty much what a projection system would indicate. Essentially Aaron had a very slight decline phase with some excellent seasons interspersed.

Bonds, however, had his four best seasons by a large margin at the consecutive ages of 36, 37, 38, and 39. Clearly this indicates that he established a new performance level that was around 25% higher than his established level from ages 29 through 34. I'm certainly not saying that I would agree with Souder that this kind of profiling should be the sole criteria used to trigger a more stringent testing regime for specific players. However, it certainly seems reasonable that statistics could be included in the set of criteria used to determine whether enhanced testing is warranted (assuming that the general concept of this second level is even accepted). In the case of Bonds, his associations and suspicions of club officials, physical appearance, and performance on the field should have combined to tip the balance in favor of increased scrutiny.

Of course it's also true as Schwarz indicates that just what statistics would be used would be problematic. Here we're looking at overall productivity but intertwined in OPS is both a measure of power (which is usually argued as the tell tale sign of steroid use but is more problematic when looking at other substances like human growth hormone) and patience. For Bonds, both components increased greatly as his power scared the daylights out of opposing teams to the extent that they would walk or pitch around him any time a runner was on base. There's no reason to believe that would necessarily be the case as a general rule.

For Aaron there were (ostensibly) no other circumstances that raised red flags and so on the strength of his career path alone that kind of scrutiny wouldn't be warranted. The reason, as Schwarz articulates, is that career paths do indeed vary significantly. For example, consider the case of Carlton Fisk.



Fisk showed a steady decline from his age 26 season through age 34 and then had a resurgent age 35 season in 1983 with the White Sox. After continuing the decline through age 39 he suddenly enjoyed three consecutive seasons at productivity levels he hadn't seen since his mid-20s albeit doing so in fewer plate appearances.

And then of course there are those players about whom there are whispers but no actual evidence coupled with a career path that could be interpreted in both ways. A case in point is Sammy Sosa.



Sosa's rise is a little earlier starting at age 29 and maxing out at age 32 and there is also other evidence including a changed approach at the plate under the tutelage of Jeff Pentland and certainly enhanced weight training (with the use of creatine); all of the above making it more than a little dicey to base enhanced testing on the statistical record alone.

With that said, the case is a little more convincing when looking at Mark McGwire.


Like Bonds, his established level of performance jumped at a rather late age (31) and was sustained through age 36 (at age 29 he had just 112 plate appearances and .333/.427/.726). If this kind of increase were coupled with allegations by former teammates and the use of the steroid precursor androstenedione (although legal at the time), then it just may rise to the level that Souder is talking about. It should be noted, though, that Jose Canseco did not (as far as I know) finger McGwire or anyone else while McGwire was still active although from the Mitchell report it is clear that both Tony LaRussa and Dave McKay (and possibly Sandy Alderson although he denies it) knew that Canseco was using steroids and did not report it. Had they done so, it should have cast suspiscion on McGwire's 1996 and 1997 performances while still with the A's.

In the final analysis while I believe that statistics could by one data point in a much more complex evaluation system, they should not be used blindly like Souder seems to be indicating. Baseball, like other human activities, is simply too dynamic and there are too many interacting variables in play to warrant that kind of simplistic system.

Thursday, February 07, 2008

Baseball's Toughest Division

Which division is baseball's toughest?

Well, if you listen to the media you'll no doubt respond that the AL Central is clearly the toughest division in baseball. Having heard that so often in the past couple months in the wake of the Tigers deal for Dontrelle Willi and Miguel Cabrera and the Johan Santana trade last week, I decided to take a look based on the actual performance of the divisions in intradivisional play as well interleague results stretching back to 1997. The end result is discussed this week in my column at Baseball Prospectus.

In the second half of the column I take a look at the simple projection system I created and wrote about several months ago. This time around I have it project into the future and show the top 2008 projections in terms of Normalized and Park Adjusted OPS. From there I take a look at the where the projections differ the most as well as the track record in graphical form of the projections for Magglio Ordonez, Alex Rodriguez, Andruw Jones, Torii Hunter, Gary Sheffield, and Ken Griffey Jr. Enjoy.

Sunday, November 11, 2007

Projections On the Cheap

As I mentioned previously, in last week's Schrodinger's Bat I took a stab at the biggest booms and busts in baseball history for hitters. Over the weekend I posted a followup up on BP's Unfiltered that looked at Brady Anderson in 1996, Adrian Beltre in 2004, Magglio Ordonez in 2007, and Juan Gonzalez in 1994 at the behest of readers.

In writing that post I was looking at projections for individuals and thought I'd post the results of three players. Pete Rose, whose projections were fairly close, Willie Mays who was in the middle of the pack in terms of standard deviation of the difference between the actual and projected, and Rogers Hornsby who came down on the high side, in part because of his 1926 season when the projection had him at an NOPS/PF of 151 when he came in at 116.

Keep in mind these projections are based on a three-year weighted average, regressed to the mean, and are age and league adjusted (as well as park adjusted).






Thursday, November 08, 2007

Resolving the Past

"The reason people find it so hard to be happy is that they always see the past better than it was, the present worse than it is, and the future less resolved than it will be" - Blaise Pascal

It's that time of year again when baseball minds start to think about next year and indulge in all manner of projections and forecasts. It is of course a perfect compliment to the hot stove talk as we imagine what this or that player might do for our favorite team. In my column on Baseball Prospectus today I take a historical look at projections by "projecting the past" using a simple but fairly effective algorithm for generating projections for every player in every season from 1903 through 2006, some 17,000 player seasons in all.

What I was looking for were the biggest booms and busts of all-time; in other words which players most exceeded a reasonable expectation and which ones went into the tank. Although ours (and our favorite front office's) expectations may not always be reasonable, the article talks about those using a couple different ways of comparing the projection to the reality and lists a top and bottom ten using those methods. Hope you enjoy it. The projections use normalized and park-adjusted OPS and are based on a three year weighted average that is regressed to the mean, age, and league adjusted.

So surf over and find out who was the biggest bust of all-time (a hint: he hit a few taters in his time) and the biggest boom (hint: my sister's favorite player and not a favorite of Mad Dog's)

Wednesday, February 21, 2007

Projection Systems

Nice article on ESPN by BP alumn Jonah Keri on projection systems. I liked this bit:

"With all these tools at their disposal, you might expect the experts to achieve huge success rates, routinely nailing the vast majority of their projections. But various studies, done by industry leaders and outsiders alike, peg the success rate for a typical weighted three-year projection system like Marcel at about 65 percent. The goal for primo projectionists is to eke out a bit more accuracy, for a year-to-year success rate approaching 70 percent. A perfect projection system, or even something close to it, is widely considered to be impossible -- at least until stat-generating robots replace human beings at Yankee Stadium."
That's a testament to the inherent variablity in the game and in the end what draws us to it.