FREE hit counter and Internet traffic statistics from

Thursday, November 10, 2005

Searching for Significance

My article looking at matchup data for 2003 through 2005 is up on THT today. The article looks at the 30,481 batter/pitcher matchups during that time period in order to take a look at which ones are statistically signficant and make some general observations about matchups.

Although it didn't come out prominately enough in the article Lawrence Weintraub (who is in astrophysics at CalTech and has been up each night waiting for the clouds to part), and John Walsh really helped me work through the issues surrounding the results I was getting and so I'd like to more publicly thank them. John will also have some very interesting articles published on THT in the coming days.

1 comment:

Aweb said...

I liked the article, but I just want to mention that you aren't allowed (statistically) to do this many 95% significance tests at a time. Basically, you have a random chance of making the wrong conclusion 5% of the time, and when you do thousands of tests, you expect to find extreme examples among them. For instance, with the ANderson/Anderson matchup, you expect to find results that extreme or more (about) 1 in 10000 times. Given that you did 30,000 tests, you might expect one like this anyway.

It would be possible to determine how many matchups you tested could even give a result this low, like you demonstrated in the 5 AB example. For instance, if only 1500 matchups could achieve a p-value this low, the extremeness of the result is more noteworthy than if 15,000 could. This context for the tests is important when drawing conclusions.

Again though, I liked the article. I agree that it would be better to do with a different measure than BA. I remember Carlos Delgado hitting, I think, 5 straight HRs of Sosa(on the Braves this year). That sort of dominance is missed by using BA.