FREE hit counter and Internet traffic statistics from freestats.com

Tuesday, November 15, 2005

Fishing Expeditions

I'd like to thank everyone who gave me feedback on my article on matchups at THT and thought I'd take this opportunity to answer the two most frequently asked questions.

First, several readers pointed out that since I tested over 30,000 outcomes I would expect a certain percentage of those to be in the statistically significant range. In other words, since I expected some low probability outcomes, how can I assign any significance to them in terms of the model not holding? i.e. by concluding that Brian Anderson has some ability to get Garrett Anderson out that the model doesn't capture.

I admit that I didn't catch the issue behind the question immediately but the questioners make an excellent point. Because I was on a "fishing expedition" as the statisticians say I was likely to find some results that were improbable. As a result we can't conclude that the low p-value matchups are necessarily evidence of some ability of a hitter to mash a particular pitcher or a pitcher to flummox a particular hitter. However, I would say that these low p-value matchups are more likely to be those where the model doesn't hold and so a judicious manager could rightly use that data to make pinch hitting decisions.

When statisticians go on a fishing expedition like this they often use a more strict standard of proof rather than the typical p value of .05. One technique to lower the standard is to apply the Bonferroni correction. This simple technique says that if we are testing n outcomes instead of a single outcome, we divide our alpha level by n. So instead of looking at .05 we would look at .05/30,481. What that produces however is a really really small p-value which none of the 30,481 matchups were under. In other words, none would be significant under the most conservative correction. In the battle of the Andersons Garrett would have had to have gone 0-33 against Brian in order to reach this level.

A more liberal application of the Bonferroni correction lowers the p-value to .01 and when that is done there are 133 matchups that fall under this level. Here they are sorted by p-value.

Hitter             Pitcher            AB   H     Avg  HitAvg PitchAvg  ExAvg    p-value
Larry Bigbie Andy Pettitte 14 11 0.786 0.276 0.247 0.256 0.000051
Garret Anderson Brian Anderson 22 0 0.000 0.300 0.299 0.335 0.000127
Michael Young Brandon Backe 10 9 0.900 0.317 0.267 0.318 0.000235
Bill Mueller Mike Mussina 23 0 0.000 0.303 0.264 0.301 0.000269
Marcus Giles Jason Schmidt 14 10 0.714 0.305 0.214 0.248 0.000318
Preston Wilson Jae Seo 6 6 1.000 0.268 0.270 0.271 0.000395
Preston Wilson Byung-Hyun Kim 10 8 0.800 0.268 0.253 0.254 0.000465
Enrique Wilson Pedro Martinez 13 8 0.615 0.214 0.219 0.174 0.000465
Jose Reyes Jon Lieber 13 10 0.769 0.277 0.280 0.292 0.000505
Mark Grudzielanek Tim Hudson 6 6 1.000 0.304 0.250 0.286 0.000547
Derrek Lee Mark Mulder 15 11 0.733 0.295 0.266 0.294 0.000558
Matt Holliday Woody Williams 6 6 1.000 0.299 0.263 0.296 0.000671
Aubrey Huff Jon Lieber 13 10 0.769 0.290 0.280 0.305 0.000756
Todd Helton Damian Moss 7 7 1.000 0.343 0.288 0.367 0.000897
Clint Barmes Odalis Perez 12 9 0.750 0.289 0.259 0.282 0.001022
Reggie Sanders David Weathers 5 5 1.000 0.272 0.260 0.266 0.001327
David Bell Gary Majewski 7 6 0.857 0.253 0.264 0.251 0.001365
Charles Johnson Jake Peavy 6 5 0.833 0.230 0.230 0.197 0.001499
Rondell White Jake Westbrook 19 0 0.000 0.289 0.265 0.288 0.001588
David Dellucci Kevin Brown 14 9 0.643 0.242 0.265 0.241 0.001607
Mark Kotsay Jamie Moyer 21 13 0.619 0.288 0.267 0.288 0.001621
Alfonso Soriano John Lackey 26 1 0.038 0.280 0.271 0.285 0.001876
Adrian Beltre Dontrelle Willi 9 7 0.778 0.277 0.254 0.264 0.001923
Brad Wilkerson Mike Matthews 7 6 0.857 0.257 0.278 0.268 0.001981
Matt LeCroy Nate Robertson 15 10 0.667 0.273 0.274 0.280 0.002075
Mark Sweeney Adam Eaton 11 8 0.727 0.277 0.261 0.271 0.002131
Jermaine Dye Jarrod Washburn 24 13 0.542 0.253 0.266 0.252 0.002278
Aaron Rowand Tim Wakefield 9 7 0.778 0.288 0.251 0.272 0.002322
David Ortiz Bartolo Colon 18 0 0.000 0.297 0.255 0.285 0.002385
Jeff Cirillo Javier Vazquez 6 5 0.833 0.234 0.250 0.218 0.002427
Kevin Millar Jorge Sosa 9 7 0.778 0.282 0.259 0.274 0.002432
Rocco Baldelli Jake Westbrook 13 9 0.692 0.285 0.265 0.283 0.002604
Carlos Lee Jeff Suppan 17 0 0.000 0.287 0.271 0.291 0.002875
Frank Catalanotto Mike Timlin 7 6 0.857 0.298 0.257 0.288 0.003029
Frank Catalanotto Dan Wright 5 5 1.000 0.298 0.285 0.318 0.003231
Hideki Matsui Aaron Sele 14 0 0.000 0.297 0.303 0.336 0.003269
Mike Lowell Rheal Cormier 6 5 0.833 0.270 0.230 0.234 0.003355
Omar Vizquel Javier Vazquez 12 8 0.667 0.274 0.250 0.257 0.003376
Bobby Abreu Mike Hampton 28 2 0.071 0.296 0.273 0.303 0.003453
Ivan Rodriguez Jon Garland 16 0 0.000 0.303 0.262 0.298 0.003507
Chone Figgins Esteban Loaiza 11 8 0.727 0.293 0.265 0.292 0.003530
Shane Halter Eddie Guardado 5 4 0.800 0.213 0.215 0.169 0.003561
Carl Crawford Mark Buehrle 7 6 0.857 0.293 0.271 0.297 0.003578
Miguel Cabrera Steve Trachsel 13 9 0.692 0.300 0.263 0.296 0.003686
Orlando Cabrera Jae-Weong Seo 16 10 0.625 0.274 0.270 0.277 0.003758
Tony Graffanino Brian Anderson 21 1 0.048 0.281 0.299 0.315 0.003822
Aubrey Huff Bronson Arroyo 16 10 0.625 0.290 0.254 0.277 0.003826
Frank Catalanotto Ryan Franklin 7 6 0.857 0.298 0.272 0.304 0.004070
Edgar Renteria Sean Burnett 5 5 1.000 0.297 0.301 0.334 0.004139
Brian Buchanan Jason Schmidt 7 5 0.714 0.244 0.214 0.195 0.004176
Edgar Renteria Gustavo Chacin 9 7 0.778 0.297 0.268 0.299 0.004218
Hideki Matsui John Parrish 12 8 0.667 0.297 0.238 0.266 0.004280
Kevin Mench Ryan Franklin 16 10 0.625 0.276 0.272 0.281 0.004284
D'Angelo Jimenez Brian Anderson 9 7 0.778 0.268 0.299 0.300 0.004306
Jason Bay Jeff Francis 5 5 1.000 0.295 0.307 0.338 0.004415
Rafael Palmeiro Orlando Hernand 8 6 0.750 0.261 0.258 0.252 0.004427
Khalil Greene Dustin Hermanso 6 5 0.833 0.259 0.256 0.249 0.004512
Lew Ford Rafael Betancou 8 6 0.750 0.285 0.236 0.253 0.004515
Edgar Renteria Brian Lawrence 9 7 0.778 0.297 0.272 0.304 0.004637
Jack Wilson Kazuhisa Ishii 8 6 0.750 0.275 0.246 0.254 0.004648
Craig Monroe Brad Radke 21 12 0.571 0.271 0.276 0.280 0.004808
Eric Chavez Roy Halladay 8 6 0.750 0.275 0.248 0.256 0.004840
Michael Tucker Derek Lowe 10 7 0.700 0.254 0.276 0.264 0.004894
Jose Hernandez Randy Johnson 12 7 0.583 0.241 0.232 0.208 0.004994
Adrian Beltre Ismael Valdez 7 6 0.857 0.277 0.305 0.317 0.005174
Jonny Gomes Bartolo Colon 6 5 0.833 0.268 0.255 0.257 0.005274
Mark Loretta Kirk Rueter 27 3 0.111 0.314 0.298 0.349 0.005321
Brian Roberts Mike Mussina 26 14 0.538 0.286 0.264 0.283 0.005446
Miguel Tejada Justin Miller 5 5 1.000 0.298 0.319 0.354 0.005562
Adrian Beltre Javier Vazquez 6 5 0.833 0.277 0.250 0.260 0.005580
Brad Hawpe Duaner Sanchez 6 5 0.833 0.259 0.268 0.260 0.005633
Reed Johnson Johan Santana 12 7 0.583 0.277 0.205 0.214 0.005819
Doug Glanville Ramon Ortiz 6 5 0.833 0.240 0.291 0.263 0.005865
Geoff Blum Jeff Suppan 11 7 0.636 0.237 0.271 0.242 0.006177
Tony Clark Jeff Weaver 6 5 0.833 0.258 0.276 0.267 0.006338
Jimmy Rollins Salomon Torres 6 5 0.833 0.281 0.254 0.268 0.006427
Jose Guillen Pedro Martinez 11 7 0.636 0.295 0.219 0.245 0.006643
Jorge Posada Curt Schilling 13 8 0.615 0.271 0.252 0.256 0.006658
Travis Hafner Bartolo Colon 15 0 0.000 0.295 0.255 0.284 0.006699
Kevin Millar Rick Bauer 6 5 0.833 0.282 0.256 0.271 0.006768
Eric Byrnes John Halama 6 5 0.833 0.260 0.280 0.274 0.007111
Marlon Byrd Greg Maddux 8 6 0.750 0.271 0.271 0.275 0.007151
Keith Ginter Juan Cruz 9 6 0.667 0.244 0.257 0.235 0.007267
Pat Burrell Horacio Ramirez 16 9 0.563 0.249 0.266 0.249 0.007326
Damion Easley Kevin Millwood 7 5 0.714 0.229 0.257 0.221 0.007326
Eric Chavez Horacio Ramirez 6 5 0.833 0.275 0.266 0.275 0.007328
Carlos Delgado Jorge Julio 6 5 0.833 0.292 0.251 0.276 0.007364
Eddie Perez Randy Johnson 7 5 0.714 0.254 0.232 0.221 0.007378
Gregg Zaun Jon Garland 11 7 0.636 0.254 0.262 0.249 0.007396
Aramis Ramirez Garrett Stephen 10 7 0.700 0.296 0.254 0.283 0.007423
Cody McKay Dan Miceli 5 4 0.800 0.230 0.240 0.206 0.007520
Ichiro Suzuki Mark Buehrle 19 12 0.632 0.330 0.271 0.334 0.007638
Aramis Ramirez Tony Armas 6 5 0.833 0.296 0.249 0.278 0.007644
Alex Rodriguez Jamie Moyer 23 13 0.565 0.302 0.267 0.302 0.007645
A.J. Pierzynski Mike Mussina 6 5 0.833 0.281 0.264 0.278 0.007692
Alfonso Soriano Erik Bedard 6 5 0.833 0.280 0.265 0.278 0.007706
Abraham Nunez Josh Fogg 6 5 0.833 0.256 0.289 0.278 0.007711
Scott Rolen Brett Myers 10 7 0.700 0.289 0.262 0.285 0.007741
Eric Chavez Jeff Weaver 10 7 0.700 0.275 0.276 0.285 0.007749
Luis Gonzalez Jim Brower 8 6 0.750 0.280 0.266 0.280 0.007762
Corey Koskie Adam Bernero 6 5 0.833 0.266 0.280 0.279 0.007822
Adam Kennedy Esteban Loaiza 8 6 0.750 0.282 0.265 0.280 0.007877
Edgar Renteria Rodrigo Lopez 13 0 0.000 0.297 0.279 0.311 0.007885
Rob Mackowiak Chris Carpenter 20 10 0.500 0.261 0.237 0.231 0.007902
Carlos Lee Brian Anderson 33 4 0.121 0.287 0.299 0.320 0.007963
Adrian Beltre Jae-Weong Seo 6 5 0.833 0.277 0.270 0.280 0.007975
Paul Konerko John Halama 6 5 0.833 0.267 0.280 0.281 0.008061
Antonio Perez Jason Schmidt 7 5 0.714 0.280 0.214 0.226 0.008093
Mark Bellhorn Jose Contreras 5 4 0.800 0.239 0.236 0.210 0.008115
Russell Branyan Kip Wells 7 5 0.714 0.237 0.255 0.226 0.008180
Rod Barajas Bartolo Colon 18 0 0.000 0.244 0.255 0.234 0.008328
Jim Edmonds Jason Jennings 13 0 0.000 0.280 0.293 0.308 0.008382
David Newhan Jon Lieber 6 5 0.833 0.271 0.280 0.285 0.008564
Craig Counsell Scott Linebrink 5 4 0.800 0.246 0.232 0.214 0.008624
Carlos Zambrano Chris Carpenter 7 5 0.714 0.258 0.237 0.229 0.008638
Geoff Jenkins Tim Redding 14 9 0.643 0.283 0.285 0.302 0.008699
Miguel Olivo Ted Lilly 5 4 0.800 0.229 0.249 0.214 0.008702
Xavier Nady Kerry Wood 5 4 0.800 0.262 0.219 0.215 0.008885
Alfonso Soriano Kevin Brown 15 9 0.600 0.280 0.265 0.278 0.008916
Pat Burrell Aaron Cook 8 6 0.750 0.249 0.306 0.287 0.008929
Royce Clayton Rick White 6 5 0.833 0.260 0.294 0.288 0.009009
Fernando Vina Matt Clement 5 4 0.800 0.243 0.239 0.217 0.009213
J.T. Snow Kevin Brown 6 5 0.833 0.291 0.265 0.289 0.009247
Kevin Millar Scot Shields 9 6 0.667 0.282 0.232 0.246 0.009283
Juan Pierre Roy Oswalt 10 7 0.700 0.303 0.258 0.294 0.009315
Richard Hidalgo Mike Mussina 11 7 0.636 0.262 0.264 0.259 0.009398
Ramon Martinez Scott Linebrink 7 5 0.714 0.268 0.232 0.233 0.009438
Miguel Cabrera Adam Eaton 10 7 0.700 0.300 0.261 0.294 0.009463
Casey Blake Wade Miller 7 5 0.714 0.257 0.244 0.235 0.009794
Brad Ausmus Mike Remlinger 5 4 0.800 0.244 0.242 0.221 0.009841
Phil Nevin Darren Oliver 6 5 0.833 0.270 0.290 0.293 0.009866
John Mabry Brandon Webb 7 5 0.714 0.258 0.244 0.236 0.009882
Mark Teixeira Aaron Sele 12 0 0.000 0.282 0.303 0.319 0.009886

Once again we can never say for sure that these are statistically significant (e.g. they represent something not represented by the model) but they are much more likely to be so.

The second question was related to why I thought that certain pitchers did well againt certain pitchers and vice versa. This question sprang from the realization that Brian Anderson made the top 25 list in the article for lowest hit matchups three times, once each with Garrett Anderson, Tony Graffanino, and Carlos Lee. At first glance one might think it had something to do with platoon effects but of course neither Lee nor Graffanino bat left-handed. Any in any case Anderson doesn't have large split differences (.282/.325/.484 vs lefties the last three years and .304/.341/.510 vs righties).

If indeed these matchups are significant it tells me that Anderson likely has something in his delivery that gives certain hitters trouble. For example, his arm angle may be difficult for some hitters to pick up. Or there may be something in his repertoire might be hard for hitters who hit certain pitches well to deal with. I looked at all of Anderson's matchups (116 or so) and didn't really see much of a pattern that I could discern and so I'm still somewhat at a loss to explain it if indeed it isn't simply randomness.

If anyone has better ideas I'm all ears.

1 comment:

Unknown said...

seo preston service with fantastic price range.