FREE hit counter and Internet traffic statistics from freestats.com

Monday, December 31, 2007

SFR in the Infield - AL East

Today we'll look at the AL East infielders in terms of SFR...


Ex
Name Team POS Balls Runners Runners SFR
Steve Trachsel BAL 1 32 3 0 1.9
Jeremy Guthrie BAL 1 22 2 0 1.8
Erik Bedard BAL 1 17 2 0 1.4
Chad Bradford BAL 1 17 1 1 0.3
Daniel Cabrera BAL 1 18 2 5 -2.3
Ramon Hernandez BAL 2 26 2 2 0.1
Chris Gomez BAL 3 32 5 4 0.4
Aubrey Huff BAL 3 83 13 15 -2.0
Kevin Millar BAL 3 207 32 35 -2.7
Brian Roberts BAL 4 644 165 153 8.6
Brandon Fahey BAL 4 17 4 2 1.8
Chris Gomez BAL 4 25 7 6 0.6
Melvin Mora BAL 5 404 89 85 2.6
Chris Gomez BAL 5 70 15 15 -0.1
Aubrey Huff BAL 5 43 9 11 -1.5
Scott Moore BAL 5 29 6 11 -3.6
Luis Hernandez BAL 6 71 19 12 5.8
Miguel Tejada BAL 6 507 146 142 4.3
Chris Gomez BAL 6 42 12 11 0.9
Brandon Fahey BAL 6 38 11 9 0.8
Freddie Bynum BAL 6 30 7 7 -0.5
-----------------------------------------------------------------------
Daisuke Matsuzaka BOS 1 25 2 1 0.9
Tim Wakefield BOS 1 26 2 2 0.3
Josh Beckett BOS 1 15 1 3 -1.2
Jason Varitek BOS 2 35 1 3 -1.2
Kevin Youkilis BOS 3 286 42 33 6.6
Eric Hinske BOS 3 71 11 10 0.3
Dustin Pedroia BOS 4 537 127 118 6.8
Alex Cora BOS 4 134 31 27 3.6
Mike Lowell BOS 5 419 86 75 8.3
Kevin Youkilis BOS 5 42 10 9 0.9
Alex Cora BOS 6 89 25 23 2.5
Julio Lugo BOS 6 582 156 151 0.1
-----------------------------------------------------------------------
Mike Mussina NYA 1 29 3 2 0.9
Roger Clemens NYA 1 16 2 1 0.5
Chien-Ming Wang NYA 1 38 4 4 -0.3
Andy Pettitte NYA 1 30 3 4 -0.8
Jorge Posada NYA 2 55 5 5 -0.1
Andy Phillips NYA 3 106 16 13 2.3
Doug Mientkiewicz NYA 3 125 17 15 1.7
Jason Giambi NYA 3 30 5 4 0.9
Wilson Betemit NYA 3 17 3 3 0.1
Miguel Cairo NYA 3 45 7 8 -0.3
Josh Phelps NYA 3 37 5 8 -2.4
Robinson Cano NYA 4 686 167 154 9.2
Wilson Betemit NYA 5 19 5 4 0.2
Alex Rodriguez NYA 5 433 90 89 0.1
Miguel Cairo NYA 6 34 10 8 1.1
Alberto Gonzalez NYA 6 23 5 5 0.0
Wilson Betemit NYA 6 18 4 5 -0.7
Derek Jeter NYA 6 618 172 197 -20.8
-----------------------------------------------------------------------
Casey Fossum TBA 1 17 2 1 0.9
James Shields TBA 1 30 3 3 -0.1
Andrew Sonnanstine TBA 1 17 2 3 -0.9
Scott Kazmir TBA 1 22 3 6 -2.3
Dioner Navarro TBA 2 29 2 3 -0.6
Carlos Pena TBA 3 321 46 43 1.9
Ty Wigginton TBA 3 32 5 8 -2.8
B.J. Upton TBA 4 223 51 49 1.9
Jorge Velandia TBA 4 43 11 10 0.5
Josh Wilson TBA 4 90 23 22 0.2
Ty Wigginton TBA 4 140 35 36 -0.7
Brendan Harris TBA 4 154 37 43 -3.8
Akinori Iwamura TBA 5 331 71 71 0.2
Josh Wilson TBA 5 28 7 7 -0.1
Ty Wigginton TBA 5 89 17 19 -1.6
Jorge Velandia TBA 6 17 5 4 1.5
Ben Zobrist TBA 6 112 31 41 -7.7
Josh Wilson TBA 6 190 53 67 -10.6
Brendan Harris TBA 6 356 101 121 -11.3
-----------------------------------------------------------------------
Shaun Marcum TOR 1 27 4 0 2.7
Jesse Litsch TOR 1 24 3 1 1.3
Dustin McGowan TOR 1 32 2 1 1.1
A.J. Burnett TOR 1 16 2 1 0.4
Josh Towers TOR 1 23 2 2 -0.1
Roy Halladay TOR 1 42 5 6 -1.0
Scott Downs TOR 1 17 2 4 -1.8
Gregg Zaun TOR 2 36 3 3 -0.4
Lyle Overbay TOR 3 253 39 31 6.4
Matt Stairs TOR 3 103 15 16 -0.7
Curtis Thigpen TOR 3 24 4 5 -0.8
Aaron Hill TOR 4 735 190 158 23.5
John McDonald TOR 5 42 9 5 3.2
Jason Smith TOR 5 37 7 5 1.8
Howie Clark TOR 5 25 6 5 0.1
Troy Glaus TOR 5 305 66 66 0.0
Hector Luna TOR 5 35 8 8 -0.1
Russ Adams TOR 5 26 6 11 -3.9
John McDonald TOR 6 406 111 94 11.3
Ray Olmedo TOR 6 69 20 15 4.4
Royce Clayton TOR 6 266 74 69 2.9

Sunday, December 30, 2007

Score One for Raines

Looks like Tim Raines just gained an additional vote for the Hall (and you can read about his case here). Two points I don't understand though. First, in discussing the qualifications of Jim Rice no mention is made of his park context which figured greatly into his peak numbers from 1975 through 1986. In fact, when you look at OPS Normalized by league and park (which is a good proxy for overall production) for players with over 1,000 plate appearances in the same period Rice ranks 28th on the list behind lots of guys who are clearly not hall of famers (he ranks 7th when not taking park into account).


Rank Name G PA OPS NOPS/PF
1 Mike Schmidt 1800 7657 931 131
2 Pedro Guerrero 825 3213 889 129
3 Don Mattingly 572 2449 913 126
4 Reggie Smith 830 3188 884 126
5 D. Strawberry 516 2107 863 125
6 George Brett 1595 6952 901 124
7 Willie Stargell 731 2547 873 123
8 Eddie Murray 1499 6415 879 122
9 Gene Tenace 1036 3642 831 122
10 Dave Winfield 1763 7482 841 121
11 Jack Clark 1235 5111 839 121
12 George Foster 1637 6763 838 120
13 Keith Hernandez 1707 7063 837 120
14 Oscar Gamble 1058 3404 852 120
15 Wade Boggs 725 3243 897 120
16 Bob Watson 1079 3993 825 119
17 Dave Parker 1652 6939 846 119
18 Greg Luzinski 1389 5766 855 119
19 Joe Morgan 1303 5390 829 119
20 Tim Raines 876 3888 823 119
21 Bob Horner 960 3966 846 118
22 Dale Murphy 1360 5689 847 118
23 Glenn Davis 276 1112 815 118
24 Ken Phelps 371 1121 865 118
25 Ken Singleton 1446 6071 834 118
26 Larry Hisle 597 2487 839 118
27 Fred Lynn 1522 6373 864 117
28 Jim Rice 1766 7754 875 117
29 Jose Cruz 1744 7043 793 117
30 Reggie Jackson 1631 6655 841 117
31 Rod Carew 1440 6219 838 117
32 Chris Brown 270 1040 788 116
33 Gary Carter 1680 6871 806 116
34 Leon Durham 862 3415 835 116
35 Rick Monday 918 2973 813 116
36 Rickey Henderso 1087 4843 828 116
37 Ron Cey 1704 6901 811 116
38 Tony Gwynn 612 2590 804 116
39 Bill Madlock 1549 6333 807 115
40 Andre Dawson 1443 6138 801 115
41 Cal Ripken 830 3562 833 115
42 Johnny Bench 1064 4082 811 115
43 Kirk Gibson 765 3104 831 115
44 Al Oliver 1496 6123 805 114
45 Alvin Davis 442 1917 838 114
46 Bobby Bonds 835 3480 811 114
47 Mike Marshall 575 2122 783 114
48 Rico Carty 674 2690 810 114
49 Bobby Grich 1516 6146 801 113
50 Andre Thornton 1405 5797 821 113


Second, there was this nugget from Peter Gammons:

Raines, Rickey Henderson and Wade Boggs were the best of the '80s and early '90s, and while some of our sabermetric fellows do not believe players are humans, Raines made every team he was on better, not just because he was such a good player, but because his effervescent personality made teammates relax and play better; you'd go out to the cage and players would all be following him around.
Why go to the trouble to paint sabermetric analysts and hence analysis in that negative light? That's the second time Gammons has done this recently. What gives?

Also saw this from Tracy Ringolsby in an otherwise good interview about the BBWAA:

Rich: Speaking of which, who are you voting for this year?

Tracy: Alphabetically, Bert Blyleven, Dave Concepcion, Rich Gossage, Jack Morris, Lee Smith and Alan Trammell. The biggest debates for me were Tim Raines, who obviously was overshadowed by Rickey Henderson, but also if you take Vince Coleman's five top years, I would say he outperformed Raines, too, and I don't see Coleman as a Hall of Famer.
Really? As the comments to the post note there is really no comparison between Raines and Coleman. If you want to compare just baserunning then Coleman had three stellar seasons in 1985-1987 that netted him over ten runs per season on the bases but Raines had five such seasons spread out over a longer period of time (1982-1992) and topped +7 runs three more times. In a previous post Chone asks me to post the career numbers for Raines, which I will do either here or on Baseball Prospectus.

Friday, December 28, 2007

The Bottom of the Pile

After my post the other day on the "Daily Double" for the Cubs in 1984 I got to thinking about who might be the worst baserunner of the last 40 or so years. I've run baserunning numbers back to 1970 (excluding 1999 of course) and the two players with the lowest overall numbers were Todd Zeile and Wade Boggs. Zeile narrowly edged Boggs at -48.4 to -48.1. What's interesting about Zeile is that he had two seasons of -10 runs or lower in 1992 and 1997 because he attempted to run more frequently than Boggs. However, Boggs was more consistently a drag on his team and managed to record a negative Equivalent Baserunning Runs (EqBRR) in each of his 17 seasons (it would be interesting to see if he actually made it 18 years but again 1999 is missing).

Others who did poorly include Ted Simmons (-47.8), Mike Piazza (-47.5), Lance Parrish (-45.7), and Eddie Murray (-44.7).

The case of Mike Piazza is also worth mentioning since I read Tom Tango's excellent piece "With or Without You" in the 2008 Hardball Times Annual. In that essay Tango looks at the defensive value of catchers and not suprisingly Piazza finds himself second to the bottom in defensive contribution per 5,000 batters at -9.6 runs. Over the equivalent of 10.5 seasons that almost exactly 100 runs worse than average (Dick Dietz was easily last at -13.2 runs). When you combine his baserunning and defense that means that Piazza lost 150 runs for his teams, a fact that highlights just how good an offensive player he is. Not many players (think Derek Jeter) would get enough playing time to cost his team 15 wins in secondary skills.

But I digress. Here are our trailers career lines in terms of baserunning...


Todd Zeile
Year Team Opps EqGAR Opps EqSBR Opps EqAAR Opps EqHAR Opps EqOAR Opps EqBRR
1989 SLN 4 0.2 0 0.0 6 0.0 6 -0.4 44 -0.2 60 -0.4
1990 SLN 29 -0.5 7 -2.9 41 -0.2 39 0.0 256 0.6 372 -2.9
1991 SLN 33 -0.4 29 -3.2 44 -1.3 46 -0.5 329 -1.0 481 -6.3
1992 SLN 23 -0.7 17 -4.0 43 -1.1 55 -5.4 287 0.0 425 -11.2
1993 SLN 19 -0.4 10 -1.9 36 0.6 53 -2.0 328 1.3 446 -2.4
1994 SLN 24 -0.2 5 -1.1 29 0.3 30 -1.3 249 -1.2 337 -3.6
1995 CHN 12 -0.3 1 0.2 27 -0.3 20 1.0 122 -0.3 182 0.3
1995 SLN 5 0.0 1 0.1 8 0.4 12 0.4 63 -0.1 89 0.8
1996 BAL 6 -0.9 2 -0.1 9 0.4 9 0.3 50 0.8 76 0.5
1996 PHI 22 -0.2 1 -0.4 35 -0.7 43 -1.6 239 -0.4 340 -3.3
1997 LAN 33 -0.5 17 -4.8 25 0.0 61 -4.5 348 -0.4 484 -10.2
1998 FLO 11 0.2 5 -1.8 16 0.3 23 -0.5 76 -0.1 131 -1.9
1998 LAN 5 -0.4 1 0.1 12 0.4 9 0.2 40 0.2 67 0.4
1998 TEX 9 -0.4 1 0.1 10 0.8 13 0.5 80 0.6 113 1.6
2000 NYN 19 -0.5 8 -1.4 35 0.6 33 -2.1 279 0.7 374 -2.7
2001 NYN 31 -0.7 3 0.2 51 -1.1 55 -1.6 327 0.2 467 -3.0
2002 COL 23 -0.2 2 -0.2 26 -0.7 31 0.3 244 0.1 326 -0.7
2003 MON 2 0.1 1 0.3 5 0.0 12 0.0 51 0.4 71 0.8
2003 NYA 5 0.2 0 0.0 6 0.1 20 -0.3 97 -0.2 128 -0.2
2004 NYN 16 -0.5 0 0.0 14 -0.8 26 -1.0 203 -1.6 259 -3.9
331 -6.2 111 -20.7 478 -2.4 596 -18.6 3712 -0.4 5228 -48.4



Wage Boggs
Year Team Opps EqGAR Opps EqSBR Opps EqAAR Opps EqHAR Opps EqOAR Opps EqBRR
1982 BOS 32 0.7 0 0.0 22 -0.5 33 -0.1 250 -0.5 337 -0.3
1983 BOS 52 -2.0 6 -0.8 68 -0.5 57 -0.7 506 -1.0 689 -5.1
1984 BOS 50 -0.2 5 -0.5 49 -1.3 71 -0.8 528 0.3 703 -2.5
1985 BOS 49 -0.7 4 0.0 91 -0.3 101 1.1 579 -0.5 824 -0.4
1986 BOS 64 0.2 5 -2.4 96 0.4 52 -1.9 535 -0.6 752 -4.3
1987 BOS 37 -1.8 4 -1.8 54 1.4 49 0.6 409 -0.4 553 -1.9
1988 BOS 63 1.7 5 -0.8 82 -0.9 91 -1.5 600 -1.5 841 -3.0
1989 BOS 60 -0.4 11 -3.1 60 0.5 84 -0.3 529 -1.2 744 -4.5
1990 BOS 48 -1.1 2 -0.6 57 0.5 82 -2.9 453 -1.2 642 -5.2
1991 BOS 55 -1.2 3 -0.8 44 -0.9 75 -2.7 489 -0.2 666 -5.8
1992 BOS 19 0.0 4 -1.8 51 0.3 50 0.7 328 -0.2 452 -1.0
1993 NYA 40 -1.3 3 -1.0 57 -1.5 72 1.4 424 0.1 596 -2.4
1994 NYA 22 -0.3 3 -0.4 41 -1.8 46 -0.9 279 0.0 391 -3.4
1995 NYA 37 -1.0 2 -0.8 51 0.0 70 2.2 380 -0.7 540 -0.2
1996 NYA 47 0.5 3 -1.0 52 0.9 67 -2.7 406 -0.4 575 -2.6
1997 NYA 26 -0.4 1 -0.4 21 1.3 41 -3.0 221 -0.4 310 -2.9
1998 TBA 7 -0.5 4 -1.3 5 0.0 16 -0.8 111 -0.2 143 -2.7
708 -7.7 65 -17.6 901 -2.2 1057 -12.2 7027 -8.5 9758 -48.1

SFR in the Infield - NL East

And now the NL East infielders in Simple Fielding Runs (SFR).


Ex
Name Team POS Balls Runners Runners SFR
Tim Hudson ATL 1 42 4 0 2.8
Peter Moylan ATL 1 21 2 1 0.9
Kyle Davies ATL 1 16 2 2 -0.1
Buddy Carlyle ATL 1 20 3 3 -0.2
John Smoltz ATL 1 34 3 4 -0.4
Chuck James ATL 1 22 3 7 -3.1
Brian McCann ATL 2 56 4 4 0.1
Craig Wilson ATL 3 39 6 4 1.5
Julio Franco ATL 3 25 5 3 1.3
Mark Teixeira ATL 3 125 16 16 0.1
Jarrod Saltalamacchia ATL 3 26 4 5 -1.3
Scott Thorman ATL 3 156 25 30 -3.8
Kelly Johnson ATL 4 545 139 133 4.8
Martin Prado ATL 4 40 11 6 3.3
Yunel Escobar ATL 4 64 17 13 2.8
Chris Woodward ATL 4 27 7 7 -0.3
Chipper Jones ATL 5 342 73 64 7.0
Yunel Escobar ATL 5 50 11 10 0.6
Pete Orr ATL 5 20 5 5 0.0
Chris Woodward ATL 5 27 6 9 -2.1
Chris Woodward ATL 6 27 8 9 0.6
Yunel Escobar ATL 6 181 48 46 0.5
Edgar Renteria ATL 6 501 141 139 0.2
-----------------------------------------------------------------------
Sergio Mitre FLO 1 30 3 0 2.3
Scott Olsen FLO 1 25 3 4 -0.6
Dontrelle Willis FLO 1 45 5 6 -1.0
Miguel Olivo FLO 2 50 4 5 -0.7
Matt Treanor FLO 2 15 1 2 -0.8
Jason Wood FLO 3 39 6 5 0.6
Aaron Boone FLO 3 105 15 19 -2.9
Mike Jacobs FLO 3 208 31 38 -5.3
Alfredo Amezaga FLO 4 30 6 6 0.4
Dan Uggla FLO 4 629 158 190 -24.5
Alfredo Amezaga FLO 5 17 2 4 -1.3
Aaron Boone FLO 5 20 3 6 -1.6
Miguel Cabrera FLO 5 446 96 112 -12.6
Alfredo Amezaga FLO 6 58 15 16 2.5
Hanley Ramirez FLO 6 662 177 197 -15.1
-----------------------------------------------------------------------
Orlando Hernandez NYN 1 21 2 0 1.6
John Maine NYN 1 19 2 0 1.4
Tom Glavine NYN 1 28 3 1 1.4
Guillermo Mota NYN 1 15 1 0 1.0
Aaron Heilman NYN 1 16 2 2 -0.3
Pedro Feliciano NYN 1 19 2 3 -1.0
Paul Lo Duca NYN 2 35 2 2 -0.4
Carlos Delgado NYN 3 285 40 40 -0.1
Shawn Green NYN 3 24 3 4 -1.5
Damion Easley NYN 4 161 38 34 2.7
Jose Valentin NYN 4 195 44 40 2.7
Luis Castillo NYN 4 159 38 44 -4.5
Ruben Gotay NYN 4 133 31 38 -5.4
David Wright NYN 5 504 107 101 4.3
Jose Reyes NYN 6 654 179 154 17.9
-----------------------------------------------------------------------
Jamie Moyer PHI 1 32 3 2 1.1
Kyle Kendrick PHI 1 21 2 1 0.7
Cole Hamels PHI 1 25 3 2 0.5
Adam Eaton PHI 1 25 3 3 0.1
Carlos Ruiz PHI 2 47 3 1 1.5
Greg Dobbs PHI 3 22 3 1 1.2
Wes Helms PHI 3 20 2 3 -0.2
Ryan Howard PHI 3 310 42 46 -3.3
Chase Utley PHI 4 532 133 121 8.6
Tadahito Iguchi PHI 4 115 28 27 0.6
Abraham Nunez PHI 5 257 58 53 3.6
Wes Helms PHI 5 149 33 33 0.0
Greg Dobbs PHI 5 151 31 40 -7.3
Jimmy Rollins PHI 6 717 198 187 6.1
-----------------------------------------------------------------------
Saul Rivera WAS 1 26 2 0 1.6
Matt Chico WAS 1 16 2 0 1.2
Shawn Hill WAS 1 16 1 0 1.1
Tim Redding WAS 1 15 1 1 0.4
Mike Bacsik WAS 1 16 1 1 0.3
Brian Schneider WAS 2 42 3 1 1.1
Jesus Flores WAS 2 21 1 0 1.1
Robert Fick WAS 3 97 15 16 -0.6
Tony Batista WAS 3 28 3 5 -1.1
Dmitri Young WAS 3 198 28 35 -5.4
D'Angelo Jimenez WAS 4 33 8 8 0.3
Felipe Lopez WAS 4 172 40 41 -0.7
Ronnie Belliard WAS 4 457 106 117 -7.7
Ryan Zimmerman WAS 5 548 115 104 9.0
Cristian Guzman WAS 6 159 43 42 0.9
Josh Wilson WAS 6 25 7 7 -0.3
Ronnie Belliard WAS 6 16 4 4 -0.3
Felipe Lopez WAS 6 488 128 126 -1.0
D'Angelo Jimenez WAS 6 38 10 12 -1.6

Thursday, December 27, 2007

SFR in the Infield - NL West

A few days ago I posted the SFR numbers for the infielders in the NL Central and so today it's time to do the NL West. You'll notice that our previous overall leader, Omar Vizquel, loses a few run with the new refinements and is now at about +24 and falls to second place behind Mark Ellis at +27 runs. Ryan Braun and Dan Uggla are still at the bottom at -25 runs while Rickie Weeks (-21) and Derek Jeter (-21) are not far behind.


Name Team POS Balls ExRunnerRunners SFR
Livan Hernandez ARI 1 38 4 0 3.0
Doug Davis ARI 1 34 3 2 1.0
Micah Owings ARI 1 26 2 4 -1.2
Brandon Webb ARI 1 62 6 9 -2.2
Edgar G Gonzalez ARI 1 15 1 4 -2.2
Miguel Montero ARI 2 19 1 0 1.0
Chris Snyder ARI 2 28 3 2 0.8
Tony Clark ARI 3 110 17 12 3.5
Chad Tracy ARI 3 23 4 3 1.2
Conor Jackson ARI 3 204 30 29 0.3
Orlando Hudson ARI 4 573 141 138 2.7
Alberto Callaspo ARI 4 18 6 4 1.1
Augie Ojeda ARI 4 97 26 25 0.1
Emilio Bonifacio ARI 4 19 5 6 -0.9
Chad Tracy ARI 5 119 24 22 1.6
Mark Reynolds ARI 5 249 54 52 1.5
Jeff Cirillo ARI 5 18 4 2 1.2
Alberto Callaspo ARI 5 37 7 6 0.8
Alberto Callaspo ARI 6 36 10 6 2.5
Augie Ojeda ARI 6 41 11 14 -1.6
Stephen Drew ARI 6 612 170 178 -6.4
-----------------------------------------------------------------------
Jeff Francis COL 1 30 3 0 2.0
Aaron Cook COL 1 41 4 3 0.7
Josh Fogg COL 1 22 2 1 0.6
Rodrigo Lopez COL 1 16 2 1 0.5
Ubaldo Jimenez COL 1 17 2 2 -0.2
Chris Iannetta COL 2 20 1 0 0.8
Yorvit Torrealba COL 2 59 5 5 -0.2
Todd Helton COL 3 346 50 40 8.1
Jeff Baker COL 3 22 3 4 -0.1
Kazuo Matsui COL 4 420 105 85 14.2
Jamey Carroll COL 4 212 55 47 5.9
Omar Quintanilla COL 4 71 17 17 0.5
Ian Stewart COL 5 20 4 0 3.0
Jamey Carroll COL 5 20 5 5 -0.2
Garrett Atkins COL 5 402 89 96 -5.4
Troy Tulowitzki COL 6 814 222 199 18.6
Jamey Carroll COL 6 32 9 7 0.7
Clint Barmes COL 6 19 5 5 -0.4
-----------------------------------------------------------------------
Mark Hendrickson LAN 1 23 3 1 1.5
Brad Penny LAN 1 26 3 1 1.2
Joe Beimel LAN 1 19 3 1 1.2
Chad Billingsley LAN 1 18 2 1 0.8
Derek Lowe LAN 1 29 3 2 0.6
Randy Wolf LAN 1 17 2 3 -0.6
Russell Martin LAN 2 63 5 3 1.4
Nomar Garciaparra LAN 3 131 21 24 -2.3
James Loney LAN 3 203 29 33 -3.5
Tony Abreu LAN 4 60 16 14 1.4
Wilson Valdez LAN 4 38 9 9 -0.2
Ramon Martinez LAN 4 71 17 21 -2.7
Jeff Kent LAN 4 487 124 132 -6.4
Ramon Martinez LAN 5 40 10 5 3.4
Andy LaRoche LAN 5 81 17 16 0.5
Nomar Garciaparra LAN 5 110 23 23 0.1
Tony Abreu LAN 5 82 19 19 -0.5
Wilson Betemit LAN 5 106 23 27 -2.8
Shea Hillenbrand LAN 5 58 13 17 -3.5
Rafael Furcal LAN 6 642 173 165 7.6
Wilson Valdez LAN 6 29 9 4 3.5
Tony Abreu LAN 6 25 7 4 1.8
Chin-lung Hu LAN 6 34 10 13 -1.4
Ramon Martinez LAN 6 28 8 13 -3.9
-----------------------------------------------------------------------
Jake Peavy SDN 1 31 3 1 1.2
Doug Brocail SDN 1 17 2 0 1.2
Justin Germano SDN 1 22 2 1 0.6
Heath Bell SDN 1 17 2 1 0.5
Justin Hampson SDN 1 15 1 1 0.3
David Wells SDN 1 21 3 3 -0.1
Greg Maddux SDN 1 60 6 7 -0.8
Cla Meredith SDN 1 21 2 4 -1.6
Josh Bard SDN 2 41 2 3 -0.7
Adrian Gonzalez SDN 3 364 57 54 2.6
Marcus Giles SDN 4 473 119 107 8.7
Geoff Blum SDN 4 224 57 46 8.6
Oscar Robles SDN 4 18 5 6 -0.1
Morgan Ensberg SDN 5 35 8 4 3.1
Russell Branyan SDN 5 56 12 10 1.8
Kevin Kouzmanoff SDN 5 360 77 80 -2.5
Khalil Greene SDN 6 662 182 158 17.9
Geoff Blum SDN 6 43 12 11 -0.2
-----------------------------------------------------------------------
Matt Cain SFN 1 28 3 3 0.0
Barry Zito SFN 1 25 2 4 -1.3
Matt Morris SFN 1 17 2 4 -1.5
Noah Lowry SFN 1 29 3 6 -2.0
Bengie Molina SFN 2 46 3 2 0.5
Rich Aurilia SFN 3 78 10 8 1.8
Dan Ortmeier SFN 3 46 7 5 1.5
Mark Sweeney SFN 3 16 2 3 -0.5
Ryan Klesko SFN 3 220 30 30 -0.5
Rich Aurilia SFN 4 21 6 5 0.1
Kevin Frandsen SFN 4 155 40 44 -2.5
Ray Durham SFN 4 432 105 111 -4.3
Pedro Feliz SFN 5 449 96 74 16.9
Rich Aurilia SFN 5 59 12 11 1.1
Kevin Frandsen SFN 5 17 5 5 -0.7
Omar Vizquel SFN 6 643 176 137 24.8
Rich Aurilia SFN 6 49 14 13 1.0
Kevin Frandsen SFN 6 71 18 23 -2.4

Monday, December 24, 2007

Team Totals for SFR 2007 Redux

A while back I posted the leaders in SFR (which includes only infielders) for 2007. Well, that was several revisions of the system ago and so here is the new ranking based on refinements that have been made.

I've also included the middle and corner infield numbers from Baseball Info Solutions Plus/Minus system as published in The Hardball Times Annual (which I'm having a wonderful time reading over the break I might add). As a result I've excluded pitchers and catchers in the numbers below.

I also wanted to point out that Phil Birnbaum has an excellent post on his blog that shows the similarities between SFR and Bill James' Win Shares method and how using a more granular data set helps us be a little more discerning. And although I've mentioned it a few times, Sean Smith developed his TotalZone metric back in April of 2007 and so I was basically thinking the same thoughts well after Sean had developed his system. I simply didn't find his article when I sat down to construct the system late one Friday night. As is pointed out in the comments to that post systems like these could be used as is when retrosheet data contains both the fielder who fielded the ball and the hit type. Where the hit type is missing a simpler system would have to be used which would therefore make it less accurate. Where it could be applied immediately, however, is to minor league data, a task on my radar for after the break.

 
Ex
Team Balls Runners Runners SFR Plus/Minus
TOR 2370 563 498 47.6 81
COL 2414 572 512 45.7 30
SDN 2265 535 480 40.8 -6
SFN 2280 526 474 35.0 47
OAK 2349 550 504 34.7 22
BOS 2179 491 451 28.0 17
KCA 2185 505 492 22.9 49
SLN 2413 547 524 17.5 41
BAL 2270 544 524 16.5 -2
ATL 2234 528 508 13.9 -1
CHN 2126 473 454 13.4 16
NYN 2172 488 471 13.0 26
PHI 2297 533 518 8.7 23
ARI 2173 512 503 6.7 1
MIN 2291 526 516 5.1 30
ANA 2166 504 503 1.7 35
TEX 2326 561 557 0.4 -23
NYA 2237 515 520 -7.0 -3
LAN 2255 532 546 -8.8 -28
WAS 2271 510 519 -9.3 -41
DET 2322 544 555 -10.3 29
HOU 2315 536 553 -11.1 -23
CLE 2424 572 589 -11.9 -4
SEA 2358 541 563 -14.4 -25
PIT 2372 551 583 -19.8 -22
CIN 2216 503 534 -24.5 -22
CHA 2336 536 567 -27.6 -53
TBA 2169 499 555 -36.4 -68
MIL 2177 500 561 -46.4 -47
FLO 2230 514 598 -61.6 -98


The following plot shows the relationship between SFR and Plus/Minus. The correlation coefficient is 0.83 and so the systems agree substantially with the big differences being the Padres, Orioles, Angels, and Twins.

Sunday, December 23, 2007

The Daily Double

While attempting to recover from a cold I spent a good part of the weekend back in 1984. Yesterday morning I popped in the DVD of the first game of the 1984 NLCS between the Cubs and Padres from the A&E Chicago Cubs Legends: Great Games Collector's Edition DVD Set. I love the set and you can find other sets by following the link to the right. Anyway, I was recounting to my daughters how for this game I left school (I was a junior in high school) at noon with I think an excuse from my parents although I might have feigned illness. As I was leaving for the walk home the superintendent stopped me but after realizing what day it was, and most folks in our small Iowa town did, he simply winked and sent me on my way.

In watching the game again for the first time since it was originally broadcast on ABC I had forgotten that Earl Weaver and Reggie Jackson joined Don Drysdale in the booth. And while generally Weaver and Drysdale provided some insight, Jackson said little that wasn't obvious or that was not from the typical player "guts and glory" script. Interestingly, at the very beginning of the broadcast Jackson noted (incorrectly) how Weaver introduced platooning to the majors several years prior and Weaver offerred a few comments on the particular mix of players he had that made the strategy successful. I had also forgotten that the umpires went on strike seeking better pay for the postseason and so the game was umpired by four replacement umps - one of seven games affected that season. The home plate umpire was inconsistent, a point that Drysdale hammered on, and in at least one instance an umpire found himself out of position although fortunately still got the call correct. As the game went on and finally got out of hand as the Cubs scored six times in the fifth inning, the home plate umpiring became even more inconsistent. Finally, the quality of the broadcast overall was pretty poor it seemed (even for the times since this morning I popped in the DVD of "the Sandberg game" broadcast on NBC earlier that season and which had a much better quality), and there were several glitches along the way. They did, however, introduce their new "super slo-mo" camera jointly developed with Sony and used first at the 1984 Olympics just a few months prior. Unfortunately, although Weaver referred to the stitch configuration on several of the pitches, the quality of the broadcast made it difficult even in slo-mo to see the rotation very well on Rick Suttcliffe's curve and on the Eric Show slider.

In the Sandberg game called by Bob Costas and Tony Kubek there were a couple of great plays by Ozzie Smith in the first three innings that I had forgotten about. Costas used the opportunity to contrast Cubs shortstop Larry Bowa's surehandedness with Smith's great range in pointing about that Smith made something like 140 more plays than Bowa in 1983. They also talked about Bowa turning to glasses and how he was now using the new lenses that darkened in the sun - a point I identified with since about that same time I too went to lenses like that. I'm not sure how long Bowa played with them but I found that after awhile they seemed to never lighten up totally when moving indoors. Costas and Kubek also made the point a couple of times in the broadcast that walks and on base percentage are two of the most underrated statistics in baseball and used the occasion to point out that this was perhaps the most important reason why Gary Matthews was in the third spot in the batting order.


But in watching both games I was thinking about the "Daily Double" of Bob Dernier and Ryne Sandberg in the first two spots in the Cubs order. Dernier of course was added to the roster as a part of the trade with the Phillies in spring training that also netted Gary Matthews and Porfi Altamirano in exchange for Bill Campbell and Mike Diaz. Anyway, I was curious how many runs the Daily Double actually contributed and so here's a look at the 1984 Cubs roster in terms of its baserunning.

Name               Opps  EqGAR   Opps  EqSBR   Opps  EqAAR HAOpps  EqHAR   Opps  EqOAR   Opps  EqBRR
Bob Dernier 34 1.4 68 0.9 47 0.6 52 1.7 391 1.7 592 6.2
Gary Matthews 29 0.3 27 -1.2 54 0.2 49 2.1 368 1.3 527 2.7
Thad Bosley 4 0.2 6 0.2 7 0.1 7 0.9 66 0.5 90 1.9
Henry Cotto 6 0.5 13 0.0 19 0.9 18 -0.2 141 0.2 197 1.4
Davey Lopes 2 0.1 3 0.4 2 0.0 4 0.7 6 0.0 17 1.1
Leon Durham 17 0.0 24 -0.6 41 0.1 36 1.1 253 0.4 371 1.0
Jay Johnstone 6 0.1 0 0.0 3 0.2 5 0.0 42 0.5 56 0.7
Dick Ruthven 2 0.0 0 0.0 0 0.0 1 -0.1 12 0.3 15 0.2
Rick Reuschel 2 0.0 0 0.0 1 0.2 4 0.1 20 -0.1 27 0.1
Steve Lake 4 -0.1 0 0.0 2 0.1 0 0.0 16 0.0 22 0.0
Lee Smith 0 0.0 0 0.0 0 0.0 0 0.0 1 0.0 1 0.0
Tim Stoddard 0 0.0 0 0.0 0 0.0 0 0.0 2 0.0 2 0.0
Dickie Noles 0 0.0 0 0.0 0 0.0 0 0.0 4 0.0 4 0.0
Warren Brusstar 0 0.0 0 0.0 0 0.0 0 0.0 5 0.0 5 0.0
Billy Hatcher 0 0.0 0 0.0 0 0.0 0 0.0 6 0.0 6 0.0
Bill Buckner 2 0.1 0 0.0 5 -0.1 0 0.0 5 0.0 12 0.0
Scott Sanderson 0 0.0 0 0.0 0 0.0 0 0.0 10 0.0 10 0.0
Chuck Rainey 1 0.0 0 0.0 0 0.0 2 0.0 6 0.0 9 -0.1
Ron Hassey 0 0.0 0 0.0 0 0.0 0 0.0 18 -0.1 18 -0.1
Dan Rohn 1 -0.1 0 0.0 1 0.0 1 0.0 9 0.0 12 -0.1
Richie Hebner 6 -0.1 1 0.2 5 0.2 7 -0.5 56 0.0 75 -0.2
Steve Trout 3 0.1 0 0.0 3 0.0 4 -0.3 20 -0.1 30 -0.3
Mel Hall 6 -0.4 4 -0.3 6 -0.2 12 0.2 76 0.3 104 -0.3
Tom Veryzer 9 -0.1 0 0.0 1 0.2 2 -0.6 29 0.1 41 -0.4
Ryne Sandberg 24 -0.7 40 1.0 53 -2.8 61 2.3 377 -0.3 555 -0.5
Dave Owen 5 -0.4 3 -1.0 4 -0.1 8 -0.1 49 0.9 69 -0.8
Rich Bordi 1 0.1 0 0.0 0 0.0 1 -1.1 3 0.0 5 -1.1
Larry Bowa 29 1.3 13 -0.6 15 -1.3 25 0.3 201 -0.8 283 -1.1
Gary Woods 8 -0.3 3 -0.1 9 0.6 7 -1.4 52 0.0 79 -1.1
D Eckersley 3 0.2 0 0.0 0 0.0 4 -1.7 15 -0.1 22 -1.6
Ron Cey 25 -0.3 5 -0.7 29 -0.2 37 -0.5 250 0.1 346 -1.6
Rick Sutcliffe 6 0.0 0 0.0 4 -0.5 5 -1.3 26 -0.1 41 -1.9
Keith Moreland 23 -0.3 5 -2.1 31 -1.2 36 0.0 253 1.2 348 -2.4
Jody Davis 21 -0.5 10 -2.3 23 -0.5 29 -2.0 251 0.2 334 -5.2

279 0.8 225 -6.2 365 -3.5 417 -0.5 3039 5.9 4325 -3.4



So Dernier was worth about +6 runs and recorded positive numbers in all five baserunning categories.

EqGAR - advancing on ground outs
EqSBR - stolen base attempts and pick-offs
EqAAR - advancing on fly balls
EqHAR - advancing on hits
EqOAR - other advancement including wild pitches, passed balls, and balks

Sandberg, however, left a little to be desired as he was especially poor in advancing on fly balls and recorded a -2.8 EqAAR.

Overall, the Cubs as a team were at -3.4 although that's not as bad as you might think. Even with that value they placed ninth in baseball since at the time teams did very poorly on average in EqSBR. If we look at the team rankings and add a column for Equivalent Baserunning Runs (EqBRR) without EqSBR included, the Cubs were right in the middle of the pack.


Team EqBRR No EqSBR EqBRR Rank
HOU 3.8 15.9 3
SDN 1.5 12.5 5
NYN 7.7 11.0 1
KCA -4.9 9.1 10
SLN 4.3 7.7 2
TOR 1.8 7.6 4
LAN -10.8 7.4 14
CLE -14.0 6.5 20
CAL -5.0 6.2 11
OAK -0.3 6.0 7
CIN -0.1 5.1 6
TEX -8.3 4.9 12
SFN -11.8 3.3 16
CHN -3.4 2.8 9
ATL -12.7 2.2 17
CHA -11.5 -0.5 15
DET -20.1 -1.7 22
MIN -13.1 -1.8 18
MIL -26.4 -3.3 24
PHI -1.4 -3.9 8
BAL -13.1 -4.4 19
NYA -15.9 -6.6 21
MON -8.9 -10.0 13
PIT -28.1 -12.3 26
SEA -26.5 -12.4 25
BOS -25.8 -16.8 23


One of the reasons that teams did more poorly in EqSBR during that era was that stolen base percentages were lower overall as I discussed in a column back in November and documented by this graph.



Since EqSBR and all the baserunning metrics are based on break-even percentages calculated from Run Expectancy matrices, lower stolen base percentages lead directly to negative EqSBR values. Some readers have pointed out however, that at least some of the difference in success rate over time has to do with the more frequent reliance on the hit and run play in the past. I'm certain that is the case and although you might think that some of that reliance will be offset by increased EqHAR values since on a successful hit and run the runner will advance from first to third more often, that difference will not be reflected in EqHAR. This is the case since EqHAR is calculated based on a comparison to the average success rate for the current season and not against a historical norm. In any case, future adjustments to EqSBR based on whether the batter offered at the pitch (or when not available when the batter strikes out) will be able to better get a handle on this. My suspicion is that even with the adjustments we'll find that EqSBR values will still be lower as increased awareness of the costs of allowing fast but relatively poor success rate players (in the 55-70% range) to steal has driven down their opportunities.

To wrap up lets look at the career baserunning numbers for the Daily Double starting with Sandberg.


Year Team Opps EqGAR Opps EqSBR Opps EqAAR HAOpps EqHAR Opps EqOAR Opps EqBRR
1981 PHI 1 0.1 0 0.0 1 0.0 2 0.1 15 -0.1 19 0.1
1982 CHN 49 4.0 49 -0.8 41 -1.4 42 3.3 332 0.4 513 5.5
1983 CHN 45 1.2 50 1.2 61 -3.0 32 0.7 369 0.0 557 0.1
1984 CHN 24 -0.7 40 1.0 53 -2.8 61 2.3 377 -0.3 555 -0.5
1985 CHN 27 -0.8 64 5.6 52 0.6 47 2.5 372 1.3 562 9.2
1986 CHN 34 0.7 45 1.0 50 -2.0 16 0.5 318 -1.7 463 -1.4
1987 CHN 37 -1.1 23 3.0 32 0.7 39 0.9 297 -1.0 428 2.5
1988 CHN 41 -0.3 36 -0.7 53 0.2 50 0.2 354 -0.3 534 -0.9
1989 CHN 29 -0.9 20 -1.0 40 0.5 64 1.2 356 -0.1 509 -0.3
1990 CHN 26 -0.3 35 1.3 48 0.9 50 0.9 326 -0.1 485 2.7
1991 CHN 30 0.3 34 0.3 66 -1.5 50 -0.6 359 1.7 539 0.2
1992 CHN 42 -1.1 25 0.5 64 -0.7 66 2.5 397 0.3 594 1.5
1993 CHN 27 -0.4 11 0.4 36 0.0 50 2.2 278 -0.1 402 2.1
1994 CHN 14 -0.2 5 -1.4 23 0.4 28 0.6 128 -0.6 198 -1.2
1996 CHN 21 0.3 18 -1.4 40 0.9 40 -0.2 260 -0.3 379 -0.7
1997 CHN 25 -0.2 10 -0.9 26 0.9 30 2.7 213 0.3 304 2.8
472 0.5 465 8.0 686 -6.2 667 19.7 4751 -0.3 7041 21.7


He adds about two wins to his career totals with his work on the bases with almost all of that coming due to his stolen bases, where he was a good career percentage stealer, and advancing on hits. As I remember it his EqGAR and EqAAR totals accurrately reflect the fact that he was less than a stellar runner when it came to making decisions on advancing on ground outs and fly balls, often taking chances that weren't warranted and getting thrown out.

To put some perspective on this, the elite runners who played a few more seasons than Sandberg including Tim Raines, Rickey Henderson, and Willie Wilson are in the +100 run range and someone like Vince Coleman who played about the same length was at +70 runs.

Finally, we'll take a look at Bob Dernier.


Year Team Opps EqGAR Opps EqSBR Opps EqAAR HAOpps EqHAR Opps EqOAR Opps EqBRR
1980 PHI 2 0.0 4 0.1 3 0.0 2 0.5 16 -0.1 27 0.5
1981 PHI 2 -0.1 3 0.2 1 0.0 1 0.2 14 -0.1 21 0.2
1982 PHI 19 -0.2 57 1.2 30 -2.0 29 0.8 210 1.2 345 1.1
1983 PHI 19 0.6 42 1.2 19 -1.1 22 1.8 172 -0.3 274 2.2
1984 CHN 34 1.4 68 0.9 47 0.6 52 1.7 391 1.7 592 6.2
1985 CHN 25 0.2 42 1.8 44 -1.5 37 0.1 275 0.7 423 1.2
1986 CHN 30 0.7 29 4.7 24 -0.1 14 0.2 195 0.9 292 6.5
1987 CHN 11 -0.1 23 -0.2 11 0.0 22 1.4 134 0.0 201 1.1
1988 PHI 23 -0.3 19 0.2 11 -0.3 18 -0.7 122 -0.4 193 -1.4
1989 PHI 8 -0.2 7 -1.2 12 0.0 21 0.9 111 0.7 159 0.2
173 2.0 294 9.0 202 -4.5 218 6.9 1640 4.4 2527 17.8


Dernier had a better overall season on the bases in 1986 and wound up his career at +18 runs overall. As for the Daily Double, their best season together was 1985 when Sandberg was at +9.2 and Dernier at +1.2 for a total of +11.3 runs.

While the actual numbers don't indicate that the Daily Double paid huge dividends with their feet in 1984, their ability to get on base (Dernier's OBP was .356, the second best of his career) and Sandberg's MVP season certainly propelled the offense to a division title and gave us Cubs fans a season we'll never forget.

Friday, December 21, 2007

Chat Transcript 12/21

As always, thanks to everyone who participated in today's chat. Feel free to email or comment here if you have a follow-up...

NCAA Limits Live Blogging

Thought this story on blogging sporting events and the NCAA, sent to me by a colleague, was pretty interesting. I agree that making rules like this only hurts the sport in the long run. Of course the technology will advance to where, using a handheld device, you could essentially simulcast the game from your seat with commentary etc. but blogging is still a long way from that...

Thursday, December 20, 2007

And Even More Refinements in SFR

As some readers will know I've been writing the last couple weeks here and on Baseball Prospectus about a play by play fielding system termed Simple Fielding Runs or SFR.

After last week's column I had a chance to tweak the system yet again, and made the following changes.

  • Took into consideration the additional context of whether first base was occupied for all infielders. It turns out that for first baseman on ground balls hit by lefties the percentage of runners who reach (hit or error) goes from 16% to 24% when first is occupied, and from 20% to 29% for right-handers. For second baseman in the same scenarios it goes from 27% to 29% and 30% to 34%, for shortstops, interestingly, the trend is the opposite as it goes from 34% to 30% and 32% to 31%, and for third baseman it's pretty steady at 27% to 27% and 24% to 26%. Although a smaller percentage of fielded balls, the differences are significant for bunts when there is a runner on first since the vast majority of such attempts are sacrifice bunts instead of bunt hit attempts which have a higher success rate. The end result, I'm hoping, is a higher correlation with UZR for first baseman. I have not, however, taken into consideration the ability of first baseman to catch throws from infielders.


  • Changed the partitioning rules for which balls are assigned to first and third baseman. In the past I was partitioning all balls in the shared areas of responsibility except for middle infielders all line drives that resulted in extra base hits. Now I also exclude bunts from the calculation as well and assign all extra base hits on ground balls to left field to the third baseman and all extra base hits on grounders to right to the first baseman on the assumption that they're hit down the line. I also no longer partition any fly balls or line drives to the outfield since by doing so we didn't seem to be adding much information to what we already knew. Finally, in the last column on the subject I noted that I used a 50/50 split on ground balls up the middle. After the article was submitted, however, I realized that there is a simple way to partition balls between the short and second area of responsibility following the same principle outlined in the column. That change has been made.


  • While I haven't re-run the correlations with UZR I did want to show a few results and so here are all infielders who were assigned 15 or more balls in 2007 in the National League Central. You'll notice that the range is smaller with Milwaukee's infield not looking quite so terrible as in previous versions of the system.


    Team Name POS Balls SFR
    CHN Carlos Zambrano 1 33 0.6
    CHN Jason Marquis 1 30 -0.4
    CHN Rich Hill 1 26 -1.7
    CHN Ted Lilly 1 30 -1.8
    CHN Sean Marshall 1 17 -2.3
    CHN Jason Kendall 2 17 -0.5
    CHN Michael Barrett 2 23 -0.9
    CHN Daryle Ward 3 21 -1.5
    CHN Derrek Lee 3 294 -1.8
    CHN Ryan Theriot 4 102 1.3
    CHN Ronny Cedeno 4 18 0.9
    CHN Mike Fontenot 4 200 0.1
    CHN Mark DeRosa 4 295 -0.1
    CHN Aramis Ramirez 5 406 7.1
    CHN Mark DeRosa 5 103 4.9
    CHN Cesar Izturis 6 189 2.3
    CHN Ryan Theriot 6 405 2.1
    CHN Ronny Cedeno 6 48 2.0
    -------------------------------------------------
    CIN Bobby Livingston 1 17 1.4
    CIN Aaron Harang 1 25 0.8
    CIN Bronson Arroyo 1 32 -0.4
    CIN Matt Belisle 1 18 -0.9
    CIN Javier Valentin 2 21 0.0
    CIN David Ross 2 49 -0.5
    CIN Scott Hatteberg 3 167 1.2
    CIN Jeff Conine 3 91 0.1
    CIN Jorge Cantu 3 17 -0.9
    CIN Joey Votto 3 32 -1.1
    CIN Jeff Keppinger 4 16 0.3
    CIN Brandon Phillips 4 662 -10.0
    CIN Jeff Keppinger 5 34 1.8
    CIN Ryan Freel 5 52 0.5
    CIN Juan Castro 5 21 -1.5
    CIN Edwin Encarnacion 5 402 -10.5
    CIN Alex Gonzalez 6 420 2.0
    CIN Juan Castro 6 44 -0.5
    CIN Jeff Keppinger 6 184 -1.6
    CIN Pedro Lopez 6 41 -1.7
    -------------------------------------------------
    HOU Chris Sampson 1 25 1.9
    HOU Roy Oswalt 1 38 1.2
    HOU Wandy Rodriguez 1 22 1.0
    HOU Woody Williams 1 40 0.2
    HOU Dave Borkowski 1 21 0.0
    HOU Brad Ausmus 2 35 0.1
    HOU Eric Munson 2 18 -0.2
    HOU Mark Loretta 3 51 2.9
    HOU Mike Lamb 3 67 -1.1
    HOU Lance Berkman 3 277 -3.5
    HOU Chris Burke 4 150 2.5
    HOU Mark Loretta 4 77 -0.3
    HOU Craig Biggio 4 398 -9.7
    HOU Ty Wigginton 5 143 1.7
    HOU Mark Loretta 5 48 -0.5
    HOU Mike Lamb 5 144 -3.5
    HOU Morgan Ensberg 5 186 -9.0
    HOU Adam Everett 6 281 9.4
    HOU Eric Bruntlett 6 180 0.3
    HOU Cody Ransom 6 35 -0.1
    HOU Mark Loretta 6 248 -2.2
    -------------------------------------------------
    MIL Jeff Suppan 1 35 2.1
    MIL Chris Capuano 1 30 1.1
    MIL Dave Bush 1 26 0.4
    MIL Claudio Vargas 1 15 0.2
    MIL Johnny Estrada 2 38 0.8
    MIL Tony Graffanino 3 25 0.7
    MIL Prince Fielder 3 325 -9.7
    MIL Craig Counsell 4 84 3.0
    MIL Tony Graffanino 4 92 0.5
    MIL Rickie Weeks 4 439 -21.4
    MIL Craig Counsell 5 112 3.9
    MIL Tony Graffanino 5 74 2.2
    MIL Ryan Braun 5 309 -24.7
    MIL Craig Counsell 6 79 -0.1
    MIL J.J. Hardy 6 619 -1.7
    -------------------------------------------------
    PIT Matt Morris 1 18 1.6
    PIT Ian Snell 1 20 1.6
    PIT Paul Maholm 1 35 0.8
    PIT Shawn Chacon 1 20 0.4
    PIT Tom Gorzelanny 1 28 -0.4
    PIT Zach Duke 1 23 -0.6
    PIT Ronny Paulino 2 42 0.7
    PIT Adam LaRoche 3 327 3.0
    PIT Josh Phelps 3 25 1.2
    PIT Matt Kata 4 18 -1.9
    PIT Jose Castillo 4 56 -5.7
    PIT Freddy Sanchez 4 526 -13.0
    PIT Jose Castillo 5 130 2.6
    PIT Matt Kata 5 23 0.1
    PIT Jose Bautista 5 410 -4.8
    PIT Cesar Izturis 6 107 -0.5
    PIT Jack Wilson 6 665 -0.6
    PIT Jose Castillo 6 32 -0.6
    -------------------------------------------------
    SLN Adam Wainwright 1 29 1.8
    SLN Braden Looper 1 21 0.4
    SLN Joel Pineiro 1 15 0.2
    SLN Kip Wells 1 23 -0.5
    SLN Yadier Molina 2 34 0.9
    SLN Gary Bennett 2 17 0.1
    SLN Albert Pujols 3 396 10.6
    SLN Brendan Ryan 4 64 1.3
    SLN Miguel Cairo 4 19 0.6
    SLN Aaron Miles 4 242 -0.8
    SLN Adam Kennedy 4 333 -2.8
    SLN Scott Rolen 5 351 13.7
    SLN Scott Spiezio 5 68 2.4
    SLN Russell Branyan 5 28 1.6
    SLN Miguel Cairo 5 31 1.5
    SLN Brendan Ryan 5 72 0.3
    SLN Brendan Ryan 6 91 3.7
    SLN Brian Barden 6 17 1.0
    SLN David Eckstein 6 493 -7.2
    SLN Aaron Miles 6 155 -8.6

    Mitchell and the Outfield

    Today in my column at BP I take a look at two topics that couldn't be more different. First, I give my take on the Mitchell report and then delve into outfield defense in a continuation of my series on Simple Fielding Runs (SFR).

    One of the goals that prompted looking at creating a fielding system with retrosheet style play by play data was to combine the outfield SFR ratings with the throwing arm ratings (Equivalent Throwing Runs) I developed for an essay to be published in Baseball Prospectus 2008. The final table in today's columns shows the two measures side by side for all outfielders in 2007 who played in 80 or more adjusted games (9-inning equivalents). I did not, however, total them up by team and so here are the totals for each team in 2007. Note that the SFR values are a first pass and are park adjusted using a three-year factor. Those adjustments I believe lead to the imbalance you see with more teams coming out positive. I haven't looked into it deeply but that's my assumption anyway.

    Also, just a reminder that I'll be chatting at BP tomorrow at 11AM Mountain, 1PM Eastern. Hope to see you there.


    2007 Outfield Defense
    Team Balls SFR EqThr Total
    BOS 2100 65.0 -0.9 64.1
    CHN 2081 46.5 11.9 58.4
    NYN 2206 29.2 -6.1 23.1
    WAS 2394 29.2 -7.0 22.2
    COL 2202 27.0 -1.8 25.2
    ARI 2207 25.3 1.0 26.3
    ANA 2268 20.5 0.0 20.5
    TEX 2250 20.2 3.1 23.2
    FLO 2366 16.7 4.6 21.3
    TOR 2010 13.3 8.5 21.8
    DET 2286 10.0 4.7 14.7
    CLE 2248 9.6 -7.5 2.1
    NYA 2340 7.3 6.5 13.8
    CHA 2290 2.8 -6.2 -3.4
    HOU 2259 1.9 -2.1 -0.2
    LAN 2120 1.4 -14.6 -13.2
    CIN 2418 0.8 -1.2 -0.4
    KCA 2426 -2.1 3.9 1.8
    PIT 2315 -2.9 1.8 -1.0
    ATL 2191 -3.9 13.0 9.0
    MIL 2294 -4.5 -4.2 -8.7
    BAL 2243 -5.0 1.4 -3.6
    OAK 2231 -11.3 -16.8 -28.1
    PHI 2249 -15.5 13.5 -2.0
    MIN 2216 -19.5 2.8 -16.7
    SDN 2153 -19.6 -10.4 -30.1
    TBA 2318 -22.9 11.4 -11.5
    SLN 2266 -24.6 -1.6 -26.3
    SFN 2242 -32.2 -5.3 -37.5
    SEA 2296 -38.1 6.2 -31.9

    Friday, December 14, 2007

    Wins and the Quantum

    In light of the recent post on the clutch hitting article by Bill James I thought this piece, that first appeared on Baseball Prospectus on April 6, 2006, might be appropriate.



    Schrodinger's Bat: Wins and the Quantum
    by Dan Fox

    "I'm reminded a bit of the principle of superposition--each player in the game produces a contribution that has an effect on the probability of winning, somewhat analogous to a wave function. Add up these "wave functions" for each team, and you get a result that expresses how likely the team is to win with these particular sets of contributions, yet at this point it's still unknown whether the team actually wins (much like the fate of Schrödinger's cat inside the box). However, the wave function only collapses to the actual result when the game is played (or the box containing the cat is opened)."

    --Keith Woolner, “Aim for the Head” October 24, 2001

    When Woolner wrote that over four years ago, this column wasn’t even a twinkle in BP’s collective eye. I do love the analogy, though, and before moving on to this week’s topic--which connects to Woolner’s quote--I wanted to take a minute to explain the title of this column and how, other than the play on words, it relates to baseball. (If quantum physics doesn't interest you, click here to go directly to the baseball part of the article).

    Erwin Schrödinger was an Austrian physicist who, in 1926, formulated the fundamental equation of quantum mechanics. His equation described a world where properties of a particle (such as the location of an electron) at a specified time can be pinpointed only probabilistically. In other words, the particle may have a greater chance of being in one place than in another, but its location is described by a wave where the peak represents the position with the greatest probability. The quantum world appeared to be a fuzzy one governed by probability unlike the clock-work deterministic world of our everyday experience.

    By 1935, many physicists, including Niels Bohr (although famously not Albert Einstein), had interpreted this waveform equation to mean that particles do not in fact possess specified properties (such as location) before measurements are taken; they are in a spread out and fuzzy state of superposition (literally beyond position) until a measurement is taken and causes the waveform to “collapse” to a particular value.

    As a response to this view, Schrödinger devised a thought experiment that came to be known as “Schrödinger’s Cat”; he believed it showed that Bohr’s interpretation of quantum theory was, at the very least, incomplete.

    In short, the experiment involves a cat in a box with a trigger that releases poison. That trigger is tied to a device that measures a property of a particle. According to the waveform equation there is a 50% chance that the particle will be in state A and a 50% chance of it being in state B. If the device measures it in state A the poison is released and the cat dies.

    The core question for Schrödinger was simply this: if the particle’s property is not determined until it is measured, under Bohr’s interpretation isn’t the cat--through its connection with the particle via the device--also left in an undetermined state and therefore neither dead nor alive until the box is opened? Bohr’s interpretation didn’t really answer the question since it didn’t define any rules about the nature of measurement and observation.

    To make a long story short, which is told in wonderfully accessible prose by Brian Greene in The Fabric of the Cosmos, the questions raised in this thought experiment baffled physicists for years, but now have been mostly resolved by applying a concept known as decoherence. That concept holds that long before the box is opened the influence of the environment (from photons to air molecules and other particles) has nudged the waveform function into taking on a specific value, meaning that the cat is in fact really dead or alive and not caught in some state of limbo.

    What I like about this episode in the history of science is that Schrödinger devised a clever experiment used to test a common perception in his own field of quantum mechanics. That experiment made people think deeply about what they knew or thought they knew about the nature of reality itself. And while I’m not pretending that baseball has anything profound to say about such matters (it is, after all, just entertainment), I do hope that through this column, at least now and then, we can devise clever experiments that put to the test both conventional and sabermetric wisdom and help us think more deeply about our shared distraction.

    Before moving on I should also mention that reader John MacKenzie noted that he’s been using the moniker Schrödinger’s Bat (with a different spelling) for his fantasy league team for several years. We were of course unaware of that usage, so please don’t give John a hard time thinking that he lifted it from us.

    And now on to your regularly scheduled programming…


    Win Expectancy 101
    The concept of Win Expectancy (or Win Probability Added) is now an old one in performance analysis circles. Simply put, Win Expectancy is the probability of wining a game given the inning, score, and base/out situation. Using the Expected Win Matrix here at BP you can see, for example, that in 2005 when the visiting team was behind by a run in the top of the 6th inning with runners on first and third and nobody out, their probability of winning was exactly 50%.

    Changes in that probability throughout a game can be tracked and then applied to a host of questions both strategic (when to sacrifice, when to steal, when to issue an intentional walk, when to bring in a reliever) and reflective (who contributed most or least to increasing their team’s chances of winning in 2005--their aggregate contribution to the waveform function for each game in which they played, to use Woolner’s analogy).

    Those readers who’ve treated themselves to The Numbers Game by Alan Schwarz or Curve Ball: Baseball, Statistics, and the Role of Chance in the Game by Jim Albert and Jay Bennett know all about the Mills Brothers and their computation of “Player Win Averages” (PWA) for the 1969 season published in their 1970 book Player Win Averages: A Computer Guide to Winning Baseball Players. There they devised a system where changes in win expectancy were assigned to players and multiplied by a point system to compute Win and Loss points. The ratio of those became the PWA, their goal being to formulate a statistic like batting average used to discover clutch performers.

    But simply because it’s a topic with some legs doesn’t mean there aren’t new applications and refinements that can be made. Woolner himself contributed to this endeavor through the publication of the Win Expectancy Framework (WX), first discussed in the 2005 Baseball Prospectus and again in the 2006 version as well as in Baseball Between the Numbers, where it is applied to topics ranging from relief pitching to stolen bases.

    For those unfamiliar, the framework allows for the computation of the probability of winning a game given the current inning, score, base/out state, run environment (both home and visiting teams), and run differential. It does so by calculating all the permutations of possible outcomes from that point forward to determine the probability of each team winning.

    The key difference between the framework and matrices such as the one referenced previously, is that the probabilities produced are theoretical and the situations from which they derive needn’t have occurred in real life. This has a twofold advantage:

    1. it allows the framework to be more flexible by considering parameters such as the offensive environment of each team instead of being averaged across all teams
    2. it eliminates the problem of small sample size where a particular situation that occurred only a handful of times--or not at all--results in probabilities that are counterintuitive.


    For example, in the scenario described above, the visiting team had a 50% chance of winning as revealed in the table. However, in the intuitively less favorable situation where the visitors had a runner only on first, their probability of winning in 2005 was 52.4%. The inherent nature of WX eliminates these problems. From that perspective, WX is more similar to the approach used by the Mills brothers in computing PWA where they used computer simulation to derive the probabilities.

    Leveling the Playing Field?
    In any case, in his 2006 Baseball Prospectus article “Adventures in Win Expectancy” Woolner applied WX to hitter seasons using play by play data extending from 1960 through 2005. In other words he calculated and then summed the change in win expectancy across all plate appearances for each hitter using the WX framework to produce a kind of “number of wins above average” contributed by each player. The results were then shown in two tables that reveal the 15 highest and lowest seasonal Batting WX and the 20 highest and lowest career Batting WX for the time period. The two tables below show the top and bottom five for each.

    Seasonal Batting WX
    Year Name PA WX
    2004 Barry Bonds 617 12.07
    2001 Barry Bonds 664 11.71
    2002 Barry Bonds 612 10.45
    1969 Willie McCovey 623 10.02
    1998 Mark McGwire 681 9.65

    ---------------------------------

    2003 Royce Clayton 543 -4.28
    1970 Larry Bowa 577 -4.29
    1968 Hal Lanier 518 -4.45
    1997 Gary DiSarcina 583 -4.76
    2002 Neifi Perez 585 -6.69


    Career Batting WX
    Name WX
    Barry Bonds 115.71
    Willie McCovey 74.11
    Hank Aaron 71.15
    Willie Mays 63.41
    Frank Robinson 63.04
    -----------------------
    Tim Foli -24.61
    Doug Flynn -25.69
    Royce Clayton -27.29
    Alfredo Griffin -28.90
    Larry Bowa -31.50

    Obviously the gap between Barry Bonds and the rest of the pack is wide because of Barry’s late and unprecedented 1999-2004 performance illustrated in the first table, but also because Willie Mays, Hank Aaron, and Frank Robinson all played significant portions of their careers prior to 1960 with Mays' rookie season in 1951, Aaron's in 1954, and Robinson's in 1956. And what about Babe Ruth, Ted Williams, Ty Cobb and the rest? How do they compare to Bonds?

    Fortunately, we can augment these lists to stretch back through time by applying a formula and a table of slopes and intercepts Woolner provided to estimate the win value of an offensive event given any offensive environment.

    First, by applying the formula to the league run environment over time for the National League we can produce the following two graphs:




    There is an interesting aspect of the first graph, as noted by Woolner. In eras of higher run scoring--such as when the average NL team scored 7.36 runs per game in 1894, 5.68 runs per game in 1930, and around 5.00 runs per game in 1999-2000--each offensive event contributes less to a win than in lower run scoring environments such as 1908 (with 3.32 runs per game), and 1968 (at 3.42).

    In other words, contrary to the notion that home runs during the dead-ball era weren’t as important as small ball tactics, they were in fact even more important, since each extra base hit--especially one that plates a run--has a larger relative impact on winning the game. Looking closer you’ll notice that as the number of bases gained by the event increases the relative value also increases as the run environment decreases. So in 1908 a home run is worth 3.14 times more than a single while in 1930 it’s worth exactly three times as much.

    It is in that context that the following quote from the supreme hitter of the dead-ball era is relevant:

    "If I had set out to be a homerun hitter, I am confident in a good season I would have made between twenty and thirty homers...I would naturally have sacrificed place hitting, which, to my way of thinking, is the supreme pinnacle of batting art."
    --Ty Cobb as quoted in F.C. Lane’s Batting.

    If Cobb could indeed have hit 25 home runs a season in the days before 1920 as he also is purported to have contended in the oft-recited anecdote where he hit three homeruns to prove the point, then he would have been well served to do so.

    In the second graph the value of various kinds of outs are shown and what is revealing is that the win values of strikeouts and other kinds of outs don’t change very much over time. The graph also shows how much more costly getting caught stealing is than other kinds of outs and that caught stealing fluctuates more with the run environment. In higher scoring eras getting thrown out doesn’t cost as much as in lower run scoring eras since when runs are scarce and runners are hard to come by, losing a baserunner has a larger relative impact on winning or losing. The long and short of it, as illustrated by Woolner in the original article and discussed by James Click in Baseball Between the Numbers and shown in the following graph, is that you have to be successful at a higher rate in low run scoring environments than you do when runs are more plentiful.



    Attentive readers will note that the break even percentages shown here vary somewhat and are lower than those shown in the original article. The reason is that these are based on the overall win expectancies calculated using Woolner’s formula and not on specific situations in various run environments.

    So by joining lower win values for offensive events in higher run scoring environments and very similar win values for most outs in lower run scoring environments you get something rather counterintuitive. But both of those statements have their roots in the fact that the basic structure of the game hasn’t changed much. Despite styles of play that come in and out of vogue, you still get just three outs per inning and 27 outs per regulation game and a home run has always been the most efficient way to score runs.

    So let’s apply the formula to individual batter seasons and adjust for both the league run environment as well as the ballpark using three-year park factors. After all, just as an extra base hit increases the win probability in the low run scoring environment of 1968 more so than in 2001, it does so to a greater degree at Dodger Stadium in 2001 than it does in Coors Field.

    After making the calculation for 83,733 player seasons (starting in 1876 in the NL and 1901 in the AL) we find the following top and bottom 15 seasons. Note that there are two tables of the bottom performers, since the bottom performers were dominated by pre-1900 players.

    Name PA WX1
    2001 SFN Barry Bonds 664 11.59
    2002 SFN Barry Bonds 612 10.85
    1923 NYA Babe Ruth 699 9.83
    1921 NYA Babe Ruth 693 9.61
    1920 NYA Babe Ruth 615 9.44
    1927 NYA Babe Ruth 691 9.17
    1926 NYA Babe Ruth 652 9.10
    1927 NYA Lou Gehrig 717 9.09
    1941 BOS Ted Williams 606 8.94
    1946 BOS Ted Williams 672 8.94
    2004 SFN Barry Bonds 617 8.79
    1957 NYA Mickey Mantle 623 8.74
    1917 DET Ty Cobb 669 8.66
    1924 SLN Rogers Hornsby 640 8.55
    1942 BOS Ted Williams 671 8.49


    Name PA WX1
    1894 CHN Jiggs Parrott 536 -6.02
    1933 SLA Jim Levey 567 -5.84
    1886 KCN Jim Lillie 427 -5.52
    1893 SLN Joe Quinn 584 -5.36
    1894 NY1 John Ward 575 -5.24
    1894 CL4 Chippy McGarr 554 -5.07
    1885 NY1 Joe Gerhardt 423 -5.05
    1895 PHI Jack Boyle 625 -5.03
    2002 KCA Neifi Perez 585 -5.03
    1890 CL4 Bob Gilks 582 -5.02
    1884 BFN Jim Lillie 476 -4.81
    1891 CIN Germany Smith 551 -4.81
    1890 BRO Germany Smith 526 -4.67
    1879 CN1 Will White 300 -4.64
    1892 BSN Joe Quinn 574 -4.64


    Post 1900 Only
    Name PA WX1
    1933 SLA Jim Levey 567 -5.84
    2002 KCA Neifi Perez 585 -5.03
    1933 SLA Art Scharein 522 -4.62
    1953 SLA Billy Hunter 604 -4.54
    1934 SLA Ski Melillo 589 -4.53
    1909 BRO Bill Bergen 372 -4.48
    1999 COL Neifi Perez 732 -4.47
    1932 SLA Ski Melillo 659 -4.45
    1931 SLA Jim Levey 540 -4.42
    1936 PHA Skeeter Newsome 508 -4.33
    1937 CHA Jackie Hayes 631 -4.30
    1977 OAK Rob Picciolo 446 -4.09
    1902 CLE John Gochnauer 506 -4.08
    1970 CIN Tommy Helms 605 -4.07
    2000 COL Neifi Perez 699 -4.07

    What stands out of course is that the WX values for Bonds from the tables shown previously don’t match the WX1 values in the first table here. The reason is that the formula applied to calculate these values is more of an approximation and doesn’t put into complete context each individual plate appearance. As a result you would expect to see more variability when play-by-play data is used since a player may find himself more or less frequently used in highly leveraged situations through both chance and managerial decision.

    In other words, the price we pay for being able to reach back before play-by-play data was available is a loss in precision. However, given that the presence of a clutch hitting ability--if it exists at all is likely quite small--some might argue that WX1 has the advantage of removing the effect of randomness and in that way actually provides a more “pure” technique for comparison.

    Bonds’ 2001 and 2002 seasons still come out on top, but Ruth makes his mark with five consecutive entries on the list which is rounded out by appearances by Lou Gehrig, Ted Williams, Mickey Mantle, Ty Cobb, and Rogers Hornsby. This is pretty much what you’d expect from similar lists that look at Equivalent Runs (EqR), or park adjusted Runs Created (RC) or BaseRuns (BsR). The top active player is Albert Pujols whose 2003 season came in 35th with a WX1 of 7.47.

    Cubs fans will no doubt be disheartened to see Neifi Perez grab three of the bottom 15 slots since 1900.

    We can then sum the WX1 values for entire careers and provide the following top and bottom 20 career performers with the bottom performers list being duplicated once again for post 1900.

    Name PA WX1
    Babe Ruth 10616 117.37
    Barry Bonds 11636 108.73
    Ty Cobb 13072 105.14
    Ted Williams 9791 98.37
    Hank Aaron 13940 90.41
    Stan Musial 12712 87.85
    Willie Mays 12493 85.29
    Mickey Mantle 9909 82.43
    Lou Gehrig 9660 79.31
    Rogers Hornsby 9475 78.44
    Tris Speaker 11988 77.15
    Frank Robinson 11743 73.09
    Mel Ott 11337 72.06
    Honus Wagner 11739 65.08
    Eddie Collins 12037 65.03
    Rickey Henderson 13346 58.73
    Jimmie Foxx 9670 57.57
    Jeff Bagwell 9431 56.65
    Joe Morgan 11329 55.48
    Frank Thomas 8602 53.22


    Name PA WX1
    Tommy Corcoran 8275 -41.25
    Joe Quinn 6341 -38.43
    Germany Smith 4652 -34.46
    Alfredo Griffin 7330 -34.24
    John Ward 7470 -34.14
    Bobby Lowe 7741 -33.95
    Bill Bergen 3228 -33.22
    Kid Gleason 8198 -32.70
    Malachi Kittridg 4446 -32.18
    Ozzie Guillen 7133 -31.80
    Bones Ely 5000 -30.73
    Davy Force 3081 -30.04
    Fred Pfeffer 6563 -29.65
    Ed Brinkman 6640 -29.40
    Don Kessinger 8529 -29.13
    Ski Melillo 5536 -29.03
    Herman Long 7845 -28.97
    Everett Scott 6373 -28.77
    Larry Bowa 9103 -28.64
    Tim Foli 6573 -28.49


    Post 1900 Only
    Name PA WX1
    Alfredo Griffin 7330 -34.24
    Bill Bergen 3228 -33.22
    Ozzie Guillen 7133 -31.80
    Ed Brinkman 6640 -29.40
    Don Kessinger 8529 -29.13
    Ski Melillo 5536 -29.03
    Everett Scott 6373 -28.77
    Larry Bowa 9103 -28.64
    Tim Foli 6573 -28.49
    George McBride 6235 -27.02
    Tommy Thevenow 4484 -26.57
    Neifi Perez 5123 -25.53
    Aurelio Rodrigue 7078 -25.16
    Hal Lanier 3940 -24.77
    Leo Durocher 5827 -24.60
    Mark Belanger 6602 -24.50
    Luke Sewell 6041 -23.92
    Roy McMillan 7653 -23.74
    Wally Gerber 5816 -23.18
    Rabbit Warstler 4611 -22.97

    You’ll notice that the total WX1 here for Bonds is just four wins or so less than the table shown earlier while Ruth overtakes him at 117.37. Of course, Ruth’s contribution to winning here does not include his pitching performance which would further distance him from Bonds. Nor do these values include fielding which would help Bonds close the gap a bit.

    Mays and Aaron also add 19 and 22 wins, respectively, by including their entire careers; Ty Cobb comes out very well, and both Ted Williams and Stan Musial round out the top seven. Jeff Bagwell and Frank Thomas, two underrated players of the modern era, also make the top 15.

    Perhaps the most interesting thing about the top performers list is that Willie McCovey--second only to Bonds with 74.11 in Woolner’s original table--comes in 24th at 50.10 in WX1. His 1969 season that was rated at 10.02 in WX comes out to 7.00 in WX1. The most probable explanations: McCovey happened to have more highly leveraged plate appearances over the course of his career than would have been expected, he happened to hit well in the highly leveraged opportunities he had, he was one of the few true clutch hitters, or a combination of all three.

    Clearing the Bases
    To wrap up, there are also a couple issues I wanted to address from last week’s column regarding platoon splits.

    I mentioned last week that The Book notes that right-handers need about 2,000 plate appearances against lefties before their measured platoon split can be considered reliable. I received several comments on this to the effect that since 2,000 plate appearances is the equivalent of 10 to 12 years of playing time, that seems like an awfully long time to wait before you can say anything about a player’s split.

    I agree. The point is not that you can’t know anything about the player’s split ability in fewer plate appearances. The point is that if you had only two pieces of information--a hitter's platoon split and the average split for right handed batters--and you had to choose which was more accurate, you would chose the average split.

    That doesn't mean that you couldn't get a better estimate by regressing the player's split to the mean using a weighted value, which the authors also discuss. So you certainly don't need to ignore the measured platoon split of players like Wily Mo Pena or Eduardo Perez. However, in the case of a player like Perez, who has just over 300 career plate appearances against southpaws, your best estimate of his true platoon split would be pretty heavily regressed to the mean.

    Second, because in this case the statistical threshold is so high, teams can and do combine both scouting information and statistical data to make predictions about future performance. So Epstein’s comments about Pena’s ability to perhaps contribute immediately because of his platoon split hopefully also reflects their scouting of his swing mechanics and pitch recognition among other attributes.

    And finally, I’d like to thank all the regular BP readers who have so kindly welcomed me into the fold. Your support is appreciated and your feedback encouraged anytime.