FREE hit counter and Internet traffic statistics from freestats.com

Saturday, August 26, 2006

Skill and Variation

Phil Birnbaum reviews the baserunning series I've been writing at BP over at Sabermetric Research. I particularly appreciate his calling out my comments on true skill plus random variation which make up the results that I've published. I will likely be publishing the entire data set at some point so that more work can be done on sorting out the two.

Thursday, August 24, 2006

Introducing EqSBR

In my ongoing series on BP I've been writing about quantifying baserunning. This week I'll take a look at stolen bases and pickoffs and introduce a metric called Equivalent Stolen Base Runs (EqSBR). As with the other metrics I've created this one uses the Run Expectancy matrix for 2000-2005 and then assigns run values to various stolen base attempts and pickoffs based on the starting base/out situation. And also as with the other metrics what we're trying to do is measure how many theoretical runs a player or team gained or gave up on the base paths in total in order to put this aspect of the game in perspective.

Here are the values for 2005 for players with 10 or more opportunities (stolen base attempts + pick offs).


Opps PO CS EqSBR
Alfonso Soriano 32 0 2 4.92
Johnny Damon 19 0 1 2.84
Kenny Lofton 24 0 3 2.64
Jason Bay 22 0 1 2.39
Rafael Furcal 57 1 11 2.36
Jimmy Rollins 50 3 9 2.24
Torii Hunter 30 0 7 2.18
Marcus Giles 18 0 3 2.04
Willie Bloomquist 16 1 2 1.74
Jose Reyes 77 2 17 1.73
Joe Mauer 14 1 2 1.60
Craig Biggio 12 0 1 1.55
Albert Pujols 19 1 3 1.54
Reggie Sanders 15 0 1 1.39
Julio Lugo 47 0 11 1.29
Ichiro Suzuki 42 1 9 1.23
Craig Counsell 33 0 7 1.22
Damian Jackson 17 1 3 1.18
Jacque Jones 16 0 4 1.03
Cliff Floyd 14 0 2 1.01
Russ Adams 11 0 2 0.90
Jayson Werth 14 1 3 0.83
Juan Pierre 75 1 18 0.82
Mike Cameron 13 1 2 0.65
Aaron Rowand 22 1 6 0.63
Carlos Lee 18 1 5 0.61
Orlando Cabrera 23 1 3 0.61
Cory Sullivan 15 0 3 0.58
Derrek Lee 19 1 4 0.55
Chase Utley 19 1 4 0.53
Raul Ibanez 10 0 4 0.39
Tony Womack 32 2 7 0.37
Joey Gathright 25 0 5 0.36
Royce Clayton 17 1 4 0.28
Antonio Perez 14 0 4 0.26
Adam Kennedy 24 1 5 0.26
Jason Ellison 20 0 6 0.22
Gary Sheffield 10 0 2 0.17
Carlos Beltran 22 0 6 0.14
Derek Jeter 17 0 5 0.09
So Taguchi 11 0 2 0.02
Rickie Weeks 17 1 3 0.01
Junior Spivey 12 0 3 0.01
Adam Everett 28 0 7 0.00
Willie Harris 14 1 4 -0.02
Jason Kendall 11 0 3 -0.04
Carl Crawford 58 4 12 -0.08
Maicer Izturis 12 0 3 -0.08
Hector Luna 12 0 2 -0.11
David Wright 22 0 7 -0.12
Miguel Cairo 15 1 4 -0.12
Vernon Wells 11 0 3 -0.20
Eric Hinske 12 0 4 -0.24
Vladimir Guerrero 13 1 2 -0.24
Chone Figgins 80 1 18 -0.30
Jack Wilson 10 0 3 -0.32
Eric Bruntlett 10 1 3 -0.36
Jermaine Dye 15 0 4 -0.40
Edgar Renteria 13 0 4 -0.43
Gary Jr. Matthews 12 1 3 -0.47
Gary Matthews 12 1 3 -0.47
Gary Matthews Jr. 12 1 3 -0.47
Nook Logan 31 3 9 -0.50
Alex Rodriguez 26 1 7 -0.53
Tadahito Iguchi 18 1 6 -0.57
Brian Giles 19 1 6 -0.58
Craig Monroe 11 0 3 -0.61
Alex Sanchez 13 0 5 -0.61
Covelli Crisp 22 1 7 -0.61
Coco Crisp 22 1 7 -0.61
Neifi Perez 12 0 4 -0.69
Lew Ford 18 0 6 -0.72
Corey Patterson 21 1 6 -0.81
Steve Finley 12 0 4 -0.85
Nick Punto 21 0 8 -0.87
Scott Podsednik 86 4 27 -0.92
Bob Abreu 40 2 11 -0.93
Bobby Abreu 40 2 11 -0.93
Bill Hall 25 1 7 -0.93
Billy Hall 25 1 7 -0.93
Willy Taveras 47 2 13 -0.93
Melvin Mora 10 0 4 -0.97
Pablo Ozuna 21 0 7 -0.99
Felipe Lopez 22 1 8 -1.01
Aaron Boone 13 1 4 -1.05
Eric Byrnes 10 1 3 -1.06
Darin Erstad 13 1 4 -1.08
Travis Lee 11 0 4 -1.12
Clint Barmes 10 0 4 -1.21
Ivan Rodriguez 11 1 4 -1.25
Luis Castillo 17 0 7 -1.28
Mark Loretta 12 0 4 -1.29
Emil Brown 14 3 4 -1.36
Jason Lane 10 2 4 -1.39
Jonny Gomes 14 0 5 -1.54
Eric Young 13 0 6 -1.59
Jeff DaVanon 17 0 6 -1.63
Grady Sizemore 33 1 11 -1.63
Ryan Freel 49 3 13 -1.67
Shawn Green 13 1 5 -1.73
Matt Holliday 19 2 5 -1.82
Preston Wilson 12 0 6 -1.90
Matt Lawton 28 1 10 -1.91
Rob Mackowiak 12 1 5 -1.96
Dave Roberts 35 1 13 -1.97
Omar Vizquel 35 2 12 -1.97
Dave Dellucci 10 2 5 -1.98
David Dellucci 10 2 5 -1.98
Shannon Stewart 12 0 5 -1.99
Angel Berroa 14 2 7 -2.09
Angel M. Berroa 14 2 7 -2.09
Mark Grudzielanek 14 0 6 -2.11
Mark Kotsay 11 1 6 -2.16
David DeJesus 11 1 6 -2.26
Juan Encarnacion 12 1 6 -2.30
Brandon Inge 14 1 7 -2.31
Reed Johnson 11 0 6 -2.31
Alexis Rios 23 0 9 -2.47
Alex Rios 23 0 9 -2.47
Cristian Guzman 12 1 5 -2.48
Aubrey Huff 13 1 8 -2.49
Brian Roberts 39 3 13 -2.62
Juan Uribe 11 1 7 -2.64
David Eckstein 19 0 8 -2.70
Cesar Izturis 16 0 8 -2.80
Chris Burke 17 1 7 -2.85
Morgan Ensberg 13 0 7 -2.90
Luis Matos 28 2 11 -3.06
Jerry Hairston 17 0 9 -3.32
Nick Johnson 12 1 9 -3.43
Randy Winn 30 1 12 -3.69
Jeremy Reed 23 1 12 -3.72
Juan Rivera 10 0 9 -3.76
Jeromy Burnitz 12 3 7 -4.02
Brady Clark 23 0 13 -4.20
Brad Wilkerson 19 2 12 -6.23

Wednesday, August 23, 2006

The Science of Pujols

In case you missed it here are two references to a battery of tests performed at St. Louis University on Albert Pujols.

What was interesting was that these were similar tests performed on Babe Ruth in 1921 by graduate students at Columbia University. The September issue of GC magazine include more detail but both not surprisingly scored very well. This kind of analysis may bear some light on the question of how players have improved over time. The unfortunate thing is that the tests Ruth took weren't as well documented or controlled and so comparisons are probably pretty difficult.

One of the more interesting tests was this one:

Asked to place a mark through a specific letter each time it appeared on a page of randomly positioned letters, Pujols used a search strategy that White had never witnessed in 18 years of administering the test.

"What was remarkable about Mr. Pujols' performance was not his speed but his unique visual search strategy," White said. "Most people search for targets on a page from left to right, much as they would when reading. In observing Mr. Pujols' performance, I initially thought he was searching randomly. As I watched, however, I realized that he was searching as if the page were divided into sectors. After locating a single target within a sector, he moved to another sector. Only after locating a single target within each sector, did he return to previously searched sectors and continue his scan for additional targets."

There is also some nice video of Pujols performing a swing test where he came out at 86.99 mph using a 31.5 ounce bat. Ruth, on the other hand, swung a 54-ounce bat an estimated 75mph.

Monday, August 21, 2006

Parting is Such Sweet Sorrow

Before the season started I wrote about the Battles of Spring and particularly the battle for the second base job at Wrigely Field.

In Chicago the situation is the reverse of that in Washington. In Mesa this spring you have three players who all actually want to play second base—well Neifi Perez may not prefer it and will likely log lots of time at shortstop if youngster Ronny Cedeno even stumbles a little and Dusty Baker relegates him to the bench. Incidentally, Perez was an impressive +20 at shortstop in 2005, ranking him fifth while his offense...well, let's not go down that sad road.

Todd Walker is the incumbent, having logged 93 starts there in 2005 while Jerry Hairston, Jr. started 36 games and Neifi Perez 18. So far this spring Hairston has been working exclusively at second base and getting some tips from Hall of Famer Ryne Sandberg, while Walker has been slowed up by a knee injury suffered last September.

Well now all three are gone with Hairston dealt to the Rangers for Phil Nevin, Walker to the Padres for Jose Cerda and today Perez to the Tigers for 22 year-old catcher Chris Robinson who played in A ball this year. The Tigers and Jim Leyland apparently think Perez will help them fill the whole made by Placido Polanco's separated shoulder in mid August. Right.

But for Cubs fans finally the Neifi era has ended.

I especially liked these quotes from the MLB.com piece:

"It's no secret that when you're doing as well as they are and you lose an outstanding player like Polanco, we all know Dave Dombrowski was going to try to act quickly," Hendry said.

Hendry said they did not exchange names until Sunday. Perez and Tigers manager Jim Leyland were together in Colorado.

"Neifi did an outstanding job for us from the time we picked him up from the Giants," Hendry said. "He didn't get off to a great start this year, but has played well the last couple months."

So by "acting quickly" Dombrowski was able to snag Perez before he was wooed away from the Cubs by the other contenders. Oh, and in June, July, and August Neifi's OBPs were .300, .304, .242. Since tha All-Star break he hit .260/.296/.254. So I guess you could say he played well for...well, for him. In his defense all the Cubs quoted in the MLB.com article note what a great guy Neifi is and how much he's taught them - presumably on a subject other than hitting a baseball.

Robinson was a third round selection in 2005 and although he hasn't shown any power and has a poor strikeout to walk ratio (72/25) he's not Neifi Perez (or his $2.5M contract for 2007 or spot on the 40-man roster) and that's enough for right now.

If the Tigers use Neifi as he should be used, that is, sparingly and only for defense instead of as the first pinch hitter off the bench during his ride with Dusty, they'll be ok but God help them if they install him in the infield for the duration. And of course there's always next year and the realization that they've got an expensive utility infielder on their hands.

More on Robinson at The Cub Reporter.

Sunday, August 20, 2006

Sabermetric Research

I had forgotten to mention that Phil Birnbaum, editor of SABR's By the Numbers newsletter, has a new blog simply called "Sabermetric Research" where he includes "Links to and reviews of sabermetric studies and sports research". I see that Phil's including some interesting studies on sports other than baseball as well.

To me this is a particularly welcome addition to the baseball blogosphere since I've written before that what I think is needed at this point in the performance analysis revolution is a clearinghouse where researchers are able to search and access all previous studies on a particular topic. At one point I had started laying the groundwork for such a site but then of course fell into other pursuits. At this year's SABR Statistical Analysis Committee meeting there was some talk on this very topic and I believe several members volunteered to get the ball rolling. I for one, would be a big supporter.

But back to Phil's blog...

One of my favorite posts thus far was this one where he cites a study that discusses the true boost in attendance from interleague play. The gist is that while MLB touts a 13% increase, the real increase is along the lines of 5% once you factor in the months in which interleague games are played, the prevalence of weekend matchups, and the boost from the inaugural 1997 season. It had occurred to me this was likely the case but never had the gumption to break it down.

I also enjoyed the post where Phil summarized the work of previous studies related to hit batsmen. That's a subject near and dear to my heart since I took a look at the historical trends in a series of three articles earlier this summer on BP. What I find most interesting about the subject are the three trends; a) the increasing number of hit batsmen over time, b) the difference between the leagues from just before the introduction of the DH to 1993 and c) the way in which the rates have evened out across the leagues since that time. I look at all three in my series and conclude:

...from an initial difference of nearly 21% in the rate of hit batsmen between the two leagues in the 1973-1993 period, just over 7% can be accounted for by the presence of more true hitters in the lineup and another 4% by two hitters [Don Baylor and Chet Lemon both of whom played in the AL] who were exceptionally "gifted" at getting plunked. This still leaves ample room for the moral hazard theory, a theory that incorporates differences in the two leagues relating to strike zone or styles of play, or a combination of all of the above to operate.

In any case, it's a fascinating subject that provides ample room for speculation and the proposal and testing of various theories.

Finally, I (like many others) enjoyed both posts related to attempts to measure the improvement of players over time. In other words, attempting to answer the question of just how good a Babe Ruth or a Rogers Hornsby would be, given the skills they possessed when the played, in the modern game. The relevance is that a chapter of Baseball Between the Numbers includes this very discussion written by Nate Silver. There Silver creates a league difficulty factor based on Davenport Translations by examining the performance of players in successive seasons. He then uses these factors to translate statistics across eras. His analysis of Ruth concludes as follows:

Ruth's career EqA would be .274. He probably would have made the All-Star team a couple of times, with an EqA in his best seasons approaching .300. But he'd be remembered as merely a good player and certainly wouldn't be a credible candidate for the Hall of Fame. In modern terms, Ruth might be a Tino Martinez (career .274 EqA) or Raul Mondesi (.278).

Birnbaum seems to think that studies like this are inherently flawed:

But can you design an experiment, like [Dick] Cramer tried to do, that will find an answer without looking to physics? I can't find the reference, but I'm pretty sure Bill James once speculated that there's no way to do it. I think I agree.

His view is that performance data (not data from the world of physics) is so intertwined with variations in the underlying difficulty level of the sport that one likely cannot devise a study that disentangles them. I'm not so sure but will have to ruminate on it for awhile. I'm of the opinion that Silver is essentially correct but these discussions have sent me back to the drawing board in terms of thinking about we could measure it.

Team Advancement

My column last week on Baseball Prospectus was put up on the site for free and so even those without subscriptions can take a look. Basically, this week I take a look at the two metrics created for crediting advancement on ground outs and air outs from a team perspective. Enjoy.

Wednesday, August 16, 2006

Introducing EqGAR

As some readers know I've been developing some baserunning metrics in a series of columns on BP in an effort to eventually consolidate them into a single baserunning number and to get a better picture of the relative importance of advancing on outs, hits, and stealing bases and the costs of getting picked off and thrown out.

Several weeks ago I developed and then reworked a metric called Equivalent Ground Advancement Runs or EqGAR similar to EqAAR discussed awhile back. In any case you'll need to read the column if you're interested in the methodology but here are the players in 2005 with 25 or more ground advancement opportunities on outs sorted by total EqGAR. Notice that there is a rate stat in the last column you can also use to compare players.


Name Opps ExGAR EqGAR GA Rate
Chone Figgins 53 6.13 4.52 1.74
Juan Pierre 54 5.72 3.52 1.61
Brady Clark 65 5.55 2.63 1.47
Jason Ellison 33 3.09 2.25 1.73
Jose Reyes 52 5.76 1.76 1.31
Brian Roberts 49 6.30 1.67 1.27
Willy Taveras 40 2.94 1.41 1.48
Jose Cruz Jr. 29 2.86 1.35 1.47
Juan Uribe 29 2.81 1.30 1.46
Derrek Lee 39 3.17 1.15 1.36
Dave Roberts 37 3.23 1.10 1.34
Orlando Cabrera 31 2.21 1.02 1.46
Julio Lugo 42 3.42 0.99 1.29
Pedro Feliz 34 2.19 0.95 1.43
So Taguchi 30 1.61 0.94 1.59
Craig Counsell 47 4.80 0.91 1.19
Cory Sullivan 28 2.30 0.85 1.37
Jimmy Rollins 51 5.44 0.79 1.14
Jim Edmonds 26 1.37 0.77 1.56
Kevin Mench 25 1.38 0.73 1.53
Joe Crede 29 2.02 0.72 1.36
Luis Gonzalez 29 2.70 0.68 1.25
Johnny Damon 42 2.83 0.66 1.23
Jeremy Reed 37 2.14 0.56 1.26
Ichiro Suzuki 47 4.03 0.55 1.14
David Wright 35 3.15 0.52 1.17
Craig Monroe 33 2.27 0.47 1.21
Jerry Hairston 31 3.79 0.46 1.12
Ray Durham 35 2.00 0.35 1.17
Jeff Conine 27 1.71 0.34 1.20
Adam Kennedy 34 2.07 0.33 1.16
Rafael Furcal 41 3.89 0.32 1.08
Mark Grudzielanek 32 2.71 0.31 1.11
Geoff Jenkins 31 1.53 0.30 1.20
Darin Erstad 39 3.59 0.30 1.08
Cristian Guzman 45 4.21 0.30 1.07
Jack Wilson 37 3.32 0.27 1.08
David Eckstein 53 3.31 0.24 1.07
Brad Ausmus 50 3.16 0.23 1.07
Aaron Rowand 28 2.15 0.22 1.10
Grady Sizemore 46 4.53 0.22 1.05
Abraham Nunez 29 2.06 0.21 1.10
Scott Podsednik 45 4.84 0.20 1.04
Craig Biggio 37 2.82 0.18 1.06
Todd Helton 27 1.26 0.17 1.14
Jacque Jones 25 1.37 0.15 1.11
Jason Kendall 40 2.30 0.15 1.06
Alex Gonzalez 43 2.83 0.14 1.05
Michael Cuddyer 33 2.97 0.11 1.04
Ryan Langerhans 31 2.33 0.10 1.04
Angel Berroa 28 2.86 0.09 1.03
Mark Kotsay 27 1.78 0.07 1.04
Moises Alou 29 1.41 0.07 1.05
Derek Jeter 47 2.68 0.06 1.02
Luis Matos 28 1.91 0.06 1.03
Alexis Rios 26 2.43 0.04 1.02
Alex Rios 26 2.43 0.04 1.02
Chris Snyder 25 2.00 0.03 1.01
Shannon Stewart 51 3.83 0.02 1.01
Bobby Kielty 30 1.89 0.02 1.01
Chad Tracy 34 2.18 0.00 1.00
Orlando Hudson 33 2.52 -0.02 0.99
Carl Crawford 28 1.83 -0.02 0.99
Russ Adams 35 2.46 -0.03 0.99
Michael Barrett 30 2.18 -0.04 0.98
David Bell 26 1.40 -0.04 0.97
Mike Lieberthal 36 3.75 -0.05 0.99
Brian Schneider 26 1.52 -0.06 0.96
Alfonso Soriano 31 1.69 -0.08 0.95
Jose Guillen 28 1.88 -0.09 0.95
Richie Sexson 27 2.90 -0.10 0.97
Matt Lawton 35 1.87 -0.12 0.94
Ron Belliard 33 2.09 -0.18 0.91
Ronnie Belliard 33 2.09 -0.18 0.91
Troy Glaus 27 1.50 -0.19 0.88
Alex Rodriguez 34 2.72 -0.20 0.93
Shawn Green 31 1.94 -0.20 0.90
Mark Teahen 31 2.58 -0.21 0.92
Mark Ellis 32 2.52 -0.21 0.92
Aubrey Huff 25 0.96 -0.22 0.77
Chase Utley 43 3.32 -0.22 0.93
Luis Castillo 33 1.80 -0.23 0.87
Khalil Greene 31 3.50 -0.24 0.93
Joe Mauer 30 1.64 -0.25 0.85
Victor Martinez 39 1.71 -0.26 0.85
Mike Matheny 35 2.56 -0.29 0.89
Lew Ford 33 2.01 -0.29 0.86
Nick Johnson 32 1.91 -0.31 0.84
Hank Blalock 25 1.21 -0.34 0.72
Mike Cameron 25 1.80 -0.35 0.81
Rob Mackowiak 25 1.97 -0.36 0.82
Yadier Molina 28 2.08 -0.36 0.83
Casey Blake 27 1.48 -0.38 0.74
Gregg Zaun 29 1.64 -0.43 0.74
Eric Chavez 26 1.68 -0.45 0.73
Paul Konerko 28 1.57 -0.46 0.71
Omar Vizquel 40 2.42 -0.46 0.81
Garrett Atkins 27 2.03 -0.47 0.77
Brian Giles 25 2.02 -0.48 0.76
Adam Everett 34 2.40 -0.49 0.80
Neifi Perez 26 1.57 -0.55 0.65
Marcus Giles 27 1.79 -0.59 0.67
Eric Hinske 29 2.27 -0.61 0.73
Bill Hall 32 2.92 -0.63 0.78
Billy Hall 32 2.92 -0.63 0.78
Gary Matthews Jr. 27 2.06 -0.64 0.69
Michael Young 29 1.38 -0.66 0.52
Randy Winn 41 3.47 -0.67 0.81
David DeJesus 39 2.75 -0.68 0.75
Brad Wilkerson 48 4.60 -0.80 0.83
Paul Lo Duca 25 1.66 -0.92 0.45
Joe Randa 26 1.68 -0.93 0.45
Jason Varitek 30 1.74 -0.93 0.46
Shea Hillenbrand 25 2.21 -1.04 0.53
Bobby Abreu 26 2.66 -1.05 0.61
Brandon Inge 57 4.25 -1.18 0.72
Emil Brown 31 2.34 -1.40 0.40
Travis Hafner 29 2.69 -1.43 0.47

Monday, August 14, 2006

A Half Run?

So I was driving to the ballpark yesterday to score the Rockies 4-3 walk-off victory over the Diamondbacks when on ESPN I heard an exchange between the host of the program and ESPN's Peter Pastorelli. As they were running down the chances of various teams making the playoffs Pastrorelli noted that it would be tough for the Red Sox because their pitching is in disarray but also in losing Jason Varitek to a cartiledge tear in his left knee they probably lost (paraphrasing) "a half run per game" because of his handling of the staff.

Really?

The idea that there is some skill in game calling that can depress ERAs has been around for a long time as one of the unproven assumptions of baseball. Several years ago Keith Woolner at BP did some research on the topic and wrote about it in The 1999 Baseball Prospectus. The end result of his research (confirmed by others) was that:

"There is no statistical evidence for a large game-calling ability, but that doesn’t preclude that a small ability. For example, a genuine game-calling ability that reduces a pitcher’s ERA by 0.01, resulting in a savings of about 1.6 runs per year for the entire team and could be masked by the statistical variance in the sample size we have to work with. Players would need to play thousands more games than they actually do to have enough data to successfully detect such a skill statistically."

So while there may be some skill involved, the natural variation overwhelms the signal the skill may be giving off which means that for all intents and purposes you may as well make decisions as if there were no skill operating at all.

The reason the myth persists IMHO is that Catcher's ERA (CERA) can easily be found (for example in each team's game notes published each day for the media and made available in the press box) and the inherent variation does indeed sometimes show that the staff performed better under catcher A than catcher B. This difference is misinterpreted by writers and even teams as meaningful when in fact there is no evidence that you would expect the difference to remain given another equally large sample size. If it were the case that you always saw little variation in CERA among a team's backstops it wouldn't have the allure it does. But that variation is a mirage.

This dovetails nicely with what Rany Jazayerli had to say about differences in hitting given small sample sizes quoted yesterday.

The State of Statistical Analysis

Rany Jazayerli had an excellent two-part analysis of the Tigers last week on BP. The level of detail in analyzing the construction of the 2006 roster and the decisions made by Dave Dombrowski is enlightening to say the least. However, from a big picture standpoint I especially enjoyed these quotes from the second piece where he discusses the ever-decreasing benefit to statistical analysis as compared to the advantages to be gained by good scouting.

"When no one took statistical analysis seriously, a team that bucked the trend could find major inefficiencies in the market. But over the last decade the acceptance of statistical analysis throughout the game--there isn’t a major league team that doesn’t employ someone doing statistical work for them--has squeezed most of the inefficiencies out of the market. Statistical measures of offense were the first to catch on, because they were the most accurate. Using those measures before everyone else allowed the A’s to build an offense that ranked in the top four in the AL in runs scored between 1999 and 2001. But as other teams have caught on, their old tricks don’t work anymore. The A’s haven’t ranked higher than sixth in runs scored since, and this year rank dead last in the league."

"The best way to find inefficiencies in the numbers today is to have access to data other teams don’t have--which may explain why the A’s, with their own proprietary fielding numbers, have allowed the second-fewest runs (only the Tigers have allowed fewer) in the league. And certainly, combining the best of statistical analysis with the best in traditional scouting measures is always going to be a recipe for success, as it was for the Red Sox in 2004."

"The best way to find inefficiencies worth exploiting is to have better information than your competition. The beauty of data--that it is discrete and precise--is also its weakness. If everyone has the same numbers, then everyone has the same information. While there is such a thing as good data analysis vs. bad data analysis, anyone qualified to work for a major league team is unlikely to make any egregious errors on that front. Some writers might think it’s meaningful that Joe Shlabotnik has hit .320 in the #2 hole and .280 in the #5 hole in 100 plate appearances each; I doubt any professional analyst would make that kind of mistake. The very fact that statistical analysis is mainstream makes it that much more difficult for the very best analysts to hold much of an advantage on the second-tier guys chasing them."

Very well said and the reason that ever more granular data will be reauired in order to reap benefits. No more low hanging fruit.

Saturday, August 12, 2006

Civil War Refresher

Before my 10-year old daughter heads back to school next month we wanted to cover a few topics to refresh her memory and get her thinking. One of those topics was the Civil War and so I wanted to briefly discuss what we did since it worked very well and so that others could perhaps take advantage of what we learned.

Our goal was to set her in mind the timeline of the war and the major events and so over the course of five 75 (or so) minute sessions we were able to do just that. Each session covered one year (1861-1865) and the time was split into two parts. First, we worked for roughly 30 minutes on a timeline. Using a roll of cheap paper we marked out a timeline roughly three feet in length and divided it into five years. For those first 30 minutes we would highlight four or five of the most important events of the year and let her draw pictures to represent them. For example, for March of 1862 she drew a small picture of the battle of the CSA's Virginia and USA's Monitor. For each battle discussed she drew an appropriate icon representing a Union or Confederate victory. Then we would discuss how the war had gone up to that point for both sides by way of review.

For the second half of the lesson we used Ken Burn's Civil War documentary (which you can get from the library) and selected specific scenes that highlighted those events we had placed on the timeline. This did two things: first, it cemented the concepts by using an alternate form of presentation and second, it helped it come alive through the pictures and sounds. Now there were several time during each lesson where I had to stop the DVD and explain something but the nice thing is that the scenes in the documentary are short enough to keep a 10 year old's attention. Our 7-year old however, when she noticed we were loading up the documentary would exclaim in disgust, "awwh, not the war again!"

At the end of five nights the war was over as we ended with Lincoln's assassination. I've always been interested in the Civil War and it helped to relate my experiences visiting various battlefields such as Shiloh, Fredricksburg, and Gettysburg which helped it come alive for my daughter as well. Since it worked so well we're considering using the technique for other material as well.

Cubs vs Rox Game 2

This weekend marks the annual visit of the Cubs to the Front Range and I'm getting set to score game two of the series for MLB.com. Last night the Cubs were pounded by the Rockies 10-2 giving up three homeruns in the process to a team that hits them infrequently in Coors Field. That makes a three game losing streak although the team has played much better in July (14-12) and August (5-5) after compiling a 16-40 record in May and June.

Rich Hill, after two consecutive nice outings once agin struggled with his control walking five in 4 and 2/3 innings. In fact during the series the Cubs will start three rookies having called up Angel Guzman (0-2 5.68) to face the Rockies tonight. He pitched well in his last start at Iowa going over 7 innings and giving up just 3 hits, 0 runs and striking out 6. Tomorrow afternoon, in a game my family and I will attend as fans, Carlos Marmol (5-5 5.09) will go for the Cubs against Byung-Hyun Kim. I'm sure my daughters will get a kick out of watching Kim throw - either that or they'll be keenly watching the cotton candy, slushy, dippin dots and other vendors. Either way it should be fun.

In fact, my column this week on Baseball Prospectus, titled "Replacing Ricky Gutierrez" focused on the Cubs and their pickup of Cesar Izturis for Greg Maddux (who incidentally has a 1.50 ERA in 12 innings with Dodgers including 6 innings of no-hit ball in his first start). As I noted in the piece, I have no problem with the idea of flipping Maddux for anything of value, but in Izturis I'm afraid the Cubs have simply extended their search for the next Ricky Gutierrez, hence the title of the article, for another year and a half. Izturis simply won't be able to field well enough to justify his anemic offensive performances. They further compounded their problems by moving Ronnie Cedeno to second base where, unless his offensive production improves markedly, he and Izturis will serve as the anchor (literally) of the Cubs offense. Oh, and Izturis will be using his vast offensive skills in his "double leadoff" role so you gotta love that. And that's of course given the struggle the Cubs have in the outfield to produce runs already. One can only imagine what would happen if they were also getting a sub par contribution from the catching position where Michael Barrett has been outstanding (defensively that's another story as opposing baserunners have swiped 79 bases in 94 attempts (84%) this season).

And in a move that surprised no one who actually watched him pitch recently, Mark Prior was put back on the DL retroactive to August 11 with "right shoulder tendinitis". That officially makes it the third trip to the DL this season. His transaction history this year from ESPN:

July 15: Placed pitcher Mark Prior on the 15-day disabled list, retroactive to July 5, with a strained left oblique muscle...

June 19: Activated pitcher Mark Prior from the 15-day disabled list...

March 28: Placed Mark Prior on the 15-day disabled list [right subscapularis strain]

In other notes given the number of injuries the Cubs have had it's not surprising that going into tonight they've had six different rookies (Hill, Guzman, Sean Marshall, Marmol, Juan Mateo and Jae Kuk Ryu) start 44 games, second only to the Marlins 63. Those rookies are 13-20 with a 5.85 ERA in 229.3 IP. Not exactly Justin Verlander and Francisco Liriano like. You'll also not be surprised to learn that the Cubs staff leads the majors in walks at 484. They also, however, lead the majors in strikeouts at 849 (thanks in part to Carlos Zambrano's National League leading total of 162). The last team to lead the league in both was the Texas Rangers in 1989.

As far as the Rockies are concerned I notice from the lineups I just entered into the Gameday software that tonight may mark the first time all season that they have the lineup that to me makes the most sense for them and that I would have liked to see them use as often as possible.

Carroll 2B
Helton 1B
Atkins 3B
Holliday LF
Hawpe RF
Spillbourghs CF
Torrealba C
Barmes SS

Ok, so maybe not Barmes at short and instead Luis Gonzales or move Carroll to short and let Gonzales player second. But in any case they finally have Helton batting second where his ability to draw walks will be helpful and have inserted Ryan Spillborghs in center field. Spillborghs collected three hits last night and was hitting over .330 with 20 doubles in 68 games in Colorado Springs. He has good patience and is a better fielder than Choo Freeman by a long shot. The fact that the Rockies have given so many at bats to the duo of Freeman and Cory Sullivan is puzzling given the offensive woes of this team. In addition Yorvit Torrealba has hit the ball well, at least for catcher standards (.261/.309/.451), and clearly neither Danny Ardoin or J.D. Closser is the answer.

Play Ball!

Monday, August 07, 2006

Bautista Impresses

Last night my daughter and I headed over to Security Service Field to catch the latest Rockies acquisition Denny Bautista pitch for the Sky Sox against the Albuquerque Isotopes. I had seen Bautista pitch in spring training and was impressed with his dominance in the first three innings of that start in March. Obviously things haven't gone so well for him since as he posted a 5.66 ERA with the Royals in 35 innings walking 17 and faring even worse with the Omaha Royals posting a 7.36 ERA while walking 32 in 44 innings.


Well, last night it looked as if things would follow this same trend as he walked the first two hitters. After that though, his breaking ball came alive and he struck out the next three. In six rain interrupted innings (we left in the bottom of the 6th after waiting out to delays already as rain started coming down again) he didn't surrender any runs, gave up three hits, struck out nine and walked four. It was an impressive outing as his fastball consistently topped 94, with his hard slider around 82 or 83 and leaving Isotope hitters helpless much of the night. With such a live arm one would hope he can harness that control and if so become at least a serviceable part of the Rockies bullpen.

Friday, August 04, 2006

Advancing in Context

My column this week at Baseball Prospectus refines the framework I'm developing for crediting baserunners with advancing on air outs by taking a look at how the park affects the results. For example, using data from 2000-2005 I was able to calculate an Air Advancement Park Factor (AAPF) for each outfield position for each of the 37 parks in use during that time.

For example, here is a graphical way of looking at Petco Park (which of course only had two years worth of data and so should be taken with a grain of salt and which, by the way, is the reason I calculated only a single park factor for each park rather than creating weighted versions for each year, and no, this doesn't take into account configuration changes in the parks like moving the fences in at Comerica and out at Kauffman Stadium).



As you can see this would indicate that Petco inhibits advancing on fly balls to left and right by 5% and 15% respectively while making it a bit easier (by 1%) when the ball is caught by the centerfielder. I then use these park factors to adjust the Equivalent Air Advancent Runs (EqAAR) runs I discussed in a recent post.

Because space didn't allow for publishing all of the park factors the following shows those AAPFs broken down by park and field along with the number of opportunities on which each is based. You'll see that fewer opportunities for parks for which I only have one or two years worth of data may skew the results (take a look at R.F.K for example). All park are listed by their most recent names (and a pox on those teams that change naming rights) while as mentioned in the column the 2003 and 2004 data for Montreal conflates Stade Olympique with Hiram Bithorn Stadium, hence they are treated as one park for this analysis while Stade Olympique for 2000-2001 is treated separately.


Park Opps Pos AAPF
--------------------------------------------------------
Ameriquest Field 564 7 1.03
Ameriquest Field 687 8 1.03
Ameriquest Field 566 9 1.04


Angels Stadium of Anaheim 532 7 1.00
Angels Stadium of Anaheim 704 8 0.99
Angels Stadium of Anaheim 548 9 1.07


Bank One Ballpark 425 7 1.06
Bank One Ballpark 657 8 1.07
Bank One Ballpark 565 9 1.04


Busch Stadium II 461 7 0.97
Busch Stadium II 647 8 1.00
Busch Stadium II 527 9 1.06


Cinergy Field 229 7 0.97
Cinergy Field 277 9 0.91
Cinergy Field 320 8 1.03


Citizen's Bank Park 164 7 0.90
Citizen's Bank Park 220 8 1.02
Citizen's Bank Park 157 9 0.98


Comerica Park 522 7 1.04
Comerica Park 723 8 1.01
Comerica Park 541 9 0.97


Coors Field 475 7 1.10
Coors Field 663 8 1.06
Coors Field 500 9 0.97


County Stadium 69 7 0.94
County Stadium 106 8 1.09
County Stadium 99 9 0.94


Dodger Stadium 371 7 1.12
Dodger Stadium 572 8 0.99
Dodger Stadium 449 9 1.05


Fenway Park II 449 7 0.87
Fenway Park II 653 8 0.99
Fenway Park II 535 9 0.98


Great American Ball Park 301 8 1.01
Great American Ball Park 260 9 1.01
Great American Ball Park 233 7 1.03


Hubert H Humphrey Metrodome 522 7 1.03
Hubert H Humphrey Metrodome 647 8 1.06
Hubert H Humphrey Metrodome 504 9 0.97


Jacobs Field 486 7 0.94
Jacobs Field 644 8 0.92
Jacobs Field 472 9 0.93


McAfee Coliseum 500 7 0.99
McAfee Coliseum 590 8 0.99
McAfee Coliseum 459 9 0.98


Miller Park 418 9 1.02
Miller Park 507 8 1.02
Miller Park 343 7 0.89


Minute Maid Park 344 7 1.04
Minute Maid Park 612 8 1.00
Minute Maid Park 503 9 1.06


Oriole Park at Camden Yards 572 7 1.06
Oriole Park at Camden Yards 700 8 1.02
Oriole Park at Camden Yards 507 9 0.92


Petco Park 214 8 1.01
Petco Park 178 7 0.95
Petco Park 180 9 0.85


PNC Park 410 9 1.14
PNC Park 400 7 0.94
PNC Park 514 8 0.97


Pro Player Stadium 446 7 0.96
Pro Player Stadium 602 8 0.96
Pro Player Stadium 478 9 0.98


Qualcomm Stadium 323 7 0.97
Qualcomm Stadium 326 9 0.88
Qualcomm Stadium 371 8 0.90


R.F.K. Stadium 85 7 1.43
R.F.K. Stadium 91 8 1.19
R.F.K. Stadium 100 9 1.06


Rogers Centre 459 7 1.03
Rogers Centre 646 8 1.03
Rogers Centre 516 9 1.09


Royals Stadium 572 7 1.05
Royals Stadium 729 8 1.01
Royals Stadium 585 9 1.04


Safeco Field 537 7 0.96
Safeco Field 709 8 0.97
Safeco Field 555 9 0.98


SBC Park 471 7 0.98
SBC Park 642 8 1.04
SBC Park 518 9 1.08


Shea Stadium 497 7 1.07
Shea Stadium 577 8 1.01
Shea Stadium 538 9 1.01


Stade Olympique 319 8 0.88
Stade Olympique 243 9 0.81
Stade Olympique 212 7 1.00


Stade Olympique/Hiram Bithorn Stad 129 7 0.92
Stade Olympique/Hiram Bithorn Stad 155 9 1.01
Stade Olympique/Hiram Bithorn Stad 184 8 0.98


Three Rivers Stadium 104 9 0.97
Three Rivers Stadium 102 8 1.05
Three Rivers Stadium 73 7 0.98


Tropicana Field 571 7 1.02
Tropicana Field 647 8 0.98
Tropicana Field 530 9 1.00


Turner Field 442 7 1.11
Turner Field 611 8 0.99
Turner Field 508 9 1.06


U.S. Cellular Field 503 7 0.95
U.S. Cellular Field 613 8 0.98
U.S. Cellular Field 523 9 1.01


Veterans Stadium 294 7 0.93
Veterans Stadium 310 9 1.02
Veterans Stadium 389 8 0.97


Wrigley Field 414 7 0.96
Wrigley Field 565 8 0.97
Wrigley Field 478 9 0.97


Yankee Stadium II 526 7 1.06
Yankee Stadium II 663 8 1.02
Yankee Stadium II 479 9 0.98