FREE hit counter and Internet traffic statistics from freestats.com

Sunday, October 31, 2004

Cubs 2005 Hot Stove Heats Up

The final week's collapse by the Cubs as they lost 7 out of 8 games to hand the wildcard to the Astros is behind us. I'm ready to start thinking about 2005. A few observations.

  • The Cubs found a new third base coach in Chris Speier after firing "waving" Wendell Kim. Speier played with the Cubs in 85-86. Personally, I though Kim did a much better job this season than last, when it was clear that he used poor judgment on sending runners. Overall, the Cubs definitely need to improve their baserunning. I noticed a marked increase in aggressiveness out of Corey Patterson after Vince Coleman was added as a coach after midseason. From Patterson and Derek Lee that kind of aggressiveness should pay off.
  • The Cubs declined their $11.5M option on Moises Alou and their $6M option on Mark Grudzielanek buying both out for $2M and $500K respectively, freeing up $15M. Both had decent years although Grudz was hurt much of the time. It is possible that Alou could be back under a different contract although of course every Cub fan's dream right now is that the money is used to lure Carlos Beltran to Chicago. Beltran's agent Scott Boros is probably looking at $20M although that's certainly more than he's worth. J.D. Drew is also an option.
  • That also leaves a hole at second base. Todd Walker is a free agent and has said that he doesn't want a part time job next season although he'd only consider it with the Cubs (November 2004 Vine Line). It seems to me that it's a no brainer to go after Walker and make him the everyday second baseman. That would also get a left-handed bat in the lineup everyday and someone who gets on base.
  • The Cubs picked up the option of Ryan Dempster for $2M. In the November issue of Vine Line, GM Jim Hendry said that Dempster has the mentality for the closer role and "has the makeup to do it." Uh oh. Hendry also said, however, that they're leaving Dempster's role up in the air right now and he could be a "rotation guy" or setup man. To me, Dempster clearly has never had enough control to pitch late in the game or come in with runners on base and he didn't show any more control last season. To me, he's a long reliever and spot starter at best.
  • Nomar Garciaparra filed for free agency. He says he'll consider staying in Chicago. With his brittleness I don't think a big contract for more than two years and preferably one, makes any sense. Also on the market are Edgar Renteria (728 OPS), Orlando Cabrera (689 OPS), Cristian Guzman (693 OPS), and Jose Valentin (760 OPS). Renteria is probably the best choice both offensively and defensively while Guzman is the youngest. I wouldn't waste any money on the other two since Valentin will be 35 and Cabrera has a .316 lifetime OBP and will command a larger amount because of his improbable postseason performance.
  • Other Cubs who are free agents include Matt Clement, Glendon Rusch, Kent Mercker, Ramon Martinez, Tom Goodwin, Paul Bako, and Ben Grieve. I wouldn't mind seeing Rusch, Martinez, and Grieve back in pinstripes. Rusch, as a swingman and lefty spot starter, Martinez who is a good utility man whos offensive production is better than most utlity men despite a poor season (659 OPS), and Grieve as a fourth outfielder and left-handed bat off the bench with some patience. I'm not sure what the status of Todd Hollandsworth is but I'm sure the Cubs have overrated his ability based on his hot start last year.
  • Letting Clement go frees up $6M to help in signing free agents.
  • There is no way they should re-sign Bako. At $865K he's already way overpriced. If Hall of Famer Greg Maddux can't pitch to Michael Barrett, then there is something seriously wrong. Backup catchers like him can be had for league minimum and almost all of them would be far superior hitters.
  • I'd be ok with brining Kent Mercker back He still walks an awful lot of batters but he could be your situational lefty again as long as Dusty finally realizes that Mike Remlinger is not a situational lefty and actually has better stats against right handed batters. Of course, with the option picked up on Dempster that may leave Mercker out.
  • Also, in Vine Line Jim Hendry has this to say in regards to Neifi Perez, "We have every interest in trying to re-sign...Neifi Perez more sooner than later-hopefully sometime even before the end of October." I hope he wasn't serious. The only role Perez should play is that of utlity infielder and with his history as an attitude problem, I'm not so sure about that. He is still a good defensive player but if the Cubs think he is suddenly going to perform at the level he did at the end of last season in a full season they're kidding themselves.
  • A story in Vine Line hinted that the Mets might want Sammy Sosa because of his connection with Mets GM Omar Minaya. No way. If Sammy is traded he automatically gets his $18M for 2006. I can't imagine that any team in their right mind would trade for the quickly aging Sosa. No, I'm hoping he can bounce back for one more tour with the Cubs in 05 and then they can buy out his contract for $3.5M. On a veteran team like the Cubs his last day shennanigans will likely not have that big of an impact.
  • Although the Cubs pitching is relatively young this is still a team built to win now before Aramis Ramirez, Mark Prior and Carlos Zambrano get to be free agents.

Friday, October 29, 2004

Mobility in the .NET DJ

For those who've read Jon's editorial in the latest .NET DJ here is source code for the Big League Pocket Manager and here is the link to the .cab files for installation. Have fun.


.NET UG in Wichita

Yesterday afternoon I was pleased to be able to present at the Wichita .NET User's Group at the invitation of Robert Tesch. I'd like to thank Robert and all the guys there. I had nice time meeting everyone and talking about CF.

The slides for the talk can be found here. The complete code for the MLB Pocket Manager is here and the .cab files for installation on the Pocket PC are here. The code for the network precense component is with the MSDN whitepaper here. The code for the SmartAppUpdater is not yet available since it should be coming out as part of an MSDN whitepaper soon. Stay tuned.


The Corps Of Discovery, October 29, 1804

This fall I started reading the abridged version of the journals of Lewis and Clark by Anthony Brandt and each morning read what Lewis and Clark wrote that day. By October 26, 1804 the Corps of Discovery had made contact with the Mandan tribes in present day North Dakota and on October 29th Clark wrote:

"We collected the chiefs and commenced a council under an awning and our sails stretched around to keep out as much wind as possible. We delivered a long speech, the substance of which was similar to what we had delivered to the nations below [they had already come in contact with the Arikara, Hidasta, Yankton Sioux, Omaha, and Missouri]...After the council we gave presents with much ceremony and put the medals on the chiefs we intended to make, one for each town, to whom we gave coats, hats, flags, and one grant chief for each nation, to whom we gave medals with the President's likeness."

As was their custom they also fired Lewis's air gun that always astonished the Indians. Clark then goes on to talk about a prairie fire that killed a man and a woman and how a young half white boy was saved when his mother threw a buffalo skin over him. The Indians thought he had been saved "by the Great Spirit medicine because he was white".

At this point the Corps was almost ready to setup their winter camp and had scouted for good wood a few days previously but hadn't found much as the land was mostly prairie.



Information at Your Fingertips

Great article here on how the Red Sox use information. Also some insight on this from David Pinto at Baseball Info Solutions.

Stoney Says Goodbye

Derek at The Big Red C reports that Steve Stone is officially gone. Derek also includes his resignation letter. So both sides of the WGN booth are moving on with Chip Caray broadcasting in Atlanta next year with his father.

For my part I think Stoney was among the best color men I've heard. He always did a good job of making his comments concise and to the point and wasn't afraid of anticipating the play rather than simply reacting. He also had a great sense for strategy even though I don't think it was at all refined by numerical analysis. He seemed to be skeptical of Moneyball when it came out but not violently against the ideas as some in the baseball community are.

Thursday, October 28, 2004

Little Bingle

As I was thumbing through F.C. Lane's fascinating 1925 book Batting (a topic for another day) I noticed several references to a "bingles", typical of which is the following from the last page where Honus Wagner is quoted:

"It's the hit that counts. You can't score many runs without the old bingle."

Having not heard this term before I Googled it and found several references to bingle as being a synonym for "single" in online dictionaries but no history of the word or thoughts on why it fell out of use.

I posted my query to SABR-L and Merritt Clifton was kind enough to explain that "Bingle" is a contraction of "bunt single." He says that as it was fading from vogue, it came to mean any single, but that was not its original use. In other words a "bingle" was a slap hit as practiced in the deadball era and later by the likes of Maury Wills and Luis Aparacio. Merritt pointed out that for example, a third base coach might shout:

"C'mon, little bingle now, drop one in."

With the idea being to try and coax the third baseman to play in thereby giving the hitter a better chance to slap the ball past him. When I played I recall coaches saying, "C'mon now, little bingo" but not bingle. I asked several fans older than myself if they'd heard the term before and none had stretching back to the mid-fifties so I assume the term died out shortly after the deadball era as more and more hitters turned to "slugging" as Lane would say.

His response made me wonder how many of Ichiro Suzuki's 225 base hits were actually "bingles"?


Wednesday, October 27, 2004

And the Simulator Says...

With the Cardinals down 3-0 the simulator now says their odds of coming back are about 9.1%. That's not all that bad and the reason they're better than the Red Sox chances in the same situation is that the Cards are a better home and road team than the Red Sox on a percentage basis.

A Brief History of Run Estimation: Estimated Runs Produced

Earlier in this series we looked at Runs Created developed by Bill James in the late 1970s and early 1980s and Batting Runs, a part of the Linear Weights system devised by Pete Palmer in the late 1970s and published in The Hidden Game of Baseball in 1984. Both of these formulas are what Albert and Bennett in Curve Ball call "intuitive" formulas because they attempt to estimate the number of runs created using a model of how baseball is played. However, the former is a non-linear formula since its’ underlying premise is that runs are the product of getting on base and advancing runners, while the latter is a linear formula since it assigns weights to the various offensive events. So now we’re ready to explore Paul Johnson’s Estimated Runs Produced or ERP in the third installment of the series.

History
The story of ERP starts with Bill James 1985 Baseball Abstract. In that book James published an essay by Paul Johnson on ERP after receiving a letter from Johnson explaining his work. Essentially, James published the essay because he found that ERP was simple, on average more accurate than his own Runs Created formula, and because he already knew that his formula overstated runs for teams with both high slugging percentages and on base percentages (something he subsequently fixed as discussed in my previous article).

In a quick study that James did to assess the accuracy of ERP he found that ERP had an average difference of 18.4 runs per team while Runs Created was at 19.3 for 100 teams from 1955 to 1975. On the strength of this James, in a move that should be applauded, felt compelled to provide Johnson with a forum to share his ideas with the baseball analyst community.

In introducing Johnson’s formula James also takes the opportunity to criticize Batting Runs, which he does more fully in the first version of his Historical Baseball Abstract also published in 1985.

"Pete Palmer in The Hidden Game makes a similar claim [for accuracy] for the linear weights method, and Pete is a good friend and an outstanding analyst of the game, but in fact linear weights do meet any acceptable standard of accuracy in assessing an offense."

So what is the formula? In that article Johnson gave as his complete version:

ERP = (2*(TB+BB+HB)+H+SB-(.605*(AB+CS+GIDP-H)))*.16

As you can see the strength of ERP is its simplicity. Only addition, subtraction, and multiplication are required with only eight counting statistics needed. The formula essentially breaks into two sections, the left hand side representing positive offensive accomplishment and right hand side representing negative (we’ll get to the bit about .16 in a minute).

A second strength of ERP is that like Batting Runs it is a linear formula. In other words, when you sum the ERP for all players on a team you get the total ERP calculated for that team. That is not the case with non-linear run estimators like Runs Created and Base Runs, which we’ll look at in our next article in this series. And because ERP is a linear formula Johnson spends the first part of his essay showing how ERP better estimates runs for teams with the combination of high slugging percentages and high on base percentages. He does this not only by looking at teams with the highest number of homeruns and top slugging percentage but also aggregating high scoring World Series games and comparing them with individual players with the same basic profile. For example, he compares the aggregate statistics of 14 World Series games with Babe Ruth’s 1929 season and finds that in those game teams scored 124 runs. His ERP formula estimated 129 runs while Runs Created estimated 148. As mentioned previously, this was a weakness in the Runs Created formula that James explains in his afterword to the essay and has subsequently corrected.

So how did Johnson come up with ERP?

To quote Johnson the formula is “based on charts I made of the number of bases advanced by batters and baserunners on various offensive plays”. From that information Johnson realized that homeruns moved batters and baserunners three times as many bases as did the typical single and that walks advanced the batter and baserunners only two-thirds as many bases as did a single. These insights led to the design of the left-hand side of the formula since:

Home Run = 9 = 2*(4+0+0)+1+0
Single = 3 = 2*(1+0+0)+1+0
Walk = 2 = 2*(0+1+0)+0+0

Values for the other offensive events then follow:

Triple = 7 = 2*(3+0+0)+1+0
Double = 5= 2*(2+0+0)+1+0
Stolen Base = 1 = 2*(0+0+0)+0+1
Hit by Pitch = 2 = 2*(0+0+1)+0+1

As you can seen this formula is indeed intuitive since it attempts to model how runs are scored by looking at the advancement value of each offensive event.

And so the relative weights assigned by Johnson to the events using singles as a baseline were:

Walk = .667
Hit by Pitch = .667
Double = 1.67
Triple = 2.33
Homerun = 3
Stolen Base = .333


If this sounds suspiciously like Batting Runs then you’re on the right track. The weights used in the 1989 version of the formula from Total Baseball were:

Single = .47
Double = .78
Triple = 1.09
Homerun = 1.40
Stolen Base = .30
Walk = .33
Hit by Pitch = .33

Which calculate to weights relative to a single of:

Walk = .702
Hit by Pitch = .702
Double = 1.66
Triple = 2.32
Homerun = 2.98
Stolen Base = .638

As you compare the weights in the two lists you’ll notice that other than the stolen base the relative weights of the offensive events is the same. What Johnson found out with his table was the same information that George Lindsey found from scoring games in the 1950s and that Pete Palmer found when running his simulations in the 1970s. Johnson’s innovation was in expressing these relative weights in an algebraically simpler formula. What Johnson sacrificed for this simplicity was a small amount of precision.

The difference in the relative weight of stolen bases from .333 for Johnson to .638 for Palmer is interesting. As discussed in my previous article, originally Palmer found that the weight for stolen bases actually ranged from .19 to .22 depending on era. He upped the value to .30 on the argument that by and large stolen bases come at strategically more important times and so should be weighted accordingly. While he was no doubt correct in the assessment of the strategic importance of stolen bases it doesn’t make sense to add it to a formula whose goal is to average out the impacts of all sorts of situation-dependant variables. Anyway, eventually Palmer changed his mind and lowered the weight of the stolen base to .22 in the 2004 Baseball Encyclopedia. Using this value the relative weight of the stolen base for Batting Runs is .468, much more inline with what Johnson used.

We now move to the right hand side of the equation.

This side of the formula calculates the negative effect of making outs and therefore represents the context in which the positive weighted events from the left-hand side of the equation occur. This part of the formula counts the number of outs the batter is responsible for by subtracting hits from at bats plus caught stealing and grounded into double plays. The sum of the outs is then multiplied by .605 before being subtracted from the weighted positive offensive events. As a result, the weight of an out relative to a single is -.20 (-.605/3). However, when you look at Batting Runs you notice that the weight of an out is -.25 and so the weight of an out relative to a single is much higher at -.53 (-.25/.47). Why the difference?

The difference here lies in what each formula is attempting to measure. In Batting Runs the end result is the marginal runs or the runs contributed by the batter above what a league average hitter would have supplied whereas ERP, like Runs Created, is attempting to measure the absolute or total number of runs contributed by a batter.

In order for Batting Runs to measure the runs contributed above an average hitter, the formula takes into consideration the value of all outs made and discovers that each out is worth -.25 runs. In the 4.3 runs per game context that Batting Runs was formulated in that means that each out decreases the run potential by .16 runs in terms of shrinking the opportunity for scoring in each inning (4.3 divided by 27 is .16). However, Batting Runs is also taking into consideration the negative value outs have in terms of moving runners along during the inning and this value is then the difference between -.25 and -.16 or -.09. In other words the value of outs can be split into two components; the -.16 that represents the effect an out has on moving closer to the end of an inning, and the -.09 that represents the lack of runner advancement. So the weight of outs relative to singles with respect to advancing runners is -.19 (-.09/.47). This turns out to be the same relative weight Johnson used. If Johnson had used a weight of 1.5 instead of .605 for his outs he would have gotten the same results as Batting Runs and measured instead the marginal runs.

So why does using the smaller relative weight equate to absolute runs? By removing the decreased run potential automatically assigned for each out (-.16) you in essence remove the background noise and judge the hitter or team purely on the basis of the interaction of offensive events and that portion of the outs they make that suppress baserunner advancement. In other words, there is nothing a team can do about those 27 outs they’re going to make each game and so removing their non-discretionary cost results in a measure of the total number of runs scored.

I don’t think Johnson used this kind of logic to come up with his value of .605 and instead simply played around with his formula until he found something that worked. In fact, he says in his essay that,

“The numbers exist only to put proper emphasis on the various events. They are essential to making the equation work, but there’s no need for me to go into how they came to be what they are. I’ll just tell you that it took a hell of a lot of experimenting to settle on the darned things.”

In the final step Johnson take the right side of his equation and subtracts it from the left and then multiples the whole thing by .16. Again, why the .16?

Once you realize that ERP is a simplified version of Batting Runs you can see that the weights assigned by Johnson to offensive events in the left-hand side of his equation multiplied by .16 approximate the weights found by Palmer.

Single = 3*.16 = .48
Double = 5*.16 = .80
Triple = 7*.16 = 1.12
Homerun = 9*.16 = 1.44
Stolen Base = 1*.16 = .16
Walk = 2*.16 = .32
Hit by Pitch = 2*.16 = .32

And as you might have guessed taking the value of outs as -.605 and multiplying it by .16 yields -.097, which not coincidentally is the weight of an out with respect to advancing baserunners in Batting Runs.

However, by using this smaller value for the weight of outs ERP runs into a conceptual problem that Batting Runs does not. It is possible for hitters to accumulate negative ERP values. This doesn’t make sense in a formula that tries to estimate the absolute number of runs contributed by a player. The lower bounds should logically be zero. In fact, the zero-level, the level at which a player has a 0 ERP, is an OPS of between .320 and .330 (depending on the frequency of walks and total bases). What this means is that in practice ERP does not “work” for very restricted run environments. After all, common sense says that a hitter or team with an OPS as low as .320 will still create occasional runs through homeruns and stringing together a few hits. However, the offensive environment this represents is right around a run per game or slightly less. And since a team that scores less than a run per game does not in fact produce any positive offense, you can reasonably assume that a player that contributes at that level would not either.

Johnson went on to give two additional versions of the formula. The first is a simplified version to use when caught stealing, hit batsmen, and double plays grounded into are not available.

ERP2 = (2*(TB+BB)+H+SB-(.615*(AB-H)))*.16

You can see that he simply increased the weight of the out to compensate. The second version Johnson says works better for players with high stolen base totals. I assume he means when caught stealing is not available.

ERP3 = (2*(TB+BB)+H+SB-(.610*(AB+(SB/4) -H)))*.16

This version simply estimate the number of caught stealing by dividing the stolen bases by 4 and adding them to the number of at bats therefore making the outs component larger.

In the end James apparently did not realize that Palmer’s formula he so roundly criticized in The Historical Baseball Abstract was in fact the same formula as ERP in an admittedly simpler guise and with the twist of using a reduced weight for outs. In an ironic comment James says:

“I was originally suspicious of the system when I saw the ‘.16’ at the end of it. Wouldn’t it seem more likely that the most accurate possible system would require multiplication by .15974 or something? My assumption, as I said, was that if better methods were to be developed, they would have to be more complex, more difficult to figure, and that they would grow out of the existing methods.”

In fact, ERP did grow out of an existing method, it’s just that neither Johnson himself nor James realized it at the time.

Because ERP is equivalent to Batting Runs as we’ve shown here most sabermetricians don’t use it and instead rely on the more precise weightings of Batting Runs or Extrapolated Runs (XR) discussed below.

Derivatives
In his original essay Johnson then goes onto offer two extensions to ERP used to calculate the number of runs produced per 162 games. The first is:

ERP/162 = ERP3/(AB+(SB/4)-H)*458

And the second is:

ERP/162 = ERP/(AB+CS+GIDP-H)*474

Apparently, these formulas are an attempt to pro-rate ERP over 162 games and can be used for comparison purposes. These formulas assume a basis of 458 or 474 outs and simply multiply that by the ERP per out. However, I’m not certain where the 458 and 474 came from and Johnson does not say in his essay.

Johnson went on to refine his formula in the STATS 1991 Baseball Scoreboard and christen it “New Estimated Runs Produced” or NERP. The formula presented was:

NERP=(TB/3.15) + ((BB-IBB+HBP-CS-GIDP)/3) + (H/4) + (SB/5) - (AB/11.75)

Or if you prefer:

NERP=TB*.318 + ((BB-IBB+HBP-CS-GIDP)*.333) + (H*.25) + (SB*.2) - (AB*.085)

Once again, this formula is a linear one that can be broken down into left and right hand sides. NERP weights homeruns at 1.52, triples at 1.2, doubles at .89, and singles at .57. It also takes intentional walks out of the equation and weights other single bases gained at .33. Note that stolen bases are now weighted at .2, very similar to Batting Runs. However, the most interesting part is simply subtracting at bats multiplied by .085 from the left hand side of the equation. This seems at first glance to be an arbitrary attempt at estimating the typical number of outs a player makes and to account for the slightly higher weights in the formula. A value of around .065 would be typical be more in line if the weights were lower per the Batting Run formula. However, I don’t want to speculate too much without the original essay in which it was explained.

A few years later Jim Furtado enters the picture. Jim studied ERP and Runs Created and came to the same conclusions about the relationship of ERP and Batting Runs I’ve talked about here. He then went the next step and tried to develop a more accurate linear formula using a combination of regression analysis, comparison to other methods, peer review, and empirical analysis. His result was the Extrapolated Runs (XR) formulas published in the 1999 Big Bad Baseball Annual. He developed three versions as shown here.

XR = (.50 * 1B) + (.72 * 2B) + (1.04 * 3B) + (1.44 * HR) + (.34 * (HP+TBB-IBB)) +(.25 * IBB)+ (.18 * SB) + (-.32 * CS) + (-.090 * (AB - H - K)) + (-.098 * K)+ (-.37 * GIDP) + (.37 * SF) + (.04 * SH)

XRR - Extrapolated Runs Reduced = (.50 * 1B) + (.72 * 2B) + (1.04 * 3B) + (1.44 * HR) + (.33 * (HP+TBB)) + (.18 * SB) + (-.32 * CS) + ((-.098 * (AB - H))

XRB Extrapolated Runs Basic = (.50 * 1B) + (.72 * 2B) + (1.04 * 3B) + (1.44 * HR) + (.34 * (TBB)) + (.18 * SB) + (-.32 * CS) + (-.096 * (AB - H))

As you can see each of these formulas takes the same form as the Batting Runs formula with very similar weights. The difference is that strikeouts are weighted slightly more heavily (-.098) than other outs (-.09) while GIDP and caught stealing are weighted even more heavily. Weighting strikeouts in this way makes logical sense since strikeouts have no opportunity to advance runners.

The outs value here corresponds with the smaller -.09 value discussed previously. It is also interesting that sacrifice flies (SF) and sacrifice bunts (SH) are both included and given positive values. Albert and Bennett in Curve Ball added sacrifice flies to their least squares regression model (p187) and found that it in isolation it correlated strongly with run scoring but its weight was inordinately high and so did not use it in their model. My assumption has always been that sacrifice flies are primarily situation dependent much like RBIs themselves and so generally should not be included in run estimation formulas. Sacrifice bunts as well are typically seen as a net negative drain on offensive production so it is surprising to see them included with even a very small positive coefficient.

Games 3 Notes

A couple of notes on last night's game 3 victory by the Red Sox:

Killer Play
Contrary to Tim McCarver and Joe Buck's comments, the play by Jeff Suppan in the bottom of the third inning was the killer for the Cardinals. With runners on first and third and nobody out Suppan hesitated on the ground ball hit to the second baseman, then went back to third, then started for home again and finally headed back to third only to be thrown out by David Ortiz. Had he scored it would have made the score 2-1 with 1 out and a runner on third with Albert Pujols at the plate. The odds of scoring in that situation are well over 70% (66.2% with an average hitter) which would have tied the game.



Further, it would have stretched Pedro Martinez a bit more and possibly helped him reach the magical 100 pitch mark (he was taken out after 98) an inning sooner, which the Cardinals really needed. No, that was clearly the key play of the game. Larry Walker’s attempted tag on the flyball to Manny Ramirez with one out in the first inning seemed to me gamble worth taking early in the game.

Percentage Play
Later, the Red Sox had runners on first and second and nobody out when a fly ball (a "popup" as Joe Buck says of everything not a homerun) was hit to Jim Edmonds in medium centerfield. The runner on second, Orlando Cabrera, got set to tag and then did not as Edmonds made a pretty strong throw to third baseman Scott Rolen. Tim McCarver quickly opined that having Cabrera tag in that situation would have been a good play since the Red Sox were already up 2-0 and it would have gotten the runner to third.

I don't think McCarver was right. In that situation the run expectancy is 1.573 runs and the probability of scoring any runs is 64.1%. Had the tag been successful and both runners moved up it would have changed the run expectancy to 1.467 and the probability of scoring to 69.5%. However, when you fail the run expectancy drops like a rock to .344 and the probability of scoring to 22.3%. Because the cost of failure in this situation so high and the relative gain so little, when you calculate the break-even percentages on these numbers you quickly find out that it is never the "percentage play" to try and tag if your goal is to maximize the number of runs you’ll score in the inning. It is advisable to tag if you’re trying to score a single run but only if you think your odds of making it are greater than 88.6%. With Edmonds making the throw I don’t think the odds on Cabrera making it were anything like 88%.

A far better strategy in that situation would have been to try and double steal. The break-even percentages on that play are only 52.2% to score a single run and 63.9% to maximize runs. Those break-evens only decrease with 1 outs and so with a fast runner on second it is probably one of the most underutilized plays in baseball.

Tuesday, October 26, 2004

More on the Double Switch

As often happens a member of SABR added more information to the question of the first double-switch in major league history. Maria Vaccaro noted that she thought it had been accomplished in the 19th century but the first actual citation she could produce was for 8/2/1906 when Highlander manager Clark Griffith put himself in as a relief pitcher in the eighth inning at Detroit and also put catcher Ira Thomas in. Before the switch the catcher's position batted 8th and the pitchers position batted 9th. Griffith inserted himself in the 8th slot and put Thomas 9th.


Monday, October 25, 2004

Is it in the Cards?

With the Cardinals down two games to none I was interested to find out how often they might come back. Using my Series Simulator in 100,000 trials this situation happened 30,401 times. The Cardinals came back to win 8,210 times or 27%. They went on to be swept 4,077 times or around 13% of the time. Here are the complete numbers:



Cards in 7 4788 16%
Cards in 6 3422 11%
Sox in 7 5215 17%
Sox in 6 7470 25%
Sox in 5 5429 18%
Sox Sweep 4077 13%


So as you would guess the likely outcome of the series is that the Red Sox will win, but the most likely outcome is that they'll win it in six games.

Sunday, October 24, 2004

The First Double Switch?

Dave Smith at Retrosheet had an interesting post on SABR-L I thought was worth sharing (not that all of his other members posts aren't of course). Anyway, someone asked a question about double-switches and he responded that a few years ago someone did some research on this question and the earliest use of the double-switch came on May 18 and May 21st 1955 and were done by the Orioles and their manager Paul Richards. Note that this simply the first one that can be found and so is not necessarily the final word.

The original poster noted that a double-switch was described in communication by the league office in 1973 but did not describe it as a "double-switch".

Saturday, October 23, 2004

NLCS Recap

Here's my quick takes on the NLCS 4-3 series win by the Cardinals over the Astros.

  • Julian Tavarez has issues. I kind of knew this when he was with the Cubs but his antics this season leave little room for doubt. Any team that signs him is taking a risk.
  • Is there a more complete player in baseball than Carlos Beltran? Having watched him play here in Kansas City I wasn't surprised at what he could do but doing it on the bigger stage made it all the more impressive. It was interesting that when he was traded from the Royals in June one of the local sports radio guys seemed to denigrate the trade by noting that after all, the Royals hadn't won with Beltran. What a ridiculous statement. It'll be interesting to see how much the Yankees (or whoever) pay him next season. He can win games with power (8 homeruns in the post season), speed (his steal of second and tag to third on a medium depth fly ball in game 7 led to a run), and defense (his game saver in game 5) and at 27 he's at or near as good as he'll get. He'd look good in a Cubs uniform.
  • Should Carlos Beltran steal more? With his steal of second base off of Jeff Suppan in game seven he had 34 consecutive stolen bases in the National League. During that time he was 16 of 16 stealing third. This question was discussed on SABR-L this week with several arguing that his very high stolen base percentage (highest in history I believe at 89.3% and 192 steals) is an indication that he doesn't steal enough since it is far higher than the break-even percentage required to make it advantageous. That makes sense to me. He could probably steal 50 or 60 bases per season and still keep his percentage well above 70%.
  • I thought Phil Garner did the right thing in using Roy Oswalt as his first option in the 7th inning. Too many times a manager will still go with the pitchers he's used during the season in that situation instead of a starter who is almost always a better pitcher (that's why he's a starter).
  • It was great to see Suppan execute the squeeze play in game 7 with Tony Womak on third. I've always thought the squeeze was an underutilized strategy. In any situation where you'd settle for a sacrifice fly I would think the odds are better to try the squeeze.
  • I don't understand the pitching pattern Roger Clemens used in the 6th and deciding inning of the series. It seemed he fell in love with his fastball, which he didn't locate very well, and only after he gave up the 2-run homer to Scott Rolen did he go back to the splitter when pitching to Jim Edmonds. Was he really trying to throw fastballs past Albert Pujols?
  • By the way, does anyone believe Pujols is really 24 years old?
  • I also don't understand the Astros approach once they fell behind going into the 7th inning in game 7. They only got 3 hits all day but didn't exhibit any patience in the final three innings. Kiko Calero threw 16 pitches in the 7th, Tavarez threw 10 in the 8th, and Jason Isringhausen threw just 5 in the 9th.
  • I've watched alot of games at Enron/Minute Maid and the left field situation there is ridiculous. Left fielders play on the warning track probably because they're used to standing 300 feet from the plate. They need to erect a really big wall there instead of just the scoreboard that's out there now. Left fielders also need to play medium depth there and take away some line drive singles.
  • The broadcasting crew for FOX did a good job. Thom Brenneman is always good.
  • This is probably the end for the killer B's as far as postseason runs go. Great careers by Craig Biggio and Jeff Bagwell may end without a trip to the World Series. Biggio is a liability in left field with his second baseman's arm and his offensive production is probably not enough for a left fielder. Beltran is gone and I would be a little surprised if Roger Clemens comes back but you never know. As far as I know only Sandy Koufax has retired after a Cy Young award winning season.

Friday, October 22, 2004

Three Questions on Probability and the Playoffs

Cards vs. Red Sox
With the All-Star game now deciding home field advantage in the World Series Cardinal fans particularly are a disgruntled lot. Why not award home field to the team with the best season record? With the advent of interleague play I think that’s a viable solution. If the leagues were still totally separate and if players didn't’t move freely between leagues, the records of the two teams really would have no basis for comparison. Since they do share at least common opponents I say base it on record.

Note: For those who don't remember the genesis for the current system was the Selig driven abomination that was the 2002 All-Star game in Milwaukee where the appropriate solution would simply have been to force the American League to forfeit. There are no ties in baseball.

But the question is, how big an advantage does home cooking give the Red Sox?

Using my Series Simulator I ran 100,000 series with both the Cardinals and Red Sox with home field advantage. With the Red Sox having the advantage the Cardinals won 56% of the time largely on the strength of their major league best .642 winning percentage on the road. When Cardinals had the advantage they won 60% of the time.

Home Field Advantage Generally
But given average major league teams what is the advantage?

The average home field winning percentage in baseball is around .540. So, given two teams that both play .520 ball at home and .480 ball on the road, the winning percentage when matched up is .540 for the home team given the Log5 formula. Running a simulation for a seven game series indicates that the team with home field advantage wins just over 51% of the time. The same holds true in a 5 game series. So given two average teams the home field advantage doesn't seem to be that significant.

Wild Card Team's Chances
Another question that comes up in relation to playoff series is how often inferior teams like a Wild Card teams would beat the best team in the league. To see how often this is the case I simulated that my Wild Card team played .540 ball at home and .490 on the road (83 wins) while my "best" team or division winner played .650 at home and .550 on the road (97 wins). With that configuration the Wild Card team still won 31% of the time in a five game series.

The lesson is that even through baseball is the "game of the long season" as George Will says, the best teams still have a considerable chance of getting beat in any playoff series just as they do in a single game. I think this point should take the edge off the criticism the Braves have taken for only winning one World Series while winning 13 consecutive division titles and especially on the A's for supposedly not being able to win while employing a "Moneyball" approach. Overall, the addition of the Wild Card adds excitement to the end of the season for many teams (Cubs, Padres, Astros, Rangers, Red Sox, and A's this season) at the cost of correlating the winner of the World Series more closely with the team who garnered the best season record. Take your pick.

Thursday, October 21, 2004

How Probable Was the BoSox Comeback?

That seems to be the question on everyone's mind today. To have some fun with that question I constructed a simulator in Visual Basic .NET that plays seven games series in a 2-3-2 format. I calculated the probability of each team winning their home games using the log5 method described by Bill James in the 1981 Baseball Abstract and documented by Tom Tippett at Diamond Mind Baseball here.


A - (A * B)
WPct = -----------------
A + B - (2 * A * B)

So if TeamA played .550 ball at home and TeamB played .450 on the road the probability of TeamA winning a game at home would be .599 (.55-(.55*.45))/(.55+.45-(2*.55*.45)). Interestingly, I also found a slightly different formula by Rodney Sparapani posted this April that gives the same results.

For the 2004 season the Yankees played .704 ball at home, .543 on the road while the Red Sox played .679 at home and .531 on the road. Using the log5 formula for home games then the Yankees had a probablity of winning of .677 against the Red Sox while the Red Sox had a probability of .643 at Fenway Park against the Yankees.

I then used the random number generator to simulate the outcome of the contest and record the number of victories for each side and in how many of the them the team who did not have home field advantage (the Red Sox in this case) one the last 4 games of the series.

Drum roll please....

In 100,000 ALCS matchups the results were:

Yankees: 60,109
Red Sox: 39,891

So the Yankees win about 60% of the time and the Red Sox 40%. In those 100,000 contests the Red Sox won the series after going down 3-0 754 times or .754% of the time or once every 132 series. The Yankees took the first three games 16,701 times and so the Red Sox made their comeback 4.5% of the time. That's very close to the actual number of times that the feat has now been accomplished - 1 out of 26 or 3.8%.

As an aside the Yankees swept the series 5,789 times while the Red Sox swept it 4,269 times. Here is the complete breakdown:

In case you're interested here are the probabilities for the Red Sox facing either the Astros or Cardinals.

Red Sox versus Astros = Red Sox win 60.5% of the time
Red Sox versus Cardinals = Cardinals win 56.3% of the time

For those interested the VB .NET code to run each trial is as follows:

Module Module1

Public outcomes As New ListDictionary


Public Sub main()
outcomes.Add("40", 0)
outcomes.Add("41", 0)
outcomes.Add("42", 0)
outcomes.Add("43", 0)
outcomes.Add("04", 0)
outcomes.Add("14", 0)
outcomes.Add("24", 0)
outcomes.Add("34", 0)
Application.Run(New Form1)
End Sub

Public Function RunTrial(ByVal team1 As Team, _
ByVal team2 As Team) As Results

Randomize()

Dim team1Won, team2Won As Integer
Dim team1Home As Decimal
Dim team2Home As Decimal
Dim scores As String
' odds of team1 winning their home games
team1Home = (team1.HomeWPct - (team2.AwayWPct * team1.HomeWPct)) / _
((team1.HomeWPct + team2.AwayWPct - (2 * team1.HomeWPct * team2.AwayWPct)))
'team1Home = (team1.HomeWPct * (1 - team2.AwayWPct)) / _
' ((team1.HomeWPct * (1 - team2.AwayWPct)) + ((1 - team1.HomeWPct) * team2.AwayWPct))
' odds of team2 winning their home games
team2Home = (team2.HomeWPct - (team1.AwayWPct * team2.HomeWPct)) / _
((team2.HomeWPct + team1.AwayWPct - (2 * team2.HomeWPct * team1.AwayWPct)))

Dim i As Integer
For i = 1 To 7 '7 game series, 2,3,2

' team1 has the homefield advantage

Select Case i
Case 1, 2, 6, 7
If PlayGame(team1Home) Then
' team1 won
team1Won += 1
scores &= "1"
Else
team2Won += 1
scores &= "2"
End If
Case 3, 4, 5
If PlayGame(team2Home) Then
' team2 won
team2Won += 1
scores &= "2"
Else
team1Won += 1
scores &= "1"
End If
End Select

If team1Won = 4 Or team2Won = 4 Then
' series over
Return New Results(team1Won, team2Won, scores)
End If

Next
End Function

Public Function PlayGame(ByVal prob As Decimal) As Boolean
' choose a random number between 0 and 1
' if <= prob then game is won so return true else return false
If Rnd(1) <= prob Then
Return True
Else
Return False
End If

End Function

End Module

Public Class Team
Public Sub New(ByVal Home As Decimal, ByVal Away As Decimal, _
ByVal name As String)
HomeWPct = Home
AwayWPct = Away
TeamName = name
End Sub
Public HomeWPct As Decimal
Public AwayWPct As Decimal
Public TeamName As String
End Class

Public Class Results
Public Sub New(ByVal team1 As Integer, ByVal team2 As Integer, _
ByVal scores As String)
Team1Won = team1
Team2Won = team2
Games = scores
outcomes(team1.ToString + team2.ToString) += 1
End Sub
Public Team1Won As Integer
Public Team2Won As Integer
Public Games As String
End Class

A Victory for Sabermetrics?

David Pinto over at Baseball Musings writes about the sabermetric tide in baseball given the success of the Red Sox. One of the commentors to that post asked:

"doesn't the spread of sabermetric ideas mean that the market inefficiencies of the past will be competed away soon and there will be no more possibilities for progress? Or are there a host of new sabermetric discoveries to be made?"

Well, this season the Red Sox and A's, rather than progress through new sabermetric insights, tried instead to exploit the undervaluing of defense as the value of on base percentage and OPS begins to be understood as pointed out by Peter Gammons and Ken Rosenthal. Billy Beane also pointed this out in a recent interview.

However, as I've said before I think the inherent structure of the game of baseball ultimately limits the possibilities for progress. Using the Batting Runs formula, for example, the value of a homerun or a stolen base or a walk has changed only slightly since 1900. From the offensive perspective it isn't like someone will suddenly learn that stolen bases indeed are more valuable than power hitting. And even in the big picture sabermetricians pretty well understand the relative values of offense and defense (split into pitching and fielding). As a result sabermetrics will become more specialized and so I wouldn't look for "a host" of new sabermetric breakthroughs. In addition, taking advantage of what is currently undervalued can only get you so far if what is undervalued is not as valuable in an absolute sense. Right now we're in a time when some teams can and do exploit the market but with the spread of sabermetrics this advantage will be lessened.

To me, one of the last big frontiers is valuing defense more precisely. There's alot of disagreement as to how many runs a really good shortstop saves over an average one. Getting a defensive run measure to the same visibility as OPS and deprecating fielding percentage would be a big win for sabermetrics and for baseball in general.

ALCS Recap

The Red Sox completed their more than improbable comeback last night with a 10-3 victory over the Yankees. 25 other teams had gotten down 0-3 in a best of seven but only the Red Sox even forced a game 7, let alone won it. A few observations on game 7 and the series.

  • Like many I was surprised that the Yankees did not have a better approach to Curt Schilling in game 6. They were not patient nor did they bunt forcing Schilling to throw only 99 pitches through 7 innings and never testing his bad ankle. When a pitcher is clearly injured you need to wear him down, which the Yankees seemed unwilling to do.
  • Games 6 and 7 weren't really the problem for the Yankees. Games 4 and 5 were. In game 4 they had a 4-3 going into the bottom of the 9th when a Mariano Rivera walk turned into a run to tie the game. In game 5 they held a 4-2 lead in the bottom of the eighth before David Ortiz homered off of Tom Gordon and Rivera gave up the sacrifice fly to Trot Nixon to tie the game. Perhaps Rivera should have been brought in in the 8th inning of game 5 instead of waiting.
  • In game 7 I was shocked to see Pedro Martinez on the mound to start the 7th inning. He was coming back on only one day's rest having thrown 111 pitches in game 5. Presumably he would pitch either game 1 or game 2 of the World Series. When he came in he was not sharp at all but seemed to pick up the velocity after he let in the 2 runs, striking out John Olerud and retiring Miguel Cairo. Regardless of the outcome that was a bad decision by manager Terry Francona. Derek Lowe had only thrown 69 pitches and was in command. It worked out but I don't understand it at all.
  • Throughout the series Al Leiter added alot with his commentary and showed what thinking pitchers are thinking about during a game. His comments in game 7 about the Yankees pitching pattern to Olrando Cabrera were especially interesting. He noted that the Yankees seemed to try and get him out on fastballs although Leiter knows from experience and backed it up with statistics that Cabrera is a fastball hitter hitting .213 on breaking balls and .297 on fastballs. That may explain part of the reason Cabrera hit .379 in the series.
  • On the contrary the Yankees had an excellent approach to pitching Mark Bellhorn. They pounded him inside both low and high. His homerun in game 6 was on a pitch out of the strike zone away and just a little up and his homerun in game 7 was pitch left out over the plate. Interestingly, both Tim McCarver and Joe Buck seemed critical of the decision to bat Bellhorn second in the lineup, emphasizing the fact that he struck out with a man on second and nobody out in the first inning therefore failing to move the runner over. To me this is an example of the Red Sox employing the strategy of "be the house". Yes, Bellhorn strikes out alot (177 times, tops in the AL not tops in the majors as McCarver said last night) but he also walks alot (88 times). Over the course of a season his 88 walks and .444 slugging percentage are going to move over alot of runners while his strikeouts are going to result in fewer double plays (he hit into 8). I'll take that tradeoff any day, especially if the alternative is to bat Cabrera second who has a .316 career OBP and grounded in 16 double plays in 2004.
  • As noted by the broadcast team last night the Yankees got 3.3 innings, 6 hits, 8 runs, and 7 walks out of $25M worth of pitchers in Kevin Brown and Javier Vazquez. Many will use this as proof that money doesn't win championships. The payroll ranking of the eight teams that made the playoffs are 1,2,3,7,8,11,12,19. The fact is that large payrolls result in many more playoff appearances but because of chance in short playoff series don't guarantee championships. Albert and Bennet in Curve Ball constructed a model of team performance and through a simulation concluded that the best team in any given season has a 98% chance of making the playoffs but only a 21% chance of winning the World Series.
  • I'm not sure that Joe Torre shouldn't have brought in Mariano Rivera in the 2nd inning when the game was on the line. That would have been a bold move. Vazquez did not pitch all that well in game 3 going 4.1 innings and giving up 4 runs and repeated that trick last night.
  • David Ortiz deserved the ALCS MVP award. All of his three homeruns were huge not to mention his game-winning single in the 14th inning of game 5 while hitting .387 and driving in 11 runs.
  • George Steinbrenner is sure to shake things up in the Bronx. I wouldn't be surprised to see a change of pitching coaches but although that's where it will start it'll likely only be the beginning.
  • Best Tim McCarver quote of the series - "The riptide of big innings are walks"
  • I agree with Will Carroll who notes that for a team with a really big payroll the Yankees had several obvious holes in their lineup during the series. Batting Kenny Lofton at DH and the combination of Tony Clark and John Olerud at first base as well as Miguel Cairo seems strange for a team with that much money. I don't mind Ruben Sierra at DH so much.

It's been a long week of baseball. One more night with the NLCS game 7. It's hard to bet against Roger Clemens.


Wednesday, October 20, 2004

ARod and Interference

For those interested here is the relevant passage in the rule book relating to last night's interference call on Alex Rodriguez.

"(a) Offensive interference is an act by the team at bat which interferes with, obstructs, impedes, hinders or confuses any fielder attempting to make a play. If the umpire declares the batter, batter runner, or a runner out for interference, all other runners shall return to the last base that was in the judgment of the umpire, legally touched at the time of the interference, unless otherwise provided by these rules. In the event the batter runner has not reached first base, all runners shall return to the base last occupied at the time of the pitch."

This is a part of rule 2 and was clearly violated by ARod. However, as Craig Burley points out the umpires also use a manual that interprets various rules which says in section 6.1:

"while contact may occur between a fielder and runner during a tag attempt, a runner is not allowed to use his hands or arms to commit an obviously malicious or unsportsmanlike act such as grabbing, tackling, intentionally slapping at the baseball, punching, kicking, flagrantly using his arms or forearms, etc. to commit an intentional act of interference unrelated to running the bases."

This also makes it clear that the umpires were correct.

The Federal Marriage Amendment

Robert Bork has a nice piece in First Things defending the Federal Marriage Amendment now before the Congress which reads:

"Marriage in the United States shall consist only of the union of a man and a woman. Neither this Constitution nor the constitution of any state shall be construed to require that marital status or the legal incidents thereof be conferred upon unmarried couples or groups."

Bork argues against other social conservatives such as George Will and Charles Krauthammer who think it unwise to amend the Constitution for this, or it appears, any reason. Bork then goes on to explore other possible wordings of an amendment and the consequences of homosexual marriage on the culture.

In some opposition to an amendment, as when Krauthammer says "for me the sanctity of the Constitution trumps everything" I see an unhealthy veneration of the Constitution that appears to have grown in the last 80 years. This "sanctity" of the Constitution has had the effect of making it increasingly unlikely that the Constitution will be amended. The irony is that over time more and more interpretations by courts have only the slightest connection with the text itself and are rather based on the whim of a few people. It is just such a situation that calls out for the process of amendment. A first step is to once again realize that the Constitution is a human document and that it's interpretation occasionally needs to be clarified by the will of the people.

And because the amendment will ultimately fail not because of its content but because of the Constitution's "sanctity" homosexual marriage in the states is a forgone conclusion. Bork outlines the undeniable scenario that will unfold:

"A homosexual couple will marry in Massachusetts, move to another state (say, Texas), and claim the status and benefits of marriage there. They will cite the Full Faith and Credit Clause of Article IV of the Constitution, which declares that states must accept the public acts of every other state. Texas will refuse recognition, relying on the federal Defense of Marriage Act (DOMA), passed in reliance on Article IV's further provision that Congress may prescribe the effect of such out-of-state acts. The couple will respond with a challenge to DOMA under the federal Due Process and Equal Protection Clauses. The Supreme Court will then uphold their challenge by finding a federal constitutional right to same-sex marriage that invalidates DOMA. The FMA would prevent this almost-certain outcome. Instead of state-by-state experimentation, we are going to have a uniform rule one way or the other: homosexual marriage everywhere or nowhere. The choice is that stark and judges are forcing us to make it."

Given the past behavior of the Supreme Court, can anyone honestly reason it will go down any other way?

My own view has been that homosexual marriage in isolation should probably be approved. After all, almost everyone agrees that some form of civil union is desirable in the interests of financial and legal fairness and once you go that far, civil marriage is largely a distinction without a difference. But once you go there (and why I don't think we should even if it's the "fair" thing to do), then marriage has lost any mooring it once had and can reasonably be conceived to mean absolutely anything (the same argument can be used against civil unions). I see no compelling reason why a young man may not "marry" an elderly woman for financial interests or a woman marry her son, or a group of people marry for financial and legal protection, or people "marrying" and divorcing the same or different people on a semi-annual basis in order to receive tax breaks. It is the symbolic link with child-rearing that is the basis for marriage as it now exists.

Bork addresses the argument that such arrangements couldn't or wouldn't happen.

"Many consider such hypotheticals ridiculous, claiming that no one would want to be in a group marriage. The fact is that some people do, and they are urging that it be accepted. There is a movement for polyamory - sexual arrangements, including marriage, among three or more persons. The outlandishness of such notions is no guarantee that they will not become serious possibilities or actualities in the not-too-distant future. Ten years ago, the idea of a marriage between two men seemed preposterous, not something we needed to concern ourselves with. With same-sex marriage a line is being crossed, and no other line to separate moral and immoral consensual sex will hold."

Conservatives such as Thomas Sowell often define the difference between themselves and liberals by saying that conservatives take seriously the law of unintended consequences. Sever the link between marriage and the family and I'm betting you'll see that law unleashed.

Tuesday, October 19, 2004

Excellent Umpiring

Two times in tonight's game between the Yankees and Red Sox the umpiring crew got together when a call was disputed, talked it over, and got the call correct. The first time was on the 3-run homerun by Mark Bellhorn that actually hit a fan before falling back onto the field of play and the second time when Alex Rodriguez swatted the ball out of Bronson Arroyo's glove in the bottom of the 8th. Kudos to the crew, who did the correct thing and got it right both times.

Up until just a few years ago it was not common for crews to do this. The umpire who made the original call usually stuck to his guns and wouldn't ask for help.

OPS vs OAPS

There was an interesting piece in The New York Times on September 19th by Alan Schwarz titled "Ball Four! Take Your Measly Base, Slugger". Schwarz is the author of The Numbers Game. In the piece Schwarz discusses the 1.433 OPS of Barry Bonds (at the time of the article - Barry has since broken his own 2002 record for OPS with 1.422) and how David Neft, the former vice president for research at Gannett, devised OAPS (On-base advantage plus slugging percentage) in order to "acknowledge this strategic aspect of the game: how walks are, to varying degrees, conscious choices by pitchers to avoid the potential damage done by slugging."

To take this into account Neft subtracts the batter's slugging percentage from 1 and uses this in the calculation of what he calls "on base advantage". Basically, this is an acknowledgment that there is an opportunity cost associated with taking a walk and that pitchers sometimes choose to walk a batter based on the potential damage pitching to the batter could do. So in practice this means that if a batter had 240 total bases, 80 walks, and 150 hits in 500 at bats his OBA would be .397, his SLUG .480, and his OPS .876. His OA would then be .330 since his walks would be worth only 52% of normal (1-.48) and his OAPS .810.

For a slugger like Bonds the difference between his OBA and OA will grow proportionately with his walk total. Whereas in our average player the difference was only .024, for Bonds it's more like .35 - fifteen times the difference and has the effect of bringing his OAPS down to around 1.07 or so. The following chart illustrates how OPS continues to rise more sharply as a batter accumulates walks while OAPS has a much smaller slope, rewarding batters less for each walk.

This approach also seems to hold up fairly well in computer models as noted by Schwarz:

"Mark Pankin, a 59-year-old investment adviser and avid baseball statistics researcher in Arlington, Va., plugged Neft's concept into his computer model and found it held up. Though the average walk costs a pitcher 0.33 runs, Bonds's walks each cost 0.17, with other players' figures going up as their slugging percentage goes down: Beltre (0.25), Guerrero (0.27) , up to the notoriously nonslugging David Eckstein of the Angels (0.36). 'It can be a meaningful difference: about 40 percent from Beltre to Eckstein,' Pankin said."

What Pankin is saying is that when Bonds doesn't walk, his plate appearances are worth an average of .16 runs. As a result, when a pitcher walks him it increases the run potential .17 runs (.33 - .16). For Eckstein on the other hand his runs per plate appearance are -.03 and so when a pitcher walks him it increases the run potential by .36 runs (.33-(-.03)). In other words it is relatively less costly to walk Bonds than it is to walk Eckstein. And of course whether you want to roll the dice and see if Bonds will make an out depends in large part whether there are runners on base, hence the strategic nature of many of his walks.

However, there is something that doesn't quite sit right about this. I don't really like the idea of punishing a hitter because they have a high slugging percentage.

One way to correct this would be to only weight Bonds' intentional walks using the formula suggested by Neft. After all, it is in these situations when we know that the defense chose a strategy of avoidance. Since 120 of Bonds 232 walks were intentional, they should be counted at .188 giving Bonds effectively 135 walks (.188*120+112) or valuing each walk at .58. Guerrero on the other hand had 14 IBB and 38 regular walks and so his walks would be valued at .84 (((14*.401)+38)/52).

But of course making this correction doesn't acknowledge the times a hitter is pitched around resulting in a "regular" walk. And for those hitters who never get an intentional walk their walks will always be valued the same as a base hit in the OPS calculation, something we know from Batting Runs that is not correct. Walks are 70% as valuable as singles overall, being worth .33 runs with singles at .47. A better approach overall would simply be to use Batting Runs with a weight such as .17 or so for the intentional walks.

In the end of course, one of the main strengths of OPS is that it is good for back of the envelope calculations and so introducing more complexity into it defeats some of its purpose.

Monday, October 18, 2004

Kauffman Park Effects 2004

Since the season is over for the Royals I thought I'd update the Kauffman Stadium run scoring spreadsheet that I started before the season started and blogged about previously.

My interest in this began when considering how the fences being moved back 10 feet at the K last winter would affect the run scoring at the park. So here are the updated numbers:

	Home				Away			Index		

Games Royals Opp Opp%+ Games Royals Opp Opp%+ Royals Opp Overall
2004 80 338 426 26% 82 382 479 25% -9% -9% -9%
2003 80 433 512 18% 82 403 355 -12% 10% 48% 28%
2002 81 434 505 16% 81 303 386 27% 43% 31% 36%
2001 81 382 485 27% 81 347 373 7% 10% 30% 20%
2000 81 451 488 8% 81 428 442 3% 5% 10% 8%
1999 80 441 449 2% 81 415 472 14% 8% -4% 2%
1998 80 353 492 39% 81 461 407 -12% -22% 22% -1%
1997 80 387 434 12% 81 360 386 7% 9% 14% 11%
1996 80 372 369 -1% 81 374 417 11% 1% -10% -5%
1995 72 285 346 21% 72 344 345 0% -17% 0% -8%
1994 59 325 287 -12% 58 249 245 -2% 28% 15% 22%
1993 81 370 354 -4% 81 305 340 11% 21% 4% 12%
1992 81 314 346 10% 81 296 331 12% 6% 5% 5%
1991 81 344 378 10% 81 383 344 -10% -10% 10% -1%

91-94 302 1353 1365 1% 301 1233 1260 2% 9% 8% 9%
95-03 715 3538 4080 15% 721 3435 3583 4% 4% 15% 9%

As you might have expected the end result is that fewer runs were scored at Kauffman Stadium this season than on the road to the tune of 9% - obviously a big difference from the trend of 2001-2003 when about 26% more runs were scored at the K and the historical trend of +9% since 1991. One shudders to think how many runs Darrell May and Brian Anderson might have given up with the fences at their 2003 distances. As it was they gave up 38 and 33 homeruns respectively, the former setting a Royals record.

However, these quick and dirty numbers don't translate directly into park factors used by sabermetricians to normalize statistics such as OPS and Runs Created. The calculation of BPF (Batter Park Factor) and PPF (Pitcher Park Factor) is rather complicated and a complete explanation can be found on baseball-reference.com using the same basic formula as Total Baseball and originally documented in The Hidden Game of Baseball.

Suffice it to say the BPF and PPF also take into account things like:

1) the innings pitched difference between home and road games (a good team at home will get fewer at bats and thus score fewer runs than on the road)
2) the impact of the team's park on the park factors of other teams
3) the fact that hitters don't get to bat against their own pitchers and vice versa
4) and that a player only plays half his games at his home park.

In addition they are typically calculated using averages over several seasons (I think it might make more sense to use weighted averages since I would assume that weather patterns varied by groups of years as well as single years). This is important since there is a large element of variation in run totals in a park from season to season based on weather, individual players, and simply luck. So I wouldn't be surprised if run scoring increased at the K next year since we had a cooler and wetter summer in KC than in years past.

So the BPF and PPF for these seasons from baseball-reference.com, which uses a 3 year average, actually calculate to:

Year BPF PPF
2004 95 96
2003 113 112
2002 117 115
2001 110 109
2000 104 103
1999 101 101
1998 104 105
1997 102 103
1996 97 98
1995 103 103
1994 104 104
1993 104 105
1992 103 103
1991 100 101

Obviously this tends to even out the fluctuations between 1994-95 and 1997-98. And it would appear that the factors of 95/96 for 2004 may not be the result of a three year average. If they were the BPF would have to have been very low indeed. If not, then this makes sense when configuration changes are made to the park as was the case at the K this season.

Triples Galore

In response to a question on SABR-L I dug into retrosheet data for 1992 (I now have the 2003 data but haven't loaded it yet courtesy of the Stats Software group on Yahoo). The questioner asked how often triples drove in runs, how often the batter hitting the triple scored, and how often both occurred. Here's what I came up with:



Lg 3b RBI Score3b Pct 3bScored Pct 3bRbiScore Pct
NL 459 289 207 45.1% 316 68.8% 117 25.5%
AL 386 264 187 48.4% 252 65.3% 93 24.1%

Where Score3b is the number of times a triple drove in at least one run, 3bScored is the number of times the batter who hit the triple scored, and 3bRbiScore is the number of times the batter drove in a run and later scored in the inning.

If I did the calculation correctly (this was my first effort at doing inter-inning analysis with retrosheet data) it is suprising that 46% of triples ended up scoring a run or more since the OBP is roughly .330. That means that triples are hit more frequently with men on base than you would expect. The percentage of batters hitting triples that score seems about right (67%) when looking at run expectancy tables.

Sunday, October 17, 2004

Advanced .NET Compact Framework Development

Here is a PPT talk that I'll be giving at the Wichita .NET User's Group in a couple of weeks. It covers several advanced .NET Compact Framework development techniques incuding P/Invoke, Detecting Network Presence, and making Smart Devices Smart Clients.

A Study of 1 Peter

The small group my wife and I attend are doing a 9-week study on 1 Peter. This year I'm leading the group and so have prepared a few notes each week in preparation for the time meeting. For those interested I've placed the notes I use here for download:

Week 1: Introduction and Background
Week 2: 1 Peter 1:1-13 A Precious Salvation
Week 3: 1 Peter 1:14-25 A New Way of Life
Week 4: 1 Peter 2:1-10 A Chosen Priesthood
Week 5: 1 Peter 2:11–2:25 Submission to Rulers and Masters
Week 6: 1 Peter 3:1–3:12 Wives and Husbands
Week 7: 1 Peter 3:13–4:6 Doing Good: The Promise of Vindication
Week 8: 1 Peter 4:7-19 Mutual Love: The Key to Christian Community in the End Times
Week 9: 1 Peter 5:1-14 The Responsibilities of a Church in the Midst of Trials and Concluding Remarks


The outline I'm following was based on one created by Daniel B. Wallace, Ph.D. Professor of New Testament Studies Dallas Theological Seminary.

Disclaimer: I have no formal theological training and so read through these at your own risk. I did try and document from where some of the background material is taken from for each week so you can find the original source yourself. I generally come from an evangelical perspective but I tend to differ in areas such as eschatology and inerrancy.

Saturday, October 16, 2004

A Brief History of Run Estimation: Batting Runs

The following is the second article in my series on the sabermetric history of run estimation. This article covers Pete Palmer's Batting Runs, a component of the Linear Weights system.

History
Batting Runs, a linear run estimator, was developed by Pete Palmer in the 1970s and was introduced as a part of his Linear Weights (LWTS) system to the world in his and John Thorn's 1984 book The Hidden Game of Baseball, like Bill Jame's Baseball Abstracts, one of the preeminent documents in the history of sabermetrics. Palmer went on to apply his linear weights system to defense and pitching and derive his Total Player Rating (TPR) system that was tracked in Total Baseball and continues as Batter-Fielder Wins (BFR) and Pitcher Wins (PW) in the 2004 edition of The Baseball Encyclopedia.

However, the history of Batting Runs and Linear Weights actually goes back much further.


As documented by Alan Schwarz in his excellent book The Numbers Game, F.C. Lane, the editor of the Baseball Magazine from 1912-1937 was actually the pioneer of linear weights when he observed that batting average was an inadequate way of measuring the contribution individual players make to winning baseball games by remarking in 1916,

"Would a system that placed nickels, dimes, quarters, 50-cent pieces on the same basis be much of a system whereby to compute a man's financial resources? And yet it is precisely such a loose, inaccurate system which obtains in baseball..."

So Lane took it upon himself to correct the situation and kept track of the results of 1,000 hits and their results in order to assign them coefficients to use in an equation he developed. The simple equation was:

Total Run Value = (1B*a)+(2B*b)+(3B*c)+(HR*d)

The values for a,b,c, and d he assigned were .30, .60, .90, and 1.15. The core of Lane's observations of the 1,000 hits being that the hits were not only valuable for the obviously different number of bases gained by each, but there was also a component of advancement value that contributed to run creation. Later Lane also assigned a value of .164 to walks, a value now recognized as too low by half but revolutionary for its time by crediting a walk on the batter's part as valuable at all.

It must also be remembered that Lane's innovation came in a time when batting average, made official way back in 1876, was the only way most people had ever evaluated offensive players. It is true that Henry Chadwick developed a stat in the 1860s he called "Total Bases Per Game", which was slugging perentage with a different denominator, but it didn't really catch on and slugging percentage was not made official in the National League until 1923 and the American League until 1946.

Lane used his formula to compare Brooklyn firstbaseman and former batting champion Jake Daubert to Phillie's slugger Gavvy Cravath, who had hit 24 homeruns in 1915. Not suprisingly, Lane's analysis showed that Cravath was the more valuable player with a Total Run Value of 79 versus 62 for Daubert.

Lane went on to adjust his formula and eventually settled on the following:

Total Run Value = (1B*.457)+(2B*.786)+(3B*1.15)+(HR*1.55)+(BB*.164)

Unfortunately, Lane's pioneering work was all but forgotten soon after.

In the mid 1950s George Lindsey, a military officer, listened to and watched around 400 baseball games and from what he learned he began submitting articles to the statistical journal Operations Research on various aspects of baseball strategy. With the help of his retired father their combined scoring efforts produced the 1963 article "An Investigation of Strategies in Baseball", another of sabermetric's founding documents. In that article Lindsey produced the first Run Expectancy table, a table that showed how many runs were expected to score from any of the 24 base/out combinations (I use a similar table in my Big League Pocket Manager application for the Pocket PC to calculate the break-even probabilities for various strategies).



O/B 0 1 2 3 1,2 1,3 2,3 Full
0 0.46 0.81 1.19 1.39 1.47 1.94 1.96 2.22
1 0.24 0.50 0.67 0.98 0.94 1.12 1.56 1.64
2 0.10 0.22 0.30 0.36 0.40 0.53 0.69 0.82

From here it was a simple step to calculate the run expectancy before an offensive event occurred, the run expectancy after, and along with the typical advancement on singles and doubles and the frequency of the base/out combinations (which Lindsey also tracked), compute the run values or weights for each offensive event. Lindsey came up with .41 for singles, .82 for doubles, 1.06 for triples, and 1.42 for homeruns - very similar to what Lane had done 40 years earlier.

Interestingly, Lindsey like Lane then used his system to compare a singles hitter, in this case the Tiger's Harvey Kuenn who had won the 1959 AL batting championship hitting .353, with a homerun hitter, the Indians Rocky Colavito who had hit 42 homeruns. This comparison had a bit more riding on it as the two were traded for each other. Colavito came out on top 114.5 to 112.6.

Using Run Expectancy and advancement tables like those calculated by Lindsey is only one way of calculating run values for various offensive events. And this brings us back to Batting Runs and Pete Palmer.

In 1978 Pete Palmer ran a computer simulation of "all major-league games played since 1901." From that simulation Palmer tabulated the frequencies of the offensive events and by assigning advancement values based on observation of 100 World Series games was able to calculate the expected run values for each event. The formula he devised was:

Batting Runs = (.46*1B)+(.80*2B)+(1.02*3B)+(1.40*HR)+(.33*(BB+HBP))+(.30*SB)+(-.60*CS)+(-.25*(AB-H))-(.50*OOB)

What is interesting about this formula first is that it includes hit by pitch (HBP) and stolen bases and of course that the weights are similar to those calculated by both Lane and Lindsey. It's real import, however, is that for the first time the number of outs (AB-H, CS, and OOB or "outs on base") the player is responsible for is included and given a coefficient. Like other offensive events, outs have a run value, it is simply the case that the run value is negative since outs decrease the opportunity for scoring runs by either ending an inning or moving the team in that direction. Typically, OOB is difficult if not impossible to find for individuals without play-by-play data but for teams is simple to calculate as OOB = H+BB+HBP-LOB-R-CS.

Stolen bases and caught stealing can also be taken out of the Batting Runs formula and be calculated separately as Stolen Base Runs (SBR) or Base Stealing Runs (BSR) as (.30*SB)-(.60*CS). Originally, the value of the stolen base and caught stealing was set at around .20 and -.35 respectively. However, Palmer was convinced by Dave Smith of Retrosheet to increase both the positive and negative impacts of the stolen base on the basis that they occur in situations where games are more in question. In other words, stolen bases are strategically more important and so have a greater impact on wins and losses. Not many people seem to buy this argument since runs and not wins are what is being calculated. Apparently, Palmer agreed and so in The 2004 Baseball Encyclopedia BSR is simply calculated as (.22*SB)-(.38*CS).

The most important fact about Batting Runs is that because of the inclusion of negative values for outs Batting Runs is a measure of the "net runs produced above average" in a given offensive context. In other words, a Batting Runs value of 55 means that the batter produced 55 runs above what an average batter would have produced given the same opportunities, which means given the same number of outs consumed. Of course, this also means that a player can be assigned negative Batting Runs indicating they performed below average. Batting Runs, therefore cannot be compared with Runs Created without making adjustments. That adjustment is to reduce the value of the out from around -.25 to -.10 or -.09. The basis for this is straightforward. The value of an out (or any offensive event for that matter) can be thought of as the sum of the value the out in moving runners over and the value of ending the inning. Using the run environment of 4.3 runs per game (the average runs per game from 1901-1977) each out is worth -.16 runs in terms of its inning-ending value. Subtracting -.16 from -.25 yields a value of -.09 as the value of the out related to moving runners along. By using -.09 as the value for outs, Batting Runs can be compared to Runs Created.

It's also important to keep in mind that technically the Batting Runs formula shown above is valid only for a given offensive context, namely the 4.3 runs per game of 1901-1977. Palmer and Thorn show in The Hidden Game several sets of weights by period (1901-20, 1921-40, 1941-60, and 1961-77). Fortunately, these values are very similar, something Palmer apparently did not expect, thinking that in the "deadball era" the relative value of a stolen base might be significantly greater and homerun smaller (they were but only very slightly, for example the homerun going from 1.36 in the earliest period to 1.42 in the latest and the stolen base going from .20 to .19).

As a result, Palmer was able to present a single formula and use the value of the out to adjust for differences by era. Some out values for different eras as noted in Curve Ball and The Hidden Game are:

-.24 for 1901-1920
-.30 for 1921-1940
-.27 for 1941-1960
-.25 for 1961-1977

In the modern era Palmer then recommends that a value of -.25 value be used when pitcher's hitting is included (for example in the NL) while a value of -.27 is recommended when the DH is employed since making an out is more costly when the run environment expands as it does when pitchers are not hitting.

Batting Runs has also been adjusted slightly throughout the years using different weights. For example, the formula in the 1989 edition of Total Baseball was:

BR = (.47*1B)+(.78*2B)+(1.09*3B)+(1.40*HR)+(.33*(BB+HBP))+(.30*SB)+(-.60*CS)+(-.25*(AB-H))

And the formula in the 2004 edition of The Baseball Encyclopedia reduces the weights of the extra base hits by including their value into the value for hits since singles are not weighted separately:

BR = (.47*H)+(.38*2B)+(.55*3B)+(.93*HR)+(.33*(BB+HBP))+(.22*SB)+(-.38*CS)-(ABF*(AB-H))

Also included here is ABF, or the "league batting factor". This is essentially a custom "out" value for the league context to ensure that the average batter's Batting Runs equal zero for the given league and year. It is calculated using league totals as:

ABF =((.33*(BB+HBP))+(.47*H)+(.38*2B)+(.55*3B)+(.93*HR))/(AB-(LGF*H))

For example, the ABF in the NL for 2003 was .28 and in the NL for 1968 was .23 since the increased offensive context of 2003 dictates that an out cost a team more potential runs than it did in 1969.

LGF in the calculation of ABF is the league factor designed to increase the number of Batting Runs for leagues deemed inferior to the typical major-league. It equals 1 expect when it is:

Union Association (1884) = .8
Federal League (1914-15) = .9

In reality, the linear weights associated with Batting Runs differ not only by era but also by league and for each league by each individual team and for each team by position in the batting order. In other words, in order to caclulate how many runs an individual player is responsible for it would be necessary to calculate weights for each offensive event that were particular to his team and position in the batting order. However, because of the complexity of making such calculations and because creating custom linear weights at the lower levels reduces their usefulness for comparison across teams, leagues, and eras, most sabermetricians use a single formula and adjust the outs value based on era or league. For an interesting discussion of creating custom linear weights see Tangotiger's site.

Another area for refinement in the era of Barry Bonds is separating the value of a regular base on balls from an intentional walk, and for that matter hit-by-pitch. The general consensus is that a regular walk has a weight around .31 while an intentional walk is around .18 and a HBP slightly more than a regular walk (since walks occur disproportionately with two outs and when first base is empty).

Derivatives
From the beginning adjustments have been made to Batting Runs. The most obvious, and one that Palmer and Thorn discuss in The Hidden Game is to take the batter's home park context into consideration. To do so they first calculate the BPF or Batter's Park Factor. This number is based on the number of runs scored in the park versus the number of runs scored in road games and takes into account the fact that hitters don't have to face their own pitchers. BPF is centered on 1 and so an above average hitter's park will have value slightly above 1 such as 1.04 while a pitcher's park will have a BPF of under one, say .96.

In order then to calculate the Adjusted Batting Runs or ABR the following calculation is used:

ABR = BR-((BPF-1)*RPA*PA/BPF)

Here BR is the unadjusted Batting Runs, RPA is the number of runs per plate appearance for the league, and PA is the plate appearances for the batter. For example, if the player had 55.0 batting runs in 700 plate appearances while playing in a pitcher's park with a BPF of .92 in a league like the 2003 National League where .122 runs were scored per plate appearance, the ABR would be 55-((.92-1)*.122*700/.92) = 58.6.

A second derivative is a conversion from Batting Runs to wins, or Batting Wins. This statistic is based on Palmer's empirical observation that on average a win is purchased at the cost of 10 extra runs. In other words, if a player contributed 10 Adjusted Batting Runs, then he was worth 1 extra win to his team. Of course, the number of runs per win varies with the league context and can be calculated as:

RPW = 10*Sqrt(RPI)

RPI or Runs per Inning here is the runs scored by both teams per inning. So for a league that scores 4.5 runs per game, the two teams combined score 1 run per inning, the square root of which is 1, multiplied times 10 equals 10. As a result, in lean offensive times like the 1968 NL the RPW will be around 8.75 while in good offensive times like the 2003 NL it will be closer to 10.5.

It is then a simple matter then to divide the Adjusted Batting Runs by RPW to get the Batting Wins. The formula used in the 2004 edition of The Baseball Encyclopedia is:

BW = ABR/(10*Sqrt(RPI+(ABR/G/9)))

In this case the runs per inning of the player is added to the runs per inning for the league to take into consideration the increased or decreased offensive context that the player contributes.

Wednesday, October 13, 2004

Royals Records

The 2004 season didn't provide much to get excited about if you're a Royals fan but on the bright side the team did set some records (ok, not all of them positive). I've been going through these since I volunteered to help The Sporting News update their Royals section for the 2005 Complete Baseball Record Book. So the following is a list of all the records that the 2004 Royals or their players eclipsed or effected:

  • Most homeruns switch-hitter career: Carlos Beltran 123
  • Most hit by pitch team: 76
  • Highest batting average career: Mike Sweeney now tied with George Brett at .305
  • Highest on-base percentage career: Mike Sweeney .377
  • Most hits allowed by the team in a season - 1,638
  • Most homeruns allowed pitcher: Darrell May 38
  • Most players season - 58
  • Most games lost - 104
  • Lowest winning pct. - .358 (58-104)
  • Overall Record: 2,835-2,858 (36 seasons) - notice that they dipped under .500 this season
  • Interleague Record: 57-83 (8 seasons)
  • Number of times worst in the league: 1 in 2004
  • Most runs game - 26 vs, Detroit, Sept 9, 2004 (gm 1 of DH)
  • Most hits game - 26 vs. Detroit, Sept 9, 2004 (gm 1 of DH)
  • Most hits game opponent - 27 vs. Detroit, May 27, 2004
  • Largest crowd day game - 41,575 vs. Chicago, Apr 5, 2004
  • Largest crowd for a home opener - 41,575 vs. Chicago, Apr 5, 2004
  • Most players used before the All-Star break – 47
  • Most RBI game – 25 at Detroit September 9, 2004
  • Most RBIs both teams-game – 30 at Detroit September 9, 2004
  • Largest margin of victory game - 21 at Detroit September 9, 2004
  • Most plate appearances inning – 16 at Detroit September 9, 2004
  • Most consecutive singles - 7 at Detroit September 9, 2004
  • Most runs doubleheader - 26 at Detroit September 9, 2004
  • Most hits doubleheader – 33 at Detroit September 9, 2004
  • Most RBIs doubleheader – 25 at Detroit September 9, 2004
  • Most runs both clubs doubleheader – 39 at Detroit September 9, 2004
  • Most RBIs both clubs doubleheader – 38 at Detroit September 9, 2004
  • Tied most singles in a game Joe Randa - 5 at Detroit September 9, 2004
  • Most double plays by a 2nd baseman in a game – 5 by Donnie Murphy 9/24 at Chicago

And here are American League or Major League records set or tied by the 2004 Royals.

  • Joe Randa ties AL record 9 inning game with 6 hits September 9, 2004 at Detroit
  • Joe Randa ties ML and AL record with 6 runs September 9, 2004 at Detroit
  • Most runs scored 2 players in a game (10 - Berroa 4, Randa 6 or Randa 6, DeJesus 4) ties AL record September 9, 2004 at Detroit
  • Joe Randa ties AL record most AB 9 inning game - 7 September 9, 2004 at Detroit
  • Tied AL record with 13 straight batters reaching base in the 3rd inning September 9, 2004 at Detroit

Tuesday, October 12, 2004

Defensive Indifference

In the Braves/Astros game last night there were several instances of defensive indifference. David Smith at Retrosheet included the following numbers in a post on SABR-L that categorize how often DI's occur. In the last 15 seasons (1990-2004) there were 2,005 occurrences:

Defensive Indifference by year:
2004 247
2003 219
2002 201
2001 213
2000 199
1999 166
1998 54
1997 122
1996 124
1995 88
1994 82
1993 85
1992 85
1991 78
1990 42

Defensive Indifference by base:
2nd base 1940
3rd base 65

Defensive Indifference by inning:
1st 1
2nd 1
3rd 1
4th 3
5th 12
6th 36
7th 69
8th 212
9th 1498
extra 172

As you might expect the vast majority come in the 8th, 9th, and extra innings when the value of the run represented is minimal. Why there are so many more in the most recent years I'm not sure. One possibility is that historically managers just didn't recognize the value in taking the free base in order to make a putout on a groundball harder. Most DI's occur with 2 outs (especially if the batter is right-handed) since the defensive team often wants the keep the double play in order with 0 or 1 outs. In fact, its still the case that teams don't take full advantage of indifference as I've seen Steve Stone criticize many teams for not taking the free base.

It is interesting the DI's are described in the rule book as "undefended steals" but not included by the official scorer. Instead the scorer scores it as a fielder's choice:

"FIELDER'S CHOICE is the act of a fielder who handles a fair grounder and, instead of throwing to first base to put out the batter runner, throws to another base in an attempt to put out a preceding runner. The term is also used by scorers (a) to account for the advance of the batter runner who takes one or more extra bases when the fielder who handles his safe hit attempts to put out a preceding runner; (b) to account for the advance of a runner (other than by stolen base or error) while a fielder is attempting to put out another runner; and (c) to account for the advance of a runner made solely because of the defensive team's indifference (undefended steal)."

I've always argued that DI's should be recorded as stolen bases since the runner did actually take an extra base for which he is not getting credit and which has some value, and because it does take a modicum of running ability to take the base.

Monday, October 11, 2004

Box on Mobility

Here's the editorial Jon Box wrote that included my Compact Framework Big League Pocket Manager. Jon gives a broad outline of mobility that includes more than just Pocket PC's.

Sunday, October 10, 2004

Design the Residue of Luck?

Branch Rickey is often quoted as saying that "Luck is the residue of design". Jay Bennett (the co-author of Curve Ball) and Aryn Martin turn this phrase on his head in this chapter entitled "The Numbers Game: What Fans Should Know About the Stats They Love". The chapter is included in the book Baseball and Philosophy: Thinking Outside the Batter's Box.

In particular I think its instructive to consider their analysis of the variation in batting average among the 146 players who qualified for the batting title. They conclude that the variation resulted from two sources: ability and chance, and thatfully half of the variation (the averages ranged from .215 for Jeremy Burnitz to .370 by Barry Bonds) is the result of chance, the left - the residue - the result of ability. For most fans that just doesn't sound right since we think of .300 hitters, for example, as having the ability to hit .300 when in reality their actual ability is best thought of as a range instead of as a point. This is similar to Stephen Jay Gould's often repeated statement that variation, not type, is the central aspect of biology.

More interestingly for sabermetricians Bennett and Martin then go on perform a similar analysis with on-base percentage, slugging percentage, and OPS. They found that ablity played a much larger role in determining these values, as much as 3 to 4 times that of chance. So not only does OPS correlate better with run scoring, it also is a better measure of a hitter's true ability. While general managers are just now figuring out the former, I haven't heard much discussion of the latter.

Saturday, October 09, 2004

C.S. Lewis: A Biography

It's sometimes said that you learn as much about the biographer as the subject. In the case of C.S. Lewis: A Biography (or should I say "pathography") by A.N. Wilson I think you learn more about Wilson than about CSL. In 312 well written pages Wilson gives you his view, or should I say his Freudian psychoanalysis of the life of Lewis (1898-1963).

And that view essentially boils Lewis down to a man who could never come to terms with the death of his mother in 1908 and as a result of his Oedipus Complex had a sexual relationship with a mother-figure (Mrs. Janie Moore) 25 years his senior that also fulfilled his need to be dominated by a woman, rejected and then accepted God only when his biological father died in 1929 (God as wish-fulfillment), rejected the intellectual defense of Christianity and retreated into his childhood when challenged (and by a woman no less), and who when Mrs. Moore died married a course and disagreeable woman in Joy Davidman again because of his sadomasichistic need to be dominated by women. There.

There are some redeeming aspects of the book. First, Wilson's literary analysis of many of CSL's works is quite interesting. For Wilson, CSL's scholarly works such as The Discarded Image, The Allegory of Love, and his history of medieval literature are among his best and his comments on these books were interesting for me as someone who knows precious little about literature. And Wilson does a good job of describing the academic environment and politics in which Lewis worked in both Oxford and Cambridge and how Lewis was thought of by his peers. One gets the impression that Wilson shares CSL's peer's view that when CSL began his defense of the faith he compromised his intellectual standing.

Not surprisingly then, Wilson does not care for CSL's apologetic works and takes a dim view of The Problem of Pain, Mere Christianity, and Miracles. In each, he condescendingly picks apart arguments and illustrations (Lord, Liar, Lunatic, the incarnation, and that naturalism abolishes reason) and attempts to make the books, and CSL's whole attempt at a defense, appear simple and childish. He especially fixes on one incident after Miracles was published as a key point in his story. As the story goes the philosopher Elizabeth Anscombe read a paper at a meeting of the Socratic Club (an Oxford society where CSL was president and where an atheist and a Christian would each read a paper and defend it) on chapter 3 of Miracles. Wilson writes that Lewis was so crushed by Anscombe's obvious disposal of his argument against naturalism that he essentially abandoned his intellectual defense of Christianity and retreated into his childhood which produced the Narnia stories. I thought this sounded strange and in pulling out my copy of Miracles I noticed that the last revision of the book was made by CSL in 1960, long after the crushing defeat when he supposedly abandoned an intellectual faith. I've since discovered that he edited Mere Christianity in 1952 as well. In Wilson's view Lewis develops an imaginative view of faith only after Joy dies as exemplified in A Grief Observed.

But Wilson is especially keen on pillorying CSL's spiritual autobiography Surprised By Joy published in 1955. Wilson thinks it a cruel book (he says the same of The Screwtape Letters) because of the chapter on Albert Lewis (which I'll admit is tough but very funny) and one that does not portray the real reasons, by which he means Freudian reasons, Lewis became a Christian.

In the end this book gives you a picture of CSL as a man struggling against his neuroses that began with the death of his mother and who hides his feelings behind his books, his smoking and drinking, and bullying of his students and peers.

If that's not enough CSL's brother Warnie doesn't come off well either. He's simply depicted as a frustrated man stuck in his own childhood, totally dependant on Jack (the name everyone called Lewis), and a whiskey addict who periodically disappears to Ireland and elsewhere for a "bender". Oh, Wilson does say he was a fine historian for his book on French history.

Perhaps the main reason Wilson wrote the book is revealed both in the preface and in the last chapter. Wilson abhors the "CSL Industry" by which he means the idea that CSL has become a kind of saint to many, especially in America. And so his aim is to inject some realism into the plastic portraits of Lewis (for example that CSL wasn't a loud and bullying smoker and drinker - something that Wilson apparently thinks must be abhorrent to all those American evangelicals who appreciate CSL) that have in his view replaced the real man. I'm not sure this book is any better than those plastic portraits since it seems to erect its own false image of Lewis. I do agree that those like Walter Hooper (who has also changed his own handwriting and speaking voice to be more in accord with his master) who now believe in the perpetual virginity of CSL have obviously moved from appreciation of Lewis to misguided veneration.

I'll have to admit that this theme of the book struck me close to home since my wife, elder daughter, and I did visit Oxford last spring when in England on business with the express purpose of seeing where CSL lived and worked. I think my wife and I were both surprised that for so famous a writer there is precious little public recognition of his work outside of a few pictures in the Eagle and Child pub where the Inklings met and a plaque on the house where Joy Davidman lived when she first moved to Oxford. But perhaps that's the point and reveals Wilson's perspective. CSL is not viewed as a famous writer in Oxford, or England for that matter, and so Americans like us visiting Oxford and the continued popularity of his apologetical books probably seems grotesque to someone like Wilson.

All I know is that for me the works of CSL provide challenging insight and clarity to many aspects of the Christian life, from its intellectual components to the moral landscape on which I find myself daily doing battle. I don't think many who appreciate CSL as an author hold the plastic portrait view of the man but rather, like us all, think of him as he described Joy and himself in A Grief Observed, "A sinful woman married to a sinful man; two of God's patients, not yet cured."

Since there are many inaccuracies and other difficulties with the book I'll leave these two links if you're hungry for more. I'm told that George Sayer's biography is much better and I might just go ahead and read it as a corrective. In case you didn't get the message, this book is not recommended.

http://cslewis.drzeus.net/papers/anwilsonerrata.html
http://www.solcon.nl/arendsmilde/cslewis/reflections/e-definitivebiography.htm

Friday, October 08, 2004

Dusty's Not Walkin'

Saw this quote from Dusty Baker on The Cub Reporter:

"Yeah, you need on-base percentage guys to put the pitcher in the stretch. I don't agree with going up there looking for a walk unless the game situation dictates it. This isn't Little League."
Dusty Baker (Chicago Sun-Times - 10/4/04)

Little League? So just to be complete Dusty started off the year with this gem on the base on balls:

"I think walks are overrated unless you can run. If you get a walk and put the pitcher in a stretch, that helps, but the guy who walks and can't run, most of the time he's clogging up the bases for somebody who can run."

So in Dusty's mind walks are primarily useful for...

1. Putting the pitcher in the stretch
2. Clogging the bases

How about getting runner on base so that they can eventually score on say...one of the 235 homeruns your team hit this year??? Geez. Slow learner. I say this as I watch Mark Bellhorn, a guy Dusty gave up on, and his 88 walks and .373 OBP help the Red Sox on their way to a series sweep of the Angels.

In Dusty's defense he did also say:

"The whole thing boils down to that half of on-base percentage is getting a good pitch to hit. Most of the times when guys are striking out, a bad pitch has been swung at during the course of that at-bat." Dusty Baker (Chicago Sun-Times - 10/4/04)

That sentiment I certainly agree with. But why not recognize that walks have value in and of themselves?

Thursday, October 07, 2004

A Brief History of Run Estimation: Runs Created

As promised we'll start our look at run estimation formulas with what is perhaps the most well known formula - Bill James Runs Created.

History
James, who coined the term "sabermetrics", introduced the formula in an early Baseball Abstract (1979 I believe). It is one of a class of run estimation formulas which I'll discuss in this series that Albert and Bennett in their book Curve Ball call "intuitive" formulas because they are based not on rigorous statistical models such as regression analysis but rather on a common sense model of how the game of baseball actually works.

In its basic incarnation the Runs Created formula James initially published (although he confesses in the 1984 Baseball Abstract that he developed and discarded 30 or 40 such formulas - a sure sign that this is an intuitive formula) simply consists of three components:

RC = (A*B)/C

where A = H+BB or the number of runners on base, B = Total Bases or what is done to move runners along, and C = AB+BB, or the context in which A and B occur. So in total the basic formula was:

RC = ((H+BB)*TB)/(AB+BB)

The main advantage of this formula is that it is simple to calculate and based on counting statistics such as hits, at bats, and walks that are readily available. The other advantage is that the scale is the same as runs batted in or runs scored and so 100 Runs Created is an excellent season. To give an example, Aramis Ramirez in 2004 had 174 hits, 49 walks, 316 total bases, and 547 at bats. That gives him ((174+49)*316)/(547+49) = 118.2 Runs Created, a fine season.

The basic premise, or the intuitive model, behind the formula is that offense is essentially the product of getting on base and advancing runners through extra base hits within a particular offensive context. Typically, this formula is accurate to within 1% for a given league for a year.

In The 1984 Baseball Abstract James introduced two additional versions of the formula, a stolen base version, and a technical version.

The stolen base version uses the following formulas for A, B, and C:

A = H+BB-CS
B = TB+.55*SB
C = AB+BB

So the complete formula is:

RC = ((H+BB-CS)*(TB+(.55*SB)))/(AB+BB)

In the 1984 Abstract James explains that he shifted from using the .70 figure for stolen bases published in the 1983 Abstract and removing caught stealing from the C factor of the equation, since doing so could be shown to make logical sense.

The technical version (also called Tech-1) simply expands the stolen base version by including all the available counting statistics. The A,B, and C factors then become:

A = H+BB+HBP-CS-GIDP
B = TB+.26*(BB-IBB+HBP)+.52*(SB+SF+SH)
C = AB+BB+HBP+SF+SH

As you can see now hit batsmen (HBP), and grounded into double play (GIDP) are included in the A factor since the former adds a runner to the bases and the latter subtracts one. In the B factor intentional walks (IBB) are subtracted from walks and hit batsmen are added. The reasoning being that non-intentional walks and hit batsmen both have some advancement value, here weighted at .26. Sacrifice flies (SF) and sacrifice bunts (SH) are also included along with stolen bases and given a weight of .52, slightly lower than .55 since an advantage of the stolen base, that it helps prevent double-plays, is already included in the A factor. The C factor then expands appropriately to include the entire offense context encapsulating all plate appearances.

These three versions of Runs Created remain among those used most often by sabermetricians because of their ease of use. However, James kept innovating and, for example, the authors of Total Baseball in 1989 included 13 additional technical versions of the formula (Tech-2 through Tech-14) introduced in the 1988 version of The Bill James Historical Baseball Abstract that adjusted the weights and included the counting statistics that were available in the period from 1900-1954. These are variations of Tech-1 and include:

  • Tech-2: 1954; B factor drops IBB
  • Tech-3: AL 1940-53, NL 1951-53; SF is dropped from C factor; B factor changes to (1.025*TB+.26*(BB+HBP)+.52*(SH+SB)
  • Tech-4: AL 1939; B factor becomes TB+.26*(BB+HBP)+.52*(SH+SB)
  • Tech-5: AL 1931-38; A factor becomes .96*(H+BB+HBP-CS)
  • Tech-6: AL 128-30, 1920-26, NL 1920-25; B factor weights for SH and SB change from .52 to .51
  • Tech-7: AL 1927, NL 1926-30; A factor changes to .93*(H+BB+HBP) and B becomes TB+.26*(BB+HBP)+.46*SH
  • Tech-8: AL 1913,1917-19, NL 1913-14, 1917-1919; A factor becomes H+BB+HBP-.02*AB and B becomes TB+.85*(SH+SB)
  • Tech-9: AL 1914-16, NL 1915-16; A becomes H+BB+HBP-CS
  • Tech-10: AL NL 1908-12; B becomes 1.025*(TB+SB)+.75*SH
  • Tech-11: AL NL 1900-1907; A becomes H+BB+HBP
  • Tech-12: NL 1939-50; A becomes H+BB+HBP-GIDP; B becomes TB+.26*(BB+HBP)+.52*SH
  • Tech-13: NL 1933-38; B factor becomes 1.025*TB+.26*(BB+HBP)+.52*SH
  • Tech-14: NL 1931-32; A becomes .95*(H+BB+HBP)

Most recently James included a newer version in his 2004 edition of The Bill James Handbook and his 2002 book Win Shares where:

A = H+BB+HBP-CS-GIDP
B = TB+.24*(BB-IBB+HBP)+.62*SB+.5*(SH+SF)-.03*SO
C = AB+BB+HBP+SH+SF

While all three factors remain relatively intact (the B factor now gives different weights to stolen bases versus sacrifice flies and bunts and even includes a bit of a penalty for striking out) the relationship between the factors has also changed over time from the simply (A*B)/C and become a bit more complicated:

RC = (((2.4*C+A)*(3*C+B))/(9*C))-(.9*C)

The basic structure remains but as James described in Win Shares the calculation has been modified to address one of the criticisms of earlier version of the formula. Essentially, since the previous versions of the formula simply multiplied the A and B factors "it presented the player as if his offensive elements were interacting with one another". This leads to estimates of runs created for players and teams with high slugging percentages and high on base percentages that are in fact too high. Albert and Bennett note this in Curve Ball when they discuss how "product models" like Runs Created "tend to be unrealistic for players at either end of the offensive production spectrum." In fact, a player's offensive elements interact with other players on his own team. However, James concluded that if you attempt to calculate runs created by calculating the runs created for the player's team and then determining how many runs they would have created without the player, the player would rate slightly differently on good offensive teams and bad. His solution is to evaluate the player "as if he played in a context of eight other players of average skill, each having the same number of plate appearances." For average skill he simply used a player with a .300 on base percentage and a .400 slugging percentage. Albert and Bennett advocate using this same idea but instead place the player in the context of an average team for the league and year in which he played.

With this information you can see that the A factor of the equation has been modified to include 8 other players with a .300 on base (8 * .300 = 2.4). The B factor has been augmented with 8 players with a .400 slugging percentage (8 * .400 = 3). The C factor then includes the plate appearances for all 9 players. After performing the (A*B)/C, the runs created by the other 8 players are removed by multiplying the plate appearances for one player by .9. This works since the runs created by 8 of the typical players are equal to 10% of the plate appearances (a quirk of using the .300 OBP and .400 SLUG).

There are additional adjustments when home runs with men on base and batting average with runners in scoring position are available.

D = (HRRISP-(ABRISP*AVG))+(HRROB-(ABROB*HR/AB))

Here HRRISP and ABRISP are the homeruns and at bats for the player with runners in scoring position while HRROB and ABROB are the homeruns and at bats with runners on base. Essentially, D is the number of hits with runners in scoring position plus homeruns with men on base above that which would be expected given the player's typical performance. D is then added to the number of runs created found through the previous equation apparently on the theory that each is worth about a run.

Derivatives
Once Runs Created are calculated it can be used to create derivative statistics that are appropriate for making comparisons between players. This is necessary since like other counting statistics such as hits or runs scored, Runs Created is heavily influenced by the opportunity the player has. Even a very good hitter with 100 plate appearances will create fewer runs than a very bad hitter with 600 plate appearances.

The first is Runs Created per 27 outs or RC/27 (also sometimes called Runs Created per Game or RC/G). Simply put, this formula estimates the number of runs per game that a team made up of nine of the same player would score per game. To do so you divide the number of Runs Created by the number of outs the player consumed and then multiply this by 27.

RC/27 = (RC / (AB-H+SH+SF+CS+GIDP)) * 27

Early versions of the formula by James used 27 outs, however, later versions of the formula use the league average outs per game like so:

RC/27 = ((RC*3*LgIP)/(2*LgG))/(AB-H+SH+SF+CS+GIDP)

where LgIP is the number of innings pitched in the league and LgG is the number of games played in the league.

Of course, this statistic produces a rate and so is ideal for comparison. For example, here are the season leaders in RC/G from Lee Sinins Sabermetric Encyclopedia where he calculates Runs Created using the appropriate technical version:



RUNS CREATED/GAME YEAR RC/G
1 Barry Bonds 2004 22.08
2 Barry Bonds 2002 21.23
3 Ted Williams 1941 19.14
4 Barry Bonds 2001 18.65
5 Babe Ruth 1920 18.41
6 Babe Ruth 1921 17.90
7 Babe Ruth 1923 17.31
8 Barry Bonds 2003 16.75
9 Ted Williams 1957 16.44
10 Nap Lajoie 1901 15.78

A second derivative statistic is OWP or Offensive Winning Percentage. OWP is a calculation of the winning percentage of a hypothetical team made up of nine of a particular player on offense and league average pitching and defense. It is based on RC/27 and the Pythagorean Formula which estimates how many games a team should win based on their runs scored and runs allowed. The formula is:

OWP = (RC/27^2)/((RC/27^2)+(LgR/G^2))

This produces a winning percentage that can easily be used to compare players. The advantage it has over RC/27 is that it takes into account the league context in which the player played by using the league runs scored per game (LgR/G). Notice the differences in the top 10 list in RC/27 and the top 10 in OWP shown below:


1 Barry Bonds 2004 .958
2 Barry Bonds 2002 .942
3 Barry Bonds 2001 .922
4 Mickey Mantle 1957 .915
5 Babe Ruth 1920 .913
6 Ted Williams 1941 .908
7 Babe Ruth 1923 .896
8 Babe Ruth 1921 .891
9 Ted Williams 1957 .891
10 Babe Ruth 1926 .883

Finally, Runs Created can also be used for comparisons to average players. For example, Lee Sinins has created RCAA or Runs Created Above Average which estimates how many runs a player contributed beyond a league average player given the same number of outs consumed. As a results RCAA can have a negative values much like Batting Runs as we'll discuss in our next post in this series.

Wednesday, October 06, 2004

The Toughest Out in Baseball

In last night's Twins-Yankees game Fox broadcaster Tim McCarver called Ichiro Suzuki "the toughest out in baseball". Is that so?

Not really. If you define the "toughest out" as the person who makes the fewest outs per plate appearance then Suzuki actually ranks 12th. Here are the leaders



Outs Outs/PA
Bonds 238 0.388
Helton 357 0.527
Berkman 372 0.546
Drew 360 0.562
Thomas 175 0.565
Snow 233 0.567
Abreu 401 0.568
Walker 181 0.569
Edmonds 348 0.576
Pujols 396 0.580
Mora 363 0.582
Suzuki 442 0.584

The strange thing about this list is that 9 of the top 10 are all in the National League. You'll also notice that because Suzuki had so many at bats (702) he also made the most outs in baseball. Of course, this list is in the same order as a list in descending order by on base percentage.

Tuesday, October 05, 2004

A Brief History of Run Estimation, Part I

One of the axioms of sabermetrics listed in my post Sabermetrics 101 is:

"The goal of a batter is to help his team score runs, the goal of a defensive player is to prevent runs. Therefore statistics that do not directly measure run production (e.g. batting average) or run prevention (pitcher's wins) are less meaningful than those that do."

As a result sabermetricians have long been on a quest for the perfect run estimation formula so that an offensive player's contribution can be adequately measured. In this series of posts I'll detail the various formulas and how they've evolved over the years including:

  • Runs Created
  • Batting Runs
  • Estimated Runs Produced
  • Extrapolated Runs
  • Base Runs

Along the way I'll detail some of the derivative formulas used to contextualize and produce rates from the results of these formulas as well as talk about the strengths and weaknesses of each.

As a little background run estimation formulas are first and foremost a counting statistic. In other words, these formulas attempt to estimate or count the number of runs a player is responsible for. They are therefore not rate statistics that are used to compare the rate at which, for example, an offensive player gets hits (batting average), or accumulates bases (slugging percentage).

These formulas can further be classified as linear versus non-linear formulas. A linear formula like Batting Runs or Extrapolated Runs attempts to assign weights to various offensive events and add the weighted values to estimate the number of runs they account for. As a result, linear formulas produce a straight line when plotted for changing values in one of the offensive categories. For example:



Non-linear formulas attempt to model the interaction of offensive events and so will essentially multiply events to produce the estimate. As a result, when graphed the plotted line looks as follows:



In my next post I'll start with perhaps the most well-known run estimator, Runs Created.


The Other Other Stone

Happily, it turns out I was wrong in my post the other day. Steve Stone is not yet gone as the Cubs broadcaster. His fate is still undetermined. Stone's comment the other day referred to Chip Caray who will be broadcasting next season with his dad Skip in Atlanta. I've always thought Chip was a good announcer as well and will miss him on WGN. Hopefully, the Cubs will retain Stone.

More disturbing is Sammy Sosa's early exit during the final game. That's just wrong and I hope he gets disciplined. It will be interesting to see if he can rebound from his last year and half and his general 4 year-decline:



G HR BB OPS
2001 160 64 116 1.174
2002 150 49 103 .993
2003 137 40 62 .911
2004 126 35 56 .849



On another note Dusty Baker said the following on Corey Patterson:

"Corey can be an excellent leadoff hitter," Baker said. "It's a matter of No. 1, cutting down on strikeouts. It's knowing when to bunt, when not to bunt. He's still in the process of learning it. We're still in the process of teaching him."

Bunting and strikeouts? Neither of these is at the top of the list for leadoff hitter skills. Hmmm, no word here about increasing his walks although cutting down his strikeouts will lead in that direction. Let's pretend that's what Baker meant when he said "He's just swinging at balls out of the strike zone..." which is absolutely true. Overall in 2004 Patterson saw 3.46 pitches per plate appearance. His highest number of P/PA at 3.70 was in Sept/Oct when saw 3.70 with an OPS of just .623. That's probably because he struck out 45 times in 135 plate appearances. In other months when he hit well (June and August) he saw more pitches (3.56) than in the other months (3.28).

Monday, October 04, 2004

Random Royals Thoughts

After the game yesterday the Royals made several player moves:

Castillo may be back if Benito Santiago won't accept a backup role. Hopefully, Santiago won't and Baird can move him and his $+2M salary before spring training. Guiel will become a free agent but sounds like he wants to stay with the Royals. If he recovers from his eye problems by spring training (a big if it seems) I think he'd be an ideal 4th outfielder. Brown, Kinney and Huisman will be free agents and the Royals won't pursue them. To me, Kinney still looks like a guy who can throw but Brown and his slow bat should definitely be done and Bukvich outperformed Huisman in Omaha so that probably leaves him out. Dennys Reyes, Desi Relaford and Kelly Stinnett can also elect free agency. I hope Baird signs Reyes but I think it's time for Relaford to move on, if only so that Tony Pena has one less reason to play a middle infielder in the outfield. Stinnet isn't really needed and so will drift into the Backup Catcher Oort Cloud to be captured again by some other team's gravitational field.

I mentioned in my previous post that I thought the rotation should be: Grienke, Bautista, Hernandez, and a new pitcher. I know that leaves out Darrell May and Brian Anderson but of course the Royals will in fact go with a 5-man rotation and so either May or Anderson will be in there with the other in a long relief role unless Baird can move one or both.

Baird mentioned this morning on 610AM that his priorities included a left-fielder, an "innings eater", a fourth outfielder, a stop-gap third baseman, some relief help, and a utility infielder. I heard a suggestion on the radio the other day that I kind of like. Make Ken Harvey the stop-gap third baseman. He played there in college and he's not really a good firstbaseman (yes, Mike Sweeney is worse). That would free Sweeney and Calvin Pickering to alternate at DH/1B with Stairs playing right field most of the time. I know that once again sounds like a brutal defensive arrangement but it does utilize the resources they have. It will also increase Harvey's trade value, something Baird should be exploring.

Baird also said that he doesn't project 24 year-old Miguel Asencio to be ready until June. However, with a career K/9 rate of 4.46 I wouldn't expect production long term. Hopefully, he'll look good when he comes back and Baird can unload him.


Sunday, October 03, 2004

104 in 2004

With a 5-0 loss today the Royals end at 58-104, setting a new record for losses. This little box from the Royals website kind of sums up the season:

Team Leaders
AVG: J. Randa, .288
HR: M. Sweeney, 22
RBI: M. Sweeney, 79
W: Two tied, 9
ERA: D. May, 5.61
K: D. May, 120

Ouch. Also if you look at sabermetric stats like Win Shares you'll see that as of 9/23 remarkably Carlos Beltran still leads the team with a paultry 13, tied with Mike Sweeney. Joe Randa was second with 12. A Win Share is defined as a 3rd of a win so Sweeney was credited with contributing just over 4 wins. No other team even approaches this level.

Calvin Pickering in the short time he was up had as many Win Shares as Brian Anderson and Dee Brown combined (4).


The Other Stone Drops

Steve Stone just basically announced he wouldn't be back next year. Game 162, Cubs 9 Braves 4 in the top of the 6th. It's strange to watch a Cubs game that has no meaning...been a long time, which for a Cubs fan is a good thing to say. Lots of next year wishing.

Stretching a Single

Here's an interesting datapoint from SABR's Clem Comly which he posted on SABR-L. The question was asked how often a batter is successful in stretching a single into a double. With the bases empty Clem came up with the following numbers:

       2B 1B+out  Pct

1962 2315 89 95.9%
2003 4962 114 97.8%

In other words batters were successful 97.8% of the time when going for a double with the bases empty. Certainly on a large percentage of the doubles there was no play on the runner but it appears that major leaguers are pretty good at determining when they can make it and when they can't.

Given these numbers one wonders whether runners shouldn't take more chances, since the odds of scoring a run increase from 12 to 19% with a runner on second only as opposed to a runner on first depending on the number of outs.

Ave Catuli 2004

The Cubs were officially eliminated today 8-6 by the Braves. One of the themes of the nightmare week continued as Kyle Farnsworth and Mike Remlinger let it slip away in the 8th. Sammy Sosa hit number 574 to pass Harmon Killebrew along with homers by Moises Alou and Aramis Ramirez but it wasn't enough.

It doesn't hurt as much as last year since the inconsistency of this team would have doomed it in the playoffs. Hoping Steve Stone is retained as he's one of the best in the business. Personally, I didn't think his comments as written were that bad. Dusty did make some questionable moves this week as he usually does but the failure of the team should be laid squarely at the feet of the players. The offense was non-existent the last week and half and the bullpen didn't come through.

The Cubs will have some interesting decisions in the off-season on the offensive side. Only Sosa, Michael Barrett, and Derrek Lee are signed for next season I believe. The Cubs have an option on Mark Grudzielanek and Ramirez is eligible for salary arbitration. It's likely Alou and Matt Clement will not be back which will free up some money (oops, I know the Cubs needn't be concerned about money but that's my Royals reflex kicking in). I wouldn't sign Nomar Garciapara with his injury history and for the love of all that's holy I'm pleading that Hendry refrain from signing Dusty's collection of misfits including Niefi Perez, Tom Goodwin, Calvin Murray, et. al. More than anything they need someone who can get on base...say Carlos Beltran who would then move to left field. I don't see anything coming out of the minors next season since the top prospects were traded in the Nomar deal.

Friday, October 01, 2004

Baird's Moves

Although the Royals season on the field has been anything but fun to watch (I scored 27 games for MLB.com and attended 5 or 6 others so I speak from first-hand experience) GM Allard Baird can be commended for making some pretty good deals during the season. Here are the highlights:

  • 8/13 Claimed RHP Matt Kinney off waivers from Milwaukee. Kinney is a guy who appears to still have a good arm at 27 and can still throw in the mid 90s. He's struggled a bit with the Royals in 16.3 innings but has struck out 21. I think the Royals are trying to work with him on his mechanics. If he can control the number of homeruns he gives up (58 in 377.67 career innings) he could become a valuable guy out of the bullpen.
  • 7/31 Acquired OF Abraham Nunez from the Florida Marlins for RHP Rudy Seanez. Nunez, while perhaps not the 5-tool player he claimed to be, was worth taking a risk on, especially for the 35+ year-old Seanez. Nunez is 27 years-old, has some power, pretty good plate discipline, and is a switch-hitter. For the Royals he's gone .236/.306/.349 after a good start which I grant is not inspiring. He has a chance to be a decent 4th outfielder if the Royals can address their corner-outfield problems by picking up at least one outfielder in the off-season.
  • 7/30 Traded INF Jose Bautista to the New York Mets for C Justin Huber. Huber, at just 22 years-old is a definite prospect while Bautista is only 23 but not needed after the Royals acquired Mark Teahan (below). Huber had a .414 OBP for the Mets AA team before being traded. He was a catcher but after undergoing surgery on his left knee in August to repair torn cartiledge it looks more like he'll drift leftward on the defensive spectrum and end up at first base. The question will be if he has enough power to hold down first base and if he develops it what the Royals will do with their glut of firstbasemen (Calvin Pickering, Ken Harvey, Mike Sweeney, Matt Stairs).
  • 7/2 Acquired outfielder Ruben Mateo from the Pittsburgh Pirates for cash considerations. Like Nunez, Mateo was worth taking a risk on. As a formerly highly regarded prospect it was just possible that he would blossom. Of course he didn't but still a decent risk for minimal investment along the lines of Nunez.
  • 6/24 Acquired pitcher Octavio Dotel and catcher John Buck from the Houston Astros for outfielder Carlos Beltran; traded Dotel and cash considerations to the Oakland Athletics for pitcher Mike Wood and third baseman Mark Teahan. This was the biggest deal of the summer. Since the Royals were going to lose Beltran anyway at the end of the season they addressed three needs, a starting catcher, a third baseman, and an interim starting pitcher. After struggling early, Buck seems to have found his stroke with .274/.299/.562 in September and 11 homeruns in his last 153 at bats. Teahan hit .280/.344//.447 for Omaha in 267 plate appearances. He struck out an awful lot but he is projected to be in Kansas City at some point next season. And of course the deal frees up $11M for next season assuming the same payroll. I don't really see Mike Wood having much of a shot of winning a spot in the rotation next season (see below).
  • 6/21 Acquired RHP Denny Bautista from the Orioles for RHP Jason Grimsley. The 24-year old Bautista has the best pure stuff of any pitcher the Royals have (with the possible exception of Jeremy Affeldt). Mid 90s fastball, slider, curve, and changeup. His stats don't look great but his last two outings have been much better (12IP, 4ER, 9Ks, 0HR). He hadn't been throwing his slider until the other night against the White Sox and he has a very good one. He's also reportedly put on almost 20 pounds since joining the Royals. This may be the sleeper of the summer as he vies for a position in the rotation next season. Right now I'm projecting Zack Grienke, Bautista, Runelvys Hernandez, a pitcher they pick up in the off-season (their second biggest priority behind a corner outfielder), and then one of the combination of Jimmy Gobble, Mike Wood, Kyle Snyder, and the other cast of characters. Of course, I'd rather see them use a four-man rotation but that's an argument for another day...
  • 4/8 Acquired RHP Justin Huisman from the Rockies for RHP Zach McClellan and INF Chris Fallon and cash. Husiman was up with the Royals for a short time and then sent back to Omaha. He had nice numbers in 2001-2003 as a closer. Allowed only 1 homerun in 61.67 innings in 2003. He pitched adequately for Omaha although Ryan Bukvich, although with less control, was more impressive in that role.

Overall, Baird has succeeded in filling three positions, shoring up the outfield some, and acquiring some pitching depth at almost no cost. Personally, I'd like the see the Royals commit to a youth movement to see if the core of DeJesus, Buck, Teahan, Grienke, Bautista, Angel Berroa, Gotay, and Hernandez along with a power-hitting corner outfielder and of course Calvin Pickering can make some headway in the next 2 or 3 years.


In Dusty we Trusty?

I've read some criticism of Dusty Baker for yesterday's loss. To be fair, although generally Dusty is a poor game tactician, the loss can not reasonably be laid at his feet.

  • Yes, Mark Prior batted twice with the bases loaded but with the bullpen issues the Cubs have had you can't blame him for not pinch hitting. Also one of the times was in the 2nd inning.
  • Also he was criticized for not pinch hitting Ben Grieve in several situations, especially when a baserunner was needed. Well, Grieve has looked awful the last few days and is 3-14 in September with exactly 1 walk. In fact, he's had only that one walk in his last 48 times to the plate so you're probably not giving much up by not pinch hitting him.
  • Third, he was pilloried for the Nomar Garciapara bunt with 1 out in the bottom of the twelvth. Yes, on its face this is a terrible move as it changes the probability of scoring from 28.3% to 22.3% and the run potential from .573 to .344 and the win probability from 20.8% to 14.3% - all bad things. But I think it was clear that the bunt attempt was for a hit and was done by Nomar himself. Actually, not too bad a play if it succeeds of course with the 3rd baseman playing back. Not sure with Nomar's bad groin whether it was a good idea though. But certainly not Dusty's fault.

Mike Hampton against Kerry Wood today at 2:20PM central.