Ok, I was wrong yesterday when I said that the Cubs would be getting Hairston for $11.7M. It now appears that the Cubs will pick up $12M of Sosa's $17M 2005 salary and have to pay a $3.5M severance that was written into Sosa's contract.
So the Cubs will pay over $17M for Jerry Hairston Jr.'s services and the chance that Fontenot and/or Crouthers might be future contributors.
It also now appears that Hendry is not interested in Magglio Ordonez. As Phil Rogers wrote on ESPN that may mean that they'll go after Jeremy Burnitz. Let's hope not. Burnitz is coming off of a season in Colorado where of course he looked better than he is. He hit .244/.327/.448 on the road. And if the Cubs sign him that will simply add to the money they've already thrown away.
At this point it looks as if Hendry may actually think that Hairston is a right-fielder. Hmmmmm
Monday, January 31, 2005
More on Sammy
Posted by Dan Agonistes at 10:44 PM 0 comments
Sunday, January 30, 2005
Slammin' Sammy Says So Long
The big news this weekend for Cubs fans is that Sammy Sosa looks like he’ll be dealt to the Baltimore Orioles in exchange for Jerry Hairston Jr. and a couple of minor leaguers, second baseman Mike Fontenot and pitcher Dave Crouthers. Reportedly, the Cubs will pick up around $6M of Sosa’s $17M 2005 salary. Some reports mentioned that perhaps Kyle Farnsworth might be part of the deal but that seems doubtful.
In analyzing the proposed deal, first it should be remembered that Hairston is a second baseman by trade not an outfielder. The O’s elected to use him in the outfield only because Brian Roberts has been pushing him every step of the way the last two years and finally won the job after Hairston broke his right ring finger in spring training. That was a repeat of 2003 when Hairston on May 20 broke his foot after landing wrong when fouling a pitch off of his foot.
After the 2004 season ended O’s GM Jim Beattie said in no uncertain terms that Roberts was the everyday second baseman in 2005. Apparently, manager Lee Mazzilli wanted Roberts to compete for the job but Beattie said no, which is understandable since Roberts did hit 50 doubles and steal 29 bases last year.
And of course Roberts hits like a second baseman (.261/.334//.371 career). Last season was his best in only 335 plate appearances (.303/.378/.397) but at age 29 (as of May 29th) I wouldn’t expect him to get any better. He has absolutely no power, average plate discipline, and is a poor percentage base stealer and so even in 2005 his Normalized OPS (OPS+) was only 100 (league average). Of course, he’d make a better backup infielder than either Jose Macias or Niefi Perez offensively, but it would be the height of irresponsibility for Cubs GM Jim Hendry to expect Hairston to play right field on a regular basis. And forget about installing Hairston as a leadoff hitter as is mentioned on the Cubs site. Todd Walker is by far the better candidate. Hairston will probably make around $1.7M in 2005.
Interesting stat from Cubs.com: "Last year, the wind blew in for 35 games, and the Cubs were 16-19 in those games, hitting 44 homers and scoring 165 runs. In the 39 games in which the wind was blowing out, the Cubs were 26-13, hit 81 homers and totaled 226 runs."
And all of this leads to the question of who will play right field for the Cubs in 2005? Losing both Alou and Sosa means that 74 homeruns have departed. The platoon of Todd Hollandsworth and Jason Dubois will be lucky to recoup 25 of those and so the right field slot becomes even more important. I doubt that David Kelton is the answer as he still appears unready and its doubtful that Dusty Baker would give a rookie much of a shot in the first place. To me this means that Magglio Ordonez has to be in the sites of Hendry and company. Ordonez is the only viable option left in the free agent market although Ordonez and his agent Scott Boros met with the Tigers last week. The only other option, since the minors have been pretty well cleaned out, might be deal one of the big three starters or Dubois which seems highly unlikely.
As for the minor leaguers mentioned in the deal, Fontenot is also a second baseman and former first round draft choice (2001). He hit .281 at AAA Ottawa last year with 30 doubles and 10 triples. He walked 49 times (.346 OBP) and struck out 111 times while hitting only 8 homeruns. He had a great 2003 season in AA hitting .325//.399/.481. He’s only 24, however, and could play a role in the future.
Crouthers, a 6’3” right-hander, was also drafted in 2001 and pitched at AA Bowie in 2004. He’s a hard thrower and struck out 138 in 140 innings while giving up 134 hits. He walked 68, however, and gave up 23 homeruns which gave him a 5.03 ERA in 27 starts and 1.45 WHIP. He seems to have regressed a little from 2003 when he was promoted to AA and pitched pretty well (1.22 WHIP). He’s probably a long shot at age 25 to be much of a contributor long term.
If this is the deal, in the final analysis the Cubs will pay $7.7M for a utility man (Hairston) in 2005 and the possibility that Fontenot can contribute down the line. That seems like a pretty high price to pay.
Posted by Dan Agonistes at 10:15 PM 0 comments
Friday, January 28, 2005
Beane and Bloggers
Found this quote on Athletics Nation as quoted by Baseball Musings. This is Billy Beane talking about the web and bloggers...
"There is a tendency [in blogs] to really analyze things in detail. Ultimately, because there is so much conversation and investigation on a site like yours, people may not ultimately agree with it, but they stumble onto what you're trying to do. Someone emailed me something written on a Cardinals' blog, and they had nailed all the things we were talking about. The economic reasons, the personnel reasons and the reasons we made the exchange. The world of a Web log will lend itself to a lot of investigation. And you will often stumble across the answer more than someone who has to write in two hours to meet deadline just to make sure something is out in the paper the next day."
To me Beane's comments reflect one of the big changes that the critical mass of the web in general and blogging in particular have wrought. Simply put, expert opinions (present company excluded of course) in a seemingly endless array of fields are now essentially free.
Why are they expert opinions? Because in a community where multiple millions of people can share information at very low cost, the law of large numbers kicks in resulting in a substantial number of people, that because of their passion and intelligence, will acquire specialized knowledge that they're willing to share. And often these folks are not employed in the field they are commenting on (as is the case with the Cardinal's blogger mentioned in Beane's comment) which enables them to avoid the competitive nature of the "insiders" who horde knowledge to gain competitve advantage or only use it for specific purposes (as in the case of sportswriters who are under different constraints). The result is that what was once the possession of the few becomes available to the many at virtually no cost.
I say virtually no cost because although Google has become a verb ("to Googgle" or "I just Googled it"), and RSS is ubiquitous in the blogging community, it is still somewhat - meaning not automated enough - difficult to find and cull the expert opinions and to receive only what's relevant. In both these areas what is needed are intelligent agents (something like this RSS filter are the very beginnings) that can search contextually on natural language input instead of on simple keywords and learn based on the reader's actions. I've seen demos of some of this technology produced at Microsoft Research but I don't know how soon it may make it into real products (or whether some of it has already).
From the baseball perspective the expert opinion effect is magnified (or has been accelerated) because of the public nature of not only the performance of the players, but the economics of each team, the relationship between labor and management, and even the input into the system via high school and college players. In short, almost all of the relevant information that experts need to synthesize is already present.
Posted by Dan Agonistes at 8:10 PM 0 comments
A Week with Felix Millan
If you've ever thought of going to a fantasy camp like I have (thought about it that is but) but not done it you might want to check out this blog put up especially to chronicle a week at the Mets camp. Sounds like the Mets do it right. Maybe someday...
Posted by Dan Agonistes at 8:07 PM 0 comments
Wednesday, January 26, 2005
Show Me the Money
One of the interesting aspects of the Lahman database is that it contains salary data that goes back to 1985. While I can’t vouch for it’s accuracy (I believe the data originally came from the late Doug Pappas’ database but I’m not certain) I thought it might be interesting to run some numbers related to team payroll.
Baseline
For the analysis that follows I only used the data from 1992 forward (376 team seasons) since it appears the data is fairly spotty before that time. For example, there are several teams in 1987 that have salaries for only a handful of players. In any case, given the years since 1992 it appears there is pretty good data and the following shows the average, maximum, minimum, and standard deviations for payrolls during that time.
Year Avg Max Min Std
1992 $31.0 $44.8 $9.4 $9.0
1993 $32.2 $47.3 $10.4 $9.1
1994 $33.1 $49.4 $14.9 $8.4
1995 $34.0 $50.6 $12.4 $9.3
1996 $34.2 $54.5 $16.3 $10.5
1997 $40.3 $62.2 $10.8 $12.8
1998 $42.6 $72.4 $10.6 $15.1
1999 $49.8 $86.7 $17.9 $20.2
2000 $55.5 $92.3 $16.5 $21.1
2001 $65.4 $112.3 $24.1 $24.3
2002 $67.5 $125.9 $34.4 $24.3
2003 $70.9 $152.7 $19.6 $27.5
2004 $69.0 $184.2 $27.5 $32.3
What you can immediately see from this is that while the average team payroll doubled from the years 1996 to 2004, the maximum payroll more than tripled while the minimum payroll didn’t quite double. This trend is illustrated by the fact that the standard deviation tripled as well and regularly surpassed the minimum payroll with the exception of 2002. In graph form the data looks as follows:
In the rest of this article I’ll examine a few issues that came to my mind as I looked at this data.
Dollars and Wins
One of the obvious questions that might be answered with this data is to measure the correlation of wins and losses to payroll. Much as I did in my post on A Mathematician at the Ballpark, I used Excel’s CORREL() function to calculate the correlation coefficient (r value) for payroll compared to record. However, to make the calculation more precise and remove the inherent salary inflation I calculated what I call the Normalized Payroll, defined as the payroll divided by the average payroll for the year and then calculated its correlation with winning percentage. Doing so produced a correlation of .432, a value that indicates that while the correlation is positive, it is weak (an r value > .70 would be considered strong). You can see below the scatter plot of Normalized Payrolls versus winning percentage and the linear regression line that shows the positive correlation. The graph also shows the coefficient of determination (r^2) and the regression equation. This analysis indicates that payroll plays a part in fielding a winning team, but is far from the determining factor explaining less than 19% of the variation in winning percentage.
Team Efficiency
A second way to look at the data is to determine which teams were the most efficient with their payrolls. In other words, who got the most bang for their buck in terms of dollars per win? Using Normalized Payroll it’s simple to calculate a value I call Payroll Efficiency by dividing winning percentage by Normalized Payroll. By this measure, higher numbers are better. The top teams of the last 20 years were:
Year Team Wins Payroll PE
1997 Pittsburg 79 $10.8 1.82
1998 Montreal 65 $10.6 1.61
1992 Cleveland 76 $9.4 1.55
2000 Minnesota 69 $16.5 1.43
2001 Minnesota 85 $24.1 1.42
It’s not too surprising that all but one of these teams are below .500 since any major league team will win 30% of their games just by showing up regardless of how low their payroll is. The first team with a winning record is the 2001 Twins who ranked 5th when they won 85 games with a PE of 1.42. The top really good team on the list is the 2001 A’s who ranked 10th and won 102 games with a payroll of just $33.8M in a year when the average payroll was $65.4M for a PE of 1.22. The A’s repeated the trick in 2002, ranking 19st with a PE of 1.07.
I was surprised, however, that teams from the last few years such as the Twins, A’s, or Expos didn’t crack the top five. In fact, the top team of more recent vintage was the 2003 Devil Rays who won just 63 games and did it with a PE of 1.41. The 2004 Brewers, 2004 Devil Rays, and 2004 Indians all cracked the top 30 as well.
On the bottom of the scale the least efficient teams of the last 20 years are:
Year Team Wins Payroll PE
2004 Yankees 101 $184.2 0.23
2003 Mets 66 $116.9 0.25
1995 Blue Jays 56 $50.6 0.26
1992 Dodgers 63 $44.8 0.27
2002 Ranges 72 $105.2 0.28
Although the bottom of the spectrum is also dominated by bad teams, the Yankees recent orgy of spending has made them the notable exception. Even winning 101 games last year didn’t save them from being the least economically efficient team of the last 20 years. Their 2003 season when they also won 101 games ranks 7th worst as well.
Dollars and the Post Season
Finally, another way to look at the data is to see how well payroll correlates with post season appearances.
Criteria Teams Normalized Payroll
Post Season 122 1.19
Wild Card 20 1.21
Division Winners 75 1.24
League Winners 22 1.36
World Series Winners 11 1.36
This can also be broken down into periods
1992-1997
Criteria Teams Normalized Payroll
Post Season 52 1.19
Wild Card 6 1.30
Division Winners 32 1.18
League Winners 10 1.31
World Series Winners 5 1.42
1998-2004
Criteria Teams Normalized Payroll
Post Season 62 1.26
Wild Card 14 1.17
Division Winners 43 1.29
League Winners 12 1.42
World Series Winners 6 1.31
Clearly this data shows a relationship between high payrolls and getting to the postseason. At the very least it appears a team needs to spend in the range of 20% more than the league average in order to be competitive and 30% more to compete for a league championship or World Series victory. The standard deviation of Normalized Payroll is around .35 and so World Series champions are more than one standard deviation above the mean.
There is also a predictable relationship between payroll and how well a team finishes within their division. Since 1992 the numbers are:
Rank NPayroll Teams
1 1.24 75
2 1.05 73
3 0.97 75
4 0.90 73
5 0.83 59
6 0.79 15
7 1.01 6
While these values don’t appear too alarming it looks a bit different when you break it into periods:
1992-1997
Rank NPayroll Teams
1 1.18 32
2 1.05 32
3 0.96 33
4 0.91 31
5 0.91 24
6 0.83 8
7 1.01 6
1998-2004
Rank NPayroll Teams
1 1.29 43
2 1.06 41
3 0.98 42
4 0.89 42
5 0.77 35
6 0.76 7
Now you can see that over the last few years, if a team is being outspent, they’re going to find themselves in the second division.
Cubs and Royals
Since I follow the Cubs and Royals I thought I’d end by showing how these two teams have done in Normalized Payroll and Payroll Efficiency.
First, the Cubs…
Payroll WPct Npayroll PE
1992 $ 29,829,686.00 0.481 0.963 0.500
1993 $ 39,386,666.00 0.519 1.223 0.424
1994 $ 36,287,333.00 0.434 1.095 0.396
1995 $ 29,505,834.00 0.507 0.868 0.584
1996 $ 33,081,000.00 0.469 0.967 0.485
1997 $ 42,155,333.00 0.420 1.047 0.401
1998 $ 50,838,000.00 0.552 1.193 0.463
1999 $ 62,343,000.00 0.414 1.252 0.330
2000 $ 60,539,333.00 0.401 1.090 0.368
2001 $ 64,715,833.00 0.543 0.990 0.549
2002 $ 75,690,833.00 0.414 1.122 0.369
2003 $ 79,868,333.00 0.543 1.126 0.482
2004 $ 90,560,000.00 0.549 1.312 0.419
As you can see last year the Cubs finally broke the “magical” 130% of average payroll but have historically not gotten much for their money.
The Royals…
Payroll WPct Npayroll PE
1992 $ 33,893,834.00 0.444 1.094 0.406
1993 $ 41,346,167.00 0.519 1.284 0.404
1994 $ 40,541,334.00 0.557 1.223 0.455
1995 $ 29,532,834.00 0.486 0.869 0.559
1996 $ 20,281,250.00 0.466 0.593 0.786
1997 $ 34,655,000.00 0.416 0.861 0.483
1998 $ 36,862,500.00 0.447 0.865 0.517
1999 $ 26,225,000.00 0.398 0.527 0.755
2000 $ 23,433,000.00 0.475 0.422 1.127
2001 $ 35,422,500.00 0.401 0.542 0.740
2002 $ 47,257,000.00 0.383 0.700 0.546
2003 $ 40,518,000.00 0.512 0.571 0.897
2004 $ 47,609,000.00 0.358 0.690 0.519
This tells a different story as the Royals payroll has plummeted since the early 90s to where it is now a paltry 70% of the average. Rest assured, given that level of spending and past history it is unlikely that the Royals can be competitive. The only bright spot is that they occasionally have been fairly efficient with their money (a PE of .567 is average).
Posted by Dan Agonistes at 9:16 PM 1 comments
Tuesday, January 25, 2005
A Mathematician at the Ballpark
One of the books I received over Christmas and just had a chance to read is A Mathematician at the Ballpark: Odds and Probabilities for Baseball Fans by Ken Ross. In this book Ross introduces the reader to probability and statistics using a good dose of baseball along with gambling, cards, and other topics. The book that most closely resembles it is Curve Ball, although the goal of this book is definitely to teach probability rather than using probability and statistics to elucidate relationships in baseball as the authors of Curve Ball do.
Some of the topics covered include converting probabilities to odds and vice versa, understanding combinations (how many ways are there to select n elements from a set of k elements), probability and Bernouli trials, correlation, and linear regression. Although many of the examples are related to baseball there are plenty more related strictly to the lottery, casino games, and betting on baseball. Since I'm not much of a gambler these latter examples kind of lost my interest.
Correlation and Offense
What did pique my interest was his discussion of correlation and offensive statistics on pages 129-131. Here Ross introduces the notion of correlation and notes several correlations of offensive team statistics and winning percentage for 2003. Those he provides are:
HR .387
AVG .554
SLG .578
OPS .625
BRA .628
OBP .655
Interestingly, OBP correlates more strongly with winning percentage than does either OPS (OBP+SLUG) or BRA (Batter Run Average = OBP*SLG). None of them, however, have a "strong correlation" defined at greater than .70. Keep in mind that correlation in this case is simply a measure of the linear relationship between winning percentage and these other statistics. In other words as OBP increases winning percentage will increase in a more uniform manner relative to BRA, OPS and the others.
I thought it would be interesting to calculate the correlation between team runs scored with not only these offensive measures but also several of the run estimators I discussed in a recent series:
A Brief History of Run Estimation
Run Estimation: Runs Created
Run Estimation: Batting Runs
Run Estimation: Estimated Runs Produced
Run Estimation: Base Runs
To do this I loaded the new 5.2 version of the Lahman database into SQL Server. I then plugged the run estimators into a query and then loaded the results into Excel. I used Excel's CORREL() function to calculate the correlation coefficient, r, you see below.
These are the formulas I used:
- Runs Created (RC) - I used the version found in The 2005 Bill James Handbook and talked about here. I also calculated the basic version (RC-Basic or RC-B) for comparison
- Batting Runs (BR) - I used the formula found in the 2004 Baseball Encyclopedia but instead of using the ABF factor I used a custom out value of -.10. I did this so that Batting Runs would calculate total runs instead of marginal runs above the league average
- Estimated Runs Produced (ERP) - rather than use Paul Johnson's formulas I used two of Jim Furtado's eXtrapolated Runs formulas - XR and XRR or eXtrapolated Runs Reduced
- BaseRuns (BsR) - I used the version of the formula found on Tangotiger's site
So here were the results:In ascending order of correlation (r value) with runs scored my data set revealed:
SB -.02
BB .590
HR .719
AVG .843
OBP .910
SLG .913
OPS .955
BRA .9576
RC-B .958
BR .9586
BsR .9591
XRR .961
RC .9638
XR .9641
As you can see all of the statistics except stolen bases and walks correlate strongly with runs scored, that is, they have an r value greater than .70. In fact, stolen bases actually have a negative r value, meaning that there is essentially no, or a slightly negative, correlation between teams that score a lot of runs and those that steal a lot of bases. At first glance you can imagine that this is because some teams that cannot hit homeruns and doubles must resort to stolen bases in order to try and manufacture some offense while some teams that are more proficient in extra base hits also include some speedy players. In that sense it's not likely that stolen bases actually inhibit run production but it does reveal that it is not necessary to steal bases in order to score a lot of runs. In this data set teams that scored more than 800 runs stole an average of 90 bases while those that scored fewer than 800 runs stole 94.
However, once you get past slugging percentage you can see that the remainder of the measures starting with OPS produce r values clustered between .955 and .964. In other words, all of them are very closely correlated with run scoring and so as the measures go up and down, so does run scoring. This is why sabermetricians prefer to use these other measures to evaluate offensive production rather than the standard batting average (official since 1876) or homeruns. There are two additional points here.
First, of these measures OPS, although at the bottom of the cluster, is by far the easiest to calculate using only addition with readily available statistics. That's why many prefer to use it instead of batting average. For myself, although a much more informative number than batting average alone, I'd prefer to see all three standard offensive measures since they convey more information when viewed as a group than OPS does by itself. In addition, BRA, as mentioned by Ross is not distorted by players who have a low SLG and a high OPB or vice versa. For example, a player with a .250 SLG and a .475 OBP will have an OPS of .725 while a player with a .350 SLG and a .370 OBP will have an OPS of .720. However, when you calculate their BRA the first player's is .119 while the second player's is .130. Since BRA is more closely correlated with run scoring I'd have to conclude that BRA is the better measure.
Secondly, because the run estimators, starting with RC-Basic, actually do attempt to estimate runs, the formulas can be applied to individuals in the hopes that they approximate the run contribution of the individual. So given this list it appears that Furtado’s eXtrapolated Runs is the best measure to use since it correlates more closely with run scoring than any of the others (even if the difference is minute).
However, there are two other ways to measure these run estimators. First, we can find the average error for each estimator, or the average number of runs each was off.
RC-B 44.67
XRR 20.75
XR 19.65
BR 19.14
RC 18.58
BsR 18.11
Judging from these numbers we would assume that BaseRuns is the most accurate since it produces the smallest average error? But is this better than using the r value? Remember that the r value measures the strength of the linear relationship between runs and the thing measured while smallest average error tells us which one makes is closer on average. So this could mean that while XR is better at ranking, BsR is actually more accurate.
There is actually a third way to rank these estimators and that is to use the standard deviation.
RC-B 23.01
BsR 16.58
XRR 16.51
BR 16.35
XR 15.49
RC 14.32
And doing so gives us yet a third winner, RC, who standard deviation is about a run better than any of the others. This means that the spread of values for Runs Created is smaller than any of the others, in other words it does not tend to make big mistakes. This is easily seen by noticing that RC-B misses the mark by 102 runs for the 2000 White Sox while the biggest error RC makes is 72 for the 2002 Phillies.
So we can conclude that XR gives us the best correlation, BsR gives us the smallest average error and RC gives us the smallest error distribution. So which one should we use? Since all of these (except the basic Runs Created formula) are so close, in practice it rarely matters given the restricted range of offensive levels at which major league baseball is played. As mentioned in my series the linear formulas (BR, XR, and XRR) tend to underestimate run production at higher levels while multiplicative formulas (RC-B and RC) tend to overestimate run production at higher levels (although James has corrected for this in RC in recent years). BsR alone seems to provide a better mix in either environment since it is an intuitively more accurate way of modeling run production.
Streaks
The other aspect of the book that caught my attention was the author's discussion of streaks or "hot hands" in chapter 8. After reviewing the standard studies done on the subject as they relate to baseball and basketball, Ross reviews a study done by Reid Dorsey-Palmateer and Gary Schmidt that looked at professional bowlers. This study showed pretty convincingly that the proportion of strikes after strikes was higher to a statistically significant degree, than the proportion of strikes after non-strikes. Ross then uses this study to conclude that "I must now assert what I have long intuited: Sports players do get in 'the groove', or have 'hot hands'". To be fair, he also acknowledges that this is very difficult to detect in complex games like baseball and basketball.
I'm not so sure that he's correct when it comes to baseball. Couldn't it just be that bowling is different than baseball or basketball by the fact that the trials (frames) occur more frequently and under more uniform conditions? In other words, maybe what allows bowlers to repeat their performance is the fact that their adrenalin doesn't diminish between frames as it does for baseball players between at bats. My intuition is that these are apples to oranges comparisons and that the result for bowlers cannot be extrapolated to mean that there is a "hot hand" phenomenon in baseball.
Articles
In addition to the above sections I appreciated the author's inclusion of summaries of some of the interesting work done by my fellow SABR members and published in By The Numbers, the newsletter of the statistical committee.
Overall, this is a good book for those wishing to learn more about statistics and those interested in odds and betting. If you're already sabermetrically literate you'll probably not find a whole lot that's new.
Posted by Dan Agonistes at 6:02 AM 2 comments
Monday, January 24, 2005
Patterson Leading Off Again
Great post on the Cub Reporter about Corey Patterson batting leadoff as noted on MLB.com.
One year ago, the Cubs weren't sure if Patterson was healed from a severe knee injury. He played 157 games in 2004, and Baker projects the center fielder will be leading off again.
"I need some speed," Baker said of his lineup. "If he could cut the strikeouts one-third or half,
think how effective he'll be."
The problem with Patterson is pitch selection, something hitting coach Gene Clines will focus on. Baker wasn't worried.
"He'll get it. He's a smart kid," Baker said.
It looks like Dusty will try him there once again this season. What I like most about the post is Alex's comment that it doesn't matter much where you bat as long as you get the right players in the lineup to begin with. In other words, don't play Niefi Perez or Jose Macias if you want to win.
Alex's simulator also confirmed what my simple analysis done last season when discussing Barry Bonds did. Even moving a great player like Bonds up in the lineup produces only slightly more than one extra win per season. Alex noted that moving Derrek Lee up last season would produce only 7 more runs, less than one win.
Posted by Dan Agonistes at 10:28 PM 0 comments
Scouting vs. Sabermetrics Redux
Here is a followup Alan Schwarz did to his piece on Scouts versus Statistics and that I blogged about here.
Posted by Dan Agonistes at 10:24 PM 0 comments
Saturday, January 22, 2005
Gould and the Moral Argument
Last week I heard a sermon that talked in part about how materialism, or the belief that matter is the only ultimate reality, cannot account for the existence of ethics and morality. This was essentially a defense of the general Moral Argument (these are really a family of arguments) I mentioned in my previous post on Apologetic Arguments. The structure of the Moral Argument is analogous to the Argument from Reason in that it postulates that materialism is proven false by the existence, in this case morals or ethics, of something that cannot be explained by it.
Anyway, I say all that to mention that after the sermon I pulled out my copy of The Hedgehog, the Fox, and the Magister's Pox by the late Stephen Jay Gould. This was the last book Gould wrote before he died in 2002 and was not really finished, which you can tell by some of the rougher prose that he didn't get time to clean up. In the book Gould is very uncomfortable with Edward O. Wilson's belief in strict reductionism as espoused in his book Consilience (a term coined by William Whewell in the 1840s that literally means a "jumping together" and used by Whewell to explicate the idea that disparate facts could be coordinated to formulate lower-level laws explaining higher-level structures) when it comes to ethics and morality. Interestingly and sadly, after making a case for the existence of morality as not being based in the physical (factual) world he says:
"At this point, one can hardly avoid the question of questions: If factual nature cannot establish the basis of moral truth, where then can we find it? I don't feel excessively evasive or stupid in admitting that I have struggled with this deepest of issues all my conscious life., and although I can summarize the classical positions offered by our best thinkers through history, I have never been able to formulate anything new or better. After all, if David Hume, and others ten times smarter than I could ever be, have similarly struggled and basically failed, I need not berate myself for coming no closer."
What's interesting to me is that it seems Gould essentially rejected strict reductionism but still could not bring himself to follow the implications of the Moral Argument he was all the while defending. Instead, he seems to have given up and apparently did not include Christian or other religious thinkers among his "best thinkers through history".
Posted by Dan Agonistes at 7:03 AM 1 comments
A Left-field Platoon?
As I mentioned last week, The 2005 Bill James Handbook includes player projections. One of the things I found interesting was taking a look at the Royals projected outfielders, especially in light of the fact that Royals GM Allard Baird has thus far failed to obtain the corner outfielder with power that he talked about just after the season.
He did, however, obtain both Terrance Long and Eli Marrero that might just be able to be a servicable left-field platoon with Long playing against right-handed pitching and Marrero against left-handers. Anyway, here are the projections from James:
AB H 2B 3B HR R RBI RC BB SO SB CS AVG OBP SLG OPS
Long 380 101 22 2 9 50 46 49 29 61 3 2 .266 .318 .405 .723
Marrero 241 63 13 1 8 33 35 33 21 48 4 1 .261 .321 .423 .744
TOTAL 621 164 35 3 17 83 81 82 50 109 7 3 .264 .319 .412 .731
Ok, so these aren't world beater numbers by any means but if you told Baird right now that he would get 17 homeruns, 81 RBIs, and 82 Runs Created from left-field in 2005 he'd probably take it given the disatrous performance by Royals left-fielders in 2004 when they hit just 13 homeruns, struck out 143 times with 43 walks for a .216 average with a microscopic OBP of .283 and a SLG of .324 (they trotted 15 guys out there in 2004 with Dee Brown, the "albatross", and Aaron Guiel garnering the majority of the at bats). Of course, Aaron Guiel is also in the mix assuming he can recover from his eye problems of a year ago.
In right-field, right now it's Matt Stairs and Abraham Nunez. The totals of their projections are very similar though not quite as good at 22 homeruns and 73 Runs Created while in center-field a full year of David DeJesus looks like this:
AB H 2B 3B HR R RBI RC BB SO SB CS AVG OBP SLG OPS
DeJesus 575 168 32 6 12 99 57 86 62 79 16 17 .292 .361 .431 .792
To me, these seem eminately reasonable given DeJesus' performance once he adjusted to major league pitching and particularly given his strong second half .314/.385/.453 performance.
In all, the outfield may not be anything to brag about in 2005 but it could easily improve by 30 runs or so.
Posted by Dan Agonistes at 6:03 AM 0 comments
Thursday, January 20, 2005
Royals Rumblings
A few notes which pass for hot-stove action when you're interested in the Royals:
- Signed former Giants pitchers Ryan Jensen and former Twins and Rockies infielder Denny Hocking to minor league contracts. Hocking is viewed as insurance in the utility infielder role if Chris Clapinski doesn't pan out or if the idea of Chris Truby is a non-starter at third base. Hocking played at AAA Iowa for the Cubs last seaon and did reaonably well. He's never had any plate discipline or power (.250/.308/.346 career) but can play seven positions reasonably well. Jensen is more preplexing as he's just 28 and threw well for the Giants in 2002 (171.7 IP, 4.51 ERA). He did not pitch well the last two seasons in Fresno for some reason, however, so he seems a bit of a long shot to make the rotation. But what the heck, it can't hurt to give him a shot.
- Jeremy Affeldt filed for arbitration yesterday. Affeldt requested $1.2M and the Royals countered with $950K. Not much to haggle over. He made $350K last year and he's the only Royal eligible. He was 13 of 17 in save opportunities but what was most disturbing about last season is that his BB/IP rose (.301 in 2003, .419 in 2004) and his K/IP dropped (.778 in 2003, .642 in 2004) as his hits H/IP increased. Some of this can be attributed to the poor coaching he received in spring training that led to him throwing way too many off-speed pitches. Hopefully, he'll be let loose this season to see what he can do as a closer. If he performs $1 will be a bargain for a good closer.
- Abraham Nunez will try swinging exclusively from the right side in spring training where the Royals think he has more power. He did hit "better" against left-handers in 2004 with a bit more power but the differences are not anything to get excited about. Career he's a .215 hitter against left-handers and a .206 hitter against right handers with homerun rates of 23.3 against lefties and 104.5 against righties. I'm skeptical that this will make him into a real option in the outfield but you never know.
- The Royals are working with Jackson county to devise a plan to make rennovations to Kauffman and/or persue a downtown stadium. Neither option seems necessary to me for each for a different reasons First, given baseball's economic structure - and if history is any indication -a new downtown stadium will provide only a short term boost in revenues for the Royals. And even with a boost, there won't be nearly enough revenue to actually compete with the big boys to sign a slew of free agents. TV revenues dominate the disparity in revenues of baseball teams with the Yankees at $187.9M and the Devil Rays at $24.4M. Tickets sales are minor by comparison. For example, consider that while Texas averaged 31,818 fans at home and Boston 35,028, the revenue difference was $79.2M to $130.M. Second, rennovating the current stadium and stocking it with upgrades is unnecessary when you're not going to sell out very much (again given the economic realities of baseball) coupled with its location (the stadium is only a place to watch a game, not a destination in and of itself that could support other businesses). I've spent a good deal of time at the ballpark the last two seasons and it is a great place to watch a ballgame, very comfortable, plenty of concessions and restrooms. Jackson county should do the minimum it needs to do to make the stadium safe.
Posted by Dan Agonistes at 1:16 PM 1 comments
Tuesday, January 18, 2005
DIPS in Excel
I had mentioned Defense Independant Pitching (DIPS) in a previous post and so for those unfamiliar with the concept here is a link to a nice spreadsheet put together by Larry Mahnken that allows for easy calculation and takes into consideration park effects.
Posted by Dan Agonistes at 10:38 PM 0 comments
Scouts and Jered Weaver
Almost missed this great post from Rich's Weekend Baseball Beat on the scouts versus stats debate that I blogged about last week. Rich follows up on Angels scouting director Eddie Bane's comment that Mark Prior's and Jered Weaver's stats are not comparable by making the comparison and adjusting for park effects.
However, I'm not sure that Bane is being "disingenuous" as Rich says when Bane says that Weaver's and Prior's stats have "no correlation whatsoever". I tend to think that's simply how the scouting community in general looks at non-major league stats. He probably views them as not comparable since the two played at different times against totally different individuals (because of the turnover every four years). Under that view there was no common denominator in Bane's mind. What the view fails to take into account is that the level of skill required to play at the level remains relatively constant with respect to the major league skill level. Further, when both pitchers dominate to almost exactly the same degree with the same sorts of K/BB, WHIP, and K/IP ratios, that certainly should tell you something about how Weaver might fare given Prior's success.
Scott Boros, Weaver's agent, is being perfectly logical if his request is the same kind of contract Prior got with the Cubs after 2001 (5 years for $10.5M).
Posted by Dan Agonistes at 12:28 PM 0 comments
Sunday, January 16, 2005
Apologetic Arguments
In the Bible study I'm involved with we talked about 1 Peter 3:13-16 a few weeks back. I thought these notes might be of interest to some.
In the first section here (verses 13-16) Peter again notes that by doing good Christians will not naturally be the target of persecution. However, Christians may still suffer but will have a clear conscience if they continue to do what is right. Interestingly, although this passage is the root of the branch of Christian study known as apologetics (from apologia, or a “make a defense” in the NASB) note that Peter has in mind here that non-Christians will ask the Christian why their behavior differs from that of the world. From this one can conclude that the primary part of our witness is through our behavior towards others.
That said, some of the various basic “arguments” Christians have used when giving an answer include:
a) Argument from Experience – the personal transformation in the life of a believer as manifested in actions. This to me is what the author of 1 Peter has in mind
b) Argument from Reason[1] – human reason is explainable only through the existence of a creator. I've blogged about this argument in the past as it was used C.S. Lewis in his 1947 book Miracles. Over Christmas I received C.S. Lewis' Dangerous Idea by Reppert, which is both a defense and a refinement of Lewis' argument. Very readable even for those without a philosophical background such as myself.
c) Cosmological Argument[2] – kalam is the most popular form of this argument, originally formulated by Islamic philosophers in the late middle ages, which basically states that whatever had a beginning began to exist at a point in time in the past, and that anything that began to exist had to have a cause. Since the universe began to exist at a finite time in the past (as evidenced by the Big Bang), then it had a beginning and therefore a cause. That cause can reasonably be identified as God.
d) Ontological Argument[3] – developed by St. Anselm and goes like this; God is that which than nothing greater can be conceived. It is better for God to exist than for Him not to exist (an existing object, by definition, is better than non-existence). Therefore, God exists [because it is better for him to exist than not to exist; and if he is "that which than nothing greater can be conceived", he necessarily must exist.] Thus, by considering and examining the following propositions, we must admit that an idea of God necessitates his existence. Thomas Acquinas refined this argument a bit but it has been attacked by modern philosophers and so is not in vogue.
e) Argument from Design – the creation is only explainable as the product of a designer. This argument can take both a cosmological bent (by pointing out the apparent “fine-tuning” of the cosmological constants such as the nuclear forces etc.) or natural bent (natural theology as in the watchmaker arguments of William Paley, 1743-1805). More recently, the Intelligent Design (ID) movement spurred by William Dembski uses this argument in relation to the information content of DNA and the existence of irreducibly complex systems in living beings such as the blood clotting system described by Michael Behe[4]
f) Argument from Joy[5] – a human’s longing for God proves the existence of God just as our longing for food proves that food exists. This argument was used quite effectively by C.S. Lewis
g) Argument from History – two pronged, the NT documents are historically reliable based on documentary evidence and since they are it is logical to believe that Jesus is who he said he was (known as the Trilemma, i.e. Lord, Liar, or Lunatic). This argument was used effectively by Josh McDowell in Evidence That Demands a Verdict.
h) Argument from the Human Condition – Christianity explains human sinfulness combined with a knowledge of that sinfulness and provides a solution
i) Moral Argument – a knowledge of objective moral law or values exist, and a belief in a giver of the moral law best explains its existence. Used by C.S. Lewis in Mere Christianity (“something above and beyond the actual facts of human behaviour. In this case, besides the actual facts, you have something else – a real law which we did not invent and which we know we ought to obey”) and The Abolition of Man as well explained by Moreland and Craig.
[1] See my blog at http://danagonistes.blogspot.com/2004/11/argument-from-reason.html
[2] William Lane Craig has a good explanation of the kalam argument in Reasonable Faith. J.P. Moreland has another explanation in Scaling the Secular City.
[3] http://en.wikipedia.org/wiki/Ontological_argument
[4] An interesting critique of Behe’s book Darwin’s Black Box can be found at http://www.talkorigins.org/faqs/behe.html
[5] See Surpised by Joy, C.S. Lewis
Posted by Dan Agonistes at 7:51 AM 3 comments
Plate Discipline
Nice article by Alan Schwarz on what it means to be a patient hitter.
Posted by Dan Agonistes at 5:46 AM 1 comments
The Bill James 2005 Handbook
I picked up a copy of the 2005 edition of the Bill James Handbook and was happy to find some very interesting new content. Here's the rundown.
A New, New Runs Created Formula
As I wrote about in my series on run estimators James changed his Runs Created formula for his 2002 book Win Shares in order to place the hitter in a neutral offensive context and remove the bias that Runs Created typically creates for teams and players with high total base and walk totals.
In this edition of the book he tweaks it once again with an eye towards reducing the estimations for teams that hit a lot of homeruns - namely those since the power surge post 1992. He does this by reducing the "B" factor of his equation (Runs Created has always been an (A*B)/C formula) which represents the runner advancement factor of various offensive events. In the past he simply added Total Bases to (.24 * (BB+HBP-IBB)) + (.62 * SB) + (.5 * SH+SF) - (.03 * SO). His problem with this was that "it assumes that a home run does four times as much to advance a runner as a single does, which it really doesn't" and so he now assigns weights to the events like so:
B = (1B * 1.125) + (2B * 1.69) + (3B * 3.02) + (HR * 3.73) + (.29 * (BB+HBP-IBB) + (.492 * (SH+SF+SB) - (.04 * SO)
The first four factors add up to what he calls "adjusted Total Bases" and notes that "some other people may have come up with similar concepts in their own systems". I'm not sure if he's being facicous here but obviously, this is most similar system in concept to Batting Runs where offensive events are weighted by their value in producing runs. However, since James is here calculating only advancement value and not run value his weights are between 2 and 3 times that of the Batting Runs formula (interestingly the values for triples and homeruns are about 2.7 times that of the Batting Runs weights while the weights for singles and doubles are 2.4 and 2.2 respectively). As you can see, his adjusted Total Bases value will be smaller for teams that hit a lot of doubles and homeruns and not greatly affected for those that hit a lot of singles and triples. You can see here that stolen bases have also been discounted and the value of walks slightly increased although he gives no justification for it.
This formula, says James, is 8% more accurate for teams from 1955-1992 and "significantly more accurate since 1992". He doesn't say how much however.
Team Efficiency
A new section of this year's book deals with team efficiency, essentially taking a first stab at figuring out which teams are efficiently producing wins given their offensive and defensive elements.
On the offensive side there isn't much to write about. James applies his new Runs Created formula to teams and compares that to the actual number of runs scored. After multiplying by 100 he calculates a "Hitting Efficiency". The most efficient offensive teams of 2004 are all AL teams:
White Sox 106
Rangers 103
Royals 103
Yankees 101
In other words the White Sox were the most efficient team in baseball in 2004 scoring 6% more runs than their offensive elements would otherwise indicate (they scored 865 runs when the RC formula said 819). Every other team, including all the National League teams, are below 100 (the Reds are at 99 but all teams fall to within 8% with the Brewers at 92). In other words, only those four teams scored more runs than would have been expected. This seems strange to me and indicates that his new Runs Created formula continues to overpredict runs, just not by as much as previously.
When I think about offensive team efficiency I'm generally drawn towards a calculation of how efficient teams are in getting runners around to score. This is the basis of another run estimator known as the BaseRuns formula:
BsR = (BaseRunners * ScoreRate) + HR
In this formula if you know the number of base runners and the number of homeruns hit you can easily calculate the efficiency with which that team plated its runners by solving for the ScoreRate. The top teams of 2004 in terms of ScoreRate were:
Angels .328
Red Sox .320
Rangers .318
White Sox .315
Orioles .314
Only two of the top 5 in the James list make this one while the Yankees finish 9th and the Royals 13th. The least efficient teams are the Diamond Backs, Brewers, and Expos with the Cubs coming in near the bottom at 25th (not a surprise to those who watched the Cubs closely in 2004).
What's more interesting in this section, however, is James' new formula for estimating the number of runs a team should give up given their pitching statistics. The formula, which has the familiar A*B/C construction and which he calls Expected Runs Allowed or ExRA is:
A = H+BB+HBP+(.7*Errors)-DP
B = (HR*4)+(H-HR)*1.048+Errors+(.7*(PB+Balks+WP)+(.32*BB+HBP+IBB)
C = BFP (Batters Facing pitcher)
Using this formula the most efficient defensive teams in 2004 were:
Braves 108
Cubs 104
Rangers 104
Mets 103
Astros 103
Generally, the ExRA favors the National League where 10 of the 16 teams are above 100 while in the American League only 6 of the 14 are over 100. Once again, this formula seems to underpredict the number of runs given up for NL teams.
Prediction is Difficult, Especially About the Future
Another new section of the book deals with player projections both for 2005 and career, ostensibly to compete with the PECOTA system used by Baseball Prospectus. In the introduction to this section as the discussion focused on the difficulty of projecting career statistics (they have Bonds for 918 homeruns), I found this comment interesting.
"We are all in a kind of denial about Barry Bonds' skills. We have a well-established notion of what it is possible for a hitter to do, based on our experience with hundreds or thousands of other players. It is hard to get used to the fact that Bonds does not fit within that box - but he very clearly does not. He's different."
I would think that with the BALCO revelations most people now believe that Bonds does not fit within the box because he's playing by chemically enhanced rules. James' comments here seem to be a bit of a departure from what he said in a book I received at Christmas, Brushbacks and Knockdowns which I highly recommend, where he argues against the notion that Bonds used steroids - or at least that statistically it can be shown that he did. My own position is that Bonds performance is so far above the normal career trajectory and correlated with 40 pounds of lean muscle mass after the age of 35 that a reasonable person should assume he's had some help. As an aside MLB this week announced a new steroids policy that gets a bit tougher by allowing random tests during both the season and off-season although the penalties (you have to be caught four times before you're suspended for a year) are entirely too weak.
But back to the projections I couldn't help but share this one:
AB H 2B 3B HR R RBI BB SO
Calvin Pickering 448 124 21 2 34 83 103 98 131 .277/.407/.560
This is why those who subscribe to sabermetric evaluation are so high on Pickering. He appears to have the offensive skills needed to actually hit at the major league level and yet he has never gotten the opportunity to compete for a job. Traditional methods of player evaluation over emphasize his bulk and slowness while undervaluing his power and plate discipline. Of course, he won't get 448 at bats this season in Kansas City unless Mike Sweeney and Ken Harvey are both in traction but its nice to dream. By the way, Harvey's 2004 projection is .276/.321/.425, making him almost useless as a first baseman assuming he was a good fielder, which he's not.
These kinds of severe differences over the evaluation of players between the sabermetric community and the traditional community stem from the differences in how statistics are viewed. For the traditionalist statistics are a record of the past while for the sabermetrician they are the key to the future. In the words of Adam Smith, "Knowing what has happened is the most important part of knowing what's going to happen."
The other part of the player projection system that caught my interest was probability of injury. This was developed by Sig Mejdal using a new database of player injuries. Each player is ranked with a low, medium, or high probability of injury given their age, position, and injury history. Not surpisingly, Mejdal found that the most important predictor of injury was past injuries. He then calculated the probabilities of players sustaining all different kinds of injuries. By totaling them up he came up with those who are the likeliest to be injured in 2005.
Ken Griffey Jr. .359
Cliff Floyd .357
Mark McLemore .331
Sammy Sosa .324
No surprises here. And of course Mike Sweeney led the league in the probability of suffering a back injury at 10.6%, a figure that seems too low. James thinks these figures are too low since they don't seem to take into account career ending injuries. Anyway, it's interesting to look at the ratings when thinking about who a team should sign.
Mejdal also chimed in a bit on the debate about pitcher's injuries that I touched on in my review of The James/Neyer Guide to Pitchers. Particularly, Mejdal found that high pitch outings (the basis of the Pitcher Abuse Points or PAP used by Baseball Prospectus) didn't add any more predictive power for injuries to pitchers once past injury history and general usage over the last two seasons were factored in. In other words he didn't find evidence that the PAP system as a measure of high-pitch outings could be used to predict pitcher injuries.
However, I did see that Mejdal did "discover a noteworthy correlation regarding the number of high pitch outings...Experienced by youthful pitchers (i.e. 25 years or less) and later should injuries." As I said previously, I think this is the crux of the disagreement between James and Keith Woolner and Rany Jazayerli (the inventors of PAP) over the effectiveness of PAP. High pitch outing simply don't effect mature pitchers in the way they effect younger pitchers which waters down PAP's predictive power. This was the heart of the argument made by Craig Wright in The Diamond Appraised twenty years ago as Mejdal notes.
The Rest
The bulk of the book contains the familiar player register that includes hitting, pitching, and fielding along with the leader boards jammed packed with fascinating stats like the fact that Brian Anderson, in another otherwise horrible year, actually led the league by allowing opposing baserunners a stolen base percentage of just 20%. Or that Rich Harden led the AL with an average fastball speed of 94.5 mph with Tim Wakefield predictably the slowest at 75.9 (which is why he threw only 9% fastballs, also the lowest in baseball).
There is also a section on the ballparks where I noticed that Kauffman Stadium ended the season with a homerun index of 74 (26% fewer homeruns at the K than on the road) while in the previous two years the number stood at 120.
Posted by Dan Agonistes at 5:39 AM 0 comments
Saturday, January 15, 2005
Sweeney Frustrated
Mike Sweeney had this to say today in the KC Star:
“It looks like the plan isn't what I thought it would be. It is a bit frustrating. I'm not the owner. And I'm not the GM. But I am a player who wants to win. It's frustrating when you're told they're going to build the team around you. I was willing to do that. I said, ‘Just show me that it'll be worth it for me.' Not financially, but just show me you're going to build a team, a winning team, around me. I don't feel like Mr. Glass lied to me. He's a fine man. I just feel like I've been misled a little bit.”
Sweeney says David Glass told him the payroll would inch up towards the $60M mark instead of being stuck down at $42M to $45M where its been the last two seasons.
I think alot of fans are a bit frustrated with this offseason where it doesn't appear the Royals have made any moves to make them better in the long term. Acquiring Terrance Long, Eli Marrero, Jose Lima, and Dennis Tankersely isn't likely to improve the team's fortunes significantly in 2005. They won't be worse, but they won't be much better either.
It feels a lot different than last year at this time when we were all excited about the possiblities. But what do I know, I was one of those excited and then the wheels fell off and boat developed a gaping hole.
That said, I'm still looking forward to my trip to Spring Training March 19-22 to catch a few Royals and Cubs games in the land of the sun. Can't wait.
Posted by Dan Agonistes at 12:23 AM 0 comments
Wednesday, January 12, 2005
Creating Exchange Appointments with WebDAV
The project I'm currently working on has a requirement that appointments be created in Exchange by a Windows Service. To do this I decided to use WebDav on Exchange 2003. Because it took a few hours to figure out how it works I thought other might benefit from the following links:
1. Here is the link the Exchange SDK base code to get you going
2. Here is the link to a couple methods that perform the synthetic logon to OWA that you'll need if you are attempting to do this forms authentication enabled on the Exchange server. You'll need to add the cookies that are retrieved in this snippet to the HttpRequest object before making the request
3. Here is the link you'll need if you're attempting to do this over SSL when the certificate needs to be accepted. You'll need to call this method before performing the forms authentication.
With these three pieces you can create a little ExchangeHelper class with a method that creates an appointment. Happy Exchanging!
Posted by Dan Agonistes at 2:03 PM 0 comments
Monday, January 10, 2005
Colorado Bound
Well, the last month has certainly been a time of change for me and my family. After resigning from Quilogy after nine years I've now accepted a position in Colorado Springs with Compassion International. Compassion is a Christian ministry that links sponsors with children in need in over 20 countries. Their credo is "Releasing children from poverty in Jesus' name". Needless to say I'm excited about the opportunity to use my IT skills and experience to serve in this fashion. Currently Compassion assists over 600,000 children, a number which will be increasing in the coming years.
Of course it also means a move from the Kansas City area, which we'll greatly miss as we've very much enjoyed our 10 years in Overland Park and Shawnee since our move from Houston. To that end our house in western Shawnee (west of I-435 off Monitcello in the Lakepointe subdivision) is now on the market. Click here to go the listing on realtor.com.
Posted by Dan Agonistes at 5:38 PM 0 comments
Saturday, January 08, 2005
Scouting vs. Sabermetrics
Here's an excellent roundtable discussion hosted by The Numbers Game author Alan Schwarz on Baseball America's site. It's styled as a debate on the merits of the "Moneyball" philosophy versus traditional scouting. On the sabermetric side are Gary Huckaby of Baseball Prospectus and Voros McCraken both of whom do analysis for major league clubs. On the scouting side are Eddie Bane, the Angels scouting director, and Gary Hughes the Cubs assistant General Manager.
As a Cubs fan and a proponent of sabermetric analysis I became more and more frustrated by the responses of Gary Hughes as I continued to read. Here's just three short snippets to wet your appetite (or turn your stomach depending on your viewpoint)...
On Evaluating Hitting Prospects
"ALAN SCHWARZ: OK, so it's the trading deadline, and you want to evaluate another team's Double-A right-field prospect. Everyone agrees that he has considerable skills, and you're going to scout him for three games. How will you evaluate what kind of asset he might be for your big league club a few years from now?
GARY HUGHES: You'll have a history coming in, but you'll evaluate his five tools. You'll compare what you have on your own club. You'll think about what your immediate needs are and what your long-term needs are. And you'll make your decision based on your feeling.
EDDIE BANE: The first thing I do when I get to the ballpark is, I don't care about his right-field play. I don't care about his running speed. I want to see him hit. If he don't hit, I don't have to stay three days. I'm going to pick up the stat sheet--I'm going to look at the strikeouts and walks. I'm going to look at the batting average. I'm going to know all that stuff because I've been on the computer. But if I don't think this guy can hit for the Anaheim Angels, the other stuff is secondary.
ALAN SCHWARZ: But what would you have to see to be encouraged?
GARY HUGHES: The swing, the approach at the plate, the show of fear.
EDDIE BANE: If you show fear, you're gone.
VOROS McCRACKEN: How would someone show fear?
GARY HUGHES: There would be a little give at the plate.
EDDIE BANE: You give on a pitcher with a decent slider . . .
VOROS McCRACKEN: That happens to everyone--everyone gets their knees buckled every once in a while. So if you rule a guy out that gets his knees buckled, that seems extreme. You'd need to see him show fear a bit more consistently. I'm not sure . . .
EDDIE BANE: I am sure. Because if I see fear in a hitter, I'm not ever coming back. I don't see fear in good big league hitters. I know that they get fooled and they'll bail on balls. But for me, that's a different term than fear."
For me, this exchange was disheartening because it gets to the heart of the debate over the meaning of baseball statistics that I touched on in my review of The Thinking Fans Guide to Baseball . Here you have Gary Hughes and Eddie Bane saying that over a three game stretch they would rule out a guy based on his reaction to a nasty slider! While they give a slight nod to looking at the stat sheet they are clearly more trusting of what their eyes tell them. The underlying reason for that, which they emphasize in another part of the debate (and which was echoed by Leonard Koppett in his book), is that they view statistics as encapsulating the past but their observation as encapsulating the future.
Here's the problem with that approach in a nutshell. Three games (12 plate appearances) is an incredibly small sample size in which to tell anything definitive about a hitter. I would argue that a player's past performance is almost always a better indicator of how he'll perform in the future. This coupled with the biases of human perception such as the tendency to magnify singular events and forget a slew of "non-events" makes the observational approach even less reliable. This is not to mention the frustration with the fact that a team might be about to make a $300,000 decision on a guy and yet doesn't have the discipline to avail themselves of all the tools (and most of them are free) out there.
On DIPS
ALAN SCHWARZ: One thing that Eddie and Gary, you might not be aware of, is that a few years ago Voros came up with something called Defense Independent Pitching Stats, which . . .
EDDIE BANE: Alan, you said, "You guys may not be aware." That's one of the things we're battling. We are aware. I read these guys' stuff all the time.
ALAN SCHWARZ: I said, "May not be aware." Gary, have you ever heard of DIPS?
GARY HUGHES: No.
ALAN SCHWARZ: OK then! (Laughter)
Ok, so here you have an assistant GM of a major league ballclub who has no idea what DIPS is. Even though I know that many "baseball men" are not comfortable with numbers I can't fathom that a front office wouldn't at least be conversant about something like DIPS, which has been around for 5 years, and which had a major impact on the baseball research community. This is their industry and could potentially provide them competitive advantage and yet they've never heard of it? If you're not a Cubs fan feel lucky.
In another part of the conversation Schwarz floats the idea that perhaps the GM role be a platoon of sorts with sabermetric and scouting functions that coalesce to make decisions. This certainly appears to be the trend as more and more teams hire statistical consultants. However, since they are consultants they likely have less input than full-time employees. While I've argued in the past that an analyst's role is best served by a degree of separation from the day to day workings of the team, this is one area where that model might be less than optimum.
On Minor/Major League Equivalency
ALAN SCHWARZ: That gets us to this question--do you guys think Triple-A stats can predict player performance in the majors?
GARY HUGHES: I don't know. I can't answer that. That's not my thing.
VOROS McCRACKEN: I think to the extent that that's your answer, that you don't really know . . .
GARY HUGHES: I don't think you know.
VOROS McCRACKEN: I don't know. But I do have an idea. I have looked at stats for tons of Triple-A players, and what they've done in the major leagues, and I think with this sort of information, I don't think that "I don't know" should be the final answer. I think, "I don't know, and I would like to find out" would be the better approach. I'm not sure that's always been the approach. I would say that you know almost as much about what a guy's going to do in the big leagues from his Triple-A stats as you do from his major league stats.
Once again here is an assistant GM admitting that he has no idea how to evaluate minor league statistics! One wonders what meetings with Jim Hendry and Gary Hughes are like when they're deciding who to invite to add to the 40-man roster or invite to spring training or take in the Rule 5 draft.
And unlike DIPS, the recognition of a relationship between minor and major league statistics has a history going back 20 years to the work of Bill James as codified in one of principles in Sabermetrics 101:
"Performance at the major league level can be predicted by performance at the minor league level and to a lesser degree in other leagues including college, the Japanese, and Mexican leagues"
There's lots more in the article which you'll have to read to believe.
Posted by Dan Agonistes at 4:30 PM 0 comments
Sandberg, Whitaker, and Trammell
With the election of Ryne Sandberg to the Hall of Fame this week I've heard a bit of rumbling about the apparent snubs of other middle infielders of the 1980s, namely Alan Trammell and Lou Whitaker. A quick look at their career lines shows just how similar they were:
G AB R H 2B 3B HR RBI SB BB SO AVG OBP SLUG
Sandberg 2164 8385 1318 2386 403 76 282 1061 344 761 1260 .285 .344 .452
16 Seasons
Trammell 2293 8288 1231 2365 412 55 183 1003 236 850 874 .285 .352 .415
20 Seasons
Whitaker 2390 8570 1386 2369 420 65 244 1084 75 1197 1099 .276 .363 .426
19 Seasons
To provide a measure of just how similar these stat lines are Bill James in 1986 developed a system called "Similarity Scores" that he introduced in his book The Politics of Glory that dealt with the Hall of Fame. This system starts by assigning two identical statistical lines a Similarity Score of 1000. From there points are subtracted for differences that two players with exactly equivalent statistics in the following fashion as reported on baseball-reference.com:
One point for each difference of 20 games played.
One point for each difference of 75 at bats.
One point for each difference of 10 runs scored.
One point for each difference of 15 hits.
One point for each difference of 5 doubles.
One point for each difference of 4 triples.
One point for each difference of 2 home runs.
One point for each difference of 10 RBI.
One point for each difference of 25 walks.
One point for each difference of 150 strikeouts.
One point for each difference of 20 stolen bases.
One point for each difference of .001 in batting average.
One point for each difference of .002 in slugging percentage.
There is also a positional adjustment:
240 - Catcher
168 - Shortstop
132 - Second Base
84 - Third Base
48 - Outfield
12 - First Base
0 - DH
Calculating the similarity scores for these three (also reported on baseball-reference.com) we see that the most similar player in history to Ryne Sandberg is indeed Lou Whitaker at 900. Alan Trammell is the 5th most comparable player to Ryno (857) behind Steve Finley, Joe Torre, and Barry Larkin. In looking at Trammell Whitaker is 5th at 868 while Sandberg is 7th at 857. For Whitaker Sandberg comes in first and Trammell second. So clearly these three are very similar indeed.
So why did Whitaker's only appearance on the voting roles garner him just 15 votes in 2001 (2.91%) while Trammell received 74 (15.7%), 70 (14.1%), and 70 (13.8%) votes in the period 2002-2004?
I think there are a couple of reasons.
First, Ryno was perceived as the better all-around player. He had a better reputation for his defense than either Trammell or Whitaker as evidenced by his beating them in Gold Gloves by winning nine to Trammell's four and Whitaker's three. Sandberg also stole more bases than either of the other two by a wide margin and was the more prolific homerun hitter not to mention the cache that goes with being the all-time leader in homeruns at his position when he retired. In contrast Trammell had only one truly great season, that in 1987 when he hit .343 and drove in 105 runs hitting 28 homeruns. However, his others good seasons are a good deal below this level. Whitaker had several fine seasons (1983, 1987, 1991) but none match the level of Sandberg's 1984, 1990, and 1991 seasons.
Secondly, Sandberg had a higher profile because he played in Chicago rather than Detroit, won an MVP award at a young age (the "Sandberg game" certainly didn't hurt), maintained an errorless streak, led his league in runs scored three times, and homeruns and total bases once (1990). In the final analysis election to the Hall is at its essence a popularity contest.
Essentially these sorts of accomplishments can also be boiled down into ratings. Several of these such as the Blank Ink Test, Gray Ink Test, HOF Standards, and HOF Monitor are also tracked on baseball-reference.com. All four of these were also created by James and began to be introduced in The Politics of Glory. Looking at these we see the following:
Black Ink: Batting Average HOFer ~ 27
Gray Ink: Batting Average HOFer ~ 144
HOF Standards: Batting Average HOFer ~ 50
HOF Monitor: Batting Likely HOFer > 100
Sandberg
Black Ink: Batting - 14 (162)
Gray Ink: Batting - 134 (121)
HOF Standards: Batting - 42.7 (120)
HOF Monitor: Batting - 157.5 (62)
Whitaker
Black Ink: Batting - 1 (692)
Gray Ink: Batting - 31 (720)
HOF Standards: Batting - 42.8 (119)
HOF Monitor: Batting - 92.5 (160)
Trammell
Black Ink: Batting - 0
Gray Ink: Batting - 48 (499)
HOF Standards: Batting - 40.4 (141)
HOF Monitor: Batting - 118.5 (112)
The numbers in parentheses are the career rankings.
From these we can see that Sandberg is close or over the top in three of the four measures while the other two are close in only one and not over the top in any. So while I'm not surprised that calls for the induction of Whitaker and Trammell are being made, it seems clear the Ryno's career has more aspects that generally result in election.
Of course, one of the arguments made in defense of Trammell and Whitaker is that using sabermetric analysis their careers look equivalent or slightly better than Sandberg's. This is the case since both walked more than Ryne, a skill that is still undervalued. This shows up in the fact that Whitaker's career OPS is 117 to Sandberg's 114 and Trammell's 110. So I certainly grant that from a career offensive perspective there isn't much to separate these three. However, Bill James has also created a system called Win Shares that attempts to allocates portions of team wins to individual players usig both offense and defense. Doing some analysis on this shared on the SABR-L list this week Cyril Morong has found that Sandberg is 13th among second baseman in Win Shares per 648 plate appearances with 24.16 and Whitaker is 21st with 22.82.
Perhaps I'm biased but I do think that Ryno's career was superior to either of the others judged by his peak performance level and broader array of skills. So if any of them deserve induction it would be Sandberg.
Posted by Dan Agonistes at 7:30 AM 4 comments
Wednesday, January 05, 2005
Galileo's Daughter
Recently I assigned my 9-year old daughter a report on the life of Galileo (1564-1642). In order to better educate myself I listened to the unabridged audio tapes of the book Galileo's Daughter: A Historical Memoir of Science, Faith, and Love, by David Sobel.
What is unique about this book is that it reveals another side of Galileo through the 125 or so surviving letters written by his eldest child, his daughter Virginia (Sister Maria Celeste) who lived in the local convent of San Matteo with her sister from an early age until her own death in her early 30s in 1634. Galileo never married Virginia's mother and so she was sent to the convent because she was deemed unmarryable. Unfortunately, none of Galileo's replies survive.
Virginia's wonderfully written letters are sprinkled throughout the book and are offered as a backdrop to the fairly straightforward biography of Galileo presented in the book. In that sense it's not really about Virginia's life but rather is an illumination of Galileo's life through the eyes of the daughter he called "a woman of exquisite mind, singular goodness, and tenderly attached to me" and to whom Galileo was very much devoted.
Not having read a biography on this subject before I was interested in both aspects but was especially drawn to the peeks at daily 17th century life offered by Virginia when she speaks of medicines she concocted in the convent's apothecary to battle bubonic plague, the constant neediness of the convent, and the care of her father's estate while he was off in Rome before the inquisition and later under a form of arrest at a bishop's residence in Siena some 40 miles from Florence. Virginia apparently even copied the manuscript of the Dialogue Concerning the Two Chief Systems of the World in preparation for its printing in 1632 and helped secure important papers left at his villa when he was before the Inquisition.
I was also struck in the early parts of the book by how much Galileo's father likely influenced the observational approach that Galileo applied to astronomy (then lumped together with mathematics) through his approach to music. Sobel also definitely paints Galileo as one who tried to work within the Catholic faith and seems to support the idea that Galileo really didn't believe he had taught contrary to the Ptolemaic system and did not believe in the Copernican system. Other tidbits I've read paint a starker picture of Galileo as a man who was roudly convinced of the Copernican system early on and who knew full well what he was doing. For example, Sobel downplays the legendary comment Galileo reportedly made under his breath, "Eppur si muove" (And yet it moves), after signing his recantation before his Inquisitors.
Unfortunately, one of the questions that remains unanswered is how his daughter felt about the matter. While its clear from her letters that she revered him and felt he was a brilliant man being smeared, you don't really get a picture of what this very intelligent woman believed about such things.
One of the interesting aspects of Galileo's work that was just touched on in the book pertain to his observations of Saturn. Being committed to the doctrine of observation over authority Galileo was convinced that Saturn was a composite object made up of three bodies; a large one in the center and two smaller ones on each side. The relative weakness of his telescope did not allow for the detail needed to perceive the rings which are easily seen in the 70mm refracting starter scope I purchased for the kids. Of course, it could also be that as Stephen Jay Gould argued the conceptual world in which he operated simply didn't have space for planets with rings and so while we could have "seen" them, he could not really see them. A few years later Galileo was shocked to observe the two smaller bodies apparently disappear as the orientation of the rings came into a perpendicular relationship to the earth. This was one puzzle he never solved.
The final chapter of the book that recounts the burial and subsequent re-burial of Galileo ends with a moving connection to Virginia which I won't spoil in case some of you haven't read the book.
Posted by Dan Agonistes at 7:27 AM 0 comments
Sandberg Gets the Nod
The highlight for Cubs fans yesterday was the election of Ryne Sandberg to the Hall of Fame. This was his third year of eligibility and he was placed on 393 ballots or 76.2% (you need 75% to be elected). Last year he was on 61.1% of the ballots. Wade Boggs was also elected with 474 votes (91.9%) on his first try.
Although Sandberg had a relatively short career playing just 16 seasons (he retired after the 1994 season and didn't play in 1995) he did have two major things going for him. First, he was considered the premier player at his position during his era (1983-1993). When you think of second baseman during that time Sandberg clearly comes out on top. He was an All-Star from 1984 through 1993. Second, he was a complete player combining both offensive and defensive accomplishments. He won the Gold Glove from 1983 through 1991 to go with his 1984 MVP award and 1990 homerun and total base crowns. He also stole 344 bases topping 50 in 1985. When he retired he was the all-time leader in homeruns for a second baseman with 277 (since surpassed by Jeff Kent). He finished with 282 homeruns. From that perspective he was a little like watching Carlos Beltran play. He didn't have the best statistics but he could do four things to beat an opposing team; get on base, hit with power, steal bases, and play defense.
Cyril Morong has noted that Sandberg is 13th all-time at second base in Win Shares per 648 plate appearances through 2001 with 24.16. The only modern players that top him include Joe Morgan (29.29), Bobby Grich (25.94), Craig Biggio (25.72), and Roberto Alomar (24.96). That also put him 146th all-time. His OPS+ for his career was a very respectable 114.
Sandberg was drafted in the 20th round by the Phillies in 1978 and traded to the Cubs with Larry Bowa for Ivan DeJesus. He played third base in 1982 starting the year about 1-31 before finally turning it around and posting very respectable rookie numbers (172 hits, 32 stolen bases, 103 runs scored, and a .271 AVG). Sandberg was known for his slow starts and generally did not start to hit until into May.
Of course, the game that catapulted Ryno into the national spotlight was his performance against the Cardinals on June 23, 1984. In that game Sandberg went 5 for 6 hitting homeruns off of Bruce Sutter in the 9th and 10th innings to tie the score both times before the Cubs won it in the 11th. As quoted on MLB.com Whitey Herzog, the Cards manager at the time has said:
"One day I think he's one of the best players in the National League. The next day, I think he's one of the best players I've ever seen."
What I remember most about watching Sandberg play on WGN was how well he could hit fastballs. I recall always being surprised when pitchers would throw him a fastball that was anywhere over the belt. He could turn on anybody's fastball as well as take them to right although he was essentially a pull hitter. He simply hit high fastballs as well as anyone I've ever seen. Consequently, the book on him was to throw sliders down and away and hope he chased them. As he got older, especially noticable in 1996, he started to chase that pitch more and more often, usually resulting in a weak grounder to second. But he could still hit the fastball.
Congrats to Ryno!
Posted by Dan Agonistes at 6:40 AM 0 comments
Monday, January 03, 2005
Not so Tolerant
A discerning reader left this comment on my post about Forms of Tolerance:
Actually, I don't think that "epistemological tolerance" as defined here is really "tolerance" at all -- it's really acceptance which is quite different. To be sure, it's still referred to as tolerance by the left so that we have no real defense against it without sounding as if we're not tolerant -- so it's a verbal slight-of-hand -- Rather like calling everybody who doesn't accept and acknowledge the validity of the homosexual lifestyle as "homophobic." By accepting the word, "tolerance" in the phrase, "epistemological tolerance," we're buying into the game. Rather, we should be protesting against the left's use of the word "tolerance" at all, to denote something that really goes well beyond.
I couldn't agree more and my post should have ended with this warning against using the term incorrectly as the left often does.
Posted by Dan Agonistes at 9:49 PM 0 comments