FREE hit counter and Internet traffic statistics from freestats.com

Sunday, January 16, 2005

The Bill James 2005 Handbook

I picked up a copy of the 2005 edition of the Bill James Handbook and was happy to find some very interesting new content. Here's the rundown.

A New, New Runs Created Formula
As I wrote about in my series on run estimators James changed his Runs Created formula for his 2002 book Win Shares in order to place the hitter in a neutral offensive context and remove the bias that Runs Created typically creates for teams and players with high total base and walk totals.

In this edition of the book he tweaks it once again with an eye towards reducing the estimations for teams that hit a lot of homeruns - namely those since the power surge post 1992. He does this by reducing the "B" factor of his equation (Runs Created has always been an (A*B)/C formula) which represents the runner advancement factor of various offensive events. In the past he simply added Total Bases to (.24 * (BB+HBP-IBB)) + (.62 * SB) + (.5 * SH+SF) - (.03 * SO). His problem with this was that "it assumes that a home run does four times as much to advance a runner as a single does, which it really doesn't" and so he now assigns weights to the events like so:

B = (1B * 1.125) + (2B * 1.69) + (3B * 3.02) + (HR * 3.73) + (.29 * (BB+HBP-IBB) + (.492 * (SH+SF+SB) - (.04 * SO)

The first four factors add up to what he calls "adjusted Total Bases" and notes that "some other people may have come up with similar concepts in their own systems". I'm not sure if he's being facicous here but obviously, this is most similar system in concept to Batting Runs where offensive events are weighted by their value in producing runs. However, since James is here calculating only advancement value and not run value his weights are between 2 and 3 times that of the Batting Runs formula (interestingly the values for triples and homeruns are about 2.7 times that of the Batting Runs weights while the weights for singles and doubles are 2.4 and 2.2 respectively). As you can see, his adjusted Total Bases value will be smaller for teams that hit a lot of doubles and homeruns and not greatly affected for those that hit a lot of singles and triples. You can see here that stolen bases have also been discounted and the value of walks slightly increased although he gives no justification for it.

This formula, says James, is 8% more accurate for teams from 1955-1992 and "significantly more accurate since 1992". He doesn't say how much however.

Team Efficiency
A new section of this year's book deals with team efficiency, essentially taking a first stab at figuring out which teams are efficiently producing wins given their offensive and defensive elements.

On the offensive side there isn't much to write about. James applies his new Runs Created formula to teams and compares that to the actual number of runs scored. After multiplying by 100 he calculates a "Hitting Efficiency". The most efficient offensive teams of 2004 are all AL teams:

White Sox 106
Rangers 103
Royals 103
Yankees 101

In other words the White Sox were the most efficient team in baseball in 2004 scoring 6% more runs than their offensive elements would otherwise indicate (they scored 865 runs when the RC formula said 819). Every other team, including all the National League teams, are below 100 (the Reds are at 99 but all teams fall to within 8% with the Brewers at 92). In other words, only those four teams scored more runs than would have been expected. This seems strange to me and indicates that his new Runs Created formula continues to overpredict runs, just not by as much as previously.

When I think about offensive team efficiency I'm generally drawn towards a calculation of how efficient teams are in getting runners around to score. This is the basis of another run estimator known as the BaseRuns formula:

BsR = (BaseRunners * ScoreRate) + HR

In this formula if you know the number of base runners and the number of homeruns hit you can easily calculate the efficiency with which that team plated its runners by solving for the ScoreRate. The top teams of 2004 in terms of ScoreRate were:

Angels .328
Red Sox .320
Rangers .318
White Sox .315
Orioles .314

Only two of the top 5 in the James list make this one while the Yankees finish 9th and the Royals 13th. The least efficient teams are the Diamond Backs, Brewers, and Expos with the Cubs coming in near the bottom at 25th (not a surprise to those who watched the Cubs closely in 2004).

What's more interesting in this section, however, is James' new formula for estimating the number of runs a team should give up given their pitching statistics. The formula, which has the familiar A*B/C construction and which he calls Expected Runs Allowed or ExRA is:

A = H+BB+HBP+(.7*Errors)-DP
B = (HR*4)+(H-HR)*1.048+Errors+(.7*(PB+Balks+WP)+(.32*BB+HBP+IBB)
C = BFP (Batters Facing pitcher)

Using this formula the most efficient defensive teams in 2004 were:

Braves 108
Cubs 104
Rangers 104
Mets 103
Astros 103

Generally, the ExRA favors the National League where 10 of the 16 teams are above 100 while in the American League only 6 of the 14 are over 100. Once again, this formula seems to underpredict the number of runs given up for NL teams.

Prediction is Difficult, Especially About the Future
Another new section of the book deals with player projections both for 2005 and career, ostensibly to compete with the PECOTA system used by Baseball Prospectus. In the introduction to this section as the discussion focused on the difficulty of projecting career statistics (they have Bonds for 918 homeruns), I found this comment interesting.

"We are all in a kind of denial about Barry Bonds' skills. We have a well-established notion of what it is possible for a hitter to do, based on our experience with hundreds or thousands of other players. It is hard to get used to the fact that Bonds does not fit within that box - but he very clearly does not. He's different."

I would think that with the BALCO revelations most people now believe that Bonds does not fit within the box because he's playing by chemically enhanced rules. James' comments here seem to be a bit of a departure from what he said in a book I received at Christmas, Brushbacks and Knockdowns which I highly recommend, where he argues against the notion that Bonds used steroids - or at least that statistically it can be shown that he did. My own position is that Bonds performance is so far above the normal career trajectory and correlated with 40 pounds of lean muscle mass after the age of 35 that a reasonable person should assume he's had some help. As an aside MLB this week announced a new steroids policy that gets a bit tougher by allowing random tests during both the season and off-season although the penalties (you have to be caught four times before you're suspended for a year) are entirely too weak.

But back to the projections I couldn't help but share this one:

                  AB   H 2B 3B HR  R RBI BB  SO

Calvin Pickering 448 124 21 2 34 83 103 98 131 .277/.407/.560

This is why those who subscribe to sabermetric evaluation are so high on Pickering. He appears to have the offensive skills needed to actually hit at the major league level and yet he has never gotten the opportunity to compete for a job. Traditional methods of player evaluation over emphasize his bulk and slowness while undervaluing his power and plate discipline. Of course, he won't get 448 at bats this season in Kansas City unless Mike Sweeney and Ken Harvey are both in traction but its nice to dream. By the way, Harvey's 2004 projection is .276/.321/.425, making him almost useless as a first baseman assuming he was a good fielder, which he's not.

These kinds of severe differences over the evaluation of players between the sabermetric community and the traditional community stem from the differences in how statistics are viewed. For the traditionalist statistics are a record of the past while for the sabermetrician they are the key to the future. In the words of Adam Smith, "Knowing what has happened is the most important part of knowing what's going to happen."

The other part of the player projection system that caught my interest was probability of injury. This was developed by Sig Mejdal using a new database of player injuries. Each player is ranked with a low, medium, or high probability of injury given their age, position, and injury history. Not surpisingly, Mejdal found that the most important predictor of injury was past injuries. He then calculated the probabilities of players sustaining all different kinds of injuries. By totaling them up he came up with those who are the likeliest to be injured in 2005.

Ken Griffey Jr. .359
Cliff Floyd .357
Mark McLemore .331
Sammy Sosa .324

No surprises here. And of course Mike Sweeney led the league in the probability of suffering a back injury at 10.6%, a figure that seems too low. James thinks these figures are too low since they don't seem to take into account career ending injuries. Anyway, it's interesting to look at the ratings when thinking about who a team should sign.

Mejdal also chimed in a bit on the debate about pitcher's injuries that I touched on in my review of The James/Neyer Guide to Pitchers. Particularly, Mejdal found that high pitch outings (the basis of the Pitcher Abuse Points or PAP used by Baseball Prospectus) didn't add any more predictive power for injuries to pitchers once past injury history and general usage over the last two seasons were factored in. In other words he didn't find evidence that the PAP system as a measure of high-pitch outings could be used to predict pitcher injuries.

However, I did see that Mejdal did "discover a noteworthy correlation regarding the number of high pitch outings...Experienced by youthful pitchers (i.e. 25 years or less) and later should injuries." As I said previously, I think this is the crux of the disagreement between James and Keith Woolner and Rany Jazayerli (the inventors of PAP) over the effectiveness of PAP. High pitch outing simply don't effect mature pitchers in the way they effect younger pitchers which waters down PAP's predictive power. This was the heart of the argument made by Craig Wright in The Diamond Appraised twenty years ago as Mejdal notes.

The Rest
The bulk of the book contains the familiar player register that includes hitting, pitching, and fielding along with the leader boards jammed packed with fascinating stats like the fact that Brian Anderson, in another otherwise horrible year, actually led the league by allowing opposing baserunners a stolen base percentage of just 20%. Or that Rich Harden led the AL with an average fastball speed of 94.5 mph with Tim Wakefield predictably the slowest at 75.9 (which is why he threw only 9% fastballs, also the lowest in baseball).

There is also a section on the ballparks where I noticed that Kauffman Stadium ended the season with a homerun index of 74 (26% fewer homeruns at the K than on the road) while in the previous two years the number stood at 120.

No comments: