FREE hit counter and Internet traffic statistics from

Tuesday, August 31, 2004


Ichiro Suzuki picked up three more hits tonight to make it 56 for the month, breaking the record of 54 set by Alex Rodriguez in 1996. It also puts him at 212 for the season with 31 games to go. He'll need to average 1.45 hits per game to tie the 1920 record of George Sisler. So far he's averaged 1.63 hits per game in the games he's played so its certainly within reach.

The bigger question is, how important is the record?

In one sense getting 257 hits in a season is a remarkable accomplishment. On the other hand for all of Ichiro's hitting he only ranks 14th in Value Over Replacement Player at 62.2 runs ranking behind J.D. Drew, Miguel Tejada, and Mark Loretta among others. Barry Bonds is tops at 117.8 in 115 fewer plate appearances. The simple reason is that Ichiro doesn't walk much, just 37 times thus far, and has only 34 extra base hits, a paltry 16% of his total. I heard it said the other night on ESPN's Baseball Tonight that Suzuki could hit for alot more power if he wanted to based on his performance in batting practice. If he really could turn on the power, he should.

Incidentally, the VORP leaders are a good indicator for who should get the MVP. The top 5 in the NL are:

Bonds 117.8
Pujols 79.5
Beltre 77.7
Helton 75.7
Edmonds 73.7

Nobody is even close to Bonds. Adrian Beltre is certainly making a run at it with his 42 homeruns but Bonds is still the most valuable.

Homerun Distribution

A question came up this week on the SABR list as to the distribution of homeruns. Using the 1992 play-by-play data from Retrosheet I ran a couple of queries:

HR by field

Left 1654 55%
Center 718 24%
Right 51 22%

HR by batter
L 1116 37%
R 1909 63%

HR by field and batter
L 7 80 3%
R 7 1574 52%
L 8 445 15%
R 8 273 9%
L 9 591 20%
R 9 60 2%

Retrosheet also includes more sophisticated data including homeruns down the line. For example, of the 1654 homeruns to left they were distributed as:

Fly ball down the line: 273
Other fly balls: 1110
Line drive down the line: 77
Other line drives: 194

A BABE in the Woods

I noticed a week or two ago that Allen St. John in his "By The Numbers" column in the Wall Street Journal introduced a new rate stat to rank pitchers called BABE or bases per batter. This stat is calculated as follows:

BABE = (TB + BB) / BFP

where BFP is batters faced by pitcher. As you can see this stat is attempting to measure how many bases a pitcher gives up per plate appearance something like Total Average (TA) for offensive players developed by Thomas Boswell in the early 1980s. St. John makes the point that some traditional stats such as wins and losses don't accurately measure a pitcher's effectiveness and so a stat like BABE is needed. The example he gives is Randy Johnson's 11-8 record but .345 BABE, currently leading the league. I can't argue that BABE is a better measure than wins and losses but is BABE really needed?

When you boil it down wins and losses are fundamentally about runs. Indeed Bill James calls them the currency with which wins are purchased. That's the reason that when evaluating offensive contributions sabermetricians strive to estimate the number of runs a player has contributed through formulas like Linear Weights, Runs Created, Base Runs, and so on. It seems only natural that for pitchers we would want to do the same thing by estimating the number of runs they are responsible for. But we already have that with Earned Run Average. ERA organically captures the interaction of offensive events that result in runs. BABE merely adds up the individual bases and does not weight offensive events like Linear Weights does or take a multiplicative approach as does Runs Created.

And so as with TA, BABE cannot be as accurate (in terms of predicting how many runs a pitcher would give up) as these other formulas. Since it combines components of on base average and slugging percentage it will be more accurate than either of them taken separately and yet not as accurate as OPS (OBP + SLUG) or BRA (batter run average = OBP * SLUG) as demonstrated by Albert and Bennett in Curveball. If anything St. John would be better off looking at OPS against the pitcher.

But ERA does have its problems, the two biggest of which are:

1) It doesn't include all the runs that the pitcher is responsible for since some percentage of unearned runs will be the pitcher's fault. See this post for an attempt at assigning responsibility

2) It doesn't work very well for relief pitchers since they pitch alot of partial innings and come in with runners on base

And so clearly at least for relief pitchers ERA is not the stat that should be used for comparisons. For starters and long relievers I think ERA is the appropriate measure. However, in order to compare all pitchers, rather than invent a new stat St. John should look no further than the existing sabermetric work of Component ERA (ERC) or Expected Earned Run Average (XERA), both of which were derived just for the purpose of comparing starters and relievers and which get to the heart of the matter - runs.

In summary, while BABE is not without its merits it can be safely discarded in the sabermetric trash can.


As you might expect one of the first questions that attendees of the devlab here in Redmond had this week was regarding the packaging and pricing of Visual Studio 2005 Team System. Prashant Sridharan answered some of those questions yesterday, assuring us this was not under NDA. Here are the highlights.

There are five different installable packages associated with Team System:

* Visual Studio Team Architect
* Visual Studio Team Developer
* Visual Studio Team Test
* Visual Studio Team Suite
* Visual Studio Team Foundation
* Visual Studio Team Foundation Client

The idea is that when the product releases, subscribers to MSDN Universal will be allowed to choose one of the first three skus and be grandfathered in and given that price in perpetuity. Those who haven't subscribed to MSDN Universal by the time the product releases will have to pony up at a significantly higher price than today. That’s a great incentive to get MSDN Universal before Team System is out. In addition, existing subscribers will be able to purchase the Team Suite for a smaller additional cost. Essentially Team Suite will be separately priced so that if you were going to buy two of the first three products, you’d be better off buying the suite.

The VS Team Foundation is the server portion and will always be sold separately. The licensing for it will be on a CAL model priced something like SQL Server standard edition and is targeted for one install per 500 users. There will also be the option of purchasing the Team Foundation Client which includes the client bits for integration with Excel, MS Project, and the VS shell in which to work with projects.

Visual Studio Team Developer can run in a stand-alone mode not connected to Team Foundation, which is essentially what was available in the beta 1.

Team Foundation Integration

The Visual Studio 2005 Team Foundation server is the core of VSTS and is the store for work items, source code control, reporting, build management, change management, and project management, along with the services that tools invoke. Internally it uses an instance of SQL Server 2005 (Yukon) and web services to interoperate with the various versions of VS.

Although the new releases of VS will interoperate with Team Foundation Microsoft does not have plans at this point to create the integration for either Visual Studio 6 or Visual Studio .NET 2003. It is expected that other tool vendors will create the integration for their tools. Creating the integration for the existing Microsoft dev tools seems to me like a major opportunity for a third party. A command-line interface will be available for tools like the source code control so developers can use Team Foundation without full shell integration.

VSTS and Process

Saw a great session today by Lori Lamkin, Group Program Manager Visual Studio Team System. VSTS will ship with at least MSF 4.0 Agile and MSFT 4.0 Formal. The following is an excerpt from Keith Rowe's blog:

"One process can't work for every project - so we're restructuring MSF with a base definition layer and a series of instantiations. In the first release, we will have two:

1. MSF Formal - aimed at larger scale, traditional projects that need and want a lot of 'ceremony' around handing off work products from group to group and phase to phase.

2. MSF Agile - aimed at smaller iterative development projects. We are very lucky to have hired Randy Miller, one of the big thinkers in the Agile movement, to help us design this version of MSF. It's still in development so I'll describe this in more detail in future postings.

Second, methodologies can drive the tools. We are introducing 'methodology templates' to the toolset. At project inception, you can select one of the methodology templates stored on your VS Team System server. This template describes:

a) the work item types ('bug', 'scenario', 'risk', etc) and their state transitions - this implements project workflow
b) predefined work item instances ('gather all user stories') to guide the work
c) check in rules that enforce policy around what a legal check-in looks like (e.g. 'all check-ins must be run through a buddy build')
d) report layouts used by the data warehouse to show project status
e) help files that describe how the methodology works

Thus, you can shape the behavior of the tools so that all team members can easily follow the prescribed methodology.

But, don't fear. We aren't shoving this down your throat. The templates are all editable. You can create your own or elect to not use one at all. We're already working with a number of third parties who plan to create their own methodology templates and I expect that there will be lots of them available from industrious individuals who want to show their own techniques. In my talks with customers so far, I expect everyone will end up editing these templates for their own use."

The real strength of what Team System offers here is illustrated when a task is created by a Project Manager in Excel or MS Project, the task (work item) is automatically added to the developer’s queue that they then see in their IDE. As developers then update the task with costing information or completion data the PM’s view of the data in their spreadsheet or Project document is updated automatically. Essentially, this kind of integration automates the transitions between the tools and the people enabling a more friction-free flow of information. As Lori said in her talk: No status meetings, no logjams, no copy and paste.

Lori’s session also made clear that there will likely be some limitations in the first release (single ownership of work items, no parent/child relationships) these tools will go a long way towards helping teams

If you’d like to get some more information on MSF 4.0 Agile take a look at GotDotNet here.


This week we received a new version of the Community Technology Preview for Visual Studio 2005 Team System. From the email describing the release:

"The CTP kit will contain a Visual Studio 2005 Beta 1 Refresh and SQL Server 2005 Beta 2 (required prerequisite). This release is based on the Whidbey Beta 1 bits, but now includes the Team Foundation server installation. You will have a matched set of Whidbey compilers/editors/frameworks and Visual Studio 2005 Team System all in one place. While the Whidbey components are still Beta quality, not all Team System bits are beta quality yet."

For now you’ll need to run the Yukon database that stores the Team Foundation data on a separate sever from the Team Foundation integration services (web services) because of dependencies on different versions of the CLR. At release the Team Foundation will ship as a single atomic “thingy” and will not be separable.

Visual Studio Team System Information

A few interesting links to VSTS related info that was passed along at the devlab this week...

Randy Miller's whitepaper titled An Integrated approach to Agile or Formal Software Development Process for Software Development magazine’s Oct issue is up online already.

Check out Sam Guckenheimer's whitepaper on MSF titled As simple as possible but no simpler, that was published last month.

Jack Greenfield and Keith Short's whitepaper advertorial on Moving to Software Factories was published online in July this year.

AutoTagging Ballplayers

Since I write alot of posts that include the names of baseball players like Willie Mays, Sammy Sosa, Babe Ruth and others I wanted to find a way to automatically create hyperlinks to the statistics for the players without having to search manually. Since I couldn't find anything out there that did this I wrote a small Windows application using the .NET Framework.

Here's the simple interface.

When the application starts or is activated it copies whatever text is in the clipboard into its main window using the Clipboard object in the Framework. By clicking on Go it parses the text in the window looking for player names. I defined player names as any two consecutive words that begin with capital letters (I know this will miss some players like J.D. Drew). In order to deal with HTML that already contains hyperlinks and text with paragraphs the program adds spaces to the copy of the text it is parsing and replaces control-line feed (crlf) characters with spaces. It then splits the text into words using the Split method in the String class and searches for names ignoring punctuation using the handy IsPunctuation method.

When it finds a name it may prompt the user if the checkbox is checked and if not go on to make an HTTP POST against the search page at using the classes in the System.Net namespace. On that site if the search does not return a unique player the same URL that was posted to is returned and so the program continues on. Of course this means that when multiple players with the same name are found, such as Randy Johnson, it will not tag the name. If the URL returned is different, it will be the player's page which is then saved and inserted into the text with a hyperlink. When all of the replacements are made, the text in the window is copied back to the clipboard so I can easily paste it into the blogging window.

Here is the part of the program that parses the text:

Private Sub Button1_Click(ByVal sender As System.Object, _
ByVal e As System.EventArgs) Handles Button1.Click

' Parse the text for names
' defined as 2 consecutive words with capital letters

Dim text As String = TextBox1.Text

Dim sbText As New StringBuilder(text)

' Make some room for hyperlinks
text = text.Replace(">", " > ")
text = text.Replace("<", " < ")
text = text.Replace(vbCrLf, " ")
' Create an array of words
Dim words() As String = text.Split(" ")
Dim searchText, s, lastWord, searchTextOrig As String
Dim firstUpper As Boolean = False
Dim newText As New StringBuilder
Dim replacedText As New ArrayList
For Each s In words
If s.Length > 0 AndAlso Char.IsUpper(s.Chars(0)) Then
If firstUpper = True Then
' Last two are upper case so search
searchText = lastWord & " " & s
' See if we've replaced it already
If Not replacedText.Contains(searchText) Then
searchTextOrig = lastWord & " " & s
If SearchForName(searchText) Then
sbText.Replace(searchTextOrig, searchText)
End If
End If
End If

' Words that end in punctuation are not considered
If Not Char.IsPunctuation(s.Chars(s.Length - 1)) Then
firstUpper = True
lastWord = s
End If
firstUpper = False
End If

TextBox1.Text = sbText.ToString
' Put the data in the clipboard
sbMessage.Panels(0).Text = "Done. " & replacements.ToString & _
" replacements made."

End Sub

And here is the part that searches

Private Function SearchRef(ByVal searchText As String) As String

Dim brHttp As HttpWebRequest = _
CType(WebRequest.Create(searchUri), HttpWebRequest)

' *** Send the POST data
Dim brPostData As String = "search=" + _
brHttp.Method = "POST"

Dim lbPostBuffer() As Byte = _
brHttp.ContentLength = lbPostBuffer.Length

Dim sPostData As Stream = brHttp.GetRequestStream()
sPostData.Write(lbPostBuffer, 0, lbPostBuffer.Length)

' Get the response and check and see if it's the search page
Dim loWebResponse As HttpWebResponse = _
CType(brHttp.GetResponse(), HttpWebResponse)
If loWebResponse.ResponseUri.ToString <> searchUri Then
Return loWebResponse.ResponseUri.ToString
Return ""
End If

Catch ex As Exception
MsgBox("Could not search the site: " & ex.Message, _
Return ""
End Try

End Function

BTW, this was the first post processed through this tool.

Mad Dog Wins Again

Greg Maddux won his 302nd game for the Cubs tonight with 7 shutout innings against the Expos. He's been the Cubs most consistent pitcher this season, a state of affairs that I would not have guessed when the season started.

A very typical Maddux season, 13-8, 3.70, 175IP, 179H, 28BB, 120K, and a 1.18 WHIP.

Monday, August 30, 2004

To End All Wars

Saw this movie this weekend about prisoners in a Japanese prison camp in WWII. This is an adaptation of a 1963 book that tells the true story of prisoners building a railway that was also the basis for Bridge on the River Kwai (1957). This movie is much more powerful and explores the relationship of justice, forgiveness, self-sacrifice, and the application of the golden rule in a time of war. You can get it on DVD.

The screenplay was written by Brian Godawa, whose web site has more info on the movie and some great movies reviews. Highly recommended.

The movie was recommended to us by good friend and columnist Eric Lapp.

Visual Studio Team System

I'm attending a developer lab in Redmond for Visual Studio Team System this week, due out mid next year. Lots of cool stuff, some of which is not under NDA which I'll be sharing in the next few days...stay tuned.

Saturday, August 28, 2004

The Effect of Pitchers

In an earlier post I introduced the MLB Pocket Manager that used play-by-play data from 1999-2002 in order to calculate probabilities for various offensive strategies including sacrificing and stealing. I wrote the application using the .NET Compact Framework and it can be downloaded here.

One of the immediate questions that come up when looking at how the application is works is how do the odds change when a pitcher is at the plate? This is an issue since the data used in the tables is an aggregate of all hitters and everyone knows that decisions about sacrificing, for example, are impacted greatly by whether or not a pitcher is hitting.

In order to answer that question I calculated the probability of various offensive events for all pitchers in the period 1999-2002 using the Lahman database. They were:

Out = .798
Single = .120
Double = .027
Triple = .002
Homerun = .008
Walk = .047

These calculate to a .164 batting average, .221 slug, and .203 on base percentage.

Next, I used a little algebra as described by Albert and Bennet in their excellent book Curveball. For example, the probability of scoring with a runner on first and nobody out is 43.7%. The same situation with a pitcher up yields a probability of 36.7%. This is calculated by multiplying the probability of each offensive event with the probability of scoring after the event takes place and summing the values. For example, a pitcher makes an out 79.8% of the time. This is then multiplied by probability of scoring after the out, 28.3% which equals .226. The same is done with each offensive event shown below and the values added:

.798 * .283 = .226
.120 * .641 = .077
.027 * .876 = .024
.002 * 1 = .002
.008 * 1 = .008
.047 * .641 = .030

The sum of these numbers is, accounting for rounding, .367. This number is then used in the break-even equation by substituting the present value (Pv) for either run potential or scoring probability with a value calculated explicitly for pitchers. You'll recall that to calculate the break even percentage you use the following equation:

P = (Pv - Fv) / (Sv - Fv)

where Fv is failure value and Sv is success value.

Finally, I added a checkbox to the user interface that can be checked in order to calculate the break-even percentages with the pitcher up. I also added an entry to the tables list that shows the pitcher stats that were used in the calculation.

Enough of the math, so what are the results? Here are some examples:

With a runner on first and nobody out the Pocket Manager says that an average hitter should never bunt. With the pitcher up the strategy makes sense if the goal is to score one run and if the pitcher's odds of laying down the bunt are 83.1% or better. In another example, with runners on first and second and nobody out an average hitter should not sacrifice if the goal is maximize runs and must be succesful 79.9% of the time if the goal is to score a single run. With the pitcher up these odds change dramatically as it makes sense to sacrifice with a break-even percentage of just 35.9% to score one run and 62.1% to maximize runs. In a third example a baserunner needs to be successful stealing second base 73.5% of the time with nobody out and an average hitter up in order to maximize the number of runs scored in the inning. With the pitcher up the stolen base percentage needed goes down to just 56%.

Of course, the lesson here is that the weaker the hitter, the more risks a manager can take with one-run type strategies like the bunt and stolen base.

Christianity and the Royals

As I was running errands last Wednesday morning in the car I was surprised to listen to a discussion of Christianity and pro sports on the local sports talk radio station. The host, Soren Petro (who incidentally I think is the most prepared and knowledgeable host on the station), was asking the question as to whether the open Christianity on the Royals as evidenced by Mike Sweeney, Tony Graffanino, Jeremy Affeldt, and Carlos Beltran (before the trade) was dividing the team and contributing to the 45-81 record.

As possible evidence he pointed to the events at Christian Family Day, the day when after the game the Royals mentioned above plus Corey Koskie of the Twins gave their testimonies and a short inspirational message to those who wanted to stay after the game and the fact that after games a few of the players (mostly American as opposed to Latin players he pointed out) participate in a Bible study "on company grounds" (something Petro hadn't seen before). His partner on the show Danny Klinkscale did offer that he thought when a team had as many Christians on the roster as the Royals do it did tend to be divisive. He offered no examples of other teams to back up his point however. To Petro's credit he explicitly denied that Christians were somehow less competitive citing the Chiefs Tony Richardson. This was the implicit criticism leveled by former Royals manager Tony Muser did in his now famous comment regarding his team's slow start in 2001:

"Chewing cookies, drinking milk and praying isn't going to get it done. It's going to take a lot of hard work."

I'm not surprised that the issue is being brought up as it was in 2001. Petro specifically said that he was bringing up the issue because it has been discussed behind the scenes by the media and I assume by front office personnel. He also did a nice job of giving a pastor who called in from Athletes In Action a good chuck of time to discuss the issue.

As a Christian I feel a need at least to think through the issue and so a few points follow:

  • I do understand what Petro is talking about when says that members of media have been discussing this issue. I could feel and hear the resentment to Christian Family Day in the press box when I scored the game for
  • This issue was only raised (whether or not it was talked about privately before) publicly when the team is losing. Last year with a winning team the same dynamics were in place and yet no one publicly criticized the Christians. This reminds me of a comment Bill James made in The New Historical Baseball Abstract in discussing one time Royals third baseman Kevin Seitzer. "He was a born-again Christian who sometimes irritated his teammates and managers, perhaps for good reason or perhaps just because, when things go wrong, it's easy to blame the Christian." I would venture that the latter is mostly the case here.
  • Christianity is inherently divisive as Jesus himself said. I think it does naturally offend some who will automatically feel as if the Christian is claiming moral superiority or special knowledge. In reality, the Christian should be humble in their realization that only through Christ's work could they hope to gain salvation. As to the point that those attending the Bible study were American as opposed to Latin players I can only offer that Carlos Beltran I believe came to Christ through his relationship with Mike Sweeney so it doesn't appear as if there is something exclusionary going on.
  • I doubt that Christianity can be more divisive than the differences between personalities generally. How about the players who don't go out and carouse at night? How about the player who actually likes to read books on roadtrips? How about the player who is active politically? These are simply differences between individuals that all of us deal with in our professional lives everyday.
  • Bible studies "on company grounds" are common. When I worked at a Fortune 500 company a weekly Bible study met during lunch in a conference room on the corporate campus. The room was given up for work related meetings of course and the Bible studies were never conducted during normal working hours. As long as the studies and chapel services aren't held in lieu of batting or fielding practice for example, I can't see how there would be cause for complaint.
  • The life of a major league baseball player is quite different from the corporate 9 to 5 world however. For much of the season the clubhouse is the central place for players to congregate not only for baseball activities but also for recreation (cards for example) and relaxing. Under those circumstances a Bible study in the clubhouse should be no more a problem than a game of cards or a discussion about politics or movies. This also relates to chapel services in the clubhouse. Since the lifestyle of the major leaguer means that they are often at the ballpark on Sundays it only seems sensitive (and we all want to be sensitive) to have a service.
Now, there are other more relevant criticisms that I think can be made of professional athletes who profess to be Christians. Some of these I'll deal with in a later post.

Friday, August 27, 2004

Beane on Moneyball

Here is an excerpt from the Billy Beane interview on Athletics Nation. I was alerted to it by David Pinto's great blog

"Blez: How does a team like Oakland continue to stay competitive? The book talks a lot about OBP and my personal perception, and in talking to Michael Lewis, was that the A's have evolved beyond that into defense and other things.

BB: That's what's interesting when you talk about critics. We've never really sat down with any of these guys and explained to them what we're doing. They assumed they know what we're doing, which is great because they're out there blasting on the airwaves and they don't really have a clue what we're doing. But the closest thing I can say is that we're in a finite market and we're always trying to take advantage of any inefficiencies. Right now, you take on-base percentage and it's en vogue. It wasn't 10 years ago. We could get guys like Matt Stairs and Geronimo Berroa.

Blez: And guys like Scott Hatteberg.

BB: Exactly, guys like Scottie Hatteberg. Now people are recognizing the value of that and they're paying for it. And if we're in a bidding war, we're going to lose that. So we have evolved. If you look at some of our first playoff teams, the '99 team that won 87 games, it was a power, on-base team. Now we're tops in the league in defense and pitching. For us, it's all about filling in on the backend and figuring out what people are undervaluing. You know, one day we're going to have a team with guys who steal 50 bases because people aren't paying for it. But it's all about wins. That's all that matters. "

I agree that one of the underlying messages of Moneyball was that the A's had found ways to exploit the market inefficiencies in baseball - inefficiencies caused by the conservative nature of "baseball people" made more exploitable by the information age. I'm not so sure, however, that Beane is correct in his assumption that there will continue to be skills that are undervalued, meaning skills that are both cheap and that lead to winning baseball games.

A second message of Moneyball, brought out beautifully in The Numbers Game in the discussion of Eric Walker, is that the realization that OBP was important in winning baseball games coincided with the A's Sandy Alderson exploiting the fact that it was undervalued. The A's have also exploited the overvaluation of high school pitchers, closers, and pitchers who throw hard. However, each of these items may be a one time discovery and therefore there might not be too many more times in history when both vectors - importance and undervaluation - meet.

I know this may sound a little like the physicist in 1899 saying that there was nothing more to discover but to illustrate, imagine if the A's did field a team of speedsters who were great defensively (I'm not considering pitching here because pitching has never been undervalued). The value of stolen bases in terms of run production is pretty well established using Linear Weights, Base Runs, Runs Created and other run production estimators. Would the A's score more runs if the team stole 300 bases or hit 200 homeruns? Obviously, the latter. Likewise the difference in run prevention between a great and an average defensive team pales in comparison between teams with average and stellar offenses. In other words, the gains made by focusing on speed and defense are smaller than those made when you have a great offense because of the inherent structure of the game. Perhaps if baseball reverts to a dead-ball like style of play and changes a bunch of rules (for example calling a batter out who fouls off a two-strike pitch) there will be a set of skills (contact hitting, bunting, and the hit and run for instance) that are undervalued but short of that the game is what it is and the value of certain offensive events are pretty well fixed.

Now as Beane says it's true that as the skills talked about in Moneyball become more valued teams like the A's will have a harder time securing those skills because of the vast disparity between revenues in small and large markets. Turning to other lesser skills, however, is not going to help them win more ball games. It seems to me that in the long run small market teams would do well to press the large market teams into a workable revenue sharing structure.

Thursday, August 26, 2004

Unearned Runs Again

I was reading an article on Bill Jame's Win Shares system and although I've read the book I noticed that the article mentioned that unearned runs are split 50-50 between the pitchers and fielders. Based on my previous post on unearned runs I would think that the split would be more like 80% fielders and 20% pitchers. There I noted that the difference in URA between good and bad pitchers along three axes (ERA+, WHIP, and K/9) did show a difference indicating that good pitchers suppress unearned runs as well as earned runs, but that the difference was fiarly small when compared with the total number of unearned runs that score.

Thinking that I had missed something I reran some numbers for 2003 to check my thinking. I reasoned that if unearned runs are primarily or even as much as 50% the fault of the pitcher, then pitchers with high URAs (unearned run average) should be worse pitchers than those whose URAs are low. In other words, bad pitchers will allow opponents to cache in mistakes by the fielders that play behind them so there should be a correlation between high URAs and bad pitchers.

To check this out I found the 76 worst pitchers in URA who pitched over 50 innings and the best 77 pitchers in URA for 2003, roughly the bottom and top quartiles. The worst URA pitchers had a cumulative URA of .716 while the best pitchers were at .100 - a difference of over a half run per game. However, when you look at these two groups their rate statistics are not all that different. For example, their strikeouts per 9 innings are close (6.38 to 7.27), their walks per 9 are close (4.96 to 4.34), their homeruns per 9 are close (1.41 to 1.16), their BABIP is close (.333 to .328), their WHIP is close (1.42 to 1.29) , and the team errors per game are close (.69 to .65). Their ERAs were 4.46 and 4.03 respectively.

Certainly, the high URA pitchers trail in all these categories and are therefore worse pitchers. As I said in my previous post I do think that better pitchers give up proportionally fewer unearned runs, but cumulatively I doubt that these differences could add up to over a half an unearned run per nine innings. After all, the high URA group only gives up just a little over one more baserunner per 9 innings.

In fact, if you reduce the group size to the top and bottom 10% (roughly 30 pitchers in each group) the URA difference climbs to almost a run per game (.999 to .046) but the rate stats don't change.

So what accounts for the differences? I think there are three possible answers:

1) Luck. The high URA pitchers were mostly unlucky. Either their teams committed a lot more errors for them than for other pitchers behind them or the errors happen to come at the wrong time. Either way, they were unlucky. Conversely, the low URA pitchers received good defense that didn't make errors with men on base.

2) Choking. The high URA pitchers pitch well when things are going well but implode when a mistake is made behind them.

3) A combination of both.

At some level the third option must be the correct answer but the hard part is figuring out what the mix is. My intuition would say that luck is the far larger factor here. There may be some players (perhaps Brian Anderson whose 1.64 URA in 2003 was second behind Scott Elarton and who is largely repeating that at 1.35 this season) tend to fall apart when errors are made. If luck is the primary factor then we should be able to detect it since lucks tends to even out with the number of trials which in this case are innings. So let's take a look at the innings pitched:

The average IP for the entire group of 295 pitchers was 119.3, the average of the high URA group was 114.8 and for the low URA group 104.5. The middle group of 142 pitchers had an average IP of 129.7. So both ends of the spectrum tend to include pitchers with fewer innings pitched which would be expected if luck played a role.

You could also look at pitchers with high URAs that also threw a lot of innings and see if these are guys tagged with the "choker" label. The top 10 who threw more than 125 innings are:

                      URA    IP

Brian Anderson CLE 1.642 148.0
Jeremy Bonderma DET 1.000 162.0
Shawn Estes CHN .947 152.3
Ramon Ortiz ANA .850 180.0
Jason Davis CLE .818 165.3
Mark Hendrickson TOR .797 158.3
Mark Buehrle CHA .704 230.3
Andy Pettitte NYA .692 208.3
Joe Kennedy TBA .677 133.7
Jae Weong Seo NYN .670 188.3

In perusing this list none of these pitchers jump out at me as being labeled as having a tendency to implode other than Anderson so I doubt that choking is the primary cause.

You'll also notice that the spread of the top 10 pitchers is quite large which would indicate that we're likely dealing with a significant amount of randomness.

In summary I can say:

1. I still support the idea that good pitchers give up fewer unearned runs given the same misplays behind them

2. That the variation between high and low URA is between half a run and a run per 9 innings for pitchers who've pitched a modicum of innings

3. That the difference in URA is mostly accounted for by luck

4. That the luck probably accounts for 70 to 90% of the difference with the rest being related to the skill of the pitcher primarily related to reducing the number of baserunners per inning*

5. Given the above that the Win Shares system should be modified to assign fielders the responsibility for 85% of the unearned runs and pitchers 15%

* In the previous post I found that the difference in URA between good and bad pitchers was between .050 and .100 depending on how you defined the concept of "good pitcher". So for an average pitcher with an URA of .450 luck accounts for 78-89% of the total.

BTW, here is a great link to Win Shares including current calculations. It's sad to note that Carlos Beltran still leads the Royals in Win Shares with 14 and he hasn't played for the Royals since late June.

Wednesday, August 25, 2004

Picking on Pena

Ron Hostetter has a nice post discussing the hiring of Tony Pena as the Royals manager. In particular he says:

"But it is clear that Pena's managing abilities are not up to par. He loves to bunt away precious outs in early innings. He often yanks pitchers who are doing well, while leaving pitchers who are getting killed out there way too long. The relievers never know what role they have. And too often, utility infielders are playing outfield positions."

As if to underscore his point, in tonight's 7-5 loss to the Angels Angel Berroa doubled to leadoff the second inning with the Royals already up 3 to 1. While the next batter, John Buck is no Barry Bonds he has hit better of late (.481 SLUG in August) and yet Pena had him sacrifice Berroa to third which he did. After a ground out by DeJesus luckily Desi Relaford singled to score the run.

But should Pena have bunted?

To answer that question I turned to my handy dandy MLB Pocket Manager. The Pocket Manager (based on tables derived from play-by-play data from 1999-2002) says that in that situation, runner on 2nd with no outs, the run potential is 1.189 runs while the scoring probability is 63.2%. The calculator then says that if the manager attempts to bunt the break even percentage for being successful and scoring at least one run is an astounding 93.9%. Further, the Pocket Manager says that it is never a good idea to sacrifice in that situation if your goal is to maximize runs.

In short, Pena should not have bunted in that situation even if he was virtually certain that Buck could get the bunt down. In that situation a manager especially with the Royals pitching staff and only a 2 run lead should not be playing for a single run. Ultimately, of course the Angels came back and won the game. Buck later hit a solo homerun. A great example of a manager squandering precious outs in the early innings.

Tuesday, August 24, 2004

Pickering Picks'em Up

I was scoring the Royals game on Sunday for when Calvin Pickering, fresh from Omaha hit two homeruns including a grand slam against the Ranger in a 10-2 win. His second homerun went 440 feet, the longest I've seen hit to centerfield this year. He also showed his plate discipline picking up a walk and battling a tough lefty before lining out to second. Last night he went 1 for 4 with a homerun off of Bartolo Colon in a 9-4 Royals loss. I didn't see the game.

I've long been advocating bringing up Pickering because of his monster season at AAA and because he's a better hitter right now than Ken Harvey and because at least it would give Royals fans something to watch for the remainder of the season. Hate to sound like a told-you-so but if the shoe fits...

Seriously, Pickering seems to have decided to apply himself a bit more since returning from an injury that cost him all of 2002:

"I've always put up numbers my whole career but, being young, you have a tendency to waste an at-bat. What I mean is you get two hits in your first two at-bats and you end up giving up your last two," he said.

"I finally came to the conclusion that I have four at-bats and I'm not going to give one away. If I get four home runs, I'm going to try to get them -- put a good swing on them and see what happens."

Pickering has actually had only had 137 plate appearances in the major leagues. His body type and his primary non-traditional skill (plate discipline) have both conspired to keep him on the farm. On the downside he's almost 28 so this may be as good as it gets (although Harvey turned 26 in spring training) and he is really big (6'5" 280 pounds) which increases his odds of injury and makes him useful only on the far left side of the defensive spectrum (DH- 1 B - LF - RF - 3B - CF - 2B - SS - C). You never know though with a guy who has talent and who is mature enough to use it. At the very least Allard Baird needs to explore what he can get for either Pickering or Harvey since they both won't be playing first base and DH for the Royals in 2005 unless by some miracle he's able to unload Mike Sweeney. Pickering's value should be increasing in the post-Moneyball era as his skills have become more valued and so he might be able to bring something the Royals could use in a trade with a contender that needs a lefty bat off the bench. To Baird's credit he did sign Pickering when other clubs didn't saying that "if you look back we signed him because he has the ability to walk and the ability to hit the ball out of ballpark." Now might be the time to cache that chip in.

As far as Harvey is concerned I think Baird needs to make a determination as to whether he'll ever develop power with that ugly inside out swing that allows pitchers to throw hard stuff down and in repeatedly to get him out. If so, they should keep him, if not trade him yesterday since his All-Star appearance gives him some perceived value he might quickly lose. Contrary to some opinions he's not a good defensive first baseman and only looks decent because Royals fans have seen Sweeney man the position. I don't have any allusions that Pickering would be better defensively but he's not really going to be much worse.

Monday, August 23, 2004

Unearned Runs: Whose Fault?

I've blogged before about the idea of getting rid of unearned runs and hence ERA and simply moving to a RA (run average) for pitchers. This idea is championed by Michael Wolverton of Baseball Prospectus. Recently, an article by Keith Scherer attacking the idea as discussed by Bob Sheehan on BP was posted on Rob Neyer's web site. Scherer says:

"The argument against ERA is straightforward: the distinction between earned and unearned runs (UER) is a false dichotomy caused by a misperception. As Michael Wolverton puts it, 'The main problem with unearned runs isn't errors, it's the notion that the pitcher's job ends whenever an error is made.'

ERA supposedly avoids charging a pitcher with runs that scored through no fault of his own. According to Baseball Prospectus, this metric has it backwards. ERA seeks to blame fielders for unearned runs, but unearned runs, like earned runs, are really caused by pitching failures and not by bad fielding. Wolverton is Baseball Prospectus’s chief exponent of the argument. He puts it this way:

'Errors will happen. Good pitchers will minimize the damage caused by them. That is, a good pitcher will allow fewer runners on base before the errors happen (so there aren't runners to score on the errors), and will allow fewer hits and walks after errors happen (so the runners who reached on errors won't score).' "

Scherer goes on to discuss why, in his view, this argument is wrong. His two lines of evidence are that:

1) The difference in errors between teams since 1990 with high and low numbers of unearned runs varies more dramatically than the difference in hits, walks, homeruns, and earned runs

2) A random sample of 10 pitchers showed that 7 of them were within 2 percent of each other in percentage of runs that were unearned (UER%), the average being around 10%

His first line of evidence correlates unearned runs with errors (putting the onus on the fielders) while the second shows little correlation between the percentage of unearned runs and pitchers (taking the onus off the pitchers).

I wanted to look into his second point more closely and so I constructed a study where I selected all of the pitcher seasons since 1960 and calculated their UER%, URA (Unearned run average i.e. the number of unearned runs given up per 9 innings), Team Errors/Game, BABIP (batting average on balls put in play), WHIP (walks+hits per inning pitched), ERA+ (ERA relative to the league with 100 being league average), and K/9.

First, the totals for these 19,185 seasons:

UER% = 10.2%
URA = .450
E/G = .805
ERA = 3.95
K/9 = 5.7
BABIP = 0.313
WHIP = 1.35

So the average pitcher since 1960 gives up just under half an unearned run per game with about 10% of their runs being unearned. This validates Scherer's data point that historically 10% of runs are unearned.

To see whether the UER% or URA varied by quality of pitcher I split the pitchers by ERA+ below and above 100 (> 100 being worse than the league average). Doing so I got the following:

ERA+ < 100 7739 .415 .787 .114 80.9 3.19 6.3 .296
ERA+ > 100 9202 .486 .807 .091 122.5 4.83 5.7 .329
The pitchers with lower ERAs (the first row) had a higher UER% and a lower URA. In other words, the better pitchers actually gave up more unearned runs as a percentage of their total runs allowed. A higher UER% makes sense for better pitchers on the theory that the defense plays the primary role since better pitchers give up fewer earned runs by definition but can't control unearned runs to the same extent. The result being that their unearned run percentage will be higher. Although Scherer did not point this out in his article this is a support for his argument. Score one for Scherer. Notice that this occurred even though the teams played slightly better defense behind the better pitchers (.787 errors per game versus .807, a difference which translates to about a half an error per 200 innings pitched).

On the other hand the pitchers with lower ERAs also had a lower URA. A lower URA supports the theory that better pitchers give up fewer unearned runs because they don't allow runners who get on base via errors to score as often and so unearned runs are not that important. Score one for Wolverton.

So which theory is correct? They both are.

To understand why you simply need to think about how unearned runs actually score. It is obvious that even good pitchers can be victimized by bad defense. Consider the case where a pitcher gets the first out on a groundball, gives up a bloop double, strikes out the third batter looking, and then a ground ball goes right through the first baseman's legs scoring the runner from second. Obviously, these kinds of situations are out of the control of the pitcher and regardless of how good the pitcher is, the run will still score. I think these kinds of situations result in a higher UER% for good pitchers than for bad. Conversely, we've all seen innings where the leadoff hitter gets on base via an error, the pitcher strikes out the second hitter, gives up a single to the third hitter putting runners on first and second, the fourth batter flies out and the next batter hits a three-run homer making all the runs unearned. These are the kinds of things that happen to bad pitchers which good pitchers avoid and which tends to drive up the URA for bad pitchers.

The question then is not which of these theories is true, but rather which of them is more important and has more impact.

This can be calculated by looking at the run differences produced by the variation in URA. The difference of .071 runs per 9 innings between low and high ERA pitchers calculates to be about 1.6 runs over the course of 200 innings, or 16%. In other words better pitchers do appear to suppress unearned runs but do so only marginally, saving their teams a handful or fewer runs per year. The majority of unearned runs (the remaining 9 or so over 200 innings) appear to be of the variety that would score anyway.

Now of course proponents of getting rid of unearned runs will point out that the split based on ERA was perhaps not the correct way to look at the data. Perhaps other skills a pitcher has suppress unearned runs to a much greater extent?

To check this out I also split the data by K/9 above and below the league average and WHIP above and below 1.36 (the average for the seasons in the study). I got the following results.

K/9 > Lg 9829 .426 .794 .103 96 3.78 7.26 .312
K/9 < Lg 7112 .472 .804 .105 104.1 4.11 4.64 .311

WHIP <1.36 7177 .394 .796 .108 85.4 3.27 6.19 .293
WHIP >1.36 9764 .513 .800 .098 117.4 4.75 5.85 .333

In examining these results you'll notice that the differences in UER% and URA are not as great between strikeout and non-strikeout pitchers as between low and high ERA pitchers. However, high strikeout pitchers did give up a lower percentage of their runs as unearned which makes sense if you assume that high strikeout pitchers would suppress unearned runs by not allowing as many runners to advance on outs.

What's interesting, however, is that the low WHIP pitchers have the lowest URA in the study at .394 translating into a savings of over 2.5 runs per 200 innings pitched, a savings of 30%. Common sense says that this is because fewer baserunners means fewer opportunities for errors to produce unearned runs. But even if you increase the split, taking only the top and bottom 5% of pitchers the difference is around 10 unearned runs over the course of 200 innings. So although keeping runners off base makes a difference in giving up unearned runs, once again the difference does not account for the lion's share of the unearned runs given up.

As I mentioned above, both ideas about unearned runs are true. They are a product of the defense and they are magnified when errors are made behind bad pitchers. It's just that they're not magnified to the extent that using unearned runs to assess pitchers would make the assessment more accurate. A good working estimate would be that between 16% and 30% of unearned runs should be assigned to the pitcher and the remainder to the defense.

Sunday, August 22, 2004

Cracks in The Da Vinci Code

While I've not read Dan Brown's The Da Vinci Code by all accounts it is a page-turner and a fairly well written book. Unfortunately, I've also heard that it promotes a view of Christianity that is questionable at best and relies on unhistorical information. This morning in church our pastor gave a five point talk on "Cracks in The Da Vinci Code". In short, here were his main points:

1. Bad Background. Although Brown says he bases his book in historical facts wrapped in fictional characters and events his "factual basis" is anything but. In particular his use of "The Priory of Sion" as the secret society that has protected the secrets of Christianity has no foundation. In actuality, the French man who started The Priory of Sion in the 1960s I believe admitted years ago under oath that the organization was not based on anything historical. His list of "Grand Masters" and the rest were entirely made up. By the way, he also thought he was descended from Jesus and was the rightful king of France. Unfortunately, information about the group was put into the book Holy Blood Holy Grail from which Brown apparently pulled it. His use of Opus Dei and the role he gives the organization within the Catholic Church is also not realistic.

2. Jesus annointed as divince at the council of Nicea. Brown states in the book that Jesus was only declared divine at the council in 325AD and that his early followers saw him only as a prophet. The problem with that view is that it is not supported by historical documentation. From the writings of the church fathers dating from 105AD up to 300AD it is clear that considering Jesus as divine was normative in the church well before Nicea. Of course, the New Testament writings of Paul and Peter makes this clear as well. Nicea addressed the heresy of Arius of Jesus as being a separate created being but that view was not widely held in the church before or after Nicea.

3. The gnostic gospels were suppressed at the council of Nicea. Brown's view is that the gnostic gospels like Phillip were suppressed at Nicea in favor of gospels that promoted Jesus as divine and the NT gospels were "embellished". Unfortunately for Brown the historical record shows that the gnostic gospels were written in the late 2nd to 4th centuries well after the four NT gospels, and that the council of Nicea did not address the question of the canon. The canon of the NT was substantially in place by the early 2nd century and consolidated by the end of the 2nd century. In fact, when the church fathers criticized various heresies (for example in Against Heresies) they don't even quote from the gnostic gospels because they were not a part of mainstream Christian thought. Secondly, his claim that the NT gospels were embellished is empirically false since numerous copies of the gospels and citations from both before and after Nicea exist. When you compare them there are not substantial differences.

4. A Revisionist view of the development of the NT canon. Brown relies on the revisionist view of history, namely that history is written by the winners, and so the early beliefs of Christianity and the secret history of the church have been suppressed by powerful forces starting at Nicea. While this is an increasingly popular view of history and you can find scholarly proponents of it such as Elaine Pagels, there is no actual evidence to support it.

Here are a couple of critical reviews of the book:

Breaking the Da Vinci Code

Not InDavincible

Saturday, August 21, 2004

The Impact of Count

It's often said as Tim McCarver did today on the Fox broadcast of the Red Sox/White Sox game that the count matters tremendously. He backed up his statement with the fact that AL hitters are hitting .186 with 0-2 count in 2004. I remember tracking batting averages by count by making tally marks on paper back in 1982 when watching Braves and Cubs games on TV. As I recall I enlisted my sister to help on occasion, perhaps turning her into the rabid Braves fan she is today. For that I'm sorry but I did use the data I collected in a speech for high school speech class. I don't remember the grade. Palmer and Thorn also include a table of averages by count in The Hidden Game of Baseball.

But is the count that significant?

Fortunately, retrosheet makes tabulating averages by count much easier than my method of 1982 and so I spent about 10 minutes during the Olympic basketball game today creating the table shown below for the 1992 AL.

Count  AB    H  2B 3B HR    TB   BB IBB HBP   SO  AVG SLUG A-SO  S-SO

0-2 5090 854 134 20 54 1190 0 0 66 2232 .168 .234 .299 .416
1-2 10683 1879 314 36 121 2628 0 0 90 4249 .176 .246 .292 .408
2-2 10488 2027 346 40 160 2933 0 0 56 3714 .193 .280 .299 .433
3-2 7052 1575 324 41 154 2443 3248 4 16 2001 .223 .346 .312 .484
0-1 6942 2099 334 36 142 2931 0 0 81 0 .302 .422 .302 .422
0-0 10986 3347 555 60 315 4967 0 0 142 0 .305 .452 .305 .452
3-0 192 59 13 1 6 92 1530 593 3 0 .307 .479 .307 .479
1-1 7554 2338 427 45 195 3440 0 0 63 0 .310 .455 .310 .455
1-0 7647 2399 471 42 223 3623 0 0 35 0 .314 .474 .314 .474
3-1 2319 738 137 12 108 1223 2300 29 8 0 .318 .527 .318 .527
2-0 2838 925 207 20 120 1532 0 0 8 0 .326 .540 .326 .540
2-1 5356 1766 334 33 178 2700 0 0 17 0 .330 .504 .330 .504

At a glance what this clearly shows is that McCarver is correct. Hitting with an 0-2 count produced a batting average of .168 with a slugging percentage of .234. Conversely, hitting with a 2-0 count produces a .326/.540 result.

But what's missing from that first-look analysis is considering the impact of strikeouts. In counts where the batter has two strikes he has a chance to strikeout. By excluding strikeouts from the calculation (the last two columns in the table) on the argument that when there are less than two strikes taking a strike or swinging through a pitch does not negatively impact the average, you can see that both the averages and slugging percentages are close to the non-two strike counts. This makes perfect sense since the odds of a ball put in play turning into a hit cannot logically be much impacted by the count. You can also see, however, that the slugging percentage is more impacted than the average. Again this make sense since with two strikes hitters tend to protect the plate and cut down their swings a bit. Aggregating these numbers you get:


Ahead .303 .478 .318 .501
Behind .215 .300 .298 .415
Even .269 .395 .305 .447
So the real lesson is that a hitter's potential for extra bases goes down when behind in the count but their batting average doesn't suffer that much as long as they can put the ball in play. Of course, that's why great contact hitters like Tony Gwynn still hit well with two strikes.

NBA Falls Flat in Athens

I was struck watching the men's Olympic basketball loss to Lithuania today how much more fundamentally sound and skilled the Lithuanians were than the NBA players. In all aspects of the game that showed true skill; free throw shooting, passing, three-point shooting, and game awareness the Lithuanians dominated. This underscores for me why the NBA has become so much less interesting over the past decade. The game relies too much on pure physicality and too little on skills. I'll take the college game any day. I wonder when USA Basketball will accept the fact that they need to take fundamentally sound players and good shooters over pure athletes.

In other Olympic news my wife's sister's husband Jim Gruenwald is a Greco-Roman wrestler for the American team at 60kg (132 lbs). His tournament starts August 24th and can be seen live on MSNBC at 4:00AM. Since Rulon Gardner when the gold in Sydney there should be more coverage of the entire team this time around. Jim suffered a serious injury in the 2003 World Championships and battled back to win at the Olympic trials. He also wrestled in Sydney and got 6th place. Hopefully, he'll get a good draw (its random) to start the tournament.

But is it the ball?

In my previous post about the increase in homeruns since 1993 I listed a number of factors I thought were all contributors including greater strength by hitters, the development of thin handled bats, the use of aluminum bats, and the crackdown on bean balls accompanied by the decreased fear by hitters to lung over the outside corner and drive the ball. In a subsequent post I attempted to quantify the impact of decreased foul territory on homeruns and came to the conclusion that while it has certainly had an effect, the effect is trivial.

There is one other factor that always gets bandied about in such conversations and that is the theory that the ball has been juiced. In many cases this argument is presented with conspiratorial overtones that the powers behind MLB are regularly manipulating the ball to increase offense and thus fan interest. On the SABR list this week Alan M. Nathan, a physicist at the University of Illinois offered that he and a colleague had tested the COR (coefficient of restitution, a measure of the "bounciness" of the ball) of twelve 2004 MLB balls from Rawlings and nine unused mid-1970s baseballs he received from late A's owner Charlie Finley's wife.

His conclusion from this small sample was that the balls were identical to the precision of the measurements. Similar studies I've heard about in the past but can't find at the moment came to the same basic conclusion. One should point out, however, that the power surge of 1987 (which resulted in 49 homeruns for Andre Dawson of the Cubs and Mark McGwire of the A's tied for the highest total in the 1980s) may be attributed to differences in the ball. That year Rawlings manufactured the balls in Costa Rica instead of Haiti and it may be that it took some time for workers in the new plant to set the machines properly to produce consistent baseballs.

I think the principal reason that livelier balls are always pointed to as a culprit is because of the association between livelier balls and the homerun explosion of the early 1920s. In reality, although "a better quality of yarn was available after World War I the effect was not dramatic" (James, The New Historical Baseball Abstract) . The true reason that homeruns increased was threefold:

1. Babe Ruth had initiated a new style of play that fans responded to and that other players emulated, namely, taking a full swing at the ball and holding that bat at the end. One bit of evidence for this is the often told story of Ty Cobb adopting Ruth's style in a game and hitting three homeruns to show writers how easy it was

2. The owners, fearing a backlash from the Black Sox scandal just breaking at the end of 1920, did nothing to prevent the new style of offense from being adopted. Baseball being inherently conservative would likely have taken measures to squash Ruth in less turbulent times

3. When Ray Chapman was killed by a Carl Mays pitch in 1920 the league began to direct umpires to keep newer and therefore whiter and livelier balls in play. This also led to the ban in 1921 of the spit ball, shine ball, and other forms of defacing the ball that had been popular since just after the turn of the century. As a result pitchers had to learn how to more effectively make the ball move without the aid of foreign substances

Incidentally, Nathan has on his web site a talk "How Does a Baseball Bat Work?" where he offers the following simple equation for estimating the speed of a hit ball:

Speed of ball on impact = (.2 * vBall) + (1.2 * vBat)

The thing to notice is that the speed of the bat is much more important than the speed of ball. Hitters with increased strength can bring the bat to greater velocities through the strike zone thereby putting the ball in play at high speeds which result in more homeruns.

Friday, August 20, 2004

MLB News and Notes

Working as a stringer for tonight at the Royals/Rangers game. A few notes while I wait for the game to start...

  • Tony Graffanino of the Royals will have season-ending surgery on his right rotator cuff.
  • After reading The Numbers Game by Alan Schwarz, thumbing through The Great American Stat Book, 1987, and looking at the retrosheet play-by-play data I finally realized that the scoring system I use for and even the paper scoresheet I keep during the game are based on the coding system developed for Project Scoresheet, the volunteer play-by-play coding effort started by Bill James, in the late 1980s. Schwarz includes the interesting history of the break between Project Scoresheet and Stats, Inc.
  • Bug Selig's contract was extended through 2009. The MLB press release quotes George Will - "The office of Comissioner of Baseball is today and will ever be the lengthening shadow of Bud Selig...Baseball's golden age coincides with Bud Selig's Commissionership in no small measure because of the service he has rendered to the sport." Their press release talks about the revenue increases since 1992, the collective bargaining agreement of 2002, franchise values, and even MLB Advanced Media - for which I work, among over a dozen other accomplishments.
  • The Cubs are now first in homeruns in the NL with 170 and last in walks with 338 which adds up to a team that has been shut out 9 times and is the bottom half in run scoring in the league (526). The Cubs still lead the league with 44% of their runs coming on homeruns. At least Dusty finally started batting Derrek Lee higher in the order. After the Cubs finish playing the Marlins in the next week or so they won't face another team over .500 the rest of the season. The wild card is theirs to lose.
  • The Royals are now last in the AL in runs scored with 502 runs scored and are second to last in the league in runs allowed with 656, last in strikeouts by pitchers, and also last in errors with 103.
  • David DeJesus is 25 for 64 (.391) since July 30. Since his 1 for 23 to start the season he's hitting .298.
  • The Royals have used 53 players this season, tying their club record set just last year
  • The Royals have had better pitching of late getting 10 quality starts in their last 14 games. They're still only 6-8 in those games.
  • The Royals are 37-8 when entering the 9th inning with a lead. Affeldt will be back with the Royals tomorrow.
  • Allard Baird continues to make good deals. He picked up Matt Kinney from the Brewers last week. I'm planning on writing a post that details how he's done a nice job of acquiring talent this season (Huber, Kinney, Nunez, Mateo, Bautista, Teahan, Buck, Wood). Speaking of Wood he's pitching tonight and has looked good in his last two starts, 13 2/3, 4 ER, 2 walks. Actually, seven of his last eight starts have been good although he's picked up only 1 win in that span (his run support has been low even for the Royals at 3.4 runs/start).

Software Factories

Here are a few good links on understanding a bit about the idea of "software factories".

From chapter 17 of the book:

“Software Factories….are focused on product families, and can therefore make specific assumptions about the problem domain, the architecture, the implementation technologies and the development process. This lets them provide appropriate pattern palettes and bindings tuned for individual product families.”

This idea of product families is the key. They envision that tools (the Whitehorse tools included in VS 2005 for example representing the tip of the iceberg) will enable the creation of software product families based on what they call a “software schema” that allow development teams to quickly build customized versions of their software for different clients. Through domain specific languages (DSLs) and models that capture lots of metadata they propose to automate much of the construction of the software so that if you change a requirement at a higher level it gets reflected in the architecture, implementation, and deployment environments. It really introduces software development methodology different from MDA, UP, XP, and Agile Modeling.

In short, the software factory encompasses the idea of "economies of scope" for building enterprise business software rather than "economies of scale" upon which manufacturing and commerical software such as operating systems are based.

It seems to me that following this approach would mean that the Microsoft partner community might become more verticalized as a whole with different partners making the investment to specialize in different product families (product line developers in their terminology). Of course, partners would also need the skills to customize product families using the tools and lower-level technologies (product developers in their terminology). I would also assume that the Microsoft tools will include a few product families that are perhaps more generic.

Thursday, August 19, 2004

Blanco and Defense

A great post by Paul White on Berroa vs. Blanco. He advocates trading Berroa if he can turn it around down in AA since it appears Blanco is ready to play. I'm not sure about his offensive ability (he's only 20 but did post just a 617 OPS in A ball last year after all) since he's batted less than 60 times with an OPS of 710.

To estimate a range of his true ability regarding OPS you can calculate a 95 percent confidence interval as (he's had 57 plate appearances):

[.592, .828] = 1.96 * SquareRoot(.710*(1-.710)/57)

So with some confidence you can say he's somewhere between an OPS of 592 and 828 at the major league level, not very helpful at this stage of his career. Berroa's OPS was 789 last year and 663 this year before being sent down.

But Blanco's defense certainly looks like the real deal. In last night's game, which I attended with my younger daughter and some friends, he made several great defensive plays including jumping clean over the oncoming runner in two double play situations and taking a nice flip by Ruben Gotay to retire a runner. Based on what I've seen I have no reason to doubt, as Paul notes, that his range numbers won't be far above average.

As for trading Berroa I think I'd wait to see if Blanco will hit at all. Although having good infield defense is important with a young pitching staff, having both Gotay (534 OPS in 39 plate appearances but OPS numbers of 833 and 727 in A ball the last two years) and Blanco in the lineup might be too big a drag on your offense long term. Berroa showed some signs of power last year which might develop if he can learn the strike zone (he saw only 3.6 pitches per plate appearance this year) and lay off the sliders down and away. If he does he'll be a bargain at his $2.6m price tag for the next four years.

Decreasing Foul Territory

Today on the Fox Sports Broadcast of the Cubs/Brewers game Steve Stone and Chip Carey were discussing the relative difficulties of reaching 500 homeruns and 300 wins. Steve mentioned that in his view virtually all the changes in baseball since 1968 have favored the hitters. Specifically, he mentioned...

1. Lowering the mound
2. Playing in smaller ballparks
3. Allowing hitters to pad up their lead elbow

Although I think he left out the most significant reason, that of weight training and increased bat speed, the first and third reasons he cites I agree with. The second is debatable.

Contrary to Steve's opinion the general consensus is that the new ballparks are not smaller in terms of their outfield dimensions although this is difficult to calculate since newer parks are not as symmetrical and distances are not well-marked in most parks. However, there is agreement that the decrease in foul territory does contribute to the increase in offense. This also is a factor in some older parks by the way as teams in recent years have added premium seating behind the plate and down the lines thereby decreasing foul territory. Fenway Park, Wrigley Field, and Kauffman Stadium among others have these changes made.

But how big is the effect of decreased foul territory?

In February of 2003 Tom Tippett of Diamond Mind did some quick analysis of the question. In looking at play-by-play data he found that there was an average of 138 foul outs per park per season from 1999-2002. 121 of these were on foul outs in the infield area and 17 on foul fly balls in the outfield. After Fenway Park added some new seats prior to 2002 it decreased the infield foul outs from the previous average of 128 to just 111. So assuming that the seats cost 17 outs, that's a little over 1 per month per team - certainly not enough to make much of a difference in run scoring or homeruns.

But what about the decreased foul territory in new parks? Network Colisuem and Dodger Stadium, both older ballparks (pre-1993) saw averages of 186 and 178 foul outs per season respectively. Assuming newer parks (post-1993) are configured more like Fenway than like Network Coliseum the new parks could have cost pitchers 30 to 50 outs per season per park using a generous estimate. Since 1993 seven of the twelve existing NL teams and six of the fourteen AL teams have moved into new ballparks. These thirteen parks therefore may have cost as many as 520 outs per season (13*40) on the high side and therefore about 21 homeruns per season (the major league average during this period was 1.07 HR per 27 outs so 520 outs would produce (1.07/27)*520 = 20.6). Once again, a total too small to effect the general trend.

So I doubt that decreased foul territory has played a significant role in the increase in offense since 1993.

Note: On another note the Cubs today signed Niefi Perez to a minor league contract. Apparently, the Cubs are trying to see how many of the worst offensive players in history they can sign having already run through Rey Ordonez this season.

Wednesday, August 18, 2004

Scoring by Inning

Related to my post about lineups the other day I received a question as to the breakdown in scoring by inning. The thought was that higher scoring in the first inning might indicate that the lineup does make a difference contrary to my conclusion that messing with the lineup doesn't generally have much effect.

Since I wasn't able to find scoring by innings after a Google search, I ran quick query on the 1992 AL and NL retrosheet data and got the following:

1992 National League

1 2 3 4 5 6 7 8 9 10 11+
Visitor 11.8% 9.7% 11.2% 11.0% 10.0% 11.5% 11.3% 9.2% 11.1% 1.3% 1.9%
Home 14.2% 9.1% 10.8% 11.4% 11.8% 12.3% 11.7% 11.6% 5.1% 0.7% 1.3%

1 2 3 4 5 6 7 8 9 10 11+
Visitor 427 353 405 399 364 415 408 333 401 48 69
Home 557 356 422 445 464 480 460 453 201 27 52

1992 American League

1 2 3 4 5 6 7 8 9 10 11+
Visitor 12.1% 10.7% 11.9% 10.1% 12.2% 11.5% 10.2% 10.2% 9.0% 0.9% 1.2%
Home 12.6% 11.1% 12.2% 11.1% 12.6% 12.0% 11.7% 11.5% 3.8% 0.6% 0.8%

1 2 3 4 5 6 7 8 9 10 11+
Visitor 586 518 579 493 592 561 497 494 438 45 58
Home 623 548 602 550 621 594 576 570 187 29 41

This certainly shows that the scoring varies more in the National League by inning because of the pitcher's spot (note the lower scoring in the second inning) and does show a slight increase in the first inning overall. This is what you would expect, however, when several of the top team's hitters are guaranteed to bat in the inning.

It also may be the case that scoring increases in the first inning because some pitchers don't have their good stuff and so get hit hard in the first inning. After righting themselves and because all the really bad starts will be selected out after the first inning, the scoring drops.

Scoring then picks up a bit in the 6th and 7th as the team's 3-5 hitters bat for the third time.

Note also that scoring in the bottom of the ninth decreases since the home team doesn't bat as often and therefore has fewer opportunities.

Tuesday, August 17, 2004

OPS Over 1000

A member of SABR mentioned this week that the Cardinals 2 through 5 hitters (Rolen, Edmonds, Pujols, and Walker) all have an OPS over 1000. He was wondering if any team had duplicated such a performance for a full season. I quick query shows that the 1930 Cardinals, 1996 Mariners, and 2000 Astros all finished the season with four hitters with over 1000 OPS. Here they are...

1930 Cardinals
Ray Blades 101 1118
Showboat Fisher 254 1019
Chick Hafey 446 1059
George Watkins 391 1037

1996 Mariners
Ken Griffey Jr. 545 1020
Edgar Martinez 499 1059
Alex Rodriguez 601 1045
Mark Whiten 140 1006

2000 Astros
Moises Alou 454 1039
Jeff Bagwell 590 1039
Ken Caminiti 208 1001
Richard Hidalgo 558 1028

You'll notice that each team has one player with relatively few at bats and of course the Cardinals will have Walker for less than half a season.

Power Surge

On the SABR listserv this week there's been a lively discussion on the increase in homeruns since 1992. One poster asked whether perhaps a contributing factor is the fact that teams carry more pitchers these days and so give a greater percentage of their innings to inferior pitchers.

In thinking about this question it occurred to me that one way to possibly test this would be to look at those pitchers that threw the most innings (on the assumption that generally the "best pitchers" get the most innings) and see if their homerun rate had differed over time or held constant. If it held constant, then the increase in homeruns could be attributed to the other, presumably inferior, pitchers who are now getting more innings. If it had increased along with the league average, then the best pitchers are getting victimized by the homerun along with everybody else.

What I did was to select the top seasons in innings pitched for each of the last four spans of 11 years (60, 72, 78, and 90 pitchers in each group to track with the increase in number of teams). Then I calculated their homeruns per 9 innings and compared that with the weighted league average during that span. The results:

AVGIP HR/9 LgHR/9 Pct of Lg
1993-2003 243.9 0.81 1.07 76%
1982-1992 266.6 0.74 0.82 90%
1971-1981 308.5 0.65 0.73 89%
1960-1970 302.4 0.67 0.83 81%

The average innings pitched of the top pitchers in each set definitely decreased and their homeruns per 9 innings increased (the bump in the 1971-1981 period can be attributed I think to knuckleballers Wood and Niekro who took 7 of the top 21 slots). So their rate did not remain constant which would tend to support the idea that inferior pitchers pitching more innings does not account for the increase in homeruns. However, relative to the league, the top pitchers in the period 1993-2003 gave up fewer homeruns than did those of the previous 3 periods. So tentatively, it looks like there is some support for the argument although it appears to some degree that a rising tide of homeruns has lifted all boats.

My personal opinion is that this may be a factor to add to the confluence of contributors that include:

* Greater strength by hitters through weight training that has increased the speed at which they swing the bat and therefore the distance they hit the ball

* The development of thin-handled bats that allow for whipping the bat through the strike zone at higher speeds

* The use of aluminum bats at lower levels that train pitchers to work the outside rather than the inside corner

* The crackdown on bean balls and fights that works to the advantage of the hitter since the pitcher is the one getting ejected, and because it allows hitters to dive out after the ball on the outside corner and hit opposite field homeruns (a rarity as you’ll remember in the 1980s and before)

You'll note that not included in this list is smaller ballparks and expansion. The reason the latter is not included is that expansion effects should only be evident for a year or two. The former isn't included because I don't think there's any evidence that the newer parks are any smaller in terms of dimensions. Certainly, the newer parks have smaller foul territory which would increase offense since balls that would normally be outs turn into second chances. I'm not sure it could account for much of a difference in homeruns per game although I'm willing to add it to the list of contributors.

Homer Havens?

In a story posted on by Jesse Sanchez the top homerun parks in the majors are discussed. While the story was well-written and interesting with its quotes from players as to their favorite parks in which to hit and to hit homeruns, anyone with a sabermetric bent will immediately spot the analytical problem with the article.

The article lists the following as the homer havens.

1. U.S. Cellular Field
2. Citizens Banks
3. Coors Field
4. Wrigely Field
5. Yankee Stadium
6. Great American Ballpark
7. Ameriquest Field
8. Bank One
9. Fenway Park
10. Oakland Coliseum

And the top 5 parks that are not homer havens are:

1. Olympic Stadium
2. PETCO Park
3. Shea Stadium
4. Miller Park
5. PNC Park

Both of these lists were determined by looking at the total number of homeruns hit in the park this season. The context, which is the home team's hitters and pitchers is entirely missing. Could it be that Yankee Stadium makes the top 10 because they have Giambi, Sheffield, ARod and the rest? In order to more accurately see which are the homer havens and which parks suppress homeruns you need to contextualize the stats by looking at the difference in homeruns for home and away games for the team in the park using a formula like the one used on ESPN:

HR PF = (HR in Park/Games in Park)/(HR in Road Games/Road Games)

Doing that you get the following lists:

1. Turner Field
2. Wrigley Field
3. U.S. Cellular Field
4. Coors Field
5. Skydome
6. Citizens Bank
7. Bank One
8. Network Coliseum
9. Fenway Park
10. Ameriquest Field

While eight of the ten parks are the same, Turner Field goes from nowhere to first while Yankee Stadium and the Great American Ballpark drops out with Skydome entering the list. This is the case because the Braves don't hit that many homeruns while the Yankees do.

On the other end of the spectrum the bottom five are:

2. Jacobs Field
3. Shea Stadium
4. Busch Stadium
5. Kauffman Stadium

Only three of the five listed in the Sanchez article actually depress homeruns very much (Olympic Stadium was sixth) and PNC Park actually has a positive HR factor (1.022).

Saturday, August 14, 2004

Batting Barry First?

Earlier this season there was much discussion by folks advocating that the Giants bat Barry Bonds higher in the lineup, possibly even first citing the increased number of at bats he would get which would result in more runs and, as the theory went, more wins for the Giants.

I've heard arguments like this before with different people citing the number of extra at bats a player would garner at different spots in the order. In fact, Earnshaw Cook in his 1964 book Percentage Baseball advocated just such a strategy and claimed it would greatly increase run scoring if a team used a lineup in descending order by run production. Well, I finally had a chance to sit down today and see for myself just what the difference actually is and whether or not it would really matter.

To see what the difference in plate appearances actually is 1 through 9 in a typical season I loaded the 1992 (the latest season available) Retrosheet play-by-play data into SQL Server. I then wrote a couple queries to show how many plate appearances each spot in the batting order had. The results:

American League

Lineup PA PA/G PA/162

1 10621 4.68 758.64
2 10395 4.58 742.50
3 10161 4.48 725.79
4 9911 4.37 707.93
5 9661 4.26 690.07
6 9428 4.16 673.43
7 9176 4.05 655.43
8 8915 3.93 636.79
9 8619 3.80 615.64
National League

Lineup  PA  PA/G  PA/162

1 8995 4.63 749.58
2 8809 4.53 734.08
3 8616 4.43 718.00
4 8413 4.33 701.08
5 8205 4.22 683.75
6 7996 4.11 666.33
7 7766 3.99 647.17
8 7536 3.88 628.00
9 7322 3.77 610.17
So from top to bottom there was about 140 plate appearance difference and roughly 15 to 20 fewer plate appearances for each position lower in the batting order. So moving Bonds from fourth to first would provide somewhere between 45 and 60 extra plate appearances in which to do his damage. Since 2003 was a higher scoring season than 1992 each lineup position would get more at bats but I'm not sure that would change the difference between positions or the ratio. How would moving Bonds to leadoff then impact the Giants?

One way to try and answer this question is to look at how many more runs Barry would create with those 45 to 60 extra plate appearances. This is easy to do given Bill James' basic Runs Created formula ((H+BB)*(TB)/(AB+BB)). Since Bonds created an average of .283 runs per plate appearance, giving him 45 to 60 extra plate appearances would yield between 12.8 and 17 extra runs. Using the Pythagorean Formula (Winning Pct = (R^2)/(R^2+RA^2)) you can then estimate how many more games the Giants of 2003 would have won with those extra 15 or so runs. Since they actually scored 755 runs and allowed 638 adding 15 more runs gives them an extra 1.6 wins (a simple rule is that in most leagues a win is purchased with 10 to 11 runs). So moving Bonds would help but not by much.

A related question is whether a team should reorder its lineup in descending order by ability to maximize their production per Cook. To test to determine how many runs a team might gain by maximizing its plate appearances I simply calculated the Runs Created per Plate Appearance for the nine lineup positions in the 1992 AL and NL and then reassigned the plate appearances in descending order starting with the highest RC/PA. For example, in the AL in 1992 the third hitter in the lineup created the most runs per plate appearance at .1296. I therefore assigned the third position the 10,621 plate appearances that the leadoff hitters amassed and recalculated the Runs Created. Finally, I added up all the Runs Created for the new optimal lineup to see how it differed from the non-adjusted lineup. Here are the results:

1992 AL Non-adjusted Lineup = 694.2
1992 AL Optimal Lineup = 695.3

1992 NL Non-adjusted Lineup = 643.7
1992 NL Optimal Lineup = 645.2

In both cases the optimal lineup that gives plate appearances to its best run creators only created the equivalent of a little over 1 run for an entire season.

I then turned the tables and created the least optimal lineup by reversing the optimal order. This resulted in 684.8 runs for the AL and 625.4 runs for the NL a loss of 10 runs in the AL and 18 in the NL. The larger drop in the NL is accounted for by the fact that a much weaker hitter, the pitcher, would be getting the most plate appearances. This also highlights the fact that in any individual team the variance between lineup positions will be greater than the averages used for this analysis. That means that an average team could be expected to pick up more than a single run when their lineup is optimized and the amount would depend on the magnitude of the offensive differences between their players. For example, if a team in the NL in 1992 were to switch its 9th and 2nd hitters they would score 634.7 runs, only nine fewer than the normal lineup. This is a far smaller effect than I had previously supposed.

This analysis clearly vindicates the conclusion of Palmer and Thorn in The Hidden Game of Baseball when they said, "All the time managers put into masterminding a winning lineup is so much thumb twiddling, and they are hereby granted an additional hour's sleep a night."