The following article first appeared on Baseball Prospectus in April of 2006 and I thought it was timely given the previous post on the subject and John Walsh's fine article digging deeper into the causes of this split differences.
April 13, 2006
Schrodinger's Bat:The Irreducible Essence of Platoon Splits
by Dan Fox
"In short, we view means and medians as the hard ‘realities,’ and variation that permits their calculation as a set of transient and imperfect measurements of this hidden essence…But all evolutionary biologists know that variation itself is nature’s only irreducible essence. Variation is the hard reality, not a set of imperfect measures for a central tendency. Means and medians are the abstractions."
--Stephen Jay Gould, from the essay "The Median Isn’t the Message"
That quote is from one of my favorite authors (himself a big baseball fan), and was appropriately used by Nate Silver when he introduced the PECOTA system in the 2003 Baseball Prospectus. The concepts embodied in the quote have been on my mind the past couple of weeks, ever since I mentioned Wily Mo Pena’s platoon split in my inaugural column and received a healthy dose of reader feedback.
Perhaps the most prevalent question asked why it is that the performance analysis community in general doesn’t seem to pay much attention to individual platoon splits or view them as very important.
In order to answer that, we’ll explore the nature of variation in these splits, using a couple of different methods in an attempt to illustrate why variation is the irreducible essence when it comes to platoon splits.
Of Lefty Mashers and LOOGYs
Platooning in baseball, known more formally as the "two-platoon" system, has been around for nearly 100 years. Most famously, it was exploited by Casey Stengel when he managed the Yankees in the 1950s; it is often said that he learned the strategy from John McGraw when the lefty-hitting Stengel was himself platooned, remaining on the bench against southpaws after he'd been dealt to McGraw's New York Giants in 1921. In 1922 and 1923, Stengel would hit .368/.436/.564 and .339/.400/.505 in that role. However, there is ample evidence that McGraw wasn’t the only one who understood that hitters tended to hit better against pitchers of the opposite hand. In fact, some credit the 1914 "Miracle" Boston Braves with being the first team, guided by Braves manager George Stallings, to extensively take advantage of this strategy.
Last season, this debate was again brought to the fore with the publication of Bill James’ essay "Underestimating the Fog" in the 2005 Baseball Research Journal and more recently by the excellent chapters devoted to the subject in The Book and by our own James Click in Baseball Between the Numbers.
To provide a little context to the question, I used Retrosheet data and took a look at all 16,562 player seasons from 1970 through 1992. Overall, and excluding switch hitters, the 3,371 players this encompassed performed as follows:
Throws ------Right------ --------Left----- -----Platoon Split---
Bats AVG OBP SLG OPS AVG OBP SLG OPS AVG OBP SLG OPS
Right .248 .301 .369 670 .264 .320 .403 722 .017 .019 .033 52
Left .270 .331 .409 740 .246 .305 .353 658 .024 .026 .056 82
In other words, as a group, right-handed hitters enjoyed improvements of 17 points in batting average, 19 points in on base percentage, and 33 points in slugging percentage against left-handed pitchers, while the advantage for lefties versus right-handers was a bit higher, at 24, 26, and 56 points, respectively.
Clearly this confirms that platoon advantages exist and are significant, and that the difference is larger for left-handed hitters than for right-handers. This fact is exploited by modern managers who employ LOOGYs (left-handed one-out guys) like the Cubs' Scott Eyre, whose sole job is to come in to pitch to the opposing team’s toughest left-handed batters.
But for the analysis that follows we'll limit the list of players to non-switch-hitters who came to the plate more than 2000 times, which leaves us with 505 players (188 left-handed hitters and 317 right-handers).
Throws ------Right------ ---------Left----- -----Platoon Split---
Bats AVG OBP SLG OPS AVG OBP SLG OPS AVG OBP SLG OPS
Right .259 .316 .393 709 .276 .336 .431 766 .017 .020 .037 57
Left .280 .343 .430 772 .254 .314 .367 681 .026 .029 .063 91
As you can see, although the performance improves because we’re now dealing with better hitters overall, the differences are basically the same, even a bit larger. This is a bit counterintuitive, since one might be inclined to think that these splits would be smaller, on the theory that players who have more ABs do so in part because they are able to handle a wider variety of pitchers.
But our questioners like George Stallings, John McGraw, and Casey Stengel already knew that the advantage existed. Lurking beneath the question of why this advantage doesn’t get more play from analysts lies another question. Is the ability to hit one side better than another a repeatable skill, or simply a phenomenon that affects all hitters more or less equally? In other words, when it comes to individual hitters, is it the mean or the variation that is the hard reality?
Variations on the Theme
To explore this issue, we’ll look at two measures of the variation of these platoon splits using the group of 505 hitters with 2,000 or more plate appearances from 1970-1992.
One simple approach we can use to study these splits a bit closer is to look at how they are distributed. If platoon differences were spread randomly across the population of hitters without regard to skill, we would expect the pattern of those differences to follow a normal distribution, one that takes the shape of a bell curve. Some hitters would certainly do better than others, but the differences could be attributed to chance. And as many readers are aware, the empirical rule relating to normal distributions states that approximately 68% of the values will fall within one standard deviation of the mean, 95% within two, and 99% within three standard deviations.
So what do we find when we look at these splits?
The following tables show the mean, standard deviation, and the percentage of split values that fall within the first three standard deviations for left and right-handed hitters for various offensive categories.
Mean SD <=1 SD <=2 SD <=3 SD
AVG 0.0339 0.0223 73.9% 94.7% 98.4%
OBP 0.0359 0.0243 69.1% 94.1% 98.9%
SLG 0.0776 0.0397 69.1% 95.2% 100.0%
K/PA -0.0499 0.0268 75.0% 94.7% 99.5%
BB/PA -0.0058 0.0158 70.2% 94.1% 100.0%
Mean SD <=1 SD <=2 SD <=3 SD
AVG 0.0181 0.0169 68.1% 95.6% 99.7%
OBP 0.0207 0.0187 70.0% 95.6% 99.4%
SLG 0.0408 0.0338 71.3% 95.0% 99.4%
K/PA -0.0236 0.0194 70.3% 94.0% 99.4%
BB/PA 0.0013 0.0019 72.9% 93.4% 100.0%
The table reveals that the average left-handed hitter hit 34 points better against right-handers, but that the difference from the mean spans a range from 12 to 45 points (34 plus and minus the standard deviation of 22 points). For right-handed hitters the mean value for batting average is 18 points with an average spread of between 1 and 35 points. As you can see, there is a fairly large amount of variability for hitters from both sides.
It should be noted that the mean values shown in these tables differ from the average values shown in the previous table, since the former are computed from totals for the entire data set whereas the latter are averaged for each set of players.
This data can be looked at graphically, where the split differences in OBP for right-handed hitters is shown:
What the tables show pretty clearly is that platoon splits are indeed normally distributed for both populations. The conclusion one might draw from this is that even if you found a player with a large platoon split, one couldn’t automatically assume that the split wasn’t the result of random chance. After all, assuming the splits were random you would still by chance have a couple of players with splits that are greater than three standard deviations from the mean.
In and of itself, the fact that the values are normally distributed does not mean they are necessarily random. After all, all kinds of characteristics in the real world--such as IQ scores--are normally distributed, and yet most people assume that there are true ability-based differences between individuals.
A more sophisticated approach to looking at the variation is to try and determine how persistent split differences are throughout a player’s career. In other words, even assuming a normal distribution, do the same hitters have either large or small splits from year to year or do their splits vary more or less randomly within the distribution? If the splits are persistent, we would be more likely to assume they are based on actual skill differences rather than random fluctuations around a general advantage.
A technique that’s been used by others to look into similar questions is to divide a player’s career into even and odd years and then compare the splits to each other. This has two advantages. First, it allows for the comparisons to be made with larger sample sizes than doing so between individual seasons. In this data set, the average hitter had 126 plate appearances against lefties per season, whereas when using career halves the average hitter had over 1,500 plate appearances against right-handers in each half of their career and almost 700 plate appearances against lefties. Incidentally, the number of plate appearances versus southpaws has declined over the years. In this data set there were 27 seasons where players garnered over 300 plate appearances against lefties (Rusty Staub had 364 in 1978)--a feat that is non-existent today. Secondly, using even and odd seasons tends to even out changes in the offensive environment as well as differences in the player’s overall performance over time.
After splitting each career into its respective halves, we can then plot those halves against each other to see if there is a healthy positive correlation between the two halves of the career. If so, we would be inclined to side with the view that platoon splits have their basis in ability differences between individuals.
For example, the following graph plots the even and odd years for left-handed hitters’ slugging percentage split difference.
The line shown is a measure of the strength of the relationship between the two halves.
The numerical measure of that strength is termed the correlation coefficient (r). Typically r values--which range from -1 to 1, with -1 being a perfect negative correlation and 1 being a perfect positive one--greater than .70 indicate that there is a strong positive correlation between the two values whereas values less than .30 indicate a weak correlation. For the above graph the correlation coefficient is .18, and while positive--as indicated by the upward slope of the best fit line--the value is well below the threshold for even a weak positive correlation. In other words, a left-handed hitter’s slugging percentage in even years is a poor predictor of their slugging percentage in odd years.
By squaring r we can get the coefficient of determination, or R-square, which is a measure of how much the variation in one of the values can explain the variation in the other. In this case the R-square is just 3.1%.
The following tables show the r value for each of the splits for lefties and righties.
r R-square .05 Sig?
AVG 0.113 0.013 No
OBP 0.176 0.031 No
SLG 0.195 0.038 Yes
K/PA 0.438 0.192 Yes
BB/PA 0.242 0.059 Yes
r R-square .05 Sig?
AVG 0.028 0.001 No
OBP 0.136 0.019 Yes
SLG 0.112 0.013 Yes
K/PA 0.441 0.195 Yes
BB/PA 0.356 0.127 Yes
As you might have expected given the previous graph, the correlation coefficients for batting average, on base percentage, and slugging percentage are pretty small, which is an indication that split values vary widely even between the larger sample sizes that result from dividing careers into even and odd years.
The rightmost column records whether or not the correlation is significant at the 95% confidence level. In other words, when considering split differences in AVG for both sides and OBP for lefties we cannot with 95% certainty conclude that the correlation coefficient is actually different than 0. Interestingly, SLG for lefties is not statistically significant even though its r is higher than that for righties. That’s the case since there is also more variability in the split difference for left-handers.
Another interesting point is that both strikeout and walk rate are more strongly correlated than the other measures indicating that they contain larger ability components. Strikeout and walk rates are not typically what springs to mind when people think of platoon differences among individual hitters and yet the data support the notion that strikeout and walk rates are, in fact, the most persistent characteristics of these splits.
So where does this leave us?
Using these two ways of looking at variation it would appear that, as Gould said, variation is the hard reality when it comes to platoon splits and the mean values we’re left with are mostly a byproduct of that reality. Another way of saying this is that, as Albert and Bennett put it in their book Curve Ball, platoon splits are an example of a "bias situation," where all hitters tend to take advantage to the same degree in the long run (once the vagaries of chance have evened out), rather than an "ability situation" where differences are governed by individual hitter differences.
As an aside, the fact that platoon splits are influenced very heavily by small sample size and luck resolves a conundrum my brother and I encountered when simulating seasons with Strat-O-Matic baseball back in the early 1980s.
In the card version of the game you have a 50% probability of determining the outcome of a play based on the hitter’s card, and a 50% chance based on the pitcher’s card. Players with massive platoon splits faithfully recorded on their cards (say, Keith Moreland in 1983) in single seasons could therefore always be found. On the other hand, most pitchers--usually having faced more hitters and therefore having larger sample sizes and smaller fluctuations--did not have such extreme splits. Consequently, it was usually easy to stack the lineup and neutralize even the best left-handed starting pitchers and be very successful pinch-hitting even against would-be LOOGYs (few of them as there were in the early 1980s). As a result, left-handed pitchers were worth far less than they should have been and teams loaded with left-handers fared worse than expected. Trust me. I managed the 1983 Padres with the lefty trio of starters Tim Lollar, Dave Dravecky, and Mark Thurmond who "helped" my team to over 100 losses--19 games worse than their actual record.
I admit that I haven’t kept up with how Strat-O-Matic or other simulation games have evolved, but incorporating a general platoon advantage or even basing the platoon split on multiple years in order to dampen the variation would have been a step in the right direction.
A Step Beyond
I mentioned in the opening that an excellent chapter on platooning appeared in The Book, and so let’s come full circle.
Looking at the overall distributions and correlations can give us big clues that platoon splits for the most part are a group characteristic rather than an individual one. But you’ll notice that while the correlations certainly are weak, they are positive and most are statistically significant. So what is the meaning of those very weak correlations?
In the chapter discussing this topic, the authors apply a more rigorous statistical technique using data from 2000-2004 to clear away some of the fog. As I noted in a previous column, they conclude that when using their statistic wOBA (weighted On Base Average, which is similar to linear weights), a right-handed hitter’s platoon split should be regressed to the mean by weighting the league average by 2,200 plate appearances whereas a lefty’s split would need to be weighted by 1,000 league average plate appearances. In other words, these small correlations can be accounted for when trying to estimate what a player’s true platoon skill is, but doing so requires putting heavy emphasis on the league average.
Using this technique they show that even the most extreme lefty masher from 2000-2004, Brian Jordan, would have his measured wOBA platoon split of .101 regressed by 63% to .037.
And that’s the essence of why platoon splits, while important in a general sense, are not an object of focus to the degree that one might expect. Put simply, yearly random fluctuation due to small sample size and their inherent variability swamps the actual skill that players do possess, making the splits less valuable as tools for prediction and assessing value.