Many baseball games turn on a decisive inning when a pitcher starts to lose it and isn’t yanked fast enough.
Does this effect, in the end, make a huge difference? How much of the game revolves around the performance of the team as a whole, and how much around how fast the manager spots problems with his pitchers?
The right way to study this is to review all the games and look for the innings with pitching changes, and then drop those, and see if the overall won/loss ratios between the teams change with the new scores.
That gets messy when there’s more than one pitching change. I could try to program that, but instead I used the inning with the most runs scored as a proxy for when the opposing pitcher pooped out. Yes, there are problems with this kind of proxy. The sun getting in the outfielders' eyes can undermine the defense, and contribute to extra runs, for example.
But first, here’s where I got the data for regular season games in 2010 and 2011:
The information used here was obtained free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at "www.retrosheet.org".
There may be errors in the site’s data, but I’ll assume it is accurate enough to serve the purpose: the uncertainty of my proxy is undoubtedly larger than their typographical error rate.
What I did was read the text file into OpenOffice and remove almost all the columns, so that I was left with the date, team names and leagues, number of outs in the game, whether there were forfeits or protests, and the line scores. The rest was just simple awk scripts and using root for the graphics.
The procedure is simple. Look at the line scores, and for each team find the inning with the most runs. Remove it from their total. Now look at which team won the game. Ties are .5 of a win for both sides. For a season, add up the wins for each team and rank them using the normal method, and rank them using the reduced scores, and see whether the standings change.
I ignore inter-league games, and toss out any games that were flagged as being forfeits or protested by one side or the other. I also don’t worry about divisions; lump them all together.
The first thing to check is consistency. When I look at the win totals I find that they are close to but smaller than those in baseball-reference.com. I have not yet accounted for the discrepancy.
If I check there is a home-team advantage: about 2 sigma for the ordinary score and a bit over 1 sigma for the reduced score.
So far so good, modulo the discrepancy with the official stats.
The distributions of the scores look a little different! They aren't Poissonian, which isn't too surprising: unless it's a home run, if one man scores he had some help, so there's a chance somebody else is now on base and ready to try to score too. And, if I didn't botch this calculation, over half the average scoring comes from a single inning's success.
Add them up and rank them!
As you might expect, the won numbers cluster tighter together.
As you might expect, the overall shape of rankings is pretty much the same.
But the final rankings do change. In 2011 Arizona undergoes a huge jump in ranking when ranked with the modified score, from 3'rd in the league to 6'th. Can their opponents all have had so many moments of bad pitching?
The astute reader will notice that there are no error estimates on the modified win totals. One contribution comes from the fact that if the high-score innings are left out, the home team would sometimes have come to bat in the 9'th where in the real world they didn't. Real scores average about .47/inning and modified scores about .22/inning
2010 Season
National League
Rank | Team normal | wins | Team modified | mod wins |
1 | "PHI" | 87 | "CIN" | 84.5 |
2 | "SFN" | 85 | "PHI" | 84 |
3 | "CIN" | 83 | "SFN" | 83 |
4 | "ATL" | 82 | "COL" | 78.5 |
5 | "SDN" | 81 | "SDN" | 76.5 |
6 | "SLN" | 77 | "SLN" | 76 |
7 | "LAN" | 76 | "ATL" | 75.5 |
8 | "COL" | 74 | "LAN" | 75 |
9 | "HOU" | 73 | "NYN" | 73.5 |
10 | "FLO" | 73 | "CHN" | 71.5 |
11 | "MIL" | 68 | "HOU" | 71 |
12 | "CHN" | 67 | "FLO" | 70.5 |
13 | "NYN" | 66 | "MIL" | 69.5 |
14 | "WAS" | 64 | "WAS" | 67.5 |
15 | "ARI" | 59 | "ARI" | 58.5 |
16 | "PIT" | 55 | "PIT" | 55 |
American League
Rank | Team normal | wins | Team modified | mod wins |
1 | "TBA" | 88 | "TBA" | 83.5 |
2 | "MIN" | 86 | "NYA" | 83 |
3 | "NYA" | 83 | "MIN" | 82.5 |
4 | "TOR" | 78 | "BOS" | 77 |
5 | "TEX" | 76 | "TOR" | 77 |
6 | "BOS" | 76 | "TEX" | 75.5 |
7 | "OAK" | 73 | "OAK" | 74 |
8 | "CHA" | 73 | "CHA" | 74 |
9 | "DET" | 70 | "ANA" | 70.5 |
10 | "ANA" | 69 | "CLE" | 68.5 |
11 | "CLE" | 64 | "DET" | 67 |
12 | "BAL" | 59 | "KCA" | 62.5 |
13 | "KCA" | 59 | "BAL" | 61 |
14 | "SEA" | 52 | "SEA" | 50 |
2011 Season
National League
Rank | Team normal | wins | Team modified | mod wins |
1 | "PHI" | 93 | "MIL" | 90 |
2 | "MIL" | 90 | "PHI" | 88 |
3 | "ARI" | 84 | "SLN" | 86.5 |
4 | "SLN" | 82 | "SFN" | 80.5 |
5 | "ATL" | 79 | "ATL" | 78 |
6 | "SFN" | 76 | "ARI" | 76.5 |
7 | "LAN" | 75 | "CIN" | 74 |
8 | "CIN" | 73 | "WAS" | 71 |
9 | "WAS" | 72 | "LAN" | 70.5 |
10 | "NYN" | 68 | "COL" | 68.5 |
11 | "CHN" | 66 | "NYN" | 68.5 |
12 | "SDN" | 65 | "SDN" | 67 |
13 | "COL" | 65 | "CHN" | 66 |
14 | "FLO" | 64 | "PIT" | 64 |
15 | "PIT" | 64 | "FLO" | 62 |
16 | "HOU" | 52 | "HOU" | 57 |
American League
Rank | Team normal | wins | Team modified | mod wins |
1 | "DET" | 88 | "TEX" | 86 |
2 | "TEX" | 87 | "NYA" | 85 |
3 | "NYA" | 84 | "DET" | 84 |
4 | "BOS" | 80 | "BOS" | 82 |
5 | "TBA" | 79 | "TBA" | 75 |
6 | "ANA" | 73 | "TOR" | 74.5 |
7 | "TOR" | 73 | "CHA" | 71.5 |
8 | "CLE" | 69 | "ANA" | 70.5 |
9 | "CHA" | 68 | "KCA" | 68.5 |
10 | "OAK" | 66 | "OAK" | 67 |
11 | "KCA" | 66 | "CLE" | 66.5 |
12 | "BAL" | 62 | "BAL" | 61 |
13 | "SEA" | 58 | "SEA" | 58.5 |
14 | "MIN" | 55 | "MIN" | 58 |
I should check that the score distribution in the real world agrees with other measurements, just to be safe.
UPDATE: But if there's no error, then something is different about those high-scoring innings.
I really need to be more careful. The distribution of the maximum from a set of samples will have a different distribution than the samples will. If you throw a die once you'll get a 6 about 1 time in 6, but if you throw it 3 times and get to pick the highest value you'll wind up with a 6 about 42% of the time. So I have to show that the real-world high-score inning has a different distribution than what you'd get from just picking the best of 9 tries with the reduced distribution.
The following plot includes the average distribution of runs per inning without the high-scoring inning in the upper right. In the lower right is an estimate of the maximum scoring inning in 9 innings if the upper right distribution is correct. The average is about 0.7, which is far less than the 2.3 average for the real-world high-scoring inning (lower right).
There is something different about the high-scoring innings.