Tuesday, July 17, 2007

How to fix ERA

Every baseball game you watch, pitchers are described by the "pitching triple crown" stats, W-L record, ERA and strikeouts. Saves are also used when dealing with a closer.

The problems with W-L record are fairly obvious, they are highly dependent upon the number of runs your team scores. It is very difficult to win when your team scores only 2 and impossible if they score 0.

Strikeouts are of course a useful tool to evaluate pitchers. They measure your ability to get batters out, without the use of fielders. But strikeouts are most informative when combined with walks and innings pitched data.

The third stat is ERA, which has been around for a long time and is highly flawed. Thats what I want to focus on today. ERA is of course 9 * earned runs / innings. They multiply by 9 to make the scale 1 regulation game, although the concept is the same if you only consider earned runs per inning. In fact, in the early days of baseball, complete games were very common and ERA would be very close to earned runs per game for a starter.

My problem comes in the definition of earned runs. Hypothetical (and quite stupid) example. Pitcher gets the first 2 outs of an inning. Next batter reaches base on an error. Next 40 batters hit HRs. The pitcher is charged with 0 ER. Clearly he did not pitch well, but the order of events leads to 0 ER. If he gave up the 40 HRs, then the guy reached by error, he would be charged with 40 ERs. The the top of the 4th for a more realistic example involving 3 HRs.

In case you're curious, since 1957 (the baseball-reference game index only goes back that far) the most runs given up by a pitcher while recording 0 ER was Andy Hawkins in 1989. The Yankees made 6 errors that day. Oddly enough, the following year, Hawkins pitched a no hitter (8 innings, not an official no hitter) while giving up 4 runs, which is also the most in the same time span. I remember watching that game and the fielding was very bad. The Yankees didn't lead the league in errors either year, but when you clump them all in 1 game, bad things happen.

So, my proposed definition would make any runs scored by batters that reached on an error unearned. Batters advancing on errors would be subject to some scorers discretion. All other runs are earned, even after the total number of outs plus batters reached on error is 3 or more. (Other finer details still exist of course. If batter 1 reaches on error, and batter 2 grounds into fielders choice, the pitcher is not responsible for batter 2)

Will this make ERA the greatest stat since sliced bread? Not even close. But it would make it more useful. The big problem, is that to go back and recalculate earned runs would require play by play data which is available for all recent games, but can be difficult to find for earlier baseball. And even with play by play data, automation would not be very easy. By the way, I know that there are many better pitching stats than even my new ERA (should I call it NERA??), but I'm trying to propose a fix for mainstream use. To add a new stat to mainstream use can be excrutiatingly slow and painful.

Also, if I want to evaluate a pitcher's entire defensive baseball ability (pitching + fielding), shouldn't an error by the pitcher have no effect on whether a run is earned or not? If we're interested only in pitching skills, then these should be treated just like any other error, but I think its more informative to hold pitchers accountable for their own mistakes.

No comments: