Wednesday, April 02, 2008

Joe DiMaggio's Hitting Streak Revisited

A couple of days ago in the New York Times, there was a funny article written by two Cornell math types about Joe DiMaggio's amazing hitting streak. You can find it here. Based on computer simulations, the authors find that Joltin' Joe's hitting streak is actually no big deal. According to them, 50 game hitting streaks should be a fairly common thing.

Um. No.

The problem with Arbesman and Strogatz's analysis is that their computer model isn't grounded in reality. Models need to be calibrated to observations. An uncalibrated model is about as useful as random typing of keys on a keyboard are at imparting information. It's gibberish.

Arbesman and Strogatz have tried to distill all of Major League baseball into the equivalent of 10,000 seasons of weighted coin tosses. This is silly business. It is true that if you look at baseball this way, long hitting streaks should be far more common than they are. For example, take a lifetime .333 hitter. He has about 3 official bats a game. The likelihood is almost 100 percent* that he should get a hit on average once a game. Add in random fluctuations and there should be in that batter's career many instances of 20 and 30 game hitting streaks, maybe even longer.

But there aren't. Thirty game hitting streaks are rare in baseball. They occur about once every four years. And that tells you something right off the bat (sorry about this pun, but I couldn't pass it up). Looking at hitting streaks by using a weighted coin model is a dumb idea.

Instead of running computer models, you can look at the actual data. And there is a lot of it. Baseball has been going on for over 130 years. There have been hundreds of thousands of games played, millions of at bats. When you look at the data, you can get a fairly decent idea of how amazing Joe Dimaggio's hitting streak is.

Above I've made a rough calculation based on observations. On the x axis, hitting streaks are grouped by size. On the y axis, hitting streaks are categorized by their recurrence interval (on average, how often in years does a long hitting streak happen). The data for hitting streaks of length of 30-45 games follows a fairly straight line. If you extrapolate that line to the 56 game hitting streak of Joe DiMaggio, you find that on average once every 1300 years, a ballplayer in the Major Leagues should be able to achieve a hitting streak of this magnitude.

Joe DiMaggio's hitting streak is a one in a thousand year event. As that Les Brown song from the 1940s goes, "Joe, Joe DiMaggio we want you on our side." And for good reason.

*Note correction in comments section. It's 70 percent.

**Addendum. I couldn't resist extending the graph above down to the 20 game hitting streak level. It looks even better.

Methinks Arbesman and Strogatz need to apologize to the DiMaggio family for denigrating Joltin' Joe's achievements with a truly dopey analysis. Just an opinion, mind you. OK, back to work.

**What I hope to be the final Addendum: An email from Strogatz says the following, "To keep things simple for the lay audience, we presented the results of what we felt was the easiest model to understand. I guess we should have emphasized how simplified it is, and therefore how tentative (and you're right, unconvincing) the conclusions are." Thank you Prof. Strogatz. We are now on the same page.


Sam said...

For a 333 hitter with three at bats per game, the probability of not getting a hit is (1-0.333)^3=0.295. The probability of getting a ten game hitting streak: (1-0.295)^10=3%. Pretty small. The probability of getting a 20 game hitting streak: (1-0.295)^20=0.09% : really small. Probability is very counterintuitive.

fortyquestions said...

Thanks for the correction. I make stupid mistakes, too. The odds are 70 percent not almost 100. The model, though, produces long hitting streaks far more common than exist in MLB. And these words:

"In other words, streaks of 56 games or longer are not at all an unusual occurrence. Forty-two percent of the simulated baseball histories have a streak of DiMaggio’s length or longer. You shouldn’t be too surprised that someone, at some time in the history of the game, accomplished what DiMaggio did."

Are not only counter-intuitive. They defy the observational data. They need a better model, one that honors the data.