Lessons from Moneyball

[If you haven’t read/seen it, consider this a low-level SPOILER ALERT for the entire post. Sorry for the length, I didn’t have time to be more concise.]

The movie was very good. It is, of course, based on the excellent book by Michael Lewis, but somehow seeing it in the new format allowed me to simplify some of the lessons from the story. Some of them are similar to elements of pragmatist philosophy. A more review-like piece may come, but I couldn’t shake these four thoughts as I watched:

1. Re-define the problem.

moneyballThere is a somewhat tense but entertaining scene early on in which Billy Beane (Brad Pitt) is discussing with his team of scouts how to deal with the loss of Jason Giambi, Johnny Damon, and Jason Isringhausen. When Beane asks the head scout what the problem is, the scout gives the straightforward and predictable answer that they need to replace those three players. Wrong. Another scout takes a somewhat more abstract approach and says they need to replace a certain number of hits, runs, etc. Maybe, but how can Oakland do so with their meager budget? Beane, along with the help of Peter Brand (Jonah Hill), says that they can’t make up for Giambi et al on a one-to-one basis. They need to recreate the effect of these three players, by getting three new players whose on-base percentage averages the same as the three lost players. By redefining the problem, they come up with a new roadmap to success–one that is achievable given their constraints.

2. Combine information in new ways.

On-base percentage was, according to Lewis, an under-appreciated statistic in the early part of this century and before. The information required to calculate it was certainly available, but because it was not a common summary statistic it was ignored by the people whose decisions matter. Bill James had of course argued for its importance in the 70’s but he was outside the traditional realm of baseball thinking and thus largely ignored (which is of course a theme of the book: traditional versus information-driven practices).

The disdain for James (and, by association, Beane and Brand) and his dismissal as a “statistics guy” completely misses the point of statistics. This is partly the fault of statisticians and analysts themselves, who often accept the portrayal of their work as mystical since it gives them a sense of prestige or self-importance. In reality the work of taking data and summarizing it into more useful forms (which I take to be the essence of statistics, whether it’s used to create infographics, stock reports, or academic papers) can be done while maintaining transparency and correspondence to reality. A batting average is never seen in reality: it takes multiple attempts and must be observed over time. On-base percentage represents the player’s chance of getting a step closer to scoring position than batting average does. By taking a measurement of input that is one step closer to the output they care about–runs, which lead to wins–Beane and Brand got a better indicator of a player’s ability to help the team with wins. This is statistics at its finest.

3. Be clear in your thinking.

Let’s be honest, the part about statistics that scares people is the numbers. This is true even (especially?) for academics, whose research can sometimes benefit from quantification. There are two dimensions to quantification. One is turning real phenomena into numbers. We call this measurement. Peter Brand did this by breaking up the baseball field into a grid and indicating where the ball actually went rather than simply counting “hit,” “run,” or “ground rule double”–again, a better measurement allowed for more precise quantification of reality.

The second dimension of quantification is that it allows your thinking to be falsifiable. Your equation either works or it doesn’t. While you can argue about whether a given equation or mathematical process was appropriate to apply in a given situation, you cannot argue whether hits divided by at-bats equals batting average or not. A statistical model is simply a combination of multiple statistics into a form that gives us some sort of indicator to predict the phenomenon we care about, putting the first two points above together. But what makes it useful is that it can be tested. If your model doesn’t work under certain circumstances, you need to be clear about this rather than just hand-waving and saying “well of course it doesn’t predict the number of wins in a season with a lot of rainy days.” If some factor that matters is not included in your model, be clear about it or you will be ignored or mocked.

4. Know how to explain your views.

Another of my favorite set of scenes in the movie is when Beane and Brand have to explain to players why they should take more walks. Sure it doesn’t excite the fans, but it gets them closer to something fans care about more: winning. They don’t sit ballplayers down and explain to them how the model predicts that on-base percentage corresponds more closely with runs and wins than batting average. Despite the benefits of clear thinking, we also have to know when and how to use less precise wording in order to convey meaning to end-users. This is the second lesson above put into action.

Perhaps the most useful lesson, though, was that you also need to know when not to explain yourself. When firing a player, Beane tells Brand to just give it to him straight. In a way this is the same lesson, about being respectful to the person you’re informing, but it is applied differently. This is another way of explaining the lesson that statistics is a rhetorical practice.

Whether or not you care about any of these lessons, Moneyball is still a great movie and well worth your time. If you have any other takeaways from it, feel free to leave them in the comments.