Epstein on Athletes

As a follow-up to the most recent series of posts, you may enjoy this TED talk by David Epstein. Epstein is the author of The Sports Gene and offered the claim that kicked off those earlier posts–that he could accurately guess an Olympian’s sport knowing only her height and weight.

The talk offers some additional context for Epstein’s claim. Specifically Epstein describes how the average height and weight in a set of 24 sports has become more different over time:

In the early half of the 20th century, physical education instructors and coaches had the idea that the average body type was the best for all athletic endeavors: medium height, medium weight, no matter the sport. And this showed in athletes’ bodies. In the 1920s, the average elite high-jumper and average elite shot-putter were the same exact size. But as that idea started to fade away, as sports scientists and coaches realized that rather than the average body type, you want highly specialized bodies that fit into certain athletic niches, a form of artificial selection took place, a self-sorting for bodies that fit certain sports, and athletes’ bodies became more different from one another. Today, rather than the same size as the average elite high jumper, the average elite shot-putter is two and a half inches taller and 130 pounds heavier. And this happened throughout the sports world.

Here’s the chart used to support that point, with data points from the early twentieth century in yellow and more recent data points in blue:

Average height and mass for athletes in 24 sports in the early twentieth century (yellow) and today (blue)

Average height and mass for athletes in 24 sports in the early twentieth century (yellow) and today (blue)

This suggests that it has become easier over time to guess individuals’ sports based on physical characteristics, but as we saw it is still difficult to do with a high degree of accuracy.

Another interesting change highlighted in the talk is the role of technology:

In 1936, Jesse Owens held the world record in the 100 meters. Had Jesse Owens been racing last year in the world championships of the 100 meters, when Jamaican sprinter Usain Bolt finished, Owens would have still had 14 feet to go…. [C]onsider that Usain Bolt started by propelling himself out of blocks down a specially fabricated carpet designed to allow him to travel as fast as humanly possible. Jesse Owens, on the other hand, ran on cinders, the ash from burnt wood, and that soft surface stole far more energy from his legs as he ran. Rather than blocks, Jesse Owens had a gardening trowel that he had to use to dig holes in the cinders to start from. Biomechanical analysis of the speed of Owens’ joints shows that had been running on the same surface as Bolt, he wouldn’t have been 14 feet behind, he would have been within one stride. 

The third change Epstein discusses is more dubious: a “changing mindset” among athletes giving them a “can do” attitude. In particular he mentions Roger Bannister’s four-minute mile as a major psychological breakthrough in sporting. As this interview makes clear, Bannister attributes the fact that no progress was made in the fastest mile time between 1945 and 1954 to the destruction, rationing, and overall quite distracting events of WWII. It’s possible that a four-minute mile was run as early as 1770. I wonder what Epstein’s claims would look like on that time scale?

Classifying Olympic Athletes by Sport and Event (Part 3)

This is the last post in a three-part series. Part one, describing the data, is here. Part two gives an overview of the machine learning methods and can be found here. This post presents the results.

To present the results I will use classification matrices, transformed into heatmaps. The rows indicate Olympians’ actual sports, and the columns are their predicted sports. A dark value on the diagonal indicates accurate predictions (the athlete is predicted to be in their actual sport) while light values on the diagonal suggest that Olympians in a certain sport are misclassified by the algorithms used. In each case results for the training set are in the left column and results for the test set are on the right. For a higher resolution version, see this pdf.

Classifying Athletes by Sport

sport-matrices

 

For most rows, swimming is the most common predicted sport. That’s partially because there are so many swimmers in the data and partially due to the fact that swimmers have a fairly generic body type as measured by height and weight (see the first post). With more features such as arm length and torso length we could better distinguish between swimmers and non-swimmers.

Three out of the four methods perform similarly. The real oddball here is random forest: it classifies the training data very well, but does about as well on the test data as the other methods. This suggests that random forest is overfitting the data, and won’t give us great predictions on new data.

Classifying Athletes by Event

event-matrices

The results here are similar to the ones above: all four methods do about equally well for the test data, while random forest overfits the training data. The two squares in each figure represent male and female sports. This is a good sanity check–at least our methods aren’t misclassifying men into women’s events or vice versa (recall that sex is one of the four features used for classification).

Accuracy

Visualizations are more helpful than looking at a large table of predicted probabilities, but what are the actual numbers? How accurate are the predictions from these methods? The table below presents accuracy for both tasks, for training and test sets.

accuracy

The various methods classify Olympians into sports and events with about 25-30 percent accuracy. This isn’t great performance. Keep in mind that we only had four features to go on, though–with additional data about the participants we could probably do better.

After seeing these results I am deeply skeptical that David Epstein could classify Olympians by event using only their height and weight. Giving him the benefit of the doubt, he probably had in mind the kind of sports and events that we saw were easy to classify: basketball, weightlifting, and high jump, for example. These are the types of competitions that The Sports Gene focuses on. As we have seen, though, there is a wide range of sporting events and a corresponding diversity of body types. Being naturally tall or strong doesn’t hurt, by it also doesn’t automatically qualify you for the Olympics. Training and hard work play an important role, and Olympic athletes exhibit a wide range of physical characteristics.