Poor naked wretches, whereso’er you are,
That bide the pelting of this pitiless storm,
How shall your houseless heads and unfed sides,
Your loop’d and window’d raggedness, defend you,
From seasons such as these?
—The Tragedy of King Lear Act III, Scene 4
“Whenever people talk to me about the weather,” the Irish writer Oscar Wilde once remarked, “I always feel quite certain that they mean something else.” As it happens, the weather at this year’s British Open has been delayed by high winds and will not be finished with the regulation 72 holes until Monday at the earliest. Which raises a question: why does the Open need to finish all 72 holes? The answer concerns something called a “Simpson’s Paradox”—an answer that also demonstrates just how talk about the weather at the British Open is in fact talk about something else. Namely, the 2016 American presidential election.
To see how, it’s first necessary to see the difference between the British Open and other professional golf tournaments, which are perfectly fine with shortening themselves. Take for instance the 2005 Northern Trust Open in Los Angeles: Adam Scott won in a playoff against Chad Campbell after the tournament was shortened to 36 holes due to weather. In 2013, the Tournament of Champions at Kapalua in Hawaii was “first cut to 54 holes because of unplayable conditions over the first two days,” according to Reuters, and was under threat of “being further trimmed to 36 holes.” The same story also quoted tour officials as saying “the eventual champion would wind up with an ‘unofficial win’” were the tournament to be shortened to 36 holes. (As things shook out they did end up completing 54 holes, and so Dustin Johnson’s win officially counted.) In a standard PGA tournament then, the “magic number” for an “official” tournament is 54 holes. But if so, then why does the Open need 72?
To answer that, let’s take a closer look at the standard professional golf tournament. Most such tournaments are conducted according to what the Rules of Golf calls “stroke play”: four rounds of golf, or 72 holes, at the end of which the players who have made it that far add up their scores—their number of strokes. The player with the lowest score, it may seem like it goes without saying, wins. But it does need to be said—because that isn’t the only option.
Many amateur tournaments after all, such as the United States Amateur, use the rules format known as “match play.” Under this format, the winner of the contest is not necessarily the player who shoots the lowest overall score, as in stroke play. Instead, as John Van der Borght has put the matter on the website of the United States Golf Association, in match play the “winner is the player who wins the most holes.” It’s a seemingly minor difference—but in fact it creates such a difference that match play is virtually a different sport than stroke play.
Consider, for instance, the Accenture Match Play tournament—the only tournament on the PGA Tour to be held under match play rules. The 2014 edition (held at the Dove Mountain course near Tucson, Arizona), had some results that demonstrate just how different match play is than stroke play, as Doug Ferguson of the Associated Press observed. “Pablo Larrazabal shot a 68 and was on his way back to Spain,” Ferguson noted about the first day’s results, while “Ernie Els shot 75 and has a tee time at Dove Mountain on Thursday.” In other words, Larrazabal lost his match and Els won his, even though Larrazabal was arguably the better player at this tournament—at least, if you consider the “better player” to be the one who puts his ball in the hole most efficiently.
Such a result might seem unfair—but why? It could be argued that while shooting a lower number might be what stroke play golf is, that isn’t what match play golf is. In other words, Larrazabal obviously wasn’t better at whatever it was that this tournament measured: if Larrazabal couldn’t beat his opponent, while Els could, then clearly Els deserved to continue to play while Larrazabal did not. While you might feel that, somehow or other, Larrazabal got jobbed, that’s merely a sentimental reaction to what ought to be a hardhearted calculation: maybe it’s true that under stroke play rules Larrazabal would have won, but that wasn’t the rules of the contest at Dove Mountain. In other words, you could say that golfing ability was, in a sense, socially constructed: what matters isn’t some “ahistorical” ability to golf, but instead how it is measured.
Here’s the $64,000 question a guy named Bill James might ask in response to such an argument, however (couched in terms of baseball players): “If you were trying to win a pennant, how badly would you want this guy?” In other words, based on the evidence presented, what would you conclude about the respective golf ability of Els and Larrazabal? Wouldn’t you conclude that Larrazabal is better at the task of putting his ball in the hole, and that the various rule systems that could be constructed around that task are merely different ways of measuring that ability—an ability that pre-existed those systems of measurement?
“We’re not trying to embarrass the best players in the game,” said Sandy Tatum at the 1974 U.S. Open, the so-called Massacre at Winged Foot: “We’re trying to identify them.” Scoring systems in short should be aimed at revealing, not concealing, ability. I choose Bill James to make the point not just because the question he asks is so pithy, but because he invented an equation that is designed to discover underlying ability: an equation called the Pythagorean Expectation. That equation, in turn, demonstrates just why it is so that match play and stroke play are not just different—yet equally valid—measures of playing ability. In so doing, James also demonstrates just why it is that the Open Championship requires that all 72 holes be played.
So named because it resembles so closely that formula, fundamental to mathematics, called the Pythagorean Theorem, what the Pythagorean Expectation says is that the ratio of a team’s (or player’s) points scored to that team’s (or player’s) points allowed is a better predictor of future success than the team’s (or player’s) ratio of wins to losses. (James used “runs” because he was dealing with baseball.) More or less it works: as Graham MacAree puts it on the website FanGraphs, using James’ formula makes it “relatively easy to predict a team’s win-loss record”—even in sports other than baseball. Yet why is this so—how can a single formula predict future success at any sport? It might be thought, after all, that different sports exercise different muscles, or use different strategies: how can one formula describe underlying value in many different venues—and thus, incidentally, demonstrate that ability can be differentiated from the tools we use to measure it?
The answer to these questions is that adding up the total points scored, rather than the total games won, gives us a better notion of the relative value of a player or a team because it avoids something called the “Simpson’s Paradox”—which is what happens when, according to Wikipedia, it “appears that two sets of data separately support a certain hypothesis, but, when considered together, they support the opposite hypothesis.” Consider what happens for example when we match Ernie Els’ 75 to Pablo Larrazabal’s 68: if we match them according to who won each hole, Els comes out the winner—but if we just compared raw scores, then Larrazabal would. Simpson’s Paradoxes appear, in short, when we draw the boundaries around the raw data differently: the same score looks different depending on what lens is used to view it—an answer that might seem to validate those who think that underlying ability doesn’t exist, but only the means used to measure it. But what Simpson’s Paradox shows isn’t that all boundaries around the data are equal—in fact, it shows just the opposite.
What Simpson’s Paradox shows, in other words, is that drawing boundaries around the data can produce illusions of value if that drawing isn’t done carefully—and most specifically, if the boundaries don’t capture all of the data. That’s why the response golf fans might have to the assertion that Pablo Larrazabal is better than Ernie Els proves, rather than invalidates, the argument so far: people highly familiar with golf might respond, “well, you haven’t considered the total picture—Els, for instance, has won two U.S. Opens, widely considered to be the hardest tournament in the world, and Larrazabal hasn’t won any.” But then consider that what you have done just demonstrates the point made by Simpson’s Paradox: in order to say that Els is better, you have opened up the data set; you have redrawn the boundaries of the data in order to include more information. So what you would have conceded, were you to object to the characterization of Larrazabal as a better golfer than Els on the grounds that Els has a better overall record than Larrazabal, is that the way to determine the better golfer is to cast the net as wide as possible. You have demanded that the sample size be increased.
That then is why a tournament contested over only 36 holes isn’t considered an “official” PGA tournament, while 54 holes isn’t enough to crown the winner of a major tournament like the Open Championship (which is what the British Open is called when it’s at home). It’s all right if a run-of-the-mill tournament be cut to 54 holes, or even 36 (though in that case we don’t want the win to be official). But in the case of a major championship, we want there to be no misunderstandings, no “fluky” situations like the one in which Els wins and Larrazabal doesn’t. The way to do that, we understand, is to maximize chances, to make the data set as wide as possible: in sum, to make a large sample size. We all, I think, understand this intuitively: it’s why baseball has a World Series rather than a World Championship Game. So that is why, in a major championship, it doesn’t matter how long it takes—all the players qualified are going to play all 72 holes.
Here I will, as they say in both golf and baseball, turn for home. What all of this about Simpson’s Paradoxes means, at the end of the day, is that a tournament like the Open Championship is important—as opposed to, say, an American presidential election. In a presidential election as everyone knows, what matters isn’t the total numbers of votes a candidate wins, but how many states. In that sense, American presidential elections are conducted according to what, in golf, would be considered match play instead of stroke play. Now, as Bill James might acknowledge, that begs the question: does that process result in better candidates being elected?
As James might ask in response: would you like to bet?