At Play In The Fields Of The Lord

Logo for 2015 US Amateur at Olympia Fields Country Club
Logo for 2015 US Amateur at Olympia Fields Country Club

 

Behold, I send you forth as sheep in the midst of wolves:
be ye therefore wise as serpents, and harmless as doves.
—Matthew 10:16

Now that the professional, Open tournaments are out of the way, the U.S. Amateur approaches. A tournament that has always been a symbol of wealth and discrimination—the Amateur was a tournament invented specifically to keep out the riff-raff of professional golfers—the site of this year’s edition might be considered particularly unfortunate considering that this year the tournament will fall just more than a year after the Michael Brown shooting in Ferguson, Missouri: Olympia Fields, in Chicago’s south suburbs, is a relatively wealthy enclave among a swath of exceedingly poor villages and towns very like the terrain of the St. Louis suburbs just a few hundred miles away. Yet there’s a deeper irony at work here that might be missed even by those who’d like to point out that similarity of setting: the format of the tournament, match-play, highlights precisely what the real message of the Brown shooting was. That real message, the one that is actually dangerous to power, wasn’t the one shouted by protestors—that American police departments are “racist.” The really dangerous message is the one echoed by the Amateur: a message that, read properly, tells us that our government’s structure is broken.

The later rounds of U. S. Amateur are played under golf’s match play, rather than stroke play, rules—a difference that will seem arcane to those unfamiliar with the sport, but is a very significant difference nevertheless. In stroke play, competitors play whatever number of holes are required—in professional tournaments, usually 72 holes—and count up however many strokes each took: the player with the fewest strokes wins. Match play however is not the same: in the first place, because in stroke play each golfer is effectively playing against every other player in the field, because all the strokes of every player count. But this is not so in match play.

In the first place, match play consists of, as the name suggests, matches: that is, once the field is cut to the 64 players with the lowest score after an initial two-day stroke play tournament, each of those 64 contestants plays an 18-hole match against one other contestant. The winner of each of these matches then proceeds to move on, until there is a champion—a single-elimination tournament that is exactly like the NCAA basketball tournament held every year in March. The winner of each match in turn, as John Van der Borght says on the website of the United States Golf Association, “is the player who wins the most holes.” That is, what matters on every hole is just whether the golfer has shot a lower score than the opponent for that hole, not overall. Each hole starts the competition again, in other words—like flipping coins, what happened in the past is irrelevant. It’s a format that might sound hopeful, because on each hole whatever screw-ups a player commits are consigned to the dustbin of history. In fact, however, it’s just this element that makes match-play the least egalitarian of formats—and ties it to Ferguson.

Tournaments conducted under match play rules are always subject to a kind of mathematical oddity called a Simpson’s Paradox: such a paradox occurs when, as the definition on Wikipedia says, it “appears that two sets of data separately support a certain hypothesis, but, when considered together, they support the opposite hypothesis.” For example, as I have mentioned in this blog before, in the first round of the PGA Tour’s 2014 Accenture Match Play tournament in Tucson, an unknown named Pedro Larrazabal shot a 68 to Hall-of-Famer Ernie Els’ 75—but because they played different opponents, Larrazabal was out of the tournament and Els was in. Admittedly, even with such an illustration the idea might still sound opaque, but the meaning can be seen by considering, for example, the tennis player Roger Federer’s record versus his rival Rafael Nadal.

Roger Federer has won 17 major championships in men’s tennis, a record—and yet many people argue that he is not the Greatest Of All Time (G.O.A.T.). The reason those people can argue that is because, as Michael Steinberger pointed in the New York Times not long ago, Federer “has a losing record against Nadal, and a lopsided one at that.” Steinberger then proceeded to argue why that record should be discarded and Federer should be called the “GOAT” anyway. But weirdly, Steinberger didn’t attempt—and neither, so far as I can tell, has anyone else—what an anonymous blogger did in 2009: a feat that demonstrates just what a Simpson’s Paradox is, and how it might apply both to the U.S. Amateur and Ferguson, Missouri.

What that blogger did, on a blog entitled SW19—a reference to the United Kingdom’s postal code for Wimbledon, the great tennis arena—was he counted up the points.

Let me repeat: he counted up the points.

That might sound trivial, of course, but as the writer of the SW19 blog realized, tennis is a game that abounds in Simpson’s Paradoxes: that is, it is a game in which it is possible to score fewer points than your opponent, but still win the match. Many people don’t realize this: it might be expected, for example, that because Nadal has an overwhelmingly-dominant win-loss record versus Federer, he must also have won an equally-dominant number of points from the Swiss champion. But an examination of the points scored in each of the matches between Federer and Nadal demonstrates that in fact the difference between them was miniscule.

The SW19 blogger wrote his post in 2009; at that time Nadal led Federer by 13 matches to 7 matches, a 65 percent winning edge for the Spaniard, Nadal. Of those 20 matches, Nadal won the 2008 French Open—played on Nadal’s best surface, clay—in straight sets, 6-1, 6-3, 6-0. In those 20 matches, the two men played 4,394 total points: that is, where one player served and the two volleyed back and forth until one player failed to deliver the ball to the other court according to the rules. If tennis had a straightforward relationship between points and wins—like golf’s stroke play format, in which every “point” (stroke) is simply added to the total and the winner has the fewest points—then it might be expected that Nadal has won about 65 percent of those 4,394 points played, which would be about 2,856 points. In other words, to get a 65 percent edge in total matches, Nadal should have about a 65 percent edge in total points: the point total, as opposed to the match record, between the two ought to be about 2,856 to 1,538.

Yet this, as the SW19 blogger realized, is not the case: the real margin between the two players was Nadal, 2,221, and Federer, 2,173. In other words, even including the epic beating at Roland Garros in 2008, Nadal had only beaten Federer by a total of 48 points over the course of their careers–a total of less than one percent of all the points scored. Not merely that, but if that single match at the 2008 French Open is excluded, then the margin becomes eight points.  The mathematical difference between Nadal and Federer, thus, is the difference between a couple of motes of dust on the edge of a coin while it’s being flipped—if what is measured is the act that is the basis of the sport, the act of scoring points. In terms of points scored, Nadal’s edge is about a half of percentage point—and most of that percentage was generated by a single match. But Nadal had a 65 percent edge in their matches.

How did that happen? The answer is that the structure of tennis scoring is similar to that of match play in golf: the relation between wins and points isn’t direct. In fact, as the SW19 blogger shows, of the twenty matches Nadal and Federer had played to that moment in 2009, Federer had actually scored more points than Nadal in three of them—and still lost the match. If there were a direct relation between points and wins in tennis, that is, the record between Federer and Nadal would actually stand even, at 10-10, instead of what it was in reality, 13-7—a record that would have accurately captured the real point differential between them. But because what matters in tennis isn’t—exactly—the total number of points you score, but instead the numbers of games and sets you win, it is entirely possible to score more points than your opponent in a tennis match—and still lose. (Or, the converse.)

The reason why that is possible, as Florida State University professor Ryan Rodenberg put it in The Atlantic not long ago, is due to “tennis’ decidedly unique scoring system.” (Actually, not unique, because as might be obvious by now match play golf is scored similarly.) In sports like soccer, baseball, or stroke play golf, as sports psychologist Allen Fox once wrote in Tennis magazine, “score is cumulative throughout the contest … and whoever has the most [or, in the case of stroke play golf, least] points at the end wins.” But in tennis things are different: “[i]f you reach game point and win it, you get the entire game while your opponent gets nothing—all the points he or she won in the game are eliminated.” Just in the same way that what matters in tennis is the game, not the point, in match play golf all that matters is the hole, and not the stroke.

Such scoring systems breed Simpson’s Paradoxes: that is, results that don’t reflect the underlying value a scoring system is meant to reflect—we want our games to be won by the better player, not the lucky one—but instead are merely artifacts of the system used to measure. The point (ha!) can be shown by way of an example taken from a blog written by one David Smith, head of marketing for a company called Revolution Analytics, about U.S. median wages. In that 2013 post, Smith reported that the “median US wage has risen about 1%, adjusted for inflation,” since 2000. But was that statistic important—that is, did it measure real value?

Well, what Smith found was that wages for high school dropouts, high school graduates, high school graduates with some college, college graduates, and people with advanced degrees all fell over the same period. Or, as Smith says, “within every educational subgroup, the median wage is now lower than it was in 2000.” But how can it be that “overall wages have risen, but wages within every subgroup have fallen?” The answer is similar to the reason why Rafael had a 65 percent winning margin against Federer: although there are more college graduates now than in 2000, the wages of college graduates haven’t fallen (1.2%) as far as, say, high school dropouts (7.9%). So despite the fact that everyone is poorer—everyone is receiving lower wages, adjusted for inflation—than in 2000, mendacious people can say wages are actually up. Wages are up—if you “compartmentalize” the numbers in just the way that reflects the story you’d like to tell.

Now, while the story about American wages might suggest a connection to Ferguson—and it does—that isn’t the connection between the U.S. Amateur and Ferguson, Missouri, I’d like to discuss. That connection is this one: if the trouble about the U.S. Amateur is that it is conducted under match play—a format that permits Simpson’s Paradox results—and Simpson’s Paradoxes are, at heart, boundary disputes—arguments about whether to divide up the raw data into smaller piles or present them as one big pile—then that suggests the real link to Ferguson because the real issue behind Darren Wilson’s shooting of Michael Brown then isn’t racism—or at least, the way to solve it isn’t to talk about racism. Instead, it’s to talk borders.

After Ferguson police officer Darren Wilson shot Michael Brown last August, the Department of Justice issued a report that was meant, as Zoë Carpenter of The Nation wrote this past March, to “address the roots of the police force’s discriminatory practices.” That report held that those practices were not “simply the result of racist cops,” but instead stemmed “from the way the city preys on residents financially, relying on the fines that accompany even minor offenses to balance its budget.” The report found an email from Ferguson’s finance director to the town’s police chief that, Carpenter reported, said “unless ticket writing ramps up significantly before the end of the year, it will be hard to significantly raise collections next year.” The finance director’s concerns were justified: only slightly less than a quarter of Ferguson’s total budget was generated by traffic tickets and other citations. The continuing operation of the town depends on revenue raised by the police—a need, in turn, that drives the kind of police zealotry that the Department of Justice said contributed to Brown’s death.

All of which might seem quite far from the concerns of the golf fans watching the results of the matches at the U.S. Amateur. Yet consider a town not far from Ferguson: Beverly Hills, Missouri. Like Ferguson, Beverly Hills is located to the northwest of downtown St. Louis, and like Ferguson it is a majority black town. But where Ferguson has over 20,000 residents, Beverly Hills has only around 600 residents—and that size difference is enough to make the connection to the U.S. Amateur’s format of play, match play, crystalline.

Ferguson after all is not alone in depending so highly on police actions for its revenues: Calverton Park, for instance, is another Missouri “municipality that last fiscal year raised a quarter of its revenue from traffic fines,” according to the St. Louis Post-Dispatch. Yet while Ferguson, like Calverton Park, also raised about a quarter of its budget from police actions, Beverly Hills raised something like half of its municipal budget on traffic and other kinds of citations, as a story in the Washington Post. All these little towns, all dependent on traffic tickets to meet their budgets; “Most of the roughly ninety municipalities in St. Louis County,” Carpenter reports in The Nation, “have their own courts, which … function much like Ferguson’s: for the purpose of balancing budgets.” Without even getting into the issue of the fairness of property taxes or sales taxes as a basis for municipal budgeting, it seems obvious that depending on traffic tickets as a major source of revenue is poor planning at best. Yet without the revenue provided by cops writing tickets—and, as a result of Ferguson, the state of Missouri is considering limiting the percentage of a town’s budget that can be raised by such tickets, as the St. Louis Dispatch article says—many of these towns will simply fail. And that is the connection to the U.S. Amateur.

What these towns are having to consider in other words is, according to the St. Louis Post-Dispatch, an option mentioned by St. Louis County Executive Steve Stenger last December: during an interview, the official said that “the consolidation of North County municipalities is what we should be talking about” in response to the threat of cutting back reliance on tickets. Small towns like Beverly Hills may simply be too small: they create too little revenue to support themselves without a huge effort on the part of the police force to find—and thus, in a sense, create—what are essentially taxable crimes. The way to solve the problem of a “racist” police department, in other words, might not be to conduct workshops or seminars in order to “retrain” the officers on the frontline, but instead to redrawn the political boundaries of the greater St. Louis metropolitan area.

That, at least, is a solution that our great-grandparents considered, as an article by writer Kim-Mai Cutler for Tech Crunch this past April remarked. Examining the historical roots of the housing crisis in San Francisco, Cutler discovered that in “1912, a Greater San Francisco movement emerged and the city tried to annex Oakland,” a move Oakland resisted. Yet as a consequence of not creating a Bay-wide government, Cutler says, “the Bay Area’s housing, transit infrastructure and tax system has been haunted by the region’s fragmented governance” ever since: the BART (Bay Area Regional Transit) system, for example, as originally designed “would have run around the entire Bay Area,” Cutler says, “but San Mateo County dropped out in 1961 and then Marin did too.” Many of the problems of that part of Northern California could be solved, Cutler thusly suggests via this and other instances—contra the received wisdom of our day—by bigger, not smaller, government.

“Bigger,” that is, in the sense of “more consolidated”: by the metric of sheer numbers, a government built to a larger scale might not employ as many people as do the scattered suburban governments of America today. But what such a government would do is capture all of the efficiencies of economies of scale available to a larger entity—thus, it might be in a sense smaller than the units it replaced, but definitely would be more powerful. What Missourians and Californians—and possibly others—may be realizing then is that the divisions between their towns are like the divisions tennis makes around its points, or match play golf makes around its strokes: dividing a finite resource, whether points or strokes or tax dollars (or votes), into smaller pools creates what might be called “unnatural,” or “artificial,” results—i.e., results that inadequately reflect the real value of the underlying resource. Just like match play can make Ernie Els’ 75 look better than Pedro Larrazabal’s 68, or tennis’ scoring system can make Rafael Nadal look much better than Federer—when in reality the difference between them is (or was) no more than a sliver of a gnat’s eyelash—dozens of little towns dissipate the real value, economic and otherwise, of the people that inhabit a region.

That’s why when Eric Holder, Attorney General for the United States, said that “the underlying culture” of the police department and court system of Ferguson needs to be reformed, he got it exactly wrong. The problems in St. Louis and San Francisco, the evidence suggests, are created not because government is getting in the way, but because government isn’t structured correctly to channel the real value of the people: scoring systems that leave participants subject to the vagaries of Simpson’s Paradox results might be perfectly fine for games like tennis or golf—where the downsides are minimal—but they shouldn’t be how real life gets scored, and especially not in government. Contra Holder, the problem is not that the members of the Ferguson police department are racists. The problem is that the government structure requires them, like occupying soldiers or cowboys, to view their fellow citizens as a kind of herd. Or, to put the manner in a pithier way: A system that depends on the harvesting of sheep will turn its agents into wolves. Instead of drowning the effects of racism—as a big enough government would through its very size—multiplying struggling towns only encourages racism: instead of diffusing racism, a system broken into little towns focuses it. The real problem of Ferguson then—the real problem of America—is not that Americans are systematically discriminatory: it’s that the systems used by Americans aren’t keeping the score right.

This Pitiless Storm

Poor naked wretches, whereso’er you are,
That bide the pelting of this pitiless storm,
How shall your houseless heads and unfed sides,
Your loop’d and window’d raggedness, defend you,
From seasons such as these?
The Tragedy of King Lear Act III, Scene 4

“Whenever people talk to me about the weather,” the Irish writer Oscar Wilde once remarked, “I always feel quite certain that they mean something else.” As it happens, the weather at this year’s British Open has been delayed by high winds and will not be finished with the regulation 72 holes until Monday at the earliest. Which raises a question: why does the Open need to finish all 72 holes? The answer concerns something called a “Simpson’s Paradox”—an answer that also demonstrates just how talk about the weather at the British Open is in fact talk about something else. Namely, the 2016 American presidential election.

To see how, it’s first necessary to see the difference between the British Open and other professional golf tournaments, which are perfectly fine with shortening themselves. Take for instance the 2005 Northern Trust Open in Los Angeles: Adam Scott won in a playoff against Chad Campbell after the tournament was shortened to 36 holes due to weather. In 2013, the Tournament of Champions at Kapalua in Hawaii was “first cut to 54 holes because of unplayable conditions over the first two days,” according to Reuters, and was under threat of “being further trimmed to 36 holes.” The same story also quoted tour officials as saying “the eventual champion would wind up with an ‘unofficial win’” were the tournament to be shortened to 36 holes. (As things shook out they did end up completing 54 holes, and so Dustin Johnson’s win officially counted.) In a standard PGA tournament then, the “magic number” for an “official” tournament is 54 holes. But if so, then why does the Open need 72?

To answer that, let’s take a closer look at the standard professional golf tournament. Most such tournaments are conducted according to what the Rules of Golf calls “stroke play”: four rounds of golf, or 72 holes, at the end of which the players who have made it that far add up their scores—their number of strokes. The player with the lowest score, it may seem like it goes without saying, wins. But it does need to be said—because that isn’t the only option.

Many amateur tournaments after all, such as the United States Amateur, use the rules format known as “match play.” Under this format, the winner of the contest is not necessarily the player who shoots the lowest overall score, as in stroke play. Instead, as John Van der Borght has put the matter on the website of the United States Golf Association, in match play the “winner is the player who wins the most holes.” It’s a seemingly minor difference—but in fact it creates such a difference that match play is virtually a different sport than stroke play.

Consider, for instance, the Accenture Match Play tournament—the only tournament on the PGA Tour to be held under match play rules. The 2014 edition (held at the Dove Mountain course near Tucson, Arizona), had some results that demonstrate just how different match play is than stroke play, as Doug Ferguson of the Associated Press observed. “Pablo Larrazabal shot a 68 and was on his way back to Spain,” Ferguson noted about the first day’s results, while “Ernie Els shot 75 and has a tee time at Dove Mountain on Thursday.” In other words, Larrazabal lost his match and Els won his, even though Larrazabal was arguably the better player at this tournament—at least, if you consider the “better player” to be the one who puts his ball in the hole most efficiently.

Such a result might seem unfair—but why? It could be argued that while shooting a lower number might be what stroke play golf is, that isn’t what match play golf is. In other words, Larrazabal obviously wasn’t better at whatever it was that this tournament measured: if Larrazabal couldn’t beat his opponent, while Els could, then clearly Els deserved to continue to play while Larrazabal did not. While you might feel that, somehow or other, Larrazabal got jobbed, that’s merely a sentimental reaction to what ought to be a hardhearted calculation: maybe it’s true that under stroke play rules Larrazabal would have won, but that wasn’t the rules of the contest at Dove Mountain. In other words, you could say that golfing ability was, in a sense, socially constructed: what matters isn’t some “ahistorical” ability to golf, but instead how it is measured.

Here’s the $64,000 question a guy named Bill James might ask in response to such an argument, however (couched in terms of baseball players): “If you were trying to win a pennant, how badly would you want this guy?” In other words, based on the evidence presented, what would you conclude about the respective golf ability of Els and Larrazabal? Wouldn’t you conclude that Larrazabal is better at the task of putting his ball in the hole, and that the various rule systems that could be constructed around that task are merely different ways of measuring that ability—an ability that pre-existed those systems of measurement?

“We’re not trying to embarrass the best players in the game,” said Sandy Tatum at the 1974 U.S. Open, the so-called Massacre at Winged Foot: “We’re trying to identify them.” Scoring systems in short should be aimed at revealing, not concealing, ability. I choose Bill James to make the point not just because the question he asks is so pithy, but because he invented an equation that is designed to discover underlying ability: an equation called the Pythagorean Expectation. That equation, in turn, demonstrates just why it is so that match play and stroke play are not just different—yet equally valid—measures of playing ability. In so doing, James also demonstrates just why it is that the Open Championship requires that all 72 holes be played.

So named because it resembles so closely that formula, fundamental to mathematics, called the Pythagorean Theorem, what the Pythagorean Expectation says is that the ratio of a team’s (or player’s) points scored to that team’s (or player’s) points allowed is a better predictor of future success than the team’s (or player’s) ratio of wins to losses. (James used “runs” because he was dealing with baseball.) More or less it works: as Graham MacAree puts it on the website FanGraphs, using James’ formula makes it “relatively easy to predict a team’s win-loss record”—even in sports other than baseball. Yet why is this so—how can a single formula predict future success at any sport? It might be thought, after all, that different sports exercise different muscles, or use different strategies: how can one formula describe underlying value in many different venues—and thus, incidentally, demonstrate that ability can be differentiated from the tools we use to measure it?

The answer to these questions is that adding up the total points scored, rather than the total games won, gives us a better notion of the relative value of a player or a team because it avoids something called the “Simpson’s Paradox”—which is what happens when, according to Wikipedia, it “appears that two sets of data separately support a certain hypothesis, but, when considered together, they support the opposite hypothesis.” Consider what happens for example when we match Ernie Els’ 75 to Pablo Larrazabal’s 68: if we match them according to who won each hole, Els comes out the winner—but if we just compared raw scores, then Larrazabal would. Simpson’s Paradoxes appear, in short, when we draw the boundaries around the raw data differently: the same score looks different depending on what lens is used to view it—an answer that might seem to validate those who think that underlying ability doesn’t exist, but only the means used to measure it. But what Simpson’s Paradox shows isn’t that all boundaries around the data are equal—in fact, it shows just the opposite.

What Simpson’s Paradox shows, in other words, is that drawing boundaries around the data can produce illusions of value if that drawing isn’t done carefully—and most specifically, if the boundaries don’t capture all of the data. That’s why the response golf fans might have to the assertion that Pablo Larrazabal is better than Ernie Els proves, rather than invalidates, the argument so far: people highly familiar with golf might respond, “well, you haven’t considered the total picture—Els, for instance, has won two U.S. Opens, widely considered to be the hardest tournament in the world, and Larrazabal hasn’t won any.” But then consider that what you have done just demonstrates the point made by Simpson’s Paradox: in order to say that Els is better, you have opened up the data set; you have redrawn the boundaries of the data in order to include more information. So what you would have conceded, were you to object to the characterization of Larrazabal as a better golfer than Els on the grounds that Els has a better overall record than Larrazabal, is that the way to determine the better golfer is to cast the net as wide as possible. You have demanded that the sample size be increased.

That then is why a tournament contested over only 36 holes isn’t considered an “official” PGA tournament, while 54 holes isn’t enough to crown the winner of a major tournament like the Open Championship (which is what the British Open is called when it’s at home). It’s all right if a run-of-the-mill tournament be cut to 54 holes, or even 36 (though in that case we don’t want the win to be official). But in the case of a major championship, we want there to be no misunderstandings, no “fluky” situations like the one in which Els wins and Larrazabal doesn’t. The way to do that, we understand, is to maximize chances, to make the data set as wide as possible: in sum, to make a large sample size. We all, I think, understand this intuitively: it’s why baseball has a World Series rather than a World Championship Game. So that is why, in a major championship, it doesn’t matter how long it takes—all the players qualified are going to play all 72 holes.

Here I will, as they say in both golf and baseball, turn for home. What all of this about Simpson’s Paradoxes means, at the end of the day, is that a tournament like the Open Championship is important—as opposed to, say, an American presidential election. In a presidential election as everyone knows, what matters isn’t the total numbers of votes a candidate wins, but how many states. In that sense, American presidential elections are conducted according to what, in golf, would be considered match play instead of stroke play. Now, as Bill James might acknowledge, that begs the question: does that process result in better candidates being elected?

As James might ask in response: would you like to bet?