Size Matters

That men would die was a matter of necessity; which men would die, though, was a matter of circumstance, and Yossarian was willing to be the victim of anything but circumstance.
Catch-22.
I do not pretend to understand the moral universe; the arc is a long one, my eye reaches but little ways; I cannot calculate the curve and complete the figure by the experience of sight; I can divine it by conscience. And from what I see I am sure it bends towards justice.
Things refuse to be mismanaged long.
—“Of Justice and the Conscience.

 

monte-carlo-casino
The Casino at Monte Carlo

 

 

Once, wrote the baseball statistician Bill James, there was “a time when Americans” were such “an honest, trusting people” that they actually had “an unhealthy faith in the validity of statistical evidence”–but by the time James wrote in 1985, things had gone so far the other way that “the intellectually lazy [had] adopted the position that so long as something was stated as a statistic it was probably false.” Today, in no small part because of James’ work, that is likely no longer as true as it once was, but nevertheless the news has not spread to many portions of academia: as University of Virginia historian Sophia Rosenfeld remarked in 2012, in many departments it’s still fairly common to hear it asserted—for example—that all “universal notions are actually forms of ideology,” and that “there is no such thing as universal common sense.” Usually such assertions are followed by a claim for their political utility—but in reality widespread ignorance of statistical effects is what allowed Donald Trump to be elected, because although the media spent much of the presidential campaign focused on questions like the size of Donald Trump’s … hands, the size that actually mattered in determining the election was a statistical concept called sample size.

First mentioned by the mathematician Jacob Bernoulli made in his 1713 book, Ars Conjectandi, sample size is the idea that “it is not enough to take one or another observation for such a reasoning about an event, but that a large number of them are needed.” Admittedly, it might not appear like much of an observation: as Bernoulli himself acknowledged, even “the most stupid person, all by himself and without any preliminary instruction,” knows that “the more such observations are taken into account, the less is the danger of straying from the goal.” But Bernoulli’s remark is the very basis of science: as an article in the journal Nature put the point in 2013, “a study with low statistical power”—that is, few observations—“has a reduced chance of detecting a true effect.” Sample sizes need to be large enough to be able to eliminate chance as a possible factor.

If that isn’t known it’s possible to go seriously astray: consider an example drawn from the work of Israeli psychologists Amos Tversky (MacArthur “genius” grant winner) and (Nobel Prize-winning) Daniel Kahneman—a study “of two toys infants will prefer.” Let’s say that in the course of research our investigator finds that, of “the first five infants studied, four have shown a preference for the same toy.” To most psychologists, the two say, this would be enough for the researcher to conclude that she’s on to something—but in fact, the two write, a “quick computation” shows that “the probability of a result as extreme as the one obtained” being due simply to chance “is as high as 3/8.” The scientist might be inclined to think, in other words, that she has learned something—but in fact her result has a 37.5 percent chance of being due to nothing at all.

Yet when we turn from science to politics, what we find is that an American presidential election is like a study that draws grand conclusions from five babies. Instead of being one big sample—as a direct popular national election would be—presidential elections are broken up into fifty state-level elections: the Electoral College system. What that means is that American presidential elections maximize the role of chance, not minimize it.

The laws of statistics, in other words, predict that chance will play a large role in presidential elections—and as it happens, Tim Meko, Denise Lu and Lazaro Gamio reported for The Washington Post three days after the election that “Trump won the presidency with razor-thin margins in swing states.” “This election was effectively decided,” the trio went on to say, “by 107,000 people”—in an election in which more than 120 million votes were cast, that means that election was decided by less than a tenth of one percent of the total votes. Trump won Pennsylvania by less than 70,000 votes of nearly 6 million, Wisconsin by less than 30,000 of just less than three million, and finally Michigan by less than 11,000 out of 4.5 million: the first two by just more than one percent of the total vote each—and Michigan by a whopping .2 percent! Just to give you an idea of how insignificant these numbers are by comparison with the total vote cast, according to the Michigan Department of Transportation it’s possible that a thousand people in the five largest counties were involved in car crashes—which isn’t even to mention people who just decided to stay home because they couldn’t find a babysitter.

Trump owes his election, in short, to a system that is vulnerable to chance because it is constructed to turn a large sample (the total number of American voters) into small samples (the fifty states). Science tells us that small sample sizes increase the risk of random chance playing a role, American presidential elections use a smaller sample size than they could, and like several other presidential elections, the 2016 election did not go as predicted. Donald Trump could, in other words, be called “His Accidency” with even greater justice than John Tyler—the first vice-president to be promoted due to the death of his boss in office—was. Yet, why isn’t that point being made more publicly?

According to John Cassidy of The New Yorker, it’s because Americans haven’t “been schooled in how to think in probabilistic terms.” But just why that’s true—and he’s essentially making the same point Bill James did in 1985, though more delicately—is, I think, highly damaging to many of Clinton’s biggest fans: the answer is, because they’ve made it that way. It’s the disciplines where many of Clinton’s most vocal supporters make their home, in other words, that are most directly opposed to the type of probabilistic thinking that’s required to see the flaws in the Electoral College system.

As Stanford literary scholar Franco Moretti once observed, the “United States is the country of close reading”: the disciplines dealing with matters of politics, history, and the law within the American system have, in fact, more or less been explicitly constructed to prevent importing knowledge of the laws of chance into them. Law schools, for example, use what’s called the “case method,” in which a single case is used to stand in for an entire body of law: a point indicated by the first textbook to use this method, Christopher Langdell’s A Selection of Cases on the Law of Contracts. Other disciplines, such as history, are similar: as Emory University’s Mark Bauerlein has written, many such disciplines depend for their very livelihood upon “affirming that an incisive reading of a single text or event is sufficient to illustrate a theoretical or historical generality.” In other words, it’s the very basis of the humanities to reject the concept of sample size.

What’s particularly disturbing about this point is that, as Joe Pinsker documented in The Atlantic last year, the humanities attract a wealthier student pool than other disciplines—which is to say that the humanities tend to be populated by students and faculty with a direct interest in maintaining obscurity around the interaction between the laws of chance and the Electoral College. That doesn’t mean that there’s a connection between the architecture of presidential elections and the fact that—as Geoffrey Harpham, former president and director of the National Humanities Center, has observed—“the modern concept of the humanities” (that is, as a set of disciplines distinct from the sciences) “is truly native only to the United States, where the term acquired a meaning and a peculiar cultural force that it does not have elsewhere.” But it does perhaps explain just why many in the national media have been silent regarding that design in the month after the election.

Still, as many in the humanities like to say, it is possible to think that the current American university and political structure is “socially constructed,” or in other words could be constructed differently. The American division between the sciences and the humanities is not the only way to organize knowledge: as the editors of the massive volumes of The Literary and Cultural Reception of Darwin in Europe pointed out in 2014, “one has to bear in mind that the opposition of natural sciences … and humanities … does not apply to the nineteenth century.” If that opposition that we today find so omnipresent wasn’t then, it might not be necessary now. Hence, if the choice of the American people is between whether they ought to get a real say in the affairs of government (and there’s very good reason to think they don’t), or whether a bunch of rich yahoos spend time in their early twenties getting drunk, reading The Great Gatsby, and talking about their terrible childhoods …well, I know which side I’m on. But perhaps more significantly, although I would not expect that it happens tomorrow, still, given the laws of sample size and the prospect of eternity, I know how I’d bet.

Or, as another sharp operator who’d read his Bernoulli once put the point:

The arc of the moral universe is long, but it bends towards justice.”

 

Advertisements

An Unfair Game

The sheer quantity of brain power that hurled itself voluntarily and quixotically into the search for new baseball knowledge was either exhilarating or depressing, depending on how you felt about baseball.
Moneyball: The Art of Winning an Unfair Game

“Today, in sports,” wrote James Surowiecki in The New Yorker a couple of years ago, “what you are is what you make yourself into”—unlike forty or fifty years ago, nearly all elite-level athletes have a tremendous entourage of dietitians, strength coaches, skill coaches, and mental coaches to help them do their jobs. But not just athletes: at the team level, coaches and scouts have learned to use data both to recruit the best players and turn that talent into successful strategies. Surowiecki notes for instance that when sports columnist Mark Montieth went back and looked at old NBA games from the 1950s and 60s, he found that NBA coaches at the time “hadn’t yet come up with offenses sophisticated enough to create what are considered good shots today.” That improvement, however, is not limited to sports: Surowiecki also notes that in fields as varied as chess and classical music, airline safety to small-unit infantry tactics, the same basic sorts of techniques have greatly improved performance. What “underlies all these performance revolutions,” Surowiecki says, “is captured by the Japanese term kaizen, or continuous improvement”—that is, the careful analysis of technique. Still, what is more curious about the fact that so many disparate fields have been improved by kaizen-type innovations is not that they can be applied so variously, but that they have not been applied to many other fields: among them, Surowiecki lists medicine and education. Yet the field that might be ripest for the advent of kaizen—and with the greatest payoff for Americans, even greater than the fact that lemon cars are for the most part a thing of the past—is politics.

To be sure, politics doesn’t lend itself particularly well to training in a wind tunnel, as the top-level cyclists Surowiecki discusses do. Nor are politics likely to be improved especially by ensuring, like the Portland Trailblazers do, that everyone in government gets enough rest, or that they eat correctly—although one imagines that in the case of several politicians, the latter might greatly improve their performance. But while the “taking care of the talent” side of the equation might not, in the field of politics, be the most efficient use of resources, certainly the kinds of techniques that have helped teams improve their strategies just might. For example, in baseball, examining statistical evidence for signs of how better to defend against a particular batter has become wildly more popular in recent years—and baseball’s use of that strategy has certain obvious applications to American politics.

That team-level strategy is the “infield shift,” the technique whereby fielders are positioned on the field in unusual structures in order to take account of a particular batter’s tendencies. If, for example, a particular player tends to hit the ball to the left side of the field—a tendency readily observable in this age of advanced statistical analysis in the post-Moneyball era—teams might move the second baseman (on the right side of the infield) to behind second base, or even further left, just as to have an extra fielder where the batter tends to place his hits. According to the Los Angeles Times, the use of the “infield shift” has become far greater than it ever has: the “number of shifts,” the Times’ Zach Helfand wrote last year, “has nearly doubled nearly every year since 2011, from 2,357 to 13,298 last year.” This past season (2015), the use of shifts had exploded again, so that there were 10,262 uses of a shift “by the All-Star break,” Helfand reported. The use of shifts is growing at such an exponential rate, of course, because they work: the “strategy saved 190 runs in the first half this (2015) season, according to estimates from Baseball Info Solutions,” Helfand says. The idea makes intuitive sense: putting players where they are not needed is an inefficient use of a team’s resources.

The infield shift is a strategy, as it happens, that one of the greatest of America’s Supreme Court justices, Earl Warren, would have approved of—because he, in effect, directed the greatest infield shift of all time. About the line of cases now known as the “apportionment cases,” the former Chief Justice wrote that, despite having presided over such famous cases as Brown v. Board of Education (the case that desegregated American schools) or Miranda v. Arizona (which ensured that defendants would be represented by counsel), he was most proud of his role in these cases, which took on the fact that the “legislatures of more than forty states were so unbalanced as to give people in certain parts of them vastly greater voting representation than in others.” In the first of that line, Baker v. Carr, the facts were that whereas the population of the state of Tennessee had grown from 487,380 to 2,092,891 since 1900, and that said population had not been distributed evenly throughout the state but instead was concentrated in urban areas like Nashville and Memphis, still Tennessee had not reapportioned its legislature since 1901. This, said Warren, was ridiculous: in effect, Tennessee’s legislature was not only not shifted, but it was wrongly shifted. If the people of Tennessee were a right-handed pull-hitter (i.e., one that tends to hit to the left side of the field), in other words, Tennessee’s legislature had the second baseman, the shortstop, and the third baseman on the right side of the field—i.e., toward first base, not third.

“Legislators represent people, not trees or acres,” Warren wrote for a later “apportionment case,” Reynolds v. Sims (about Alabama’s legislature, which was, like Tennessee’s, also wildly malapportioned). What Warren was saying was that legislators ought to be where the constituents are—much as baseball fielders ought to be where the ball is likely to be hit. In Reynolds, the Alabama legislature wasn’t: because the Alabama Constitution provided that the state senate would be composed of one senator from each Alabama county, some senate districts had voting populations as much as 41 times that of the least populated. Warren’s work remedied that vast disparity: as a result of the U.S. Supreme Court’s decisions in Baker, Reynolds, and the other cases in the “apportionment” line, nearly every state legislature in the United States was forced to redraw boundaries and, in general, make sure the legislators were where the people are.

Of course, it might be noted that the apportionment cases were decided more than fifty years ago, and that the injustices they addressed have now all been corrected. Yet, it is not merely American state legislatures that were badly misaligned with the American population. After all, if the state senate of Alabama was badly malapportioned through much of the twentieth century and before, it is also true that the Senate of the United States continues to be malapportioned today: if the difference between Alabama’s least populated county and its most in the early 1960s was more than 40 times, the difference between the number of voters in Wyoming, the least populated American state, and California, the most, is now more than 60 times—and yet each state has precisely the same number of senators in the U.S. Senate. These differences, much like infield shifts, have consequences: in such books as Sizing Up the Senate: The Unequal Consequences of Equal Representation, political scientists like Frances E. Lee and Bruce I. Oppenheimer have demonstrated that, for example, “less populous states consistently receive more federal funding than states with more people.” Putting legislators where the people aren’t, in other words, has much the same effect as not shifting a baseball team’s infield: it allows money, and the other goods directed by a legislature, to flow—like hits—in directions that it wouldn’t were there fielders, or legislators, in place to redirect those flows.

To say that moving America’s legislature around would have an instantaneously positive effect on American life, of course, is likely to overstate the effect such a move might make: some batters in the major leagues, like Albert Pujols, have been able to overcome the effects of an infield shift. (Pujols, it seems, bats 28 points higher when a shift is on than when it isn’t, Zach Helfand reported.) Yet, teams still use the shift on Pujols—on the theory, apparently, that even though Pujols might overall bat better, still it is unlikely that he can keep it up, first, and second that on the occasions that he misses “hitting the gaps,” a fielder will be there.

Similarly, although it might be so that, as Senator Everett Dirksen of Illinois argued in the aftermath of 1964’s Reynolds, “the forces of our national life are not brought to bear on public questions solely in proportion to the weight of numbers,” the forces behind such examples as Billy Beane’s Oakland As teams—assembled largely on the weight of the statistics put up by the players—or Japanese car companies—which redesigned workspaces, Surowiecki says, “so workers didn’t have to waste time twisting and turning to reach their tools”—beg to differ: although not every question can be solved by the application of kaizen-like techniques, surely a number of them can.

Among them, it may be, is gun-control legislation, which has continually been held up by structural features of the American Congress that have much to do with malapportionment. Surely, in other words, with regard to gun policy it matters that the Senate is heavily stacked in favor of mostly-rural states. Were it not, it is much easier to imagine the United States having a gun policy much more in line with that of other industrialized democracies. Which, in the light of incidents like the recent shooting deaths in Orlando, is to shine a new light on an old baseball phrase.

That phrase?

“Hit ’em where they ain’t.”

The Oldest Mistake

Monte Ward traded [Willie] Keeler away for almost nothing because … he made the oldest mistake in management: he focused on what the player couldn’t do, rather than on what he could.
The New Bill James Historical Baseball Abstract

 

 

What does an American “leftist” look like? According to academics and the inhabitants of Brooklyn and its spiritual suburbs, there are means of tribal recognition: unusual hair or jewelry; a mode of dress either strikingly old-fashioned or futuristic; peculiar eyeglasses, shoes, or other accessories. There’s a deep concern about food, particularly that such food be the product of as small, and preferably foreign, an operation as possible—despite a concomitant enmity of global warming. Their subject of study at college was at minimum one of the humanities, and possibly self-designed. If they are fans of sports at all, it is either extremely obscure, obscenely technical, and does not involve a ball—think bicycle racing—or it is soccer. And so on. Yet, while each of us has exactly a picture of such a person in mind—probably you know at least a few, or are one yourself—that is not what a real American leftist looks like at the beginning of the twenty-first century. In reality, a person of the actual left today drinks macro-, not micro-, brews, studied computer science or some other such discipline at university, and—above all—is a fan of either baseball or football. And why is that? Because such a person understands statistics intuitively—and the great American political battle of the twenty-first century will be led by the followers of Strabo, not Pyrrho.

Each of those two men were Greeks: the one, a geographer, the other a philosopher—the latter often credited with being one of the first “Westerners” to visit India. “Nothing really exists,” Pyrrho reportedly held, “but human life is governed by convention”—a philosophy very like that of the current American “cultural left,” governed as it is by the notion, as put by American literary critic Stanley Fish, that “norms and standards and rules … are in every instance a function or extension of history, convention, and local practice.” Arguably, most of the “political” work of the American academy over the past several generations has been done under that rubric: as Fish and others have admitted in recent years, it’s only by acceding to some version of that doctrine that anyone can work as an American academic in the humanities these days.

Yet while “official” leftism has prospered in the academy under a Pyrrhonian rose, in the meantime enterprises like fantasy football and above all, sabermetrics, have expanded as a matter of “entertainment.” But what an odd form of relaxation! It’s an bizarre kind of escapism that requires a familiarity with both acronyms and the formulas used to compute them: WAR, OPS, DIPS, and above all (with a nod to Greek antecedents), the “Pythagorean expectation.” Yet the work on these matters has, mainly, been undertaken as a purely amateur endeavor—Bill James spent decades putting out his baseball work without any remuneration, until finally being hired latterly by the Boston Red Sox in 2003 (the same year that Michael Lewis published Moneyball, a book about how the Oakland A’s were using methods pioneered by James and his disciples). Still, all of these various methods of computing the value of both a player and a team have a perhaps-unintended effect: that of training the mind in the principle of Greek geographer, Strabo.

“It is proper to derive our explanations from things which are obvious,” Strabo wrote two thousand years ago, in a line that would later be adopted by the Englishman who constructed geology, Charles Lyell. In Lyell’s Principles of Geology (which largely founded the field) Lyell held—in contrast to the mysteriousness of Pyrrho—that the causes of things are likely to those already around us, and not due to unique, unrepeatable events. Similarly, sabermetricians—as opposed to the old-school scouts depicted in the film version of Moneyball—judge players based on their performance on the field, not on their nebulous “promise” or “intangibles.” (In Moneyball scouts were said to judge players on such qualities as the relative attractiveness of their girlfriends, which was said to signify the player’s own confidence in his ability.) Sabermetricians disregard such “methods” of analysis in favor of examination of the acts performed by the player as recorded by statistics.

Why, however, would that methodological commitment lead sabermetricians to be politically “liberal”—or for that matter, why would it lead in a political direction at all? The answer to the latter question is, I suspect, inevitable: sabermetrics, after all, is a discipline well-suited for the purpose of discovering how to run a professional sports team—and in its broadest sense, managing organizations simply is what “politics” is. The Greek philosopher Aristotle, for that reason, defined politics as a “practical science”—as the discipline of organizing human beings for particular purposes. It seems inevitable then that at least some people who have spent time wondering about, say, how to organize a baseball team most effectively might turn their imaginations towards some other end.

Still, even were that so, why “liberalism,” however that is defined, as opposed to some other kind political philosophy? Going by anecdotal evidence, after all, the most popular such doctrine among sports fans might be libertarianism. Yet, beside the fact that libertarianism is the philosophy of twelve-year-old boys (not necessarily a knockdown argument against its success), it seems to me that anyone following the methods of sabermetrics will be led towards positions usually called “liberal” in today’s America because from that sabermetrical, Strabonian perspective, certain key features of the American system will nearly instantly jump out.

The first of those features will be that, as it now stands, the American system is designed in a fashion contrary to the first principle of sabermetrical analysis: the Pythagorean expectation. As Charles Hofacker described it in a 1983 article for Baseball Analyst, the “Pythagorean equation was devised by Bill James to predict winning percentage from … the critical difference between runs that [a team] scores and runs that it allows.” By comparing these numbers—the ratio of a team’s runs scored and runs allowed versus the team’s actual winning percentage—James found that a rough approximation of a team’s real value could be determined: generally, a large difference between those two sets of numbers means that something fluky is happening.

If a team scores a lot of runs while also preventing its opponents from scoring, in other words, and yet somehow isn’t winning as many games as those numbers would suggest, then that suggests that that team is either tremendously unlucky or there is some hidden factor preventing success. Maybe, for instance, that team is scoring most of its runs at home because its home field is particularly friendly to the type of hitters the team has … and so forth. A disparity between runs scored/runs allowed and actual winning percentage, in short, compels further investigation.

Weirdly however the American system regularly produces similar disparities—and yet while, in the case of a baseball team, that would set off alerts for a sabermetrician, no such alarms are set off in the case of the so-called “official” American left, which apparently has resigned itself to the seemingly inevitable. In fact, instead of being the subject of curiosity and even alarm, many of the features of the U.S. constitution, like the Senate and the Electoral College—not to speak of the Supreme Court itself—are expressly designed to thwart what Chief Justice Earl Warren said was “the clear and strong command of our Constitution’s Equal Protection Clause”: the idea that “Legislators represent people … [and] are elected by voters, not farms or cities or economic interests.” Whereas a professional baseball team, in the post-James era, would be remiss if it were to ignore a difference between its ratio of runs scored and allowed and its games won and lost, under the American political system the difference between the will of the electorate as expressed by votes cast and the actual results of that system as expressed by legislation passed is not only ignored, but actively encouraged.

“The existence of the United States Senate”—for example wrote Justice Harlan in his dissent to the 1962 case of Baker v. Carr—“is proof enough” that “those who have the responsibility for devising a system of representation may permissibly consider that factors other than bare numbers should be taken into account.” That is, the existence of the U.S. Senate, which sends two senators from each state regardless of each state’s population, is support enough for those who believe—as the American “cultural left” does—in the importance of factors like “history” or the like in political decisions, as opposed to, say, the will of the American voters as expressed by the tally of all American votes.

As Jonathan Cohn remarked in The New Republic not long ago, in the Senate “predominantly rural, thinly populated states like Arkansas and North Dakota have the exact same representation as more urban, densely populated states like California and New York”—meaning that voters in those rural states have more effective political power than voters in the urban ones do. In sum, the Senate is, as Cohn says, one of Constitution’s “levers for thwarting the majority.” Or to put it in sabermetrical terms, it is a means of hiding a severe disconnect in America’s Pythagorean expectation.

Some will defend that disconnect, as Justice Harlan did over fifty years ago, on the grounds of terms familiar to the “cultural left”: that of “history” and “local practice” and so forth. In other words, that is how the Constitution originally constructed the American state. Yet, attempting (in Cohn’s words) to “prevent majorities from having the power to determine election outcomes” is a dangerous undertaking; as the Atlantic’s Ta Nehisi-Coates wrote recently about certain actions taken by the Republican party designed to discourage voting, to “see the only other major political party in the country effectively giving up on convincing voters, and instead embarking on a strategy of disenfranchisement, is a bad sign for American democracy.” In baseball, the sabermetricians know, a team with a high difference between its “Pythagorean expectation” and its win-loss record will usually “snap back” to the mean. In politics, as everyone since before Aristotle has known, such a “snap back” is usually a bit more costly than, say, the price of a new pitcher—which is to say that, if you see any American revolutionaries around you right now, he or she is likely wearing, not a poncho or a black turtleneck, but an Oakland A’s hat.        

Mr. Tatum’s Razor

Arise, awake, and learn by approaching the exalted ones, for that path is sharp as a razor’s edge, impassable, and hard to go by, say the wise.
Katha Upanishad 1-III-14

Plurality is never to be posited without necessity.
—William of Ockham. Questions on the Sentences of Peter Lombard. (1318).

“The United States had lost. And won.” So recently wrote the former European and present naturalized American John Cassidy when Team USA advanced out of the “group stage” in the World Cup soccer tournament despite losing its last game of that stage. (To Germany, 1-0.) So even though they got beat, it’s the first time the U.S. has advanced out of the group stage in back-to-back Cups. But while the moment represented a breakthrough by the team, Cassidy warns it hasn’t been accompanied by a breakthrough in the fandom: “don’t ask [Americans] to explain how goal difference works,” he advises. He’s right that most are unfamiliar with the rule that allowed the Americans to play on, but he’s wrong if he’s implying that Americans aren’t capable of understanding it: the “sabermetric revolution”—the statistical study of the National Pastime—begins by recognizing the same principle that also backs goal difference. Yet while thus there’s precedent to think that Americans could understand goal difference—and, maybe, accept soccer as a big-time sport—there’s one reason to think America can’t: the American political system. And, though that might sound wacky enough for any one piece of writing, golf—a sport equally at home in America and Europe—is ideally suited to explain why.

Goal difference is a procedure that applies at the opening stage of the World Cup, which is organized differently than other large sporting tournaments. The NCAA college basketball tournament, for instance, is an “elimination” type tournament: sorts each of its 64 teams into four different brackets, then seeds each bracket from a #1 ranked team to a #16 ranked team. Each team then plays the team on the opposite side of the bracket, so that the the best team plays the lowest ranked team, and so on. Winning allows a team to continue; losing sends that team home, which is what makes it an “elimination” type of tournament.

The World Cup also breaks its entrants into smaller groups, and for the same reason—so that the best teams don’t play each other too early—but that’s where the similarities end. The beginning, “group” stage of the tournament is conducted in a round-robin format: each team in a group plays every other team in a group. Two teams from each group then continue to the next part of the competition.

Because the group stage is played under a round-robin, rather than elimination, structure losing a game doesn’t result necessarily in exiting the tournament—which is not only how the United States was not eliminated from competition by losing to Germany, but also is what makes the World Cup un-American in Cassidy’s estimation. “Isn’t cheering a team of losers,” Cassidy writes, “an un-American activity?” But there’s at least two questionable ideas packed into this sentence: one is that a team that has lost—a “loser”—is devoid of athletic ability, or what we might call value, and secondly that “losers” are un-American, or anyway that cheering for them is.

The round-robin format of the group stage after all just means that the tournament does not think a loss of a game necessarily reveals anything definitive about the value of a team: only a team’s record against all the other teams in its group does that. If the tournament is still unsure about the value of a team—that is, if two or more teams are tied for best, or second-best (two teams advance) record—then the tournament also looks at other ways to determine value. That’s what “goal difference,” or differential, is: as Ken Boehlke put it on CBSports website (“Understanding FIFA World Cup Procedures”), goal difference is “found by simply subtracting a team’s goals against from its goals scored.” What that means is that by the way the World Cup reckons things, it’s not only important whether a team lost a close game, but it’s also important if that team wins a blow-out.

Goal difference was, as Cassidy says, the reason why the American team was able to be one of the two teams of each group to advance. It’s true that the Americans were tied by win-loss record with another team in their group, Portugal. But the Americans only lost to Germany by one goal, while earlier in the stage the Portuguese lost 4-0. That, combined with some other results, meant that the United States advanced and Portugal did not. What the World Cup understands, is that just winning games isn’t necessarily a marker of a team’s quality, or value: what also matters is how many goals a team allows, and scores.

Now, John Cassidy appears to think that this concept is entirely foreign to Americans, and maybe he’s right—except for any of the Americans who happen to have seen the movie Moneyball, which not only grossed over $75 million dollars in the United States and was nominated for six Oscars but also starred Brad Pitt. “What are you really worth?” was the film’s tagline, and in the speech that is the centerpiece of the movie, the character Peter Brand (played by Jonah Hill, another fairly well-known actor) says to his boss—general manager of the Oakland A’s Billy Beane (played by Pitt)—that “Your goal … should be to buy wins. And in order to buy wins, you need to buy runs.” And while Moneyball, the film, was released just a few years ago, the ideas that fuel it have been around since the 1970s.

To be sure, it’s hardly news that scoring points results in winning games—the key insight is that, as Graham MacAree put it on the website FanGraphs, it is “relatively easy to predict a team’s win-loss record using a simple formula,” a formula that was invented a man named Bill James in the 1970s. The formula resembled the classic Pythagorean Theorem that James called it the Pythagorean Expectation: what it expressed was that the ratio of a team’s past runs scored to runs allowed is a better predictor of future success (i.e., future wins and losses) than that team’s past ratio of wins to losses. What it meant was that, to quote MacAree again, “pure pythagorean expectancy is probably a better way of gauging a team than actual wins and losses.” Or to put it another way, knowing how many runs a team scored versus how many that team’s opponents scored is more valuable than knowing how many games it won.

What the Pythagorean Expectation model and the goal difference model do, then, concentrate focus on what is the foundational act of their respective sports: scoring goals and scoring runs. Conversely, both weaken attention on winning and losing. That might appear odd: isn’t the point of playing a game to win, not (just) to score? But what both these methods realize is that a focus on winning and losing, instead of scoring, is vulnerable to a particular statistical illusion called a Simpson’s Paradox.

As it happens, an episode of the television series Numb3rs used a comparison of the batting averages of Derek Jeter and David Justice in the middle 1990s to introduce the idea of what a Simpon’s Paradox is, which seems tailor-made for the purpose. Here is a table—a more accurate one than the television show used—that shows those averages during the 1995, 1996, and 1997 seasons:

1995

1996

1997

Totals

Derek Jeter

12/48

.250

183/582

.314

190/654

.291

385/1284

.300

David Justice

104/411

.253

45/140

.321

163/495

.329

312/1046

.298

Compare the year-by-year averages: Jeter, you will find, has a worse average than Justice in every year. Then compare the two players’ totals: Jeter actually has a slightly better average than Justice. A Simpson’s Paradox results, as the Stanford Encyclopedia of Philosophy puts it, a when the “structures that underlie” a set of facts “invalidate … arguments that many people, at least initially, take to be intuitively valid.” Or as the definition on Wikipedia describes it, a bit more elegantly, the paradox occurs when “appears that two sets of data separately support a certain hypothesis, but, when considered together, they support the opposite hypothesis.” In this case, if we consider the data year-by-year, it seems like Justice is a better hitter than Jeter—but when we consolidate all of the data, it supports the notion that Jeter is better than Justice.

There’s at least two ways we can think that the latter hypothesis is the more likely: one is the simple fact that 1995 was Derek Jeter’s first appearance in the major leagues, because he was born in 1974, whereas Justice was already a veteran player who was born eight years earlier. Jeter is younger. Quite obviously then from the perspective of a general manager looking at these numbers after the 1997 season, buying Jeter is a better move because more of Jeter’s career is available to be bought: since Jeter is only retiring this year (2014), that means that in 1997 there was 17 seasons of Derek Jeter available, whereas since David Justice retired in 2002, there were only 5 more seasons of David Justice available. Of course, none of that information would have been available in 1997—and injuries are always possible—but given the age difference it would have been safe to say that, assuming you valued each player relatively equally on the field, Jeter was still more valuable. In one sense though that exercise isn’t very helpful, because it doesn’t address just what Simpson’s Paradox has to do with thinking about Derek Jeter.

In another though it has everything to do with it. The only question that matters about a baseball player, says Bill James, is “If you were trying to win a pennant, how badly would you want this guy?” Or in other words, don’t be hypnotized by statistics. It sounds like a simple enough lesson, which in a way it is—but it’s terribly difficult to put into practice. In this case, it is terribly easy to become mystified by the two players’ batting averages, but what James might advise is to look at the events that these numbers represent: instead of looking at the averages, look at the components of those averages.

 What looking at the raw numbers reveals is that Jeter had more hits than Justice over the three seasons: 385 to 312. That difference matters because—unlike the difference in batting average over the same period, which is only a couple of points—78 more hits is a lot more hits, and as James wrote in The New Bill James Historical Baseball Abstract, the “essential measure of a hitter’s success is how many runs he has created.” Further, without getting too far into the math of it, smart people who’ve studied baseball have found that a single hit is worth nearly half a run. (Joe Posnanski, former Senior Writer at Sports Illustrated and one of those people, has a nice post summarizing the point called “Trading Walks For Hits” at joeposnanski.com.) What that would mean is that Jeter may have created more runs than Justice did over the same period: depending on the particular method used, perhaps more than twenty more runs. And since runs create wins (that conversion being calculated as about ten runs to the win) that implies that Jeter likely helped his team to two more wins than Justice did over the same period.

To really know which player contributed more to winning would require a lot more investigation than that, but the point is that following James’ method leads towards the primary events that generate outcomes, and away from the illusions that a focus on outcomes foster. Wins are generated by runs, so focus on runs; runs are created by hits, so focus on hits. So too does goal difference mean that while the World Cup recognizes wins, it also recognizes the events—goals—that produce wins. Put that way, it sounds quite commonsensical—but in fact James was lucky in a sense to stumble upon it: because there are two ways to organize sport, and only one of those types is amenable to this kind of analysis. It was fortunate, both to James and to baseball, that he was a fan of a game that could easily be analyzed this way.

In sports like baseball, there’s a fairly predictable relationship between scoring and winning. In other sports though there isn’t, and that’s why golf is very important. It is a sport that under one way to play it the sport is very amenable to means of analysis like the World Cup’s goal difference or Bill James’ Pythagorean Expectation. Golf however also has another way to play, and that way does not have a predictable relationship between scores and wins. What the evidence will show is that having two different forms to the sport isn’t due to a mistake on the part of the designers’: it’s that each form of the game was designed for a different purpose. And what that will show, I will argue, is that whether a game has one sort of scoring system or the other predicts what the purpose of the design is—and vice versa.

On the PGA Tour, the standard tournament consists of four rounds, or 72 holes, at the end of which the players who have made it that far add up their scores—their number of strokes—and the lowest one wins. In the Rules of Golf, this format is known as “stroke play.” That’s what makes it like the group stage of the World Cup or Bill James’ conception of baseball: play begins, the players attempt some action that produces a “score” (however that is determined), and at the end of play each of those scoring events is added together and compared. The player or team that produces the right amount of these “scoring events” is then declared the winner. In short, under the rules of stroke play—just as to the World Cup’s group stage, or to Bill James’ notion of baseball—there is a direct relationship between the elemental act of the game, scoring, and winning.

But the format most often used by golf’s professionals is not the only method available: many amateur tournaments, such as the United States Amateur, use the rules format known as “match play.” Under this format, the winner of the contest is not necessarily the player who shoots the lowest overall score, as in stroke play. Instead, as John Van der Borght has put the matter on the website of the United States Golf Association, the official rule-making body of the sport, in match play the “winner is the player who wins the most holes.” It’s a seemingly minor difference—but in fact it creates such a difference that match play is virtually a different sport than stroke play.

Consider, for instance, this year’s Accenture Match Play tournament, held at the Dove Mountain course near Tucson, Arizona. (The only tournament on the PGA Tour to be held under match play rules.)  “Factoring in conceded putts,” wrote Doug Ferguson of the Associated Press earlier this season, “Pablo Larrazabal shot a 68 and was on his way back to Spain,” while “Ernie Els shot 75 and has a tee time at Dove Mountain on Thursday.” In other words, Larrazabal lost his match and Els won his, even though Larrazabal played better than Els. Intuitively, Larrazabal was the better player at this tournament, which would lead to thinking Larrazabal continued to play and Els exited—but the actual results conclude the reverse. It’s a Simpson’s Paradox, and unlike stroke play—which cannot generate Simpson’s Paradoxes—match play produces them all the time. That’s why match play golf does not resemble baseball or soccer, as golf does in stroke play, but instead a sport whose most prestigious tournament—Wimbledon—just concluded. And tennis is the High Church of Simpson’s Paradox.

Simpson’s Paradox, for example, is why many people don’t think Roger Federer is not the greatest tennis player who ever lived. That’s because the Swiss has won 17 major championships, a record, among other career accomplishments. “But,” as Michael Steinberger wrote in the New York Times not long ago, “he has a losing record against [Rafael] Nadal, and a lopsided one at that.” (Nadal leads 23-10.) “How can you be considered the greatest player ever if you were arguably not even the best player of your own era?” Steinberger asks. Heroically, Steinberger attempts to answer that question in favor of Federer—the piece is a marvel of argumentation, where the writer sets up a seemingly-insurmountable rhetorical burden, the aforementioned question, then seeks to overcome it. What’s interesting, though—and in several searches through the Internet I discovered many other pieces tackling more or less the same subject—neither Steinberger nor anyone else attempted what an anonymous blogger did in 2009.

He added up the points.

The blog is called SW19, which is the United Kingdom’s postal code for the district Wimbledon is in. The writer, “Rahul,” is obviously young—he (or she) stopped posting in December of 2009, because of the pressures of college—but yet Rahul did something I have not seen any other tennis journalist attempt: in a post called “Nadal vs. Federer: A Pythagorean Perspective,” Rahul broke down “the Federer/Nadal rivalry on a point-by-point basis, just to see if it really is as lopsided as one would expect.” That is, given Nadal’s dominant win-loss record, the expectation would be that Nadal must win an equally-impressive number of points from Federer.

By July of 2009—the time of publication—Nadal led Federer by 13-7 in terms of their head-to-head record, a 65 percent winning percentage. The two champions had played 4,394 total points across those 20 matches—one of them the 2008 French Open, won by Nadal in straight sets, 6-1, 6-3, 6-0. (Nadal has, as of 2014, now won 9 French Opens, a majors record, while Federer has only won the French once—the very next year after Nadal blew him off the court: 2009.) Now, if there was a straightforward relation between points and wins, Nadal’s percentage of those points ought to be at least somewhat similar to his winning percentage of those matches.

But what Rahul found was this: of the total points, Nadal had won 2,221 and Federer 2,173. Nadal had only beaten Federer on 48 points, total, over their careers to that point, including the smackdown at Roland Garros in 2008. It’s less than one percent of all the points. And if you took that match out of the total, Nadal had won a grand total of eight more points than Federer, out of over 4,000 points and 19 other matches. It is not 65 percent. It is not even 55 percent.

Still, it’s the final nugget that Rahul uncovered that is likely of the most relevance. In three of the twenty matches won by Nadal to that moment in their careers, Federer had actually won more points: two matches in 2006, in Dubai and Rome, and once at the Australian Open in 2009. As Rahul points out, “if Federer had won those three matches, the record would sit at 10-10”—and at least in 2009, nobody would have been talking about Federer’s Achilles heel. I don’t know what the current Pythagorean record stands between the two players at the moment, but it’s interesting that nobody has taken up this detail when discussing Federer’s greatness—though nub of it has been taken up as a serious topic concerning tennis as a whole.

In January in The Atlantic, Professor Ryan Rodenberg of the Florida State University noted that not only did Federer have the 17 Grand Slam titles and the 302 weeks ranked No. 1 in the world, but he also held another distinction: “the worst record among players active since 1990 in so-called ‘Simpson’s Paradox’ matches—those where the loser of the match wins more points than the winner.” Federer’s overall record in these matches is like that of his record against Nadal: not good. The Swiss is only 4-24.

To tennis aficionados, it’s a point that must appear irrelevant—at least, no one until Professor Rodenberg appears to have mentioned it online. To be sure, it does seem questionably relevant: Federer has played nearly 1200 matches professionally; 28 is a pittance. But Rodenberg, along with his co-authors, found that matches like the Isner-Mahut match, where the loser out-scored the winner, constituted “about 4.5 percent” of “61,000 men’s ATP and Grand Slam matches dating back to 1990.” That’s over 3,000 matches—and given that, in exactly zero soccer matches or baseball games over that time frame or any other time, did the losing side net more goals or plate more runs than the other, it at the least raises some questions.

How, after all, is it possible for one side of the net to win—despite losing more of the points? The answer, as Rodenberg puts it, is  “tennis’ decidedly unique scoring system.” In sports like baseball, sports psychologist Allen Fox wrote recently on for the website for the magazine Tennis, “score is cumulative throughout the contest … and whoever has the most points at the end wins.” Sports like tennis or match play golf are different however: in tennis, as Fox says, “[i]f you reach game point and win it, you get the entire game while your opponent gets nothing—all the points he or she won in the game are eliminated.” In the same fashion, once a hole is over in match play golf it doesn’t matter what either competitor scored on that hole: each total is struck out, and the match in effect begins again. What that in turn means is that certain points, certain scoring events, have more value than others: in golf, what matters is the stroke that takes a hole, just as in tennis what matters is the point that takes a game, or a set, or a match. Those points are more valuable than other points—a fact of tremendous importance.

It’s this scoring mechanism that is what allows tennis and match play golf to produce Simpson’s Paradox games: a system whereby the competition as a whole is divided into smaller competitions that function independently of the others. In order to get Simpson’s Paradox results, having a system like this is essential. The $64,000 question however is: just who would design a system like that, a system that can in effect punish a player who does the thing that defines the sport better than the other player more often than the player who doesn’t? It isn’t enough just to say that results like that are uncommon, because why allow that to happen at all? In virtually every other sport, after all, no result like these would ever come up. The only serious answer must be that tennis and match play golf were specifically designed to produce Simpson’s Paradoxes—but why? The only way to seek that answer, I’d say, is to search back through history.

The game we today call tennis in reality is correctly termed “lawn tennis,” which is why the formal name of the organization that sponsors the Wimbledon tournament is the “All England Lawn Tennis and Croquet Club.” The sport is properly called that in order to distinguish it from the older game known as “real tennis” or, in French, Jeu de Paume. Whereas our game of tennis, or lawn tennis, is generally played outdoors and on a single plane, Jeu de Paume is played indoors, in unique, non-standardized courts where strange bounces and funny angles are the norm. And while lawn tennis only came into existence in 1874, Jeu de Paume goes well back into the Middle Ages. “World titles in the sport were first competed in 1740,” as Rolf Potts noted in a piece about the game in the online magazine, The Smart Set, “and have continued to the present day, making Jeu de Paume men’s singles the oldest continuous championship event in sport.” Jeu de Paume, thus, is arguably the oldest sport in the world.

Aside from its antiquity, the game is also, and not unrelatedly, noted for its roots in the ancien regime: “Nearly all French royalty were familiar with the sport from the 13th century on,” as Rolf Potts notes. And not just French royalty: Henry VIII of England is regularly described as a great player by historians. These are not irrelevant facts, because the status of the players of Jeu de Paume in fact may be directly relevant to how tennis is scored today.

“When modern tennis,” writes Potts, “was simplified into its popular form in 1874, it appropriated the scoring system of the ancient French game.” So our game of tennis did not invent its own method of scoring; it merely lifted another game’s method. And that game’s method may be connected to the fact that it was played by aristocrats in the fact that so much about Jeu de Paume is connected to gambling.

“In October of 1532,” Potts reports, Henry VIII lost 50 pounds on tennis matches: “about a thousand times the sum most Englishmen earned in a week.” Anne Boleyn, Henry’s second wife, by some accounts “was betting on a tennis game when Henry’s men arrested her in May of 1536,” while others say that her husband received the news of her execution while he himself was playing a match. Two centuries earlier, in 1355, King John II of France had been recorded paying off a bet with “two lengths of Belgian cloth.” And in Rob Lake’s academic paper, “Real Tennis and the Civilising Process,” published in the academic journal Sport in History, Lake claims that “the game provided opportunities for nobles to engage in conspicuous consumption … through gambling displays.”

So much so, in fact, that Potts also reports that “some have speculated that tennis scoring was based on the gros denier coin, which was said to be worth 15 deniers.” Be that as it may, two facts stand out: the first is that the game’s “gradual slide into obscurity began when fixed games and gambling scandals sullied its reputation in the late 17th century,” and the second that “games are still regulated by a complicated handicapping system … so that each player begins the game with an equal expectation of winning.” So elaborate is that handicap system, in fact, that when Rolf Potts plays the first match of his life, against a club professional who is instructing him, he “was able to play a close game.” Gambling, in seems, was—as Potts says—“intrinsic to Jeu de Paume.” And since the sport still has a handicap system, which is essential to gambling, so it still is.

We can think about why that is by comparing Jeu de Paume to match play golf, which also has an early connection both to feudalism and gambling. As Michael Bohn records in Money Golf: 600 Years Of Bettin’ On Birdies, the “earliest record of a golf bet in Scotland was in 1503,” when on February 3 King James IV paid out 42 shillings to the Earl of Bothwell in “play at the golf.” And as John Paul Newport of the Wall Street Journal writes, “historically all the early recorded competitions—King James IV in 1503, for example, or the Duke of York, later King James II [of England], in 1681—were match play.” That is likely not a coincidence, because the link between the aristocracy, gambling, and match play is not difficult to explain.

In the first place, the link between the nobility and gambling is not difficult to understand since aristocrats were virtually the only people with both money and the time for sport—the opportunity, as a prosecutor would say. “With idle people amusement is the business of life,” as  the London magazine The Spectator noted in 1837; and King James’ bet with the Earl of Bothwell—42 shillings, or a little over £2—would have bought roughly six month’s work from a laborer during the sixteenth century. Not merely that: the aristocracy were practically the only people who, legally speaking, could gamble in during the Renaissance: as Nicholas Tosney notes in a paper for the University of Nevada, Las Vegas in 2010—“Gaming in Britain and America: Some Historical Comparisons”—gambling in England was outlawed in 1541 for anyone not at least a gentleman.

Yet just having the ability does not carry a case. It’s also required to be able to posit a reason—which of course isn’t all that hard to find when it comes to gambling. Aside from the obvious financial inducement, though, aristocratic men had something extra pushing them toward gaming. As the same 1837 Spectator article noted, gambling was widely thought to be “a necessary accomplishment of a young man in fashionable circles.” After all, what better way to demonstrate belonging to the upper classes by that form of conspicuous consumption that buys—nothing? The literature on the subject is so extensive as to not need bothering with trolling out in its entirety: nobles had both the means and the motive to gamble, so it therefore seems reasonable to suppose that a game adopted by gamblers would be ideal for gambling.

And examined closely, match play does have such features. Gambling after all would best explain why match play consists of what John Van der Borght calls “18 one-hole contests.” According to John Paul Newport, that’s so “an awful hole here or there doesn’t spoil the day”—but a better explanation is likely because doing things that way allows the previous hole’s loser to bet again. Multiplying contests obviously increases the opportunity to bet—and thus for a sucker to lose more. And that’s why it is significant that the match play format should have a link to the nobility and gambling: because it helps to demonstrate that the two formats of golf are not just different versions of the same game, but in fact have two different purposes—purposes that are so different they are virtually different sports.

That difference in purpose is likely why, as Newport observes, it isn’t “until the mid-18th century are there records of stroke-play competitions.” One reason for the invention of the stroke play format was, Newport tells us, “to make tournaments involving larger numbers of golfers feasible.” The writer for the Wall Street Journal—make of that connection what you will—presents the new format as simply demanded by the increasing number of players (a sign, though Newport does not mention it, that the game was spreading beyond the boundaries of the nobility). But in reality stroke play was invented to serve a different purpose than match play, a purpose even now recognized by the United States Golf Association.

About the best definition of the purpose of stroke play—and thus, it’s difference from match play—can be found in the reply Sandy Tatum, then the executive director of the United States Golf Association, gave to a reporter at the 1974 U.S. Open at Winged Foot. That tournament would become known as “the Massacre at Winged Foot,” because even the winner, Hale Irwin, finished over par (+7). So when the extent of how tough the golf course was playing became obvious, one reporter asked Tatum if the USGA was trying to embarrass the best players in the world. What Tatum said in reply to the reporter is about as succinct an explanation of the purpose of the U.S. Open, and stroke play, as is possible.

“Our objective is not to humiliate the best golfers in the world,” Tatum said in response to the question: “It’s to identify them.”And identifying the greatest golfers is still the objective of the USGA: That’s why, when Newport went to interview the current executive director of the USGA, Mike Davis, about the difference between stroke play and match play for his article, Davis said “If all you are trying to do is determine who is playing the best over a relatively short period of time, [then] 72 holes of stroke play is more equitable [than match play].” The position of the USGA is clear: if the purpose of the competition is to “identify,” as Tatum said, or “determine,” as Davis said, the best player, then the best format for that purpose is stroke play, and not match play.

One reason why the USGA can know this is that it is obviously not in the interest of gamblers to identify themselves as great players. Consider, for instance, a photo printed along with Golf magazine’s excerpt of Kevin Cook’s book, Titanic Thompson: The Man Who Bet On Everything. The photo depicts one Alvin “Titanic Thompson” Thomas, swinging a club late in life. Born in 1892, Cook says that “Titanic was the last great player to ignore tournament golf”—or stroke play golf, anyway. Not because he couldn’t: Cook says that Byron Nelson, who among other exploits won 11 tournaments on the PGA Tour in a row in the summer of 1945, and thus seems an excellent judge, said “there was ‘no question’ that Titanic could have excelled on Tour, ‘but he didn’t have to.’”—because Titanic “‘was at a higher level, playing for $25,000 a nine while we [Tour players] played for $150.’” Thomas, or Thompson was the greatest of golf gamblers; hence the caption of the photo: “Few golf photos exist of Thompson,” it reads, “for obvious reasons.” Being easily identifiable as a great golfer, after all, is not of much use to a gambler—so a format designed for gambling would have little incentive to “out” better players.

To put it simply then the game of tennis today has the structure that it does today because it descends from a different game—a game whose intent was not to identify the best player, but rather to enable the best player to maximize his profits. Where the example of tennis, or match play golf, should then lead specifically, is to the hypothesis that any point-driven competition that has non-continuous scoring—which is to say divided into sub-competitions whose results are independent of all the others—and where some parts of the competition have a higher value than other parts, ought to raise doubt, at the least, as to the validity of the value of the competition’s results.

The nature of such structures make it elementary to conceal precisely that which the structure is ostensibly designed to reveal: the ultimate value that underlies the whole operation, whether that is the athletic ability of an individual or a team—or something else entirely. Where goal difference and Pythagorean Expectation and stroke play all consolidate scores in order to get at the true value those scoring events represent, tennis’ method and match play divide scores to obscure value.

That’s why match play is so appealing to golf gamblers—it allows the skilled player to hide his talent, and thus maximize income. Conversely, that’s why the U.S. Open uses stroke play: because the USGA wants to reveal the best player. Some formats of play lend themselves to one purpose or the other—and what that leads to is a kind of thought experiment. If the notion advanced here is correct, then there are two kinds of ways a given sport may score itself, and concurrently two different purposes those different means of scoring may serve. If a sport is more like golf’s match play than it is like golf’s stroke play, in short, it can be predicted that it’s likely to be vulnerable to gamblers.

As it happens, it’s widely believed that professional tennis has a gambling problem. “Everyone knows,” said last year’s Wimbledon winner, Andy Murray, “that match-fixing takes place in professional tennis”—all the way back in October of 2007. A story in the Guardian that year summed up the scandal that broke over the sport that August, which began when the world’s largest online betting exchange, Betfair, reported “irregular gambling patterns” on a match between Nikolay Davydenko—once ranked as high as #3 in the world—and Martin Arguello—at the time ranked #87—at the Polish Open. At the end of September 2007, Novak Djokovic—this year’s Wimbledon champion—said “he was offered £10,000 to lose in a tournament in St. Petersburg” the previous year. In late October of 2007—after Murray’s comment to the press—“French undercover police” were “invited into the Paris Masters amid suspicions of match-fixing in tennis.” But what Simpson’s Paradox would tell the police—or tennis’ governing bodies—is that looking for fixed matches is exactly what the cunning gambler would want the authorities to do.

“The appeal of tennis to gamblers,” wrote Louisa Thomas for Grantland earlier this year, “makes total sense” for a number of reasons. One is that “tennis is played everywhere, all the time”: there’s likely a tournament, somewhere in the world, any time anyone feels the urge to bet, unlike a lot of other sports. That ubiquity makes tennis vulnerable to crooked gamblers: as Thomas observes, there are “tens of thousands of professional matches, hundreds of thousands of games, millions of points”—a spread of numbers so wide that the volume alone discourages detection by any authority.

Another reason why tennis should be appealing to gamblers is that “bettors can make wagers during play itself”: you can get online while watching a match and lay down some action. As The Australian reported this year—when a young man was arrested at the Australian Open with an electronic device designed to transmit scores quicker than the official tournament did—there are “websites that allow bets to be laid on individual events such as whether a player faults on serve.” Now, essentially the scam that the man at the Australian Open was arrested for is the same con as depicted in the film The Sting, which itself tells something of a tale about the sport.

But the real scandal of tennis, though perhaps Thomas does not emphasize this enough, is that it is vulnerable to manipulation simply because  “broken into discrete points, games, sets, matches, and tournaments.” It’s a point, however, that one of Professor Rodenberg’s students understands.

What Benjamin Wright—a graduate student in Rodenberg’s department at the Florida State University—knows is that because of tennis’ scoring system, the sport doesn’t need to have crooked players throwing matches to be corrupt. “Governing bodies must be aware,” says Wright—in his master’s thesis, “Best of N Contests: Implications of Simpson’s Paradox in Tennis”—“that since tennis does not use a running score like other sports intentionally losing points, games, and sets is plausible since such acts may not have long-term implications.” In other words, “a player would not need to lose an entire match intentionally.” All that’s necessary—especially since it’s possible to bet on tennis in real time—is for a player to lose “points during specific periods of a match.” All a gambler needs to know, that is, is that a player will throw the second point of the fourth game of the second set—knowledge that is nearly undetectable because under the rules of the game it is entirely possible for a player to shave points without risking a loss.

“Who’s to say,” says Thomas about the undetectability of corruption, a player is “not just having a really rotten day?” But what Thomas doesn’t appear to grasp fully is that the actual disgrace is the question of how a player could be accused of corruption if she has won her match? That’s the real scandal: how even apparently well-trained journalists can miss the point. “Although tennis is perceived as a genteel sport,” wrote Joe Drape of the New York Times about the Davydenko scandal in 2007, “it has always confronted the same problem as other contests based on individual competition like boxing.” That problem, Drape said, is that a “fixer needs to sway only one person, and taking a dive is hard to detect.” Drape is, to be sure, right about what he says—so far as that goes. But Drape does not point out—I think likely because he does not understand—why “taking a dive” is so difficult to unmask in tennis: because it’s possible to throw a point—or a game, or a set—without affecting the outcome of the match.

Now, this is so obviously crooked that the gall of it is simply breathtaking. Yet the reality is simply that, aside from a few very naive people who could probably stand to have a few dollars taken from them by shrewd, and likely Russian, mobsters, no one really loses much by this arrangement. There are far worse scams in the world, and people who bet on tennis are probably not very sympathetic victims. But what knowing what we now know about tennis, and match play golf, allows us to now do is to evaluate all competitions: any contest which has the characteristics we have isolated (non-cumulative scoring, unequal points) will necessarily produce Simpson’s Paradox results. Further, any contest that produces Simpson’s Paradox results does so by design: there’s no reason to add an extra layer of complexity to a competition unless it’s in somebody’s interests. Lastly, since the only reason to add that layer of complexity, and thus produce Simpson’s Paradoxes, is to conceal value, it’s more likely than not that those interests are not entirely legitimate.

Now, it so happens that there is a competition that has those two characteristics and has demonstrably produced at least one paradoxical result: one where the “winner” lost and the “loser” won.

That competition is called an American presidential election.

The Weight We Must Obey

The weight of this sad time we must obey,
Speak what we feel, not what we ought to say.
King Lear V,iii

There’s a scene in the film Caddyshack that at first glance seems like a mere throwaway one-liner, but that rather neatly sums up what I’m going to call the “Kirby Puckett” problem. Ted Knight’s Judge Smails character asks Chevy Chase’s Ty Webb character about how if Webb doesn’t, as he claims, keep score, then how does he measure himself against other golfers? “By height,” Webb replies. It’s a witty enough reply on its own of course. But it also (and perhaps there’s a greater humor to be found here) raises a rather profound question: is there a way to know someone is a great athlete—aside from their production on the field? Or, to put the point another way, what do bodies tell us?

I call this the “Kirby Puckett” problem because of something Bill James, the noted sabermetrician and former , once wrote in his New Historical Baseball Abstract: “Kirby Puckett,” James observed, “once said that his fantasy was to have a body like Glenn Braggs’.” Never heard of Glenn Braggs? Well, that’s James’ point: Glenn Braggs looked like a great ballplayer—“slender, fast, very graceful”—but Kirby Puckett was a great ballplayer: a first-ballot Hall of Famer, in fact. Yet despite his own greatness—and surely Kirby Puckett was aware he was, by any measure, a better player than Glenn Braggs—Puckett could not help but wish he appeared “more like” the great player he, in reality, was.

What we can conclude from this is that a) we all (or most of us) have an idea of what athletes look like, and b) that it’s extremely disturbing when that idea is called into question, even when you yourself are a great athlete.
This isn’t a new problem, to be sure. It’s the subject, for instance, of Moneyball, the book (and the movie) about how the Oakland A’s, and particularly their general manager Billy Beane, began to apply statistical analysis to baseball. “Some scouts,” wrote Michael Lewis in that book, about the difference between the A’s old and the new ways of doing things, “still believed they could tell by the structure of a young man’s face not only his character but his future in pro ball.” What Moneyball is about is how Beane and his staff learned to ignore what their eyes told them, and judge their players solely on the numbers.

Or in other words, to predict future production only by past production, instead of by what appearances appeared to promise. Now, fairly obviously that doesn’t mean that coaches and general managers of every sport need to ignore their players’ appearances when evaluating their future value. Indisputably, many different sports have an ideal body. Jockeys, of course, are small men, whereas football players are large ones. Basketball players are large, too, but in a different way: taller and not as bulky. Runners and bicyclists have yet a different shape. Pretty clearly, completely ignoring those factors would lead any talent judge far astray quickly.

Still, the variety of successful body types in a given sport might be broader than we might imagine—and that variety might be broader yet depending on the sport in question. Golf for example might be a sport with a particularly broad range of potentially successful bodies. Roughly speaking, golfers of almost any body type have been major champions.

“Bantam” Ben Hogan for example, greatest of ballstrikers, stood 5’7” and weighed about 135 pounds during his prime, and going farther back Harry Vardon, who invented the grip used almost universally today and won the British Open six times, stood 5’9” and weighed about 155 pounds. But alternately, Jack Nicklaus was known as “Fat Jack” when he first came out on tour—a nickname that tells its own story—and long before then Harry Vardon had competed against Ted Ray, who won two majors of his own (the 1912 British and the 1920 U.S. Opens)—and was described by his contemporaries as “hefty.” This is not even to bring up, say, John Daly.

The mere existence of John Daly, however, isn’t strong enough to expand our idea of what constitutes an athlete’s body. Golfers like Daly and the rest don’t suggest that the overweight can be surprisingly athletic; instead, they provoke the question of whether golf is a sport at all. “Is Tiger Woods proof that golf is a sport, or is John Daly confirmation to the contrary?” asks a post on Popular Science’s website entitled “Is Golf a Sport?” There’s even a Facebook page entitled “Golf Is Not a Sport.”

Facebook pages like the above confirm just how difficult it is to overcome our idealized notions of what athletes are. It’s to the point that if somebody, no matter how skillful his efforts, doesn’t appear athletic, then we are more likely to narrow our definition of athletic acts rather than expand our definition of athletic bodies. Thus, Kirby Puckett had trouble thinking of himself as an athlete, despite that he excelled in a sport that virtually anyone will define as one.

Where that conclusion could (and, to some minds, should) lead us is to the notion that a great deal of what we think of as “natural” is, in fact, “cultural”—that favorite thesis of the academic Left in the United States, the American liberal arts professors proclaiming the good news that culture trumps nature. One particular subspecies of the gens is the supposedly expanding (aaannnddd rimshot) field called by its proponents “Fat Studies,” which (according to Elizabeth Kolbert of The New Yorker) holds that “weight is not a dietary issue but a political one.” What these academics think, in other words, is that we are too much the captives of our own ideas of what constitutes a proper body.

In a narrow (or, anti-wide) sense, that is true: even Kirby Puckett was surprised that he, Kirby Puckett, could do Kirby Puckett-like things while looking like Kirby Puckett. To the academics involved in “Fat Studies” his reaction might be a sign of “fatphobia, the fear and hatred of fatness and fat people.” It’s the view of Kirby Puckett, that is, as self-hater; one researcher, it seems, has compared “fat prejudice … to anti-semitism.” In “a social context in which fat hatred is endemic,” this line of thinking might go, even people who achieve great success with the bodies they have can’t imagine that success without the bodies that culture tells them ought to be attached to it.

What this line of work might then lead us to is the conclusion that the physical dimensions of a player matter very little. That would make the success of each athlete largely independent (or not) of physical advantage—and thereby demonstrate that thousands of coaches everywhere would, at least in golf, be able to justify asserting that success is due to the “will to succeed” rather than a random roll of the genetic dice. It might mean that nations looking (in expectation perhaps of the next Summer Olympics, where golf will be a medal sport) to achieve success in golf—like, for instance, the Scandinavian nations whose youth athletics programs groom golfers, or nations like Russia or China with a large population but next to no national golf tradition—should look for young people with particular psychological characteristics rather than particular physical ones.

Yet whereas “Fat Studies” or the like might focus on Kirby Puckett’s self-image, Bill James instead focuses on Kirby Puckett’s body: the question James asks isn’t whether Puckett played well despite his bad self-image, bur rather whether Puckett played well because he actually had a good body for baseball. James asks whether “short, powerful, funny-looking kind of guy[s]” actually have an advantage when it comes to baseball, rather than the assumed advantage of height that naturally allows for a faster bat speed, among the other supposed advantages of height. “Long arms,” James speculates, “really do not help you when you’re hitting; short arms work better.” Maybe, in fact, “[c]ompressed power is more effective than diffuse power,” and James goes on to name a dozen or more baseball stars who all were built something like Honus Wagner, who stood 5’11” and weighed 200 pounds. Which, as it happens, was also about the stat line for Jack Nicklaus in his prime.

So too, as it happens, do a number of other golfers. For years the average height of a PGA Tour player was usually said to be 5’9”; these days, due to players like Dustin Johnson, that stat is most often said to be about 5’11”. Still—as remarked by the website Golf Today—“very tall yet successful golfers are a rarity.”I don’t have the Shotlink data—which has a record of every shot hit by a player on the PGA Tour since 2003—to support the idea that certain-sized guys of one sort or another had the natural advantage, though today it’s possible that it could easily be obtained. What’s interesting about even asking the question, however, is that it is a much-better-than-merely-theoretically-solvable problem—which significantly distinguishes it from that of the question that might be framed around our notions of what constitutes an athletic body, as might be done by the scholars of “Fat Studies.”

Even aside from the narrow issue of allocating athletic resources, however, there’s reason for distrusting those scholars. It’s true, to be sure, that Kirby Puckett’s reaction to being Kirby Puckett might lend some basis for thinking that a critical view of our notions of what bodies are is salutary in an age where our notions of what bodies are and should be are—to add to an already-frothy mix of elements—increasingly driven by an advertising industry that, in the guise of either actors or models, endlessly seeks the most attractive bodies.

It would easier to absorb such warnings, however, were there not evidence that obesity is not remaining constant, but rather a, so to say, growing problem. As Kolbert reports, the federal government’s Centers for Disease Control, which has for decades done measurements of American health, found that whereas in the early 1960s a quarter of Americans were overweight, now more than third are. And in 1994, their results got written up in the Journal of American Medicine: “If this was about tuberculosis,” Kolbert reports about one researcher, “it would be called an epidemic.” Over the decade previous to that report Americans had, collectively, gained over a billion pounds.

Even if “the fat … are subject to prejudice and even cruelty,” in other words, that doesn’t mean that being that way doesn’t pose serious health risks both for the individual and for society as a whole. The extra weight carried by Americans, Kolbert for instance observes, “costs the airlines a quarter of a billion dollars’ worth of jet fuel annually,” and this isn’t to speak of the long-term health care costs that attach themselves to the public pocketbook in nearly unimaginable ways. (Kolbert notes that, for example, doors to public buildings are now built to be fifteen, instead of twelve, feet wide.)

“Fat Studies” researchers might claim in other words, as Kolbert says, that by shattering our expectations of what a body ought to be so thoroughly fat people (they insist on the term, it seems) can shift from being “revolting … agents of abhorrence and disgust” to “‘revolting’ in a different way … in terms of overthrowing authority, rebelling, protesting, and rejecting.” They might insist that “corpulence carries a whole new weight [sic] as a subversive cultural practice.” In “contrast to the field’s claims about itself,” says Kolbert however, “fat studies ends up taking some remarkably conservative positions,” in part because it “effectively allies itself with McDonald’s and the rest of the processed-food industry, while opposing the sorts of groups that advocate better school-lunch programs and more public parks.” In taking such an extreme position, in short, “Fat Studies” ends up only strengthening the most reactionary policy tendencies.

As, logically speaking, it must. “To claim that some people are just meant to be fat is not quite the same as arguing that some people are just meant to be poor,” Kolbert observes, “but it comes uncomfortably close.” Similarly, to argue that our image of a successfully athletic body is tyrannical can, if not done carefully, be little different from the fanatical coach who insists that determination is the only thing separating his charges from championships. Maybe it’s true that success in golf, and other sports, is largely a matter of “will”—but if it is, wouldn’t it be better to be able to prove it? If it isn’t, though, that would certainly enable a more rational distribution of effort all the way around: from the players themselves (who might thereby seek another sport at an earlier age) to recruiters, from national sporting agencies to American universities, who would then know what they sought. Maybe, in other words, measuring golfers by height isn’t so ridiculous at all.