Home of the Brave

audentes Fortuna iuvat.
The Aeneid. Book X, line 284. 

American prosecutors in the last few decades have—Patrick Keefe recently noted in The New Yorker—come to use more and more “a type of deal, known as a deferred-prosecution agreement, in which the company would acknowledge wrongdoing, pay a fine, and pledge to improve its corporate culture,” rather than prosecuting either the company officers or the company itself for criminal acts. According to prosecutors, it seems, this is because “the problem with convicting a company was that it could have ‘collateral consequences’ that would be borne by employees, shareholders, and other innocent parties.” In other words, taking action against a corporation could put it out of business. Yet, declining to prosecute because of the possible consequences is an odd position for a prosecutor to take: “Normally a grand jury will indict a ham sandwich if a prosecutor asks it to,” former Virginia governor Chuck Robb, once a prosecutor himself, famously remarked. Prosecutors, in other words, aren’t usually known for their sensitivity to circumstance—so why the change in recent decades? The answer may lie, perhaps, in a knowledge of child-raising practices of the ancient European nobility—and the life of Galileo Galilei.

“In those days,” begins one of the stories described by Nicola Clarke in The Muslim Conquest of Iberia: Medieval Arabic Narratives, “the custom existed amongst the Goths that the sons and daughters of the nobles were brought up in the king’s palace.” Clarke is describing the tradition of “fosterage”: the custom, among the medieval aristocracy, of sending one’s children to be raised by another noble family while raising another such family’s children in turn. “It is not clear what … was the motive” for fostering children, according to Laurence Ginnell’s The Brehon Laws (from 1894), “but its practice, whether designed for that end or not, helped materially to strengthen the natural ties of kinship and sympathy which bound the chief and clan or the flaith and sept together.” In Ginnell’s telling, “a stronger affection oftentimes sprang up between persons standing in those relations than that between immediate relatives by birth.” One of the purposes of fostering, in other words, was to decrease the risk of conflict by ensuring that members of the ruling classes grew up together: it’s a lot harder to go to war, the thinking apparently went, when you are thinking of your potential opponent as the kid who skinned his knee that one time, instead of the fearsome leader of a gang of killers.

Perhaps one explanation for why prosecutors appear to be willing to go easier on corporate criminals these days than in the past might be because they share “natural ties”: they attended the same schools as those they are authorized to prosecute. Although statistics on the matter appear lacking, there’s reason to think that future white collar criminals and their (potential) prosecutors share the same “old school” ties more and more these days: there’s reason to think, in other words, that just as American law schools have seized a monopoly on the production of lawyers—Robert H. Jackson, who served from 1941 to 1954, was the last American Supreme Court Justice without a law degree—so too have America’s “selective” colleges seized a monopoly on the production of CEOs. “Just over 10% of the highest paid CEOs in America came from the Ivy League plus MIT and Stanford,” a Forbes article noted in 2012—a percentage higher than at any previous moment in American history. In other words, just as lawyers all come from the same schools these days, so too does upper management—producing the sorts of “natural ties” that not only lead to rethinking that cattle raid on your neighbor’s castle, but perhaps also any thoughts of subjecting Jaime Dimon to a “perp walk.” Yet as plausible an explanation as that might seem, it’s even more satisfying when it is combined with an incident in the life of the great astronomer.

In 1621, a Catholic priest named Scipio Chiaramonti published a book about a supernova that had occurred in 1572; the exploded star (as we now know it to have been) had been visible during daylight for several weeks in that year. The question for astronomers in that pre-Copernican time was whether the star had been one of the “fixed stars,” and thus existed beyond the moon, or whether it was closer to the earth than the moon: since—as James Franklin, from whose The Science of Conjecture: Evidence and Probability Before Pascal I take this account, notes—it was “the doctrine of the Aristotelians that there could be no change beyond the sphere of the moon,” a nova that far away would refute their theory. Chiaramonti’s book claimed that the measurements of 12 astronomers showed that the object was not as far as the moon—but Galileo pointed out that Chiaramonti’s work had, in effect, “cherrypicked”: he did not use all the data actually available, but merely used that which supported his thesis. Galileo’s argument, oddly enough, can also be applied to why American prosecutors aren’t pursuing financial crimes.

The point is supplied, Keefe tells us, by James Comey: the recent head of the FBI fired by President Trump. Before moving to Washington Comey was U.S. Attorney for the Southern District of New York, in which position he once called—Keefe informs us—some of the attorneys working for the Justice Department members of “the Chickenshit Club.” Comey’s point was that while a “perfect record of convictions and guilty pleas might signal simply that you’re a crackerjack attorney,” it might instead “mean that you’re taking only those cases you’re sure you’ll win.” To Comey’s mind, the marvelous winning records of those working under him was not a sign of not a guarantee of the ability of those attorneys, but instead a sign that his office was not pursuing enough cases. In other words, just as Chiaramonti chose only those data points that confirmed his thesis, the attorneys in Comey’s office were choosing only those cases they were sure they would win.

Yet, assuming that the decrease in financial prosecution is due to prosecutorial choice, why are prosecutors more likely, when it comes to financial crimes, to “cherrypick” today than they were a few decades ago? Keefe says this may be because “people who go to law school are risk-averse types”—but that begs the question of why today’s lawyers are more risk-averse than their predecessors. The answer, at least according to a former Yale professor, may be that they are more likely to cherrypick because they are the product of cherrypicking.

Such at least was the answer William Deresiewicz arrived at in 2014’s “Don’t Send Your Kid to the Ivy League”—the most downloaded article in the history of The New Republic. “Our system of elite education manufactures young people who are smart and talented and driven, yes,” Deresiewicz wrote  there—but, he wrote, it also produces students that are “anxious, timid, and lost.” Such students, the Yale faculty member wrote, had “little intellectual curiosity and a stunted sense of purpose”; they are “great at what they’re doing but [have] no idea why they’re doing it.” The question Deresiewicz wanted answered was, of course, why the students he saw in New Haven were this way; the answer he hit upon was that the students he saw were themselves the product of a cherrypicking process.

“So extreme are the admissions standards now,” Deresiewicz wrote in “Don’t,” “that kids who manage to get into elite colleges have, by definition, never experienced anything but success.” The “result,” he concluded, “is a violent aversion to risk.” Deresiewicz, in other words, is thinking systematically: in other words, it isn’t so much that prosecutors and white collar criminals share the same background that has made prosecutions so much less likely, but instead the fact that prosecutors have experienced a certain kind of winnowing process in the course of achieving their positions in life.

To most people, in other words, scarcity equals value: Harvard admits very few people, therefore Harvard must provide an excellent education. But what the Chiaramonti episode brings to light is the notion that what makes Harvard so great may not be that it provides an excellent education, but instead that it admits such “excellent” people in the first place: Harvard’s notably long list of excellent alumni may not be a result of what’s happening in the classroom, but instead in the admissions office. The usual understanding of education, in other words, takes the significant action of education to be what happens inside the school—but what Galileo’s statistical perspective says, instead, is that the important play may be what happens before the students even arrive.

The question that Deresiewicz’ work suggests, in turn, is that this very process may itself have unseen effects: efforts to make Harvard (along with other schools) more “exclusive”—and thus, ostensibly, provide a better education—may actually be making students worse off than they might otherwise be. Furthermore, Keefe’s work intimates that this insidious effect might not be limited to education; it may be causing invisible ripples throughout American society—ripples that may not be limited to the criminal justice system. If the same effects Keefe says are affecting lawyers is also affecting the future CEOs the prosecutors are not prosecuting, then perhaps CEOs are becoming less likely to pursue the legitimate risks that are the economic lifeblood of the nation—and perhaps more susceptible to pursuing illegitimate risks, of the sort that once landed CEOs in non-pinstriped suits. Accordingly, perhaps that old conservative bumper sticker really does have something to teach American academics—it’s just that what both sides ought perhaps to realize is that this relationship may be, at bottom, a mathematical one. That relation, you ask?

The “land of the free” because of “the brave.”

Stormy Weather

They can see no reasons …
—“I Don’t Like Mondays” 
The Boomtown Rats.
The Fine Art of Surfacing. 1979.


“Since Tuesday night,” John Cassidy wrote in The New Yorker this week, “there has been a lot of handwringing about how the media, with all its fancy analytics, failed to foresee Donald Trump’s victory”: as the New York Times headline had it, “How Data Failed Us in Calling an Election.” The failure of Nate Silver and other statistical analysts in the lead-up to Election Day rehearses, once again, a seemingly-ancient argument between what are now known as the sciences and the humanities—an argument sometimes held to be as old as the moment when Herodotus (the “Father of History”) asserted that his object in telling the story of the Greco-Persian Wars of 2500 years ago was “to set forth the reasons why [the Greeks and Persians] wage war on each other.” In other words, Herodotus thought that, to investigate war, it was necessary to understand the motives of the people who fought it—just as Cassidy says the failure of the press to get it right about this election was, Cassidy says, “a failure of analysis, rather than of observation.” The argument both Herodotus and Cassidy are making is the seemingly unanswerable one that it is the interpretation of the evidence, rather than the evidence itself, that is significant—a position that seems inarguable so long as you aren’t in the Prussian Army, dodging Nazi bombs during the last year of the Second World War, or living in Malibu.

The reason why it seems inarguable, some might say, is because the argument both Herodotus and Cassidy are making is inescapable: obviously, given Herodotus’ participation, it is a very ancient one, and yet new versions are produced all the time. Consider for instance a debate conducted by English literature professor Michael Bérubé and philosopher John Searle some years ago, about a distinction between what Searle called “brute fact” and “social fact.” “Brute facts,” Bérubé wrote later, are “phenomena like Neptune, DNA, and the cosmic background radiation,” while the second kind are “items whose existence and meaning are obviously dependent entirely on human interpretation,” such as “touchdowns and twenty-dollar bills.” Like Searle, most people might like to say that “brute fact” is clearly more significant than “social fact,” in the sense that Neptune doesn’t care what we think about it, whereas touchdowns and twenty dollar bills are just as surely entirely dependent on what we think of them.

Not so fast, said Bérubé: “there’s a compelling sense,” the professor of literature argued, in which social facts are “prior to and even constitutive of” brute facts—if social facts are the means by which we obtain our knowledge of the outside world, then social facts could “be philosophically prior to and certainly more immediately available to us humans than the world of brute fact.” The only way we know about Neptune is because a number of human beings thought it was important enough to discover; Neptune doesn’t give a damn one way or the other.

“Is the distinction between social facts and brute facts,” Bérubé therefore asks, “a social fact or a brute fact?” (Boom! Mic drop.) That is, whatever the brute facts are, we can only interpret them in the light of social facts—which would seem to grant priority to those disciplines dealing with social facts, rather than those disciplines that deal with brute fact; Hillary Clinton, Bérubé might say, would have been better off hiring an English professor, rather than a statistician, to forecast the election. Yet, despite the smugness with which Bérubé delivers what he believes is a coup de grâce, it does not seem to occur to him that traffic between the two realms can also go the other way: while it may be possible to see how “social facts” subtly influence our ability to see “brute facts,” it’s also possible to see how “brute facts” subtly influence our ability to see “social facts.” It’s merely necessary to understand how the nineteenth-century Prussian Army treated its horses.

The book that treats that question about German military horsemanship is called The Law of Small Numbers, which was published in 1898 by one Ladislaus Bortkiewicz: a Pole who lived in the Russian Empire and yet conducted a study on data about the incidence of deaths caused by horse kicks in the nineteenth-century Prussian Army. Apparently, this was a cause of some concern to military leaders: they wanted to know whether, say, if an army corp that experienced several horse kick deaths in a year—an exceptional number of deaths from this category—was using bad techniques, or whether they happened to buy particularly ornery horses. Why, in short, did some corps have what looked like an epidemic of horse kick deaths in a given year, while others might go for years without a single death? What Bortkiewicz found answered those questions—though perhaps not in a fashion the army brass might have liked.

Bortkiewicz began by assembling data about the number of fatal horse kicks in fourteen Prussian army corps over twenty years, which he then combined into “corp years”: the number of years together with the number of corps. What he found—as E.J. Gumbel pus it in the International Encyclopedia of the Social Sciences—was that for “over half the corps-year combinations there were no deaths from horse kicks,” while “for the other combinations the number of deaths ranged up to four.” In most years, in other words, no one was killed in any given corps by a horse kick, while in some years someone was—and in terrible years four were. Deaths by horse kick, then, were uncommon, which meant they were hard to study: given that they happened so rarely, it was difficult to determine what caused them—which was why Bortkiewicz had to assemble so much data about them. By doing so, the Russian Pole hoped to be able to isolate a common factor among these deaths.

In the course of studying these deaths, Bortkiewicz ended up independently re-discovering something that a French mathematician, Simeon Denis Poisson, had already, in 1837, used in connection with discussing the verdicts of juries: an arrangement of data now known as the Poisson distribution. And as the mathematics department at the University of Massachusetts is happy to tell us (https://www.umass.edu/wsp/resources/poisson/), the Poisson distribution applies when four conditions are met: first, “the event is something that can be counted in whole numbers”; second, “occurrences are independent, so that one occurrence neither diminishes nor increases the chance of another”; third, “the average frequency of occurrence for the time period in question is known”; and finally “it is possible to count how many events have occurred.” If these things are known, it seems, the Poisson distribution will tell you how often the event in question will happen in the future—a pretty useful feature for, say, predicting the results of an election. But that what wasn’t was intriguing about Bortkiewicz’ study: what made it important enough to outlast the government that commissioned it was that Bortkiewicz found that the Poisson distribution “may be used in reverse”—a discovery ended up telling us about far more than the care of Prussian horses.

What “Bortkiewicz realized,” as Aatish Bhatia of Wired wrote some years ago, was “that he could use Poisson’s formula to work out how many deaths you could expect to see” if the deaths from horse kicks in the Prussian army were random. The key to the Poisson distribution, in other words, is the second component, “occurrences are independent, so that one occurrence neither diminishes nor increases the chance of another”: a Poisson distribution describes processes that are like the flip of a coin. As everyone knows, each flip of a coin is independent of the one that came before; hence, the record of successive flips is the record of a random process—a process that will leave its mark, Bortkiewicz understood.

A Poisson distribution maps a random process; therefore, if the process in question maps a Poisson distribution, then it must be a random process. A distribution that matches the results a Poisson distribution would predict must also be a process in which each occurrence is independent of those that came before. As the UMass mathematicians say, “if the data are lumpy, we look for what might be causing the lump,” while conversely, if  “the data fit the Poisson expectation closely, then there is no strong reason to believe that something other than random occurrence is at work.” Anything that follows a Poisson distribution is likely the result of a random process; hence, what Bortkiewicz had discovered was a tool to find randomness.

Take, for example, the case of German V-2 rocket attacks on London during the last years of World War II—the background, as it happens, to novelist Thomas Pynchon’s Gravity’s Rainbow. As Pynchon’s book relates, the flying missiles were falling in a pattern: some parts of London were hit multiple times, while others were spared. Some Londoners argued that this “clustering” demonstrated that the Germans must have discovered a way to guide these missiles—something that would have been highly, highly advanced for mid-twentieth century technology. (Even today, guided missiles are incredibly advanced: much less than ten percent of all the bombs dropped during the 1991 Gulf War, for instance, had “smart bomb” technology.) So what British scientist R. D. Clarke did was to look at the data for all the targets of V-2s that fell on London. What he found was that the results matched a Poisson distribution—the Germans did not possess super-advanced guidance systems. They were just lucky.

Daniel Kahneman, the Israeli psychologist, has a similar story: “‘During the Yom Kippur War, in 1973,’” Kahneman told New Yorker writer Atul Gawande, he was approached by the Israeli Air Force to investigate why, of two squads that took to the skies during the war, “‘one had lost four planes and the other had lost none.’” Kahneman told them not to waste their time, because a “difference of four lost planes could easily have occurred by chance.” Without knowing about Bortkiewicz, that is, the Israeli Air Force “would inevitably find some measurable differences between the squadrons and feel compelled to act on them”—differences that, in reality, mattered not at all. Presumably, Israel’s opponents were bound to hit some of Israel’s warplanes; it just so happened that they were clustered in one squadron and not the other.

Why though, should any of this matter in terms of the distinction between “brute” and “social” facts? Well, consider what Herodotus wrote more than two millennia ago: what matters, when studying war, is the reasons people had for fighting. After all, wars are some of the best examples of a “social fact” anywhere: wars only exist, Herodotus is claiming, because of what people think about them. But what if it could be shown that, actually, there’s a good case to be made for thinking of war as a “brute fact”—something more like DNA or Neptune than like money or a home run? As it happens, at least one person, following in Bortkiewicz’ footsteps, already has.

In November of 1941, the British meteorologist and statistician Lewis Fry Richardson published, in the journal Nature, a curious article entitled “Frequency of Occurrence of Wars and Other Quarrels.” Richardson, it seems, had had enough of the endless theorizing about war’s causes: whether it be due to, say, simple material greed, or religion, or differences between various cultures or races. (Take for instance the American Civil War: according to some Southerners, the war could be ascribed to the racial differences between Southern “Celtics” versus Northern “Anglo-Saxons”; according to William Seward, Abraham Lincoln’s Secretary of State, the war was due to the differences in economic systems between the two regions—while to Lincoln himself, perhaps characteristically, it was all due to slavery.) Rather than argue with the historians, Richardson decided to instead gather data: he compiled a list of real wars going back centuries, then attempted to analyze the data he had collected.

What Richardson found was, to say the least, highly damaging to Herodotus: as Brian Hayes puts it in a recent article in American Scientist about Richardson’s work, when Richardson compared a group of wars with similar amounts of casualties to a Poisson distribution, he found that the “match is very close.” The British scientist also “performed a similar analysis of the dates on which wars ended—the ‘outbreaks of peace’—with the same result.” Finally, he checked another data set concerning wars, this one compiled by the University of Chicago’s Quincy Wright—“and again found good agreement.” “Thus,” Hayes writes, “the data offer no reason to believe that wars are anything other than randomly distributed accidents.” Although Herodotus argued that the only way to study wars is to study the motivations of those who fought them, there may in reality be no more “reason” for the existence of war than the existence of a forest fire in Southern California.

Herodotus, to be sure, could not have seen that: the mathematics of his time were nowhere near sophisticated enough to run a Poisson distribution. Therefore, the Greek historian was eminently justified in thinking that wars must have “reasons”: he literally did not have the conceptual tools necessary to think that wars may not have reasons at all. That was an unavailable option. But through the work of Bortkiewizc and his successors, that has now become an option: indeed, the innovation of these statisticians has been to show that our default assumption ought to be what statisticians call the “null hypothesis,” which is defined by the Cambridge Dictionary of Statistics to be “the ‘no difference’ or ‘no association’ hypothesis.” Unlike Herodotus, who presumed that explanations must equal causes, we now assume that we ought to be first sure that there is anything to explain before trying to explain it.

In this case, then, it may be that the “brute fact” of the press’ Herodotian commitment to discovering “reasons” that explains why nobody in the public sphere predicted Donald Trump’s victory: because the press is already committed to the supremacy of analysis over observation, it could not perform the observations necessary to think Trump could win. Or, as Cassidy put it, when a reporter saw the statistical election model of choice “registering the chances of the election going a certain way at ninety per cent, or ninety-five per cent, it’s easy to dismiss the other outcome as a live possibility—particularly if you haven’t been schooled in how to think in probabilistic terms, which many people haven’t.” Just how powerful the assumption of the force of analysis over data can be is demonstrated by the fact that—even despite noting the widespread lack of probabilistic thinking—Cassidy still thinks it possible that “F.B.I. Director James Comey’s intervention ten days before the election,” in which Comey announced his staff was still investigating Clinton’s emails, “may have proved decisive.” In other words, despite knowing something about the impact of probability, Cassidy still thinks it possible that a letter from the F.B.I. director was somehow more important to the outcome of this past election than the evidence of their own lives were to million of Americans—or, say, the effect of a system in which the answer to the question where outweighs that of how many?

Probabilistic reasoning, of course, was unavailable to Herodotus, who lived two millennia before the mathematical tools necessary were even invented—which is to say that, while some like to claim that the war between interpretation and data is eternal, it might not be. Yet John Cassidy—and Michael Bérubé—don’t live before those tools were invented, and yet they persist in writing as if they do. While that’s fine, so far as it is their choice as private citizens, it ought to be quite a different thing insofar as it is their jobs as journalist and teacher, respectively—particularly in the case, as say in the 2016 election, when it is of importance to the continued health of the nation as a whole that there be a clear public understanding of events. Some people appear to think that continuing the quarrels of people whose habits of mind, today, would barely qualify them to teach Sunday school is something noble; in reality, it may just be a measure of how far we have yet to travel.