We may be confident that the Great American Poem will not be written, no matter what genius attempts it, until democracy, the idea of our day and nation and race, has agonized and conquered through centuries, and made its work secure.
But the Great American Novel—the picture of the ordinary emotions and manners of American existence … will, we suppose, be possible earlier.
—John William De Forest. “The Great American Novel.” The Nation 9 January 1868.Things refuse to be mismanaged long.
—Theodore Parker. “Of Justice and the Conscience.” 1853.
“It was,” begins Chapter Seven of The Great Gatsby, “when curiosity about Gatsby was at its highest that the lights in his house failed to go on one Saturday night—and, as obscurely as it began, his career as Trimalchio was over.” Trimalchio is a character in the ancient Roman novel The Satyricon who, like Gatsby, throws enormous and extravagant parties; there’s a lot that could be said about the two novels compared, and some of it has been said by scholars. The problem with comparing the two novels however is that, unlike Gatsby, The Satyricon is “unfinished”: we today have only the 141, not-always-continguous chapters collated by 17th century editors from two medieval manuscript copies, which are clearly not the entire book. Hence, comparing The Satyricon to Gatsby, or to any other novel, is always handicapped by the fact that, as the Wikipedia page continues, “its true length cannot be known.” Yet, is it really true that estimating a message’s total length based only on a part of the whole is impossible? Contrary to the collective wisdom of classical scholars and Wikipedia contributors, it isn’t, which we know due to techniques developed at the behest of a megalomaniac Trimalchio convinced Shakespeare was not Shakespeare—work that eventually become the foundation of the National Security Agency.
Before getting to the history of those techniques, however, it might be best to describe first what they are. Essentially, the problem of figuring out the actual length of The Satyricon is a problem of sampling: that is, of estimating whether you have, like Christopher Columbus, run up on an island—or, like John Cabot, smacked into a continent. In biology, for instance, a researcher might count the number of organisms in a given area, then extrapolate for the entire area. Another biological technique is to capture and tag or mark some animals in an area, then recapture the same number of animals in the same area some time later—the number of re-captured previously-tagged animals provides a ratio useful for estimating the true size of the population. (The fewer the numbers of re-captured, the larger the size of the total population.) Or, as the baseball writer Bill James did earlier this year on his website (in “Red Hot Start,” from 16 April), of forecasting the final record of a baseball team based upon its start: in this case, the “true underlying win percentage” of the Boston record given that the team’s record in its first fifteen games was 13-2. The way that James did it is, perhaps, instructive about possible methods for determining the length of The Satyricon.
James begins by noting that because the “probability that a .500 team would go 13-2 or better in a stretch of 15 games is … one in 312,” while the “probability that a .600 team would go 13-2 in a stretch of 15 games is … one in 46,” it is therefore “much more likely that they are a .600 team than that they are a .500 team”—though with the caveat that, because “there are many more .500 teams than .600 teams,” this is not “EXACTLY true” (emp. James). Next, James finds the standard statistical measure called the standard deviation: that is, the amount by which actual team records distribute themselves around the .500 mark of 81-81. James finds this number for teams in the years 2000-2015 to be .070, a low number; meaning that most team records in that era bunched closely around .500. (By comparison, the historical standard deviation for “all [major league] teams in baseball history” is .102, meaning that there used to be a wider spread between first-place teams and last-place teams than there is now.) Finally, James arranges the possible records of baseball teams according to what mathematicians call the “Gaussian,” or “normal” distribution: that is, how team records would look were they to follow the familiar “bell-shaped” curve, familiar from basic statistical courses, in which most teams had .500 records and very few teams had either 100 wins—or 100 losses.
If the records of actual baseball teams follow such a distribution, James finds that “in a population of 1,000 teams with a standard deviation of .070,” there should be 2 teams above .700, 4 teams with percentages from .675 to .700, 10 teams from .650 to .675, 21 teams from .625 to .650, and so on, down to 141 teams from .500 to .525. (These numbers are mirrored, in turn, by teams with losing records.) Obviously, teams with better final records have better chances of starting 13-2—but at the same time, there are a lot fewer teams with final records of .700 than there are of teams going .600. As James writes, it is “much more likely that a 13-2 team is actually a .650 to .675 team than that they are actually a .675 to .700 team—just because there are so many more teams” (i.e., 10 teams as compared to 4). So the chances of each level of the distribution producing a 13-2 team actually grows as we approach .500—until, James says, we approach a winning percentage of .550 to .575, where the number of teams finally gets outweighed by the quality of those teams. Whereas in a thousand teams there are 66 teams who might be expected to have winning percentages of .575 to .600, thereby meaning that it is likely that a bit better than one of those teams might have start 13-2 (1.171341 to be precise), the chance of one of the 97 teams starting at 13-2 is only 1.100297. Doing a bit more mathematics, which I won’t bore you with, James eventually concludes that it is most likely that the 2018 Boston Red Sox will finish the season with .585 winning percentage, which is between a 95-67 season and a 94-68 season.
What, however, does all of this have to do with The Satyricon, much less with the National Security Agency? In the specific case of the Roman novel, James provides a model for how to go about estimating the total length of the now-lost complete work: a model that begins by figuring out what league Petronius is playing in, so to speak. In other words, we would have to know something about the distribution of the lengths of fictional works: do they tend to converge—i.e., have a low standard deviation—strongly on some average length, the way that baseball teams tend to converge around 81-81? Or, do they wander far afield, so that the standard deviation is high? The author(s) of the Wikipedia article appear to believe that this is impossible, or nearly so; as the Stanford literary scholar Franco Moretti notes, when he says that he works “on West European narrative between 1790 and 1930,” he “already feel[s] like a charlatan” because he only works “on its canonical fraction, which is not even one percent of published literature.” There are, Moretti observes for instance, “thirty thousand nineteenth-century British novels out there”—or are there forty, or fifty, or sixty? “[N]o one really knows,” he concludes—which is not even to consider the “French novels, Chinese, Argentinian, [or] American” ones. But to compare The Satyricon to all novels would be to accept a high standard deviation—and hence a fairly wide range of possible lengths.
Alternately, The Satyricon could be compared only to its ancient comrades and competitors: the five ancient Greek novels that survive complete from antiquity, for example, along with the only Roman novel to survive complete—Apuleius’ The Metamorphoses. Obviously, were The Satyricon to be compared only to ancient novels (and of those, only the complete ones) the standard deviation would likely be higher, meaning that the lengths might cluster more tightly around the mean. That would thereby imply a tighter range of possible lengths—at the risk, since the six ancient novels could all differ in length from The Satyricon much more than all the novels written likely would, of making a greater error in the estimate. The choice of which set (all novels, ancient novels) to use thereby is the choice between a higher chance of being accurate, and a higher chance of being precise. Either way, Wikipedia’s claim that the length “cannot be known” is only so if the words “with absolute certainty” are added. The best guess we can make can either be nearly certain to contain the true length within it, or be nearly certain—if it is accurate at all—to be very close to the true length, which is to say that it is entirely possible that we could know what the true length of The Satyricon was, even if we were not certain that we did in fact know it.
That then answers the question of how we could know the length of The Satyricon—but when I began this story I promised that I would (eventually) relate it to the foundations of the National Security Agency. Those, I mentioned, began with an eccentric millionaire convinced that William Shakespeare did not write the plays that now bear his name. The millionaire’s name was George Fabyan; in the early 20th century he brought together a number of researchers in the new field of cryptography in order to “prove” Fabyan’s pet theory that Francis Bacon was the true author of the Bard’s work Bacon having been known as the inventor of the code system that bears his name; Fabyan thusly subscribed to the proposition that Bacon had concealed the fact of his authorship by means of coded messages within the plays themselves. The first professional American codebreakers thereby found themselves employed on Fabyan’s 350-acre estate (“Riverbank”) on the Fox River just south of Geneva, Illinois, which is still there today—and where American military minds found them on the American entry into World War One in 1917.
Specifically, they found Elizabeth Smith and William Friedman (who would later marry). During the war the couple helped to train several federal employees in the art of codebreaking. By 1921, they had been hired away by the War Department, which then led to spending the 1920s breaking the codes of gangsters smuggling liquor into the dry United States in the service of the Coast Guard. During World War Two, Elizabeth would be employed in breaking one of the Enigma codes used by the German Navy; meanwhile, her husband William had founded the Army’s Signal Intelligence Service—the outfit that broke the Imperial Japanese Navy’s “Purple” code (itself based on Enigma machines), and was the direct predecessor to the National Security Agency. William had also written the scientific papers that underlay their work; he had, in fact, even coined the word cryptanalysis itself.
Central to Friedman’s work was something now called the “Friedman test,” but then called the “kappa test.” This test, like Bill James’ work, compared two probabilities: the first being the obvious probability of which letter a coded one is likely to be, which in English is in one in 26, or 0.0385. The second, however, was not so obvious, that being the chance that two randomly selected letters from a source text will turn out to be the same letter, which is known in English to be 0.067. Knowing those two points, plus how long the intercepted coded message is, allows the cryptographer to estimate the length of the key, the translation parameter that determines the output—just as James can calculate the likely final record of a team that starts 13-2 using two different probabilities. Figuring out the length of The Satyricon, then, might not be quite the Herculean task it’s been represented to be—which raises the question, why has it been represented that way?
The answer to that question, it seems to me, has something to do with the status of the “humanities” themselves: using statistical techniques to estimate the length of The Satyricon would damage the “firewall” that preserves disciplines like Classics, or literary study generally, from the grubby no ’ccount hands of the sciences—a firewall, we are eternally reminded, necessary in order to foster what Geoffrey Harpham, former director of the National Institute for the Humanities, has called “the capacity to sympathize, empathize, or otherwise inhabit the experience of others” so “clearly essential to democratic citizenship.” That may be so—but it’s also true that maintaining that firewall allows law schools, as Sanford Levinson of the University of Texas remarked some time ago, to continue to emphasize “traditional, classical legal skills” at the expense of “‘finding out how the empirical world operates.’” And since that has allowed (in Gill v. Whitford) the U.S. Supreme Court the luxury of considering whether to ignore a statistical measure of gerrymandering, for example, while on the other hand it is quite sure that the disciplines known as the humanities collect students from wealthy backgrounds at a disproportionate rate, it perhaps ought to be wondered precisely in what way those disciplines are “essential to democratic citizenship”—or rather, what idea of “democracy” is really being preserved here. If so, then—perhaps using what Fitzgerald called “the dark fields of the republic”—the final record of the United States can quite easily be predicted.