22 February 2012

What's the count in the American empire debate?

As part of a project I'm working on this term, I ran these Google Scholar n-grams for a professor. (They aren't the first or even the most impressive ones I've run, but they are the ones that I'm most behind on sharing.)

The first corpus is a broad overview of the frequency of the use of the term "American empire" in Google's corpus of American English.

"American empire", corpus: American English, 1850-2008

What emerges is a somewhat surprising pattern: There's a long decline over the nineteenth century, a blip following the acquisition of an actual American empire in 1898, and then a long plateau following the United States' inadvertent victory in World War I.

The decline that set in after the height of Vietnam War guilt is reversed almost immediately by the events of September 11, which sent discussions of "American empire" to an all-time high (that's right, even eclipsing the period of time when we stole--there's no other nice word for it--Hawai'i, Puerto Rico, the Philippines, and Guam!).

"American empire", corpus: American English, 1968-2008
 This trend is confirmed by an examination of the British English corpus (more discussion below).

16 February 2012

In torrential rains, you need an umbrella. For torrents of data, you need statistics

Technically, "because I didn't have observational data."
Working with experimental data requires you only
to be able to calculate two means and look up
a t-statistic on a table.
The excellent Silbey at the Edge of the American West is stunned by the torrents of data that future historians will be able to deal with. He predicts that the petabytes of data being captured by government organizations such as the Air Force will be a major boon for historians of the future --

and I surely can't be the only person who always says "Of the future!" in the same way that the announcers of the 1930s Flash Gordon serials would announce the impending arrival of aliens --

but that this torrent of data means that it will take vastly longer for historians to sort through the historical record.

He is wrong. It means precisely the opposite. It means that history is on the verge of becoming a quantified academic discipline.

The sensations Silbey is feeling have already been captured by an earlier historian, Henry Adams, who wrote of his visit to the Great Exposition of Paris:
He [Adams] cared little about his experiments and less about his statesmen, who seemed to him quite as ignorant as himself and, as a rule, no more honest; but he insisted on a relation of sequence. And if he could not reach it by one method, he would try as many methods as science knew. Satisfied that the sequence of men led to nothing and that the sequence of their society could lead no further, while the mere sequence of time was artificial, and the sequence of thought was chaos, he turned at last to the sequence of force; and thus it happened that, after ten years’ pursuit, he found himself lying in the Gallery of Machines at the Great Exposition of 1900, his historical neck broken by the sudden irruption of forces totally new.

Because it is strictly impossible for the human brain to cope with large amounts of data, this implies that in the age of big data we will have to turn to the tools we've devised to solve exactly that problem. And those tools are statistics.

It will not be human brains that directly run through each of the petabytes of data the US Air Force collects. It will be statistical software routines. And the historical record that the modal historian of the future confronts will be one that is mediated by statistical distributions. Inshallah, yes, this means that historians will debate whether a given event was caused by a process that follows a negative binomial or a Poisson distribution.

And scholarship will be better for it.


I do have the notes for my long-promised sequel to "What do quallys know, anyway?" , tentatively entitled "Quantoids don't know anything," and this will probably prompt me to finally finish it.