By Eric Loy
This summer I attended DHSI at U Victoria (again). I had the great fortune to take James O’Sullivan’s course on Computation and Literary Criticism. (I also had the great fortune to eat at Red Fish Blue Fish, like, four times in five days.)
As one could guess, we learned a lot about distant reading and macroanlytic approaches to literary study, focusing on the technological pragmatics. So: we messed around in RStudio, creating stylometric cluster dendrograms; we dumped huge corpuses into Voyant Tools; we experimented with an open source Topic Modeling app (and talked about how mathematically insane topic modeling is).
The Blake Archive, of course, contains a trove of text that’s easily mineable from the backend. (Our tech editor Mike Fox emailed me plain text files of all Archive transcriptions for my experimenting.) Here are a couple of results from those experiments:
On the left, one of those stylometric cluster dendrograms. On the left, the same data dumped into a network visualization.
What exactly are these visualizing? Well, each is essentially a comparative statistical analysis of Blake’s illuminated books, using the 100 most frequent words in the corpus. (Most frequent words is an alarmingly reliable …read more