By the Book: Catching Up with the XML Tag Set

By Eric Loy

Screenshot of XML encoding

This is not a call for reform. This is not an admission of guilt.

Moving deeper into Blake’s typographic works for the Archive has presented a number of new encoding questions, particularly with how to handle potentially “secondary” text on the page, like printer’s marks, catchwords, page numbers, titles, etc.

The first kinds of questions we asked dealt with transcription display:

  • How would we handle different fonts/sizes/spacing?
  • Do we want to display “secondary text” with a specific color code?
  • How many different kinds of “secondary text” should we classify?

These are good questions to ask because a rich transcription can help users make sense of a manuscript. In the case of typographic works, it’s usually not a strain to read the text, but a rich transcription can help distinguish different kinds of texts at work on the typographic page.

We were, perhaps, getting ahead of ourselves.

Questions of transcription display are always implicitly questions about encoding. What we’re really talking about with “color code” or text classification is, behind the transcription display, XML attributions that make editorial observations (or claims) about what a particular part of the manuscript is doing. OK, so what should we do about it?

