Method: How did we make this edition?

Planning a variorum reading experience

This is a variorum edition in the sense that it assembles and displays the variant forms of a work. For our Frankenstein Variorum, we are grateful for inspiration and consultation of Barbara Bordalejo and her ”Online Variorum of Darwin’s Origin of Species, which shared a similar goal, with six variant editions published in the author’s lifetime over a period of 14 or 15 years. We were also impressed with Ben Fry’s ”On the Origin of Species: The Preservation of Favoured Traces," an interactive visualization of how much that work changed over 14 years of Darwin’s revisions, where on mouseover, you can access passages of the text in transition. In our team’s early meetings at the Carnegie Mellon University Library, we sketched several design ideas on whiteboards, and came to a significant conclusion about an interface to invite reading for variation. Side-by-side panels are typically how we read variant texts (via Juxta Commons, which was then popular, or the Versioning Machine, or the early experiments with the Pennsylvania Electronic Edition’s side-by-side view of 1818 and 1831 Frankenstein texts).

We agreed that surely a five-way comparison was not best served by five narrow side-by-side panels, yet we wanted our readers to be able to see all available variations of a passage at once. For this a note panel seemed most appropriate, especially if we could link to each other edition at a particular instance of variation. Bordalejo’s Variorum highlights variant passages color-coded to their specific edition and offers a mechanism to view each of the other variants on that passages, momentarily on mouseover of the highlighted text. We liked this but wanted it to be more available to the viewer for exploring the edition. Our Variorum viewer is related to that of the Darwin Variorum, but we decided to foreground the variant apparatus view and make it the basis of visualizing and navigating our edition.

We decided to display a single edition at a time and to foreground "hotspots" of variance, to alert the reader to passages that are different in this text than in the other versions. As they explore a particular edition, readers can discover variant passages based on highlights of light to dark intensity. On interacting with a variant passage, a side panel would appear to display the data about variation in each of the other four editions, known as the critical apparatus (designed in editions to store information about variation). That side panel would, in turn, link the reader to each of the other editions available at that moment. While Bordalejo’s Variorum allows the reader to select any two editions to read side-by-side, we do not provide this in our Variorum, but rather we have chosen to foreground the view of all variations at once on the screen and use it as a basis for navigation. Note that the variant passages represented in the critical apparatus panel is not exactly the same as a passage’s literal appearance in its source text (visible on click). This is because we are displaying normalized text showing our basis for comparing the editions. For example, the normalized view ignores case differences in lettering, interprets “&” as equivalent to ”and” and brings forward where some versions hold paragraph boundaries and others do not. To view the text as it appears in its witness, follow the link to it in the critical apparatus panel. In openly sharing the normalized view of the texts in the critical apparatus panel, we wanted to display not only the variations but also our basis for identifying and grouping those variations.

While the Frankenstein Variorum may certainly be accessed to read a single edition from start to finish, it seems more likely that readers might wish to go wandering to explore the edition at interesting moments, collecting digital “core samples” of significantly altered passages to track their changes. We recommend reading the Frankenstein Variorum from any point of departure and in any direction. We invite the reader to a non-linear adventure in reading across the editions, exploring for variation. Exploring variants in this edition may complement reading a print edition of Frankenstein, and we hope will reward curious readers, student projects, and scholarly researchers investigating precisely how this novel transformed from 1816 to 1831.

To accomplish the vision of our interface, we had much work to do to prepare the texts for comparison. We needed to identify the passages that vary, and do so in a way that followed the logic

Preparing the texts for machine-assisted collation

When we began this project, we set ourselves the challenge to collate existing digital editions of the 1818 and 1831 texts with the Shelley-Godwin Archive’s TEI XML edition of the manuscript notebooks (the "MS" in our Variorum). The print editions were encoded based on their nested semantic structure of volumes, chapters, paragraphs, with pagination in the original source texts a secondary phenomenon barely worth representing on the screen. The MS consisted of thousands of XML documents, with a separate file for each individual notebook with a documentary line-by-line encoding of the marks of the page surfaces, including marginal annotations. These could be bundled into larger files, but the major structural divisions in this edition are page surfaces. Chapter, paragraph, and other such meaningful structures were, thankfully for us, encoded carefully in “milestone marker” elements. In XML, it means they were signaled in position, but not used to provide structure to the documents.

With careful tracking of all the distinct elements in each edition, we noted where and how the editions marked each meaningful structure in the novel. We applied eXtensible Stylesheets Transformation Language (XSLT) to negotiate the different paradigms of markup in these digital editions. We applied XSLT to “flatten” the structure of all the editions, to convert all the meaningful structure elements into “milestone markers,” thinking of them as signal beacons for us in the collation process that would follow. Locating analogous markup and flattening all the editions to include that meaningful markup was key to our preparation of the editions for machine-assisted collation. Crucially, we could not include all the markup tagging in the collation. Markers of volumes, letters, chapters, paragraphs, and poetry were meaningful points of comparison. However, we also had to mark off, effectively mask away, elements in the MS files that marked page surfaces and each line on the page. We could not lose these markers: they were important to construct the editions you see in the Variorum interface (where we do display lineation. But we also had to bundle the S-GA page XML files into clusters to align roughly with the structural divisions of the print editions.

We followed the Gothenburg Model of computer-aided textual collation, which requires clarity on how we would:

Alignment proved a significant challenge. We divided the novel into 33 portions (casually deemed “chunks”) that shared the same or very similar passages as starting and ending points. Often these were set at chapter boundaries, or at the start of a passage shared across all five editions, like the famous "It was on a dreary night of November" shared in all of the texts. These "chunks" would share much the same end-points as well. These were prepared so that the CollateX collation software would more reliably and efficiently locate variant passages than it could by working with the entire novel. This was also important because the MS notebooks were not a complete representation of the novel, but were missing portions, so we needed to identify which collation units we had present.

To understand the contents of the Frankenstein Variorum, it helps to see the various pieces of the manuscript notebook aligned with the 33 collation units that we prepared for the full print editions available in their totality. The MS notebooks were missing a large portion of the opening of the novel, a full 7 collation units. They featured a gap in the middle, around which we identified collation unit 19, and they contained a few extra copies of passages at C-20, C-24, and C-29 - C-33. The MS files were identified by their position in one of three boxes at the Bodleian Library, all fully encoded by the Shelley-Godwin Archive. The following interactive diagram is a visual summary of how the pieces aligned prior to collation:

This SVG displays how the MS Frankenstein Notebooks align with the collation units devised for the published editions of Frankenstein. Alignment of the MS Notebook Collation Units Print editions: full range of collation units C-01 C-33 MS Box 56 C-08 C-18(frag) MS Box 57 C-20 C-24 C-29 MS Box 58 C-33
Visualization of the collation units prepared from the Manuscript Notebooks. Click on the underlined links in the image to visit the Variorum edition at each alignment boundary,

We prepared all editions to be compared with one another with computer-aided collation. To create the TEI variorum, we prepared all the print editions with the same XML elements, and then we “flattened” those elements as self-closing milestone markers for collation, because the collation process needs to be able to locate alterations that collapse or open up new paragraphs and chapters. We similarly flattened the markup of the Shelley-Godwin archive texts, and we wrote an algorithm in Python to exclude page surface and line markers from the collation, because our process compares what we think of as semantic structures; thus, the paragraphing, the chapter, the volume boundaries matter where the page boundaries and lineation do not. When the editions are thus prepared in comparable “flat” XML, we process them with CollateX, which locates the points of variance (or “deltas”) and outputs these in TEI XML critical apparatus markup. We have devised a structure that we think of as the “spine” of the edition created from the TEI critical apparatus to point to specific locations in the manuscript notebooks. This provides a way to link a reading interface of the novel that highlights “hotspots” of variance in the print edition and that links into relevant passages in the Notebooks.

From Collation Data to Variorum Edition

Here describe the making of the TEI spine and the edition files.

The Static Web Interface for thie Variorum

Here describe the static website with Astro.

Presentations