Let’s look at a more nuanced tool that was specifically created for collating witnesses: Juxta, http://www.juxtasoftware.org
Juxta is an Free and open source project (APLv2) designed to support textual scholarship that was developed by NINES (i.e. Networked Infrastructure for Nineteenth-Century Electronic Scholarship). People have found a wide variety of uses for the software. For example, here is a post by Stephanie Kingsley at the Juxta blog about using it for editing OCR transcripts, much like I am doing at Digital Aladore. Juxta was first available as a java-based desktop application, but the most recent version is online only, called Juxta Web Service. The code can be found at github if you want to start your own instance. However, a complete pipeline is available to use for free at the Juxta Commons, http://juxtacommons.org
Go register for an account and start Collating your own texts online! Oh, what fun!
Seriously, Juxta is a simple and powerful tool that quickly reveals the exact differences between the witnesses. For the purposes of Digital Aladore, where the source texts are the simple HTML files I prepared in the last post, the older desktop version and the newer web version are almost the identical. The web version just looks a little slicker and enables easy sharing of your work.
The desktop version is very simple. Just click the Plus icon to add each witness and click the Refresh icon to collate the selected texts. This will generate a view like this:
There is two main ways to visualize the collation of the witnesses: Heatmap or Side-by-side view. The Heatmap is the default view, the upper right pane in the screenshot above. The text displayed is the “base text”, i.e. one of the witnesses, in this screenshot the 1915 edition. The base can be switched by clicking on a different witness in the left pane. Areas where the other witnesses differ from the base text are highlighted in blue. If you have many witnesses, the color will be lighter or darker depending on how much variance is present. For example, if all the witnesses have a different word in one location, it would be dark blue. If only one of five witnesses has a different word, it would be highlighted in light blue. Clicking on the highlighted area brings up a window showing the alternative reading (i.e. what the other witness says that differs from the base text).
The Side-by-side view displays two witnesses aligned next to each other with highlights on the differences. Lines visually connect the differences so that you can easily see how they relate. A histogram showing the areas of variance can be opened to easily navigate through the text.
Using Juxta Commons is basically the same, although the workflow is a little more complicated using the browser based controls. This enables more input types and advanced processing of text sources, which we don’t need for Digital Aladore. After logging in, you need to add sources, i.e. upload your files or connect to a URL. Once the source is uploaded, click the little arrow next to the file name to “Prepare Witness.” A processed version will now show up in the Witnesses window. Once you have all the witnesses ready, check off the ones you want to compare, click on Witnesses at the top of the window, and click “Create Set with selected.” The screen will look something like this:
Collation of the witnesses may take a few minutes, a green circle will appear to the left of the Set name when processing is complete. Click on the Set’s name to open the visualizations. The view options are the same as on the desktop version. Side-by-side of Digital Aladore 1914 and 1915 looks like this:
So the collation is all set up, we will start USING it in the next post…