Using Juxta

So the collation of the 1914 and 1915 Aladore texts is pretty neat–but it is also useful.  I don’t think it is revealing any interesting differences between the two printed editions, but it is surfacing many simple errors in the OCR that I haven’t spotted by other means.  The 1914 and 1915 print editions seem to be identical except for a few page breaks.  I do not need to unravel issues with the transmission or make any complex editorial decisions based on textual scholarship.  Instead, the comparison is part of a process to get the new OCR witnesses to better match the digital image witnesses.  The OCR of each edition has slightly different errors which are highlighted by the collation.  Thus, by combining the information of two imperfect witnesses (all witnesses are imperfect!), we can create a new best text that is more accurate than its parents.

Here is an example:

OCR errors in Juxta.

OCR errors in Juxta.

The 1915 has “world’s. four roads” and the 1914 “world’s {our roads”, so both have errors!  Looking at the page images “world’s four roads” is obviously the correct reading, but the 1915 edition has a tiny blemish at the end of a line which OCRed as a period, and the “f” of four in 1914 edition is slightly faint contributing to its missed identification.  These are exactly the sort of OCR errors that are hard to detect in any other way since they do not show up when spell checking and are not visually obvious.

In case you want to play along, I shared the comparison on Juxta Commons: http://juxtacommons.org/shares/qWLf2p

However, Juxta Commons is not ideal for actually fixing the errors, since it does not allow you to edit the source texts.  You will have to open the base text in an external text editor to fix issues as you look through the collation on Juxta.  Instead, I prefer to use the desktop version which does allow you to edit on the fly–handy!  One other difference is that the desktop application displays the raw HTML rather than the rendered text shown on the Commons.  This can be distracting if your tags are too ugly but I like seeing them since I want to ensure both the text content and tags are correct.

Here is my workflow for the process:

1) Download and install the desktop version of Juxta from http://www.juxtasoftware.org/download

2) Open Juxta, and click File > New Comparison Set.

Juxta menu bar and icons.

Juxta menu bar and icons.

3) Click the Plus icon to add your witnesses (i.e. Digital Aladore 1914 and 1915).

4) Click the Refresh icon to collate the witnesses.  This brings up a dialog box with options for the collation.  Since I want to detect ALL differences, I uncheck all the boxes.  Processing might take a few seconds.

collate options.

collate options.

You will now have a workspace something like this:

Aladore collated on Juxta.

Aladore collated on Juxta.

Clicking on a witness listed on the “Comparison Set” pane (left) changes the base text.  The selected base text will appear in the lower right “Source” pane.  The upper right pane can be switched between “Collation view” or “Comparison view” (i.e. Heatmap or Side-by-side in Juxta Commons terminology, as described in the previous post) using the tabs on the bottom of the pane.  Clicking on a word in Collation pane will move the source text and highlight the selection in the Source pane.

5) Choose one of the source texts to edit (I used Aladore 1914).  Then, click “Edit” in the lower right corner of the Source pane.  This allows you to edit the text, but nothing is saved until you click “Update.”

6) Now, I scroll through the text on the Collation pane (I prefer to use the “Comparison view”) and decide which errors need to be edited in the base text.  I click on the highlighted errors in the Collation pane (if using Comparison view, make sure you click on the side representing your base text or it will switch the Source pane).  This brings up the corresponding spot in the Source pane to edit.  If the correct reading is not obvious, I quickly check the original page image (I have the directory open with thumbnails and a preview window so they are easy to reference).

7) When finished working through the entire collation, click “Update” on the lower right corner of the Source pane and save the edited text with a new name.

8) Click on the new text in the Comparison Set pane.  Then, click File > Export Source Document.  This saves a text file of the new witness.

Done!

This little activity caught a lot more errors than I expected!  In particular, the 1914 text had “b” instead of “h” in many places.  There was also many misplaced periods in random locations.  The interface is a little cumbersome for editing in this way, but I definitely think it is a useful tool.

Advertisements

One comment

  1. Pingback: Another Juxta use! | Digital Aladore

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s