The Digital Aladore project has more than 25 posts, so as we enter stage “4. Capturing Text” I want to quickly review where we are.
The Background stage looked at the author of Aladore, Henry Newbolt. He was a Victorian celebrity, but also had a complex love life that impacted his writing. In fact Aladore seems to be a product of his passionate affair with Alice Hylton who illustrated the novel.
The Witnesses stage looked at the print and digital publication history of Aladore. There was a short diversion into two stories by Edgar Allen Poe to practice creating and distributing EPUBs.
Digital Witnesses (3.5) reflected on how traditional concepts of textual criticism relate to digitization and unraveling digital texts.
Now we are at the Capturing Text stage. We officially have our JPG page images of the two digital witnesses of Aladore. We will now be testing a few different OCR platforms to “capture” our digital text.
Looking ahead, here is the plan:
5. Compare Editions: After we have a couple full transcripts of the witnesses we will use some tools to compare the texts. Curious? Okay check out http://www.juxtasoftware.org for a preview of a great tool! We will do this with the different OCR platforms to evaluate them, but also with the two corrected texts to see if there is any differences between the editions.
6. Edit into a Best Text: we will fix up the raw OCR transcript into a more correct text–following the evidence of the page images if the text appears incorrect. Will we find everything that is inaccurate? No… but we just want to a good reading text, not a completely authoritative edition.
7. Massage epub: At this point hopefully we have a lovely text in some format–but we need to make it into a beautiful EPUB! I haven’t thought too much about the intricacies of EPUB formatting yet… but here is a good lead of someone who IS thinking about the presentation of text in ebook formats: Yellow Buick Review, http://yellowbuickreview.wordpress.com. I can slap together a working EPUB using Sigil, but hope to learn more about the finer points, such as CSS to provide better formatting.
8. Release: Once everything looks great, I will put the epub out into the world (i.e. www). It will most likely be at Archive.org.
Sound good? Sound possible to finish some day?