Sorry no new posts for awhile–but Digital Aladore has not been idle!
I have been processing the second digital witness, Aladore 1915 (New York: E.P. Dutton & Company, digitized by Internet Archive in 2006). This time around things went quite quick and efficiently, since I wasn’t testing various options and I am now familiar with all the software. The page background was cleaner in this digitized edition, which I think made the OCR a tiny bit more accurate. However, the actual book seemed to have a few more print errors–for example a single letter or punctuation mark missing or distorted. I think this sometimes happened with later printings of a book, since they were often reproduced from plates used in the first printing. Wear and tear on the older plates can introduce errors into the new text.
Like the first time around, I divided the page images into six batches (i.e. directories) to simplify processing. I preprocessed the pages with ScanTailor, ran OCR with YAGF, and batch edited the HTML with BlueFish. Those three steps, including the computer’s processing time, took about four to five hours to complete in total. You could rush through the process faster, but I think this time estimate is a fairly careful and non-stressful pace.
I am curious to compare this new text with the first one I created, so I will be setting that up next! Stay tuned…