Witnesses Ready

The aim of the next stage of the project, Capturing Text as I decided to call it, is to test some OCR platforms to create a usable digital text from the digitized page images of Aladore.

I have been talking a lot about the witnesses available online, and I was frustrated with the PDF versions.  PDF is just not a very good format to work with and the quality was too variable.  Although OCR is possible on PDF files, most of the programs are really set up to use individual page images.  To get the best possible digital text, I need the best quality images I can get.

No one provides individual page images par se… but actually they do!  Most online reader platforms are actually a container for serving up JPG pages.  For example, check out the Internet Archive reader display of Aladore 1915, https://archive.org/stream/aladorehen00newbrich#page/n7/mode/2up

Right click on one of the pages and choose “view image”.  You find a JPG!  You can get better quality files by zooming in first then viewing the image.  If you study the resulting URL you be able to figure out the pattern for naming the individual page images and qualities.

I use the Free and open download manager DownThemAll! which is an extension for Firefox: http://www.downthemall.net

This allowed me to efficiently download a full set of images for the 1914 (from Internet Archive) and 1915 (from Hathi Trust) editions.  While this is not the way IA and Hathi intended us to use the images stuck in the online readers, if we flipped through every page of the book it would result in exactly the same file use.

So, Awesome!  Digital witnesses ready to go!

Aladore page images!

Aladore page images!


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s