Tagged: Internet Archive

DigitalAladore 1.5, EPUB3 Edition!

Is DigitalAladore 1.0 looking crummy on your ultra high def 10 inch tablet screen?

Well, give DigitalAladore 1.5 a try! Following the workflow outlined in previous posts, I generated an Aladore EPUB3 edition. The images are much bigger and the CSS is slightly tweaked with larger screens in mind. Personally, I still find reading ebooks on tablets a bit unsatisfying, slightly too big and bright. But I think this version will look pretty good! However, at over 9MB it might be slow to load on your e-ink reader.

So with out further ado, you can find the new EPUB3 at Internet Archive,

DigitalAladore 1.5: Aladore, by Henry Newbolt (1914, epub3), https://archive.org/details/AladoreHenryNewbolt3


New Aladore EPUB!

If you have been following along, all that prototyping, testing, and tweaking eventually brings us to a NEW Aladore EPUB! I am calling it DigitalAladore1.0, because there might be some more versions to come (for example conversion to epub3 standard)…

This is an EPUB2 file which should render well on dedicated e-ink readers for a high quality reading experience. The text is much better than the auto-generated editions I encountered at the beginning of this project (here is one of the source editions on Internet Archive with a crummy PDF and epub available). We have done a lot of work to go beyond the first Digital Aladore draft edition. The images are nicer, the underlying mark up is sensible, the metadata is complete, and the epub package is put together correctly. And we did it all with Free software.

This is a major milestone for Digital Aladore, but I still have more to say (of course).  For example, I uploaded the new epub to Internet Archive, which I think is an amazing resource: we need to talk more about free distribution and the public domain. Lets save it for another day! For now:

DigitalAladore 1.0: Aladore, by Henry Newbolt (1914), https://archive.org/details/AladoreHenryNewbolt

Excitement at Internet Archive!

Internet Archive has recently rolled out a major redesign of their website.  I don’t love everything about the design (its much less information dense, so requires more navigation and is less easy to browse), but one thing is AWESOME: they now provide direct download of the page image files!

Remember that work around I came up with to harvest the page images out of their online reader?  You don’t have to do that any more.  Just click on the download button, and most items will offer raw and edited scans images (JP2) in addition to the usual EPUB, PDF, and other access derivatives.  They actually expose all files related to an item if you click “See All Files”, including the metadata in a bunch of formats.

Check out Aladore 1915 for an example: https://archive.org/details/aladorehen00newbrich

Wow!  I am seriously impressed and excited!  I mentioned the issue of only providing limited access versions in previous posts, so its great to see the huge collection at Internet Archive take this massive leap forward in enabling users and re-use.

p.s. on a related note, just as I cropped out the illustrations from Aladore and provide them in nicely edited versions here for re-use, Internet Archive initiated a project about a year ago to mine the wealth of images out of their digitized books.  They started posting images on their Flickr website in July 2014, https://www.flickr.com/photos/internetarchivebookimages

Today they have 2,878,891 images posted!  That’s a lot to browse through… they haven’t gotten to Aladore yet, so you will still have to visit here!

Digitized Aladore

Finally, lets get some digital witnesses to work with!

There are basically two scanned books available online in many different versions with different processing.

Aladore, 1914 Title page

Aladore, 1914 Title page

First, is a copy of the 1914 Blackwood standard edition scanned at University of Toronto in 2006.  Several different versions of the scanned book are available.  U of T previously hosted PDFs on their own library website.  You can still get their two versions from the legacy links, although the catalog no longer points to them:

U of T 1914 edition: http://scans.library.utoronto.ca/pdf/1/5/aladoren00newbuoft/aladoren00newbuoft.pdf

U of T 1914 edition, processed to black and white: http://scans.library.utoronto.ca/pdf/1/5/aladoren00newbuoft/aladoren00newbuoft_bw.pdf

The U of T library now points to Scholars Portal Books, hosted by the Ontario Council of University Libraries.  The listing is here: http://books1.scholarsportal.info/viewdoc.html?id=75462

These PDFs are identical.  They are large (57.2 MB) and well made.

Internet Archive also hosts a derivative of this scan.  However, the PDF is much lower quality (10.9 MB) and has slow performance due to the odd layered post-processing.  IA also provides automatically generated alternative formats, such as EPUB, but the accuracy of the transcription is horrible. On the upside, IA provides much more metadata than any of the other sites.  https://archive.org/details/aladoren00newbuoft


Aladore, 1915 Title page

Aladore, 1915 Title page

Second, is a copy of the 1915 Dutton edition scanned at University of California Libraries in 2006.  This copy exists in two versions online.

University of California Libraries point to the record on Hathi Trust Digital Library.  Using Hathi can be annoying because they limit downloading many of their items, despite the fact that they are in the public domain.  They have a sort of pay wall that requires logging in from a partner institution to access the full site.  The second annoyance is that Hathi adds a huge border around the PDF pages that has a reference to the source file, plus a watermark over the bottom of the page.  In this case the watermark says “Digitized by Internet Archive, Original from University of California.”  Internet Archive does NOT include this watermark on their copy!  Unfortunately, in providing this format, Hathi does not seem to consider READING and readers.  It also seems a pathetic possessiveness over public domain materials.  Color and black+white PDFs are available from the Hathi catalog listing, and are of high quality (119 MB): http://catalog.hathitrust.org/Record/006155073

Internet Archive also hosts a version of this scan, but again, their post-processing creates a lower quality (13.5 MB) and less useable PDF: https://archive.org/details/aladorehen00newbrich

Looking at the metadata provided by Internet Archive is really fascinating.  The digitization of both editions were sponsored by Microsoft.  One was shot using a Canon EOS 5D (at 400 ppi), the other a 1Ds (at 500 ppi)–almost certainly using an ATIZ BookDrive.  The 1915 was shot November 7th 2006, and the 1914 was shot ten days later.  The operators were “scanner-melissa-cunningham” and “scanner-katie-lawson.”

Even with all this random metadata, we do not know much about the digitization project or the post-processing.  To me it is strange that we do not better document the process, to understand the intentions behind how these objects were created.  It is interesting that the library catalogs do not represent ANY of the digitization metadata.  The catalogs only refer to the original object and seem completely uninterested in the digital one or how it came into existence.