Tagged: Aladore

DigitalAladore 1.5, EPUB3 Edition!

Is DigitalAladore 1.0 looking crummy on your ultra high def 10 inch tablet screen?

Well, give DigitalAladore 1.5 a try! Following the workflow outlined in previous posts, I generated an Aladore EPUB3 edition. The images are much bigger and the CSS is slightly tweaked with larger screens in mind. Personally, I still find reading ebooks on tablets a bit unsatisfying, slightly too big and bright. But I think this version will look pretty good! However, at over 9MB it might be slow to load on your e-ink reader.

So with out further ado, you can find the new EPUB3 at Internet Archive,

DigitalAladore 1.5: Aladore, by Henry Newbolt (1914, epub3), https://archive.org/details/AladoreHenryNewbolt3

Lets Read Together: Chapter One

ALADORE

CHAPTER I.
OF THE HALL OF SULNEY AND HOW SIR YWAIN LEFT IT.

SIR YWAIN sat in the Hall of Sulney and did justice upon wrong-doers. And one man had gathered sticks where he ought not, and this was for the twentieth time; and another had snared a rabbit of his lord’s, and this was for the fortieth time; and another had beaten his wife, and she him, and this was for the hundredth time: so that Sir Ywain was weary of the sight of them. Moreover, his steward stood beside him, and put him in remembrance of all the misery that had else been forgotten.

And in the midst of his judging there was brought into the hall a child that had been found in the road, a boy of seven years as it seemed: and he was dressed in fine hunting green, but not after the fashion of that day or country. Also when they spoke to him he answered becomingly, but in a speech that no one could understand.

So Sir Ywain had him set by the table at his own side, and now and again as he judged those wrong-doers, he cast a look upon the child. And always the child looked back at him with bright eyes, and even when there was no looking between them, he listened to what was being said, and smiled as though that which was weariness to others was to him something new and joyful. But as the hour passed, Sir Ywain felt his mind slacken more and more, and whenever he saw the boy smiling, his own heart became heavier and heavier between his shoulders, and his life and the life of his people seemed like a high-road, dusty and endless, that might never be left without trespassing. And though he would not break off from his judging, yet he groaned over the offenders instead of rebuking them; and when he should have punished, he dismissed them upon their promise, so that his steward was mortified, and the guilty could not believe their ears.

Then when all was said and done the hall was cleared, and Sir Ywain was left alone with the boy.

But the steward, looking slyly back through the hinges of the door, saw that his lord and the child were speaking together; and he perceived that they understood one another well enough, though how this should have come about he was not able to guess, having himself heard the boy answering to all questions in none but an outlandish tongue.

Then he saw Sir Ywain rise up, and suddenly he was aware that his lord was calling for him loudly and with a hearty voice, as he would call for him long since, when they were at the wars together. And when he went in, Sir Ywain bade him summon all the household.

Now when the household were come into the hall they stood at a little distance from the dais, in the order of their service, and Sir Ywain stood above them in front of the high table. And beside him was the boy, and before him was his own brother, who was now an esquire grown, with hawk on wrist.

Then Sir Ywain bade his brother kneel down, and there he made him knight, taking his sword from him and laying it on his shoulder, and afterwards belting it again round his body. And he took the keys from his own girdle and the gold spurs from his own feet, and said aloud: I call you all to witness that as I have done off my knighthood and the Honour of Sulney, and given them to this my brother Sir Turquin, so also by these tokens do I deliver unto him the quiet possession of my house and goods and the seisin of all my lands, to hold unto him and his heirs for ever, by the service due and accustomed for the same. And henceforth I go free.

How Sir Ywain was led away of a Child

Then his brother, who was both glad and sorry, and moreover was still in doubt how this might end, stood holding the keys and the spurs, and looking at him without a word. And he looked also at the child, and he saw that for all the difference in their years, the eyes of Sir Ywain had become like the boy’s eyes: and as he looked his heart became heavy, and for a moment he envied his brother and feared for himself. But in his fear he moved his hands, and the keys clanked and the spurs clinked together, and his heart leaped up again for joy of his possessions.

And all this Ywain saw as it were a great way off, and he smiled, and forgot it again instantly. And the boy took his hand, and they went down the hall together. And when they came to the door to pass out, the steward got before them and bowed as he was used to do, and he spoke very gravely to Sir Ywain, reminding him that this same afternoon had been appointed among the lords, his neighbours, for the witnessing of certain charters.

But Ywain and the boy looked at one another and laughed, and the steward saw that they laughed at the lords and at him and at the very greatness of the business: and he was enraged, and turned away and went to his new master.

Then Sir Turquin came hastily after them, and he laid his hand upon his brother’s arm and bent his head a little, and spoke to him so that none else should hear, and he said: What is this that you are doing; for no man leaves all that he has, and departs suddenly, taking nothing with him. But those two went from him without answering, and they passed, as it seemed, very swiftly along the road under the woodside, and were hidden from him. And again, as he stood still watching, he saw them going swiftly above the wood where there was no path, but only the bare wold before them.

Keep reading: get the Digital Aladore ebook at Internet Archive!

New Aladore EPUB!

If you have been following along, all that prototyping, testing, and tweaking eventually brings us to a NEW Aladore EPUB! I am calling it DigitalAladore1.0, because there might be some more versions to come (for example conversion to epub3 standard)…

This is an EPUB2 file which should render well on dedicated e-ink readers for a high quality reading experience. The text is much better than the auto-generated editions I encountered at the beginning of this project (here is one of the source editions on Internet Archive with a crummy PDF and epub available). We have done a lot of work to go beyond the first Digital Aladore draft edition. The images are nicer, the underlying mark up is sensible, the metadata is complete, and the epub package is put together correctly. And we did it all with Free software.

This is a major milestone for Digital Aladore, but I still have more to say (of course).  For example, I uploaded the new epub to Internet Archive, which I think is an amazing resource: we need to talk more about free distribution and the public domain. Lets save it for another day! For now:

DigitalAladore 1.0: Aladore, by Henry Newbolt (1914), https://archive.org/details/AladoreHenryNewbolt

Poetry Markup

If you remember, I talked a bit about the difficulties of formatting the poetry in an ebook (Aladore didn’t turn out tooooo complicated, since the poems and structures were pretty simple). And if you remember way back, we started on the journey To Aladore with a bit of poetry, the Song of the Children in Paladore.

So, to preview the new poetry markup and styling, I put the new CSS inline so we can revisit the song:

To Aladore, to Aladore,

Who goes the pilgrim way?

Who goes with us to Aladore

Before the dawn of day?

O if we go the pilgrim way,

Tell us, tell us true,

How do they make their pilgrimage

That walk the way with you?

O you must make your pilgrimage

By noonday and by night,

By seven years of the hard hard road

And an hour of starry light.

O if we go by the hard hard road,

Tell us, tell us true,

What shall they find in Aladore

That walk the way with you?

You shall find dreams in Aladore,

All that ever were known:

And you shall dream in Aladore

The dreams that were your own.

O then, O then to Aladore,

We’ll go the pilgrim way,

To Aladore, to Aladore,

Before the dawn of day.

Do you like it better than the Blockquote on the old To Aladore post?

Here is what it looked like in print:

To Aladore, p.250, 1914 edition

p.250, 1914 edition

You can see that the print verse is actually in a slightly smaller font. To reproduce a bit of these typographical techniques used to set off the poetry from body text, I added a few more CSS tweaks: font-size: 0.95em; word-spacing: 0.2em.

Digital Aladore Poster

This week I had a presentation about Digital Aladore that I grandly titled:

Digital Aladore: Creating and Reading with Free Software and the Public Domain

It was a great time chatting with people about the project and imagining how it relates to work that others want to do.

Here is a PDF of the poster I made for the event: Digital Aladore Poster

Also, here is some of the abstract.  It repeats things said else where in this blog, but in a more condensed form:

I love reading old books and I love reading on my e-reader.  I also love free stuff—cost free and freedom-free, like Free software.  Well known authors in the public domain have many high quality ebook editions freely available from organizations such as Project Gutenberg or commercial vendors such as Feedbooks.  But what if you want to read something more obscure?

I was trying to read Henry Newbolt’s Aladore, a “forgotten fantasy” novel from 1914.  Print editions were digitized by the Internet Archive, but a true ebook version was never created.  The digitized books can be read in an online viewer or downloaded as an image-based PDF.  Unfortunately, the PDFs are too highly compressed, requiring extensive rendering time in any reader.  Internet Archive also provides numerous other formats derived from automated OCR. Unfortunately, the quality is very poor and there has been no attempt to edit the resulting text or format the ebooks.   Basically, you have a choice between a low-quality-ridiculously-slow-and-cumbersome PDF or a gibberish-filled-automatically-generated EPUB.  This makes for a horrible reading experience!

It was so horrible, I decided to create my own ebook.  Thus, Digital Aladore was launched September 2014—exactly one hundred years after Aladore was first published.  It was originally conceived as a “Create” project for LIBR 559Q Open Knowledge, but my work has continued beyond the context of the course.  The idea is to use freely available public domain materials (the digitized copies of Aladore) and free software to create a GOOD digital reading edition of the text, and to blog about the entire process along the way.

The bigger idea is to demystify the creation of ebooks, empowering readers to be reflective creators.  Digital Aladore has a zero dollar budget: public domain content, free software, recycled hardware—all you need is some interest, passion, and perseverance.  The blog explores preserving, creating, and sharing through public domain, free software, open formats, and Creative Commons.  It reflects about the process of digitization and textual transmission.  However, the main focus is a practical hands-on spirit: crack open an EPUB or an old computer, and look inside!

Draft EPUB released!

Since I have been holding the Digital Aladore world in suspense for too long, I decided to release a draft version of the EPUB.  I uploaded it to the Internet Archive Community Texts collection for easy distribution:  https://archive.org/details/AladoreNewbolt

Draft cover image.

Draft cover image.

This version of the EPUB is minimally formatted.  The cover is pretty ugly.  And there is no stylesheets, so it won’t look very fancy.  But, it has the most up-to-date edited text, all the images, and it works!  So enjoy!

P.S. I also uploaded the plain text version to the Internet Archive page.

Digital Aladore versus Internet Archive!

No, this isn’t about some kind of battle–I love Internet Archive!

But, it goes back to the beginning of this project:  I was trying to read the Aladore EPUB available from Internet Archive.  It was terrible!  These files are automatically generated by OCR and not human edited (a tag in the HTML headers says they were generated by “abbyy to epub tool, v0.2.”), so maybe I should be impressed at how good they are… but the thousands of tiny errors make for a frustrating and ugly reading experience.

When I first read the Internet Archive Aladore EPUB back in July 2014, I did some quick find & replace to clean up the EPUB a bit using Calibre.  However, some of the errors are very difficult to eliminate without painstaking editing of the entire text.  Particularly annoying are the headers and page numbers.  They both cause a lot of OCR errors, so are not predictable enough to remove with find & replace.

At that point, I just read it, and dealt with the crummy text…  If you want something better, there were two options: 1) edit the OCR text available on Internet Archive, 2) start from scratch and do the whole process yourself.

Although option one would probably be easier in the short term, the Digital Aladore project followed option two–because it seemed more interesting!  Trust me, I know the breadth of this project wasn’t entirely necessary, but I think it demonstrates how a single person using only Free software can create a quality digitized text.  To finish polishing and editing the text, larger projects that can utilize the power of crowd source, like Project Gutenberg’s Distributed Proofreaders (or in Canada), are great.  But what if its such a obscure or personal text that you can’t generate that kind of participation and interest?  I think Digital Aladore shows its not impossible to just go it alone.  Be empowered over your EPUBs!  Create, Edit, Read!

Anyway,

So to see how far (or not very far) the Digital Aladore text has come by starting over from scratch, I thought it would be interesting to do a Juxta comparison with the Internet Archive text.  To set up the comparison I had to smooth out some technical content in the HTML since I really just want to compare the written text, not the tagging.  Here is a quick outline of what I did, since the concepts may be helpful if you want to start from an existing EPUB and polish it up, rather than go through the entire OCR process:

Open the EPUB with Sigil (downloaded from https://archive.org/details/aladorehen00newbrich )

Explore the contents to understand how the files are divided.  In this case the EPUB has a bunch of files named “leaf” which are the covers and random pages from the front matter.  The actual text is contained in three arbitrarily divided HTML files named “part.”

Merge the HTML containing the text (In the “Book Browser” pane highlight the files, then right click and select Merge).

I noticed that the text contains a bunch of page divisions represented by <div> tags.  They are not very accurate, and won’t relate to the Digital Aladore text, so I wanted to remove them all.  Advanced Find & Replace, using Regex, <div class=".*?" id=".*?"/>

Then, search for div, since there are a few more scattered around to remove.

Next, I need to remove the illustrations since I just want to compare the text.  I looked at how they were tagged in the files, and used this regex Find & Replace string to eliminate them: <p class="illus"><img alt=".*?" src=".*?"/></p>

Finally, right click on the HTML file and Save As to export it from the EPUB.

Ready to compare!

Here it is at Juxta Commons: http://juxtacommons.org/shares/BUCgJl

Publication history, summary

Aladore (1914), page 5

Aladore (1914), page 5

Just to summarize the last few posts, here is a full publication history of Aladore in one list.

Aladore, by Henry Newbolt:

  • 1914 [serially published articles] Blackwood’s Magazine, Vol. 195-196.

Aladore, by Henry Newbolt, illustrated by Lady Hylton:

  • 1914 [de luxe, limited edition] Edinburgh and London: William Blackwood and Sons.
  • 1914 [standard edition] Edinburgh and London: William Blackwood and Sons.
  • 1915 New York: E.P. Dutton & Company.
  • 1916 Edinburgh and London: William Blackwood and Sons.
  • 1975 [Newcasle Forgotten Fantasy Library, vol.5] Hollywood (Calif.): Newcastle.
  • 1980 San Bernardino, Calif.: Borgo Press.
  • 2006 digitized copy: New York: E.P. Dutton & Company, 1915.
  • 2006 digitized copy: Edinburgh and London: William Blackwood and Sons, 1914.

Digitized Aladore

Finally, lets get some digital witnesses to work with!

There are basically two scanned books available online in many different versions with different processing.

Aladore, 1914 Title page

Aladore, 1914 Title page

First, is a copy of the 1914 Blackwood standard edition scanned at University of Toronto in 2006.  Several different versions of the scanned book are available.  U of T previously hosted PDFs on their own library website.  You can still get their two versions from the legacy links, although the catalog no longer points to them:

U of T 1914 edition: http://scans.library.utoronto.ca/pdf/1/5/aladoren00newbuoft/aladoren00newbuoft.pdf

U of T 1914 edition, processed to black and white: http://scans.library.utoronto.ca/pdf/1/5/aladoren00newbuoft/aladoren00newbuoft_bw.pdf

The U of T library now points to Scholars Portal Books, hosted by the Ontario Council of University Libraries.  The listing is here: http://books1.scholarsportal.info/viewdoc.html?id=75462

These PDFs are identical.  They are large (57.2 MB) and well made.

Internet Archive also hosts a derivative of this scan.  However, the PDF is much lower quality (10.9 MB) and has slow performance due to the odd layered post-processing.  IA also provides automatically generated alternative formats, such as EPUB, but the accuracy of the transcription is horrible. On the upside, IA provides much more metadata than any of the other sites.  https://archive.org/details/aladoren00newbuoft

 

Aladore, 1915 Title page

Aladore, 1915 Title page

Second, is a copy of the 1915 Dutton edition scanned at University of California Libraries in 2006.  This copy exists in two versions online.

University of California Libraries point to the record on Hathi Trust Digital Library.  Using Hathi can be annoying because they limit downloading many of their items, despite the fact that they are in the public domain.  They have a sort of pay wall that requires logging in from a partner institution to access the full site.  The second annoyance is that Hathi adds a huge border around the PDF pages that has a reference to the source file, plus a watermark over the bottom of the page.  In this case the watermark says “Digitized by Internet Archive, Original from University of California.”  Internet Archive does NOT include this watermark on their copy!  Unfortunately, in providing this format, Hathi does not seem to consider READING and readers.  It also seems a pathetic possessiveness over public domain materials.  Color and black+white PDFs are available from the Hathi catalog listing, and are of high quality (119 MB): http://catalog.hathitrust.org/Record/006155073

Internet Archive also hosts a version of this scan, but again, their post-processing creates a lower quality (13.5 MB) and less useable PDF: https://archive.org/details/aladorehen00newbrich

Looking at the metadata provided by Internet Archive is really fascinating.  The digitization of both editions were sponsored by Microsoft.  One was shot using a Canon EOS 5D (at 400 ppi), the other a 1Ds (at 500 ppi)–almost certainly using an ATIZ BookDrive.  The 1915 was shot November 7th 2006, and the 1914 was shot ten days later.  The operators were “scanner-melissa-cunningham” and “scanner-katie-lawson.”

Even with all this random metadata, we do not know much about the digitization project or the post-processing.  To me it is strange that we do not better document the process, to understand the intentions behind how these objects were created.  It is interesting that the library catalogs do not represent ANY of the digitization metadata.  The catalogs only refer to the original object and seem completely uninterested in the digital one or how it came into existence.