Category: 1. Introduction

The Idea

Aladore is a classic fantasy novel by Henry Newbolt, first published in 1914.  The work was highly influenced by the early fantasy romances of William Morris, filled with archaic language, allegorical elements, and fairy tale like plot progression.

The book is now in the public domain and there are a few digitized copies available for free download.  However, all are PDFs of scanned materials with poor quality OCR.  They are poorly formatted for reading and load too slowly.  The automatically generated e-book formats are completely full of errors.

They both offer a horrible reading experience!

While there are a few more polished e-book editions being sold online, there are no high quality versions available for free.   Despite being republished in the 1970’s and 80’s in “forgotten fantasy” type series paperbacks, Aladore is not well known or popular.  Two of Newbolt’s more famous poetry collections are available from Project Gutenberg, but it is unlikely that a version of Aladore will be created.

I love reading on my epaper reader.  I love reading early fantasy.  I love William Morris.  So I stumbled across Aladore.

And I wanted to read it!

But, I was so frustrated by the terrible experience of reading cumbersome PDFs or unedited OCR text.  If you like reading old books in the public domain, this experience will sound familiar…

So, this blog is about a personal attempt to create a high quality, readable version of the text–to be freely distributed.  I hope to test and compare different software options,  and reflect on the creation process, to create a simple “open-digital-critical-edition”.

I will explain more about the plans and ideas later!

Advertisements

Quick Outline

So the last post was about the general idea, i.e. make a nice, useable ebook edition of Aladore [while doing some experiential learning along the way!].  Today I just want to outline the basic work plan going forward.  I will post about each of the steps in more detail as I work through the process, but here is the start:

1) Background research: learn more about the author, book, and history of publication.  If we were going to make an actual critical edition this research would be, well, critical.  I am not going into that level of detail, but I want to be informed about the materials I am working with.

2) Get PDFs: find the raw materials for this project.  I would like to have two different scanned copies of the work to compare.

3) Test OCR: to create the ebook text, we need to run Optical Character Recognition on the PDFs.  I would like to test out some open source OCR tools against the standard commercial ones such as ABBYY.

4) Compare Editions: use visualization tools to compare the different book versions.  Quick and dirty textual criticism light!

5) Edit into Best Edition: weave together the highest quality edition based on the best available original text, scanning, and OCR.

6) Massage epub: edit and polish the text to create a readable ebook.

7) Release: send the text out into the world to be used!

 

Looks pretty strait forward here in a list… but, don’t worry, it will get ugly!

I have some good ideas for most of the process [I am fuzziest on #7], but send me some ideas and comments,

thanks!

Creative Commons

Sorry for lack of new posts for awhile.   I live in a neighborhood full of kids and I can confirm that cold season is in full swing here now…

You may have noticed that added a handy CC License icon to the blog side bar recently.  I realized I should have the license advertized loud and clear.  There are a lot of great reasons to choose a CC license.  I just like to share, and hope publicly offering my work to share generates mutual respect and more sharing!

This applies to the content of the blog, however, all the Aladore text will be licensed public domain.

If you are interested in adding this cool widget (and licensing) to your blog, follow two easy steps:

1) use this handy CC widget generator to choose a license and get the HTML:

http://creativecommons.org/choose

2) go into your blog Appearance > Widgets settings and add a “Text Widget.”  A text widget allows you to paste in any text, including HTML, to add to your widget areas.  Maybe it should be called a widget widget since the main use will be to paste in HTML widgets from other sites.  So copy and paste the HTML generated by the CC page into the text widget, and Wow!  You have a handy CC License on your sidebar!

Aladore Bibliography

Just for reference, here is a bibliography of print sources I used in the first few stages of the project.  I will also provide a tool bibliography in a future post.

Related to Aladore:

Susan Chitty, Playing the Game: A Biography of Sir Henry Newbolt (London: Quartet Books, 1997).

Henry Newbolt,

  • Admirals All and Other Verses (London: Elkin Mathews, 1898).
  • Clifton Chapel and Other School Poems (London: John Murray, 1908).
  • The Old Country: A Romance (New York: Dutton, 1906).
  • The Book of Cupid: Being an Anthology from the English poets, with 23 illustrations by Lady Hylton and introduction by Henry Newbolt (London: Constable & Co. 1909).
  • Song of the Children in Paladore. Two-part Song for Children’s Voices, poem by H. Newbolt, by Granville Bantock (J. Curwen & Sons, 1929).

Robert Reginald, “Paladorean Idylls: Sir Henry Newbolt’s Aladore”, in Xenograffiti: Essays On Fantastic Literature (Rockville, Maryland: Wildside Press, 1996).

Derek Winterbottom, Henry Newbolt and the Spirit of Clifton (Bristol: Redcliffe Press, 1986).

Related to critical editions and online books:

Fredson Bowers, “Some principles for scholarly editions of nineteenth-century American authors,” Studies in Bibliography 17 (1964), http://etext.virginia.edu/etcbin/toccer-sb?id=sibv017&images=bsuva/sb/images&data=/texts/english/bibliog/SB&tag=public&part=17&division=div

Gregory Crane, “What Do You Do with a Million Books?” D-Lib Magazine 12.3 (2006). doi:10.1045/march2006-crane, http://www.dlib.org/dlib/march06/crane/03crane.html

Marilyn Deegan and Kathryn Sutherland, ed.s. Text Editing, Print and the Digital World. Burlington, VT: Ashgate Publishing, 2009.

Wilfred L. Guerin, et al., A Handbook of Critical Approaches to Literature 5th edition (New York: Oxford University Press, 2005).

Katherine D. Harris, “TechnoRomanticism: Creating Digital Editions in an Undergraduate Classroom.” Journal of Victorian Culture 16.1 (2011): 89-94. http://dx.doi.org/10.1080/13555502.2011.554679

 

Tools Bibliography

Here is a basic list of tools used during the project.

The listing provides a link to the main software website. However, if you search for related posts on this blog, I usually give more details about using the tool and links to resources.  I tried to mark Free Software as (Free), although all items listed are cost free.  If I high recommend a tool, I have marked it with *** three asterisks!

General resources:

Creative Commons License choosing widget, http://creativecommons.org/choose/?lang=en

“What is Free software,” http://www.fsf.org/about/what-is-free-software

Linux Command Line, http://LinuxCommand.org

Project Gutenberg Distributed Proofreaders [crowd source editing of ebooks], http://www.pgdp.net

DIY Book Scanner [community with guides on how to do your own digitization], http://www.diybookscanner.org

EPUB:

Sigil *** [EPUB editor/creator (Free)], https://github.com/user-none/Sigil

Writer2epub [EPUB export extension for Open/LibreOffice (Free)], http://writer2epub.it/en

Calibre *** [ebook management and editing (Free)], http://calibre-ebook.com

HTML:

Bluefish Editor *** [powerful text editor for web development and programming (Free)], http://bluefish.openoffice.nl

PDF Readers:

Sumatra PDF *** [multi-purpose reader (Free)], http://blog.kowalczyk.info/software/sumatrapdf/free-pdf-reader.html

Evince [multi-purpose reader (Free)], https://wiki.gnome.org/Apps/Evince

PDF Readers, http://pdfreaders.org

PDF Utilities:

PDF Shaper [Freeware simple PDF utilities], http://www.glorylogic.com/pdf-shaper.html

Briss [cropping utility with page clustering (Free)], http://briss.sourceforge.net

K2PDF [PDF reformatter for ereaders (Free)], http://www.willus.com/k2pdfopt

jPdf Tweak [“Swiss Army Knife for PDF files”(Free)], http://jpdftweak.sourceforge.net

iText RUPS [PDF structure utility (Free)], http://sourceforge.net/projects/itextrups

diff-pdf [document comparison utility], http://vslavik.github.io/diff-pdf

OCR Engines:

Tesseract *** (Free), https://code.google.com/p/tesseract-ocr

Ocrad (Free), http://www.gnu.org/software/ocrad

CuneiForm [main website is broken, get it from a repository (Free)], http://en.wikipedia.org/wiki/CuneiForm_%28software%29

OCR GUIs:

YAGF *** (Free), http://sourceforge.net/projects/yagf-ocr

OCRFeeder *** (Free), https://wiki.gnome.org/OCRFeeder

SimpleOCR, http://www.simpleocr.com/Download.asp

Lector, https://code.google.com/p/lector

gImageReader, http://sourceforge.net/projects/gimagereader

Image Processing:

ScanTailor *** [post-processing suite for scanned images (Free)], http://scantailor.org

GIMP *** [raster image editor (Free)], http://www.gimp.org

ImageJ [advanced scientific image processing and analysis (Free)], http://rsbweb.nih.gov/ij/index.html

Fiji [=ImageJ], http://fiji.sc/Fiji

FastStone Image Viewer, http://www.faststone.org/FSViewerDetail.htm

Office Suites:

LibreOffice *** (Free), http://www.libreoffice.org

OpenOffice *** (Free), https://www.openoffice.org

Other Utilities:

Notepad++ *** [text file editor with many plugins and features (Free)], http://notepad-plus-plus.org

Juxta [text collation tool (Free)], http://www.juxtasoftware.org

7-zip [archive package manager (Free)], http://www.7-zip.org

DownThemAll! [download manager, Firefox extension (Free)], http://www.downthemall.net

AdvancedRenamer [file and directory batch renamer], http://www.advancedrenamer.com

GPRename [file and directory batch renamer], http://gprename.sourceforge.net/index.php

 

 

 

Reflective moment

I finished the draft Aladore EPUB a few posts ago, but still haven’t polished up the file…

Right now its just raw text and images in HTML crammed an EPUB package.  Yes its an EPUB, but its not a BOOK yet!  There is a few steps left, I am close–but hang in there, I am not ready to share just yet.

However, I had to give a couple minute introduction (Digital Aladore elevator speech) and write up a summary post about the project recently.  It has some background information and thoughts about the process, so I though I would repost it here:

Digital Aladore is a blog documenting a project to create a e-reader friendly e-book edition of the obscure early fantasy novel Aladore.

Background bibliography: https://digitalaladore.wordpress.com/2014/11/17/aladore-bibliography

Tool bibliography: https://digitalaladore.wordpress.com/2014/11/26/tools-bibliography

I first came across Aladore because I love the early fantasy works of William Morris that were published between 1850-1900. Morris’ “prose romances” were highly influential in developing the fantasy genre we know today, inspiring writing such as C.S. Lewis and Tolkien. Aladore is clearly influenced by Morris’ style and prose—a fantastic allegorical romance set in an invented world, told with archaic language and narrative styling. Henry Newbolt (1862-1938) was a famous poet in Britain, best known for his stirring patriotic verse—which also caused him to be ignored and forgotten in the post-war disenchantment with the propagandist use of duty and nationalism. He did not write any other work in the genre of Aladore.

I conceived of a project to create a reading edition of a public domain work because I love reading old books and I love reading on my e-reader. I also love free stuff—cost free and freedom free stuff! (like Free software)

Better known public domain authors, such as William Morris, have many good free ebook editions available from organizations such as Project Gutenberg or commercial vendors such as Feedbooks. Print editions of Aladore were digitized by the Internet Archive, but a true ebook edition was never created. The digitized versions can be read in an online viewer or downloaded as an image-based PDF [see: https://digitalaladore.wordpress.com/2014/09/29/digitized-aladore ]. Internet Archive also provides numerous other formats derived from automated OCR. Unfortunately, the quality of OCR is very poor and there has been no attempt to edit the resulting text. Furthermore, the image-based PDFs are too highly compressed, requiring extensive rendering time in any PDF reader. Basically, You have a choice between a gibberish filled automatically generated EPUB or a low quality ridiculously slow and cumbersome PDF. This makes for a horrible reading experience!

So, I started reading Aladore and thought: “I really need to convert this to an good EPUB.” And so Digital Aladore was launched early September 2014—exactly one hundred years after Aladore was first published!

The project provided a great opportunity to explore and learn about a huge variety of things that I only vaguely knew about at the beginning. Attempting to carry out tasks—then encountering issues—then searching for solutions—then articulating the ideas in the blog was a great way to build concrete skills and give body to theoretical concepts. I thought extensively about open formats, digital rights, implications of the public domain, and how to provide access. I also realized strange resonances with my academic background, such as exploring the digital text in terms of the traditional field of textual transmission.

I started with a clear outline and intended workflow. However, the actual work veered off pretty regularly as I pursued new discoveries, new tools, and new ideas—new distractions. I ended up spending more time evaluating and documenting multiple options to complete each step of the project, rather than just following a more expedient route to get things done. Basically, I spent too much time on it—but still haven’t completed the original outline or even published all my thoughts to the blog. Yet, I learned more than I intended to!

Blogging on WordPress was also a great experience. I continuously tweaked the Digital Aladore site to test solutions to different practical and theoretical considerations. Through comments, likes, and follows, I was able to connect with an interesting mix of other bloggers, some of who became important resources for the project.

The blog site was first intended to be totally minimalist to reflect the idea of reading on an e-reader. It gradually grew to include more features to make it more usable and more images to make it more interesting. However, the design continues to have a fairly stripped down aesthetic focused on reading. The first half of the content started out mostly theoretical and reflective, while the second half is much more technical and practical. I wish I could have made a better mix of the two, but there was so much technical workflow information that I wanted to document to make the blog a resource. Hopefully the reflective bits will catch up before too long.

Oh yeah, and the content on Digital Aladore is licensed CC-by-sa, and the ebook products are public domain. The tools focus on Free Software.

Digital Aladore is not done—so please follow along in the future and send your comments!

Some files:

Two Stories by Edgar Allan Poe relating to Blackwood’s Magazine (November 1, 1838) [an epub and pdf created for testing purposes early in the project], https://archive.org/details/PoeBlackwoodArticle

Gallery of Aladore’s illustrations, https://digitalaladore.wordpress.com/illustrations-gallery

Look for the EPUB soon!

 

 

Digital Aladore Poster

This week I had a presentation about Digital Aladore that I grandly titled:

Digital Aladore: Creating and Reading with Free Software and the Public Domain

It was a great time chatting with people about the project and imagining how it relates to work that others want to do.

Here is a PDF of the poster I made for the event: Digital Aladore Poster

Also, here is some of the abstract.  It repeats things said else where in this blog, but in a more condensed form:

I love reading old books and I love reading on my e-reader.  I also love free stuff—cost free and freedom-free, like Free software.  Well known authors in the public domain have many high quality ebook editions freely available from organizations such as Project Gutenberg or commercial vendors such as Feedbooks.  But what if you want to read something more obscure?

I was trying to read Henry Newbolt’s Aladore, a “forgotten fantasy” novel from 1914.  Print editions were digitized by the Internet Archive, but a true ebook version was never created.  The digitized books can be read in an online viewer or downloaded as an image-based PDF.  Unfortunately, the PDFs are too highly compressed, requiring extensive rendering time in any reader.  Internet Archive also provides numerous other formats derived from automated OCR. Unfortunately, the quality is very poor and there has been no attempt to edit the resulting text or format the ebooks.   Basically, you have a choice between a low-quality-ridiculously-slow-and-cumbersome PDF or a gibberish-filled-automatically-generated EPUB.  This makes for a horrible reading experience!

It was so horrible, I decided to create my own ebook.  Thus, Digital Aladore was launched September 2014—exactly one hundred years after Aladore was first published.  It was originally conceived as a “Create” project for LIBR 559Q Open Knowledge, but my work has continued beyond the context of the course.  The idea is to use freely available public domain materials (the digitized copies of Aladore) and free software to create a GOOD digital reading edition of the text, and to blog about the entire process along the way.

The bigger idea is to demystify the creation of ebooks, empowering readers to be reflective creators.  Digital Aladore has a zero dollar budget: public domain content, free software, recycled hardware—all you need is some interest, passion, and perseverance.  The blog explores preserving, creating, and sharing through public domain, free software, open formats, and Creative Commons.  It reflects about the process of digitization and textual transmission.  However, the main focus is a practical hands-on spirit: crack open an EPUB or an old computer, and look inside!