DigitalAladore 1.0 is a valid EPUB2. To recap: EPUB was chosen for the ebook because it is a free and open format built on open web standards (in contrast to proprietary formats such as Kindle AZW). And we love Free because of the many practical benefits of open source development plus the moral ideals of respecting the user’s freedom.
The EPUB2 standard was first released in 2007, but has since been superseded by EPUB3 released in October 2011. EPUB3 was designed to take advantage of new elements introduced in HTML5 and allow more interactive functionality (script). However, support of the full specification continues to be very poor. The only readers with full support seem to be commercial apps that deliver interactive books in a closed ecosystem. For example, AZARDI offers a cost-free reading app that has good support of advanced features of EPUB3, but it is focused on secure “content fulfillment” of interactive textbook subscriptions. To publish to the platform, authors must use their proprietary ebook creation application. Kobo and Apple have developed tweaked versions of EPUB3 that do not fully comply with the standard and focus on the possibilities for improved DRM, rather than functionality not found in EPUB2.
However, for simple functionality (i.e. a linear novel) EPUB3 is supported by most reading devices. I decided to update the Aladore EPUB2 to an EPUB3 version for future compatibility, higher specs, and improved semantic inflection. Guidelines now suggest adding larger images and cover images than I used in the EPUB2 to ensure they don’t look terrible on HD tablets. So while DigitalAladore 1.0 was optimized for older e-ink ereaders, the EPUB3 version will be optimized for larger, more powerful devices.
However, Sigil does not currently support the creation of ebooks following the EPUB3 spec. If you make changes to the markup following EPUB3, Sigil will actually correct them back to EPUB2 when saving the file. So, to create the Aladore EPUB3 we have to do a few extra steps:
- Replace all the image files with larger versions using Sigil.
- Use the Sigil plugin ePub3-itizer to export a pseudo EPUB3. Sigil developers intend to implement full EPUB3 creation and editing support soon, so this plugin is considered a “stop-gap measure.” It changes the HTML headers, restructures a few files, and adds the nav.xhtml.
- Unzip the ePub3-itizer output to edit the contents. Because Sigil limits the markup to XHTML valid to the EPUB2 spec, it is not possible to add HTML5 tags such as section or EPUB3 attributes such as epub:type (thus, it is what I call a pseudo EPUB3). I used the IDPF Accessibility Guidelines (The epub:type attribute) plus the attribute vocab EPUB 3 Structural Semantics Vocabulary to add some semantic structure to the text. This markup can be used for styling the document with CSS, but is also useful for machine processing and accessibility options. You can mark up sections of the ebook (frontmatter, body, backmatter), divisions within (abstract, chapters), types of content (footnote), or individual elements (title). I added div tags with attributes in the EPUB2 which I converted to section tags, for example, each <div class=”chapter”> became <section epub:type=”chapter”>. I used these epub:type values: cover, titlepage, chapter, epigraph, toc, and loi. Since I made each chapter a single XHMTL file, another option would be to add the epub:type attribute to the body element. However, those attributes would be lost if merging the HTML, so I prefer the section tags.
- Delete the toc.ncx file. This file was used by older reading devices to provide navigation functionality, but it is not part of the EPUB3 spec as it is replaced by nav.xhtml. However, many people seem to be leaving this file in the EPUB for legacy support. If you leave it, everything should work fine, but the file will NOT fully validate.
- Re-Zip the new EPUB3. EPUBs need to be zipped in the correct order or they will not function. This means you must create the zip archive first (in Windows right click somewhere and choose New > Compressed (zip) Folder), then add the mimetype file (drag it into the new zip folder). Then all the rest of the content can be added. Finally, change the extension from .zip to .epub.
- Validate with the IDPF EPUB Validator.
The sketchyTech blog talks about the differences created by this process in more detail if you want to hear from some one else…
But basically, that’s it! Not too complicated, although it requires some thought about 1) the quality of images to include, 2) changes to styling with larger screens in mind, and 3) consideration of semantic inflection to provide better accessibility and machine readability. I will post the new Aladore EPUB3 soon!
After completing the tweaks outlined in the last few posts, I opened the Aladore epub with Calibre’s built in editor for a final look. As mentioned in previous posts, the editor is comparable to Sigil, although not necessarily designed for creating ebooks from scratch. However, because it is built into Calibre’s ebook library management platform, it is great for making tweaks on the fly for testing on your reading devices. Also, development on the project currently seems more active than Sigil.
To get a overview of the contents of the epub, I open Reports from the Tools menu. This analyzes the package, listing all the files, words, images, styles, characters, and links. It is a nice way to quickly look for any issues that might still be lurking. I scan through the words to see if any weirdness stands out, then check the characters to ensure there is nothing strange. You will learn interest factoids, such as “and” is the most used word at 4001 times, or there is 66,910 spaces in the ebook.
It is worth noting that Calibre slightly modifies the metadata when ebooks are added to the library. If you are anal about your newly perfected markup, you might want to re-edit it. One powerful feature of the editor is “Compare to another book” under the File menu. It creates a nice visualization highlighting the differences between versions of the ebook (compare with Juxta used earlier in Digital Aladore). Here it is showing the differences introduced by the automatic Calibre metadata edits:
So everything looks okay! I also flipped through it on my reader for a final “user testing” session.
Finally, we want to run it though IDPF’s EPUB Validator (a free web-based tool) to ensure everything is kosher:
Ready for distribution?
Wow, its been awhile since my last post–sorry. Time to get back on track and get the polished EPUB out there!
I have a series of posts that will deal with editing and styling the draft EPUB into a more finished book. But, first lets review what an EPUB is:
EPUB is a free and open ebook format maintained by International Digital Publishing Forum. The current version is EPUB3, but many ebooks (like the Aladore draft) are still provided following the EPUB2 specs. It is built on web standards with the main content of the book contained in a series of XHTML files. The standard allows for CSS style sheets, images, and other supplemental files to be included. The book’s structure is communicated to ereading devices by the Open Packaging Format (.opf), an XML file that contains metadata, a manifest of the contents, and a “spine” element defining the linear order for reading the content files. All the components of the ebook are wrapped in a zip package given the .epub extension.
For example, the draft Aladore EPUB looks like this unzipped:
The first file, mimetype, simply says “application/epub+zip”, telling the device what the package is.
The container.xml file tells the device where the “root” file is, that will explain the structure of the book. It will normally reference the content.opf. The OPF is where things start to get more interesting, so lets take a quick look:
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<package xmlns="http://www.idpf.org/2007/opf" unique-identifier="BookId" version="2.0">
<metadata xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:opf="http://www.idpf.org/2007/opf">
<dc:identifier id="BookId" opf:scheme="UUID">urn:uuid:3e795f7d-7913-4181-b82c-3bd18e706507</dc:identifier>
<dc:creator opf:file-as="Newbolt, Henry" opf:role="aut">Henry Newbolt</dc:creator>
<dc:publisher>Digital Aladore project</dc:publisher>
<dc:description>A new ebook text based on digitized versions of the 1914 and 1915 editions of Aladore. Created in 2014 by the Digital Aladore project.</dc:description>
<dc:contributor opf:role="ill">Alice Hylton</dc:contributor>
<meta content="0.8.5" name="Sigil version" />
<meta name="cover" content="aladore_cover.jpg" />
<item href="toc.ncx" id="ncx" media-type="application/x-dtbncx+xml" />
<item href="Text/title.xhtml" id="title.xhtml" media-type="application/xhtml+xml" />
<item href="Text/Section0001.xhtml" id="Section0001.xhtml" media-type="application/xhtml+xml" />
<item href="Images/aladore_01.jpg" id="aladore_01.jpg" media-type="image/jpeg" />
<item href="Images/aladore_02.jpg" id="aladore_02.jpg" media-type="image/jpeg" />
<item href="Images/aladore_03.jpg" id="aladore_03.jpg" media-type="image/jpeg" />
<item href="Images/aladore_04.jpg" id="aladore_04.jpg" media-type="image/jpeg" />
<item href="Images/aladore_05.jpg" id="aladore_05.jpg" media-type="image/jpeg" />
<item href="Images/aladore_06.jpg" id="aladore_06.jpg" media-type="image/jpeg" />
<item href="Images/aladore_07.jpg" id="aladore_07.jpg" media-type="image/jpeg" />
<item href="Images/aladore_08.jpg" id="aladore_08.jpg" media-type="image/jpeg" />
<item href="Images/aladore_09.jpg" id="aladore_09.jpg" media-type="image/jpeg" />
<item href="Images/aladore_10.jpg" id="aladore_10.jpg" media-type="image/jpeg" />
<item href="Images/aladore_11.jpg" id="aladore_11.jpg" media-type="image/jpeg" />
<item href="Images/aladore_12.jpg" id="aladore_12.jpg" media-type="image/jpeg" />
<item href="Images/aladore_13.jpg" id="aladore_13.jpg" media-type="image/jpeg" />
<item href="Images/aladore_14.jpg" id="aladore_14.jpg" media-type="image/jpeg" />
<item href="Images/aladore_15.jpg" id="aladore_15.jpg" media-type="image/jpeg" />
<item href="Images/aladore_cover.jpg" id="aladore_cover.jpg" media-type="image/jpeg" />
<item href="Text/cover.xhtml" id="cover.xhtml" media-type="application/xhtml+xml" />
<item href="Styles/Style0001.css" id="Style0001.css" media-type="text/css" />
<itemref idref="cover.xhtml" />
<itemref idref="title.xhtml" />
<itemref idref="Section0001.xhtml" />
<reference href="Text/cover.xhtml" title="Cover" type="cover" />
If you are unfamiliar with XML it might look like intimidating gibberish, but this example is not too complex if you take it slowly. First, skip down to where it says
<metadata xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:opf="http://www.idpf.org/2007/opf">. This opens the metadata element that provides information describing the book. The two attributes of the element metadata let us know that follows the Dublin Core schema and the IDPF OPF qualifiers. Dublin Core gives us the basic fields you would expect, such as Title, Author, and Date. IDPF OPF qualifiers give the information more specific semantic meanings for ereading devices. For example, Dublin Core Creator is “Henry Newbolt”, but the OPF attribute says to file the name as “Newbolt, Henry”, since we normally sort authors by their last names.
The next element,
<manifest>, simply lists every file in the package. You can see the XHTML files that are the book’s text, the image files (JPGs), and the CSS style sheet.
Then, we have the
spine element. This tells the ereader the correct order to read the files. In this case: first look at the cover, then the title page, then the text! It also points out the table of contents will be a NCX file. The NCX is part of the EPUB2 spec that provides navigation and accessibility within the book. However, it is not part of the EPUB3 specification.
EPUB3 was approved in 2010 to take advantage of new HTML5 elements, better semantic markup, and allow more interactive functionality (script). However, support of the full specification continues to be very poor in reading apps and devices. I will write a full post about the issue soon. The draft Aladore EPUB was created using Sigil, which follows the EPUB2 standard (for now).
I have a whole series posts about polishing the draft with better markup and styling *almost* ready! Coming soon!