Since Digital Aladore is more than a year old (see The Idea), I thought I should check in with a few of the key tools for any news. First up is Sigil Ebook editor, used for creating the various EPUB versions. As I have said many times, it is a great tool! There are a few features like the character report and auto merging html that I wish were in my everyday text editor.
After a scary period where it looked like development on Sigil might stall, I am happy to see it surge back to an active project full of interesting changes. This week version 0.9.1 was released stabilizing a host of new features moving the application towards full EPUB3 support. Also be sure to check through the Plugin Index to find many useful extensions for the editor.
Creating an editor that supports both EPUB2 and 3 is a bit complicated. As I mentioned in an earlier post, older versions of Sigil automatically correct markup and packaging to match the EPUB2 standard. To fix this issue, version 0.9.1 replaces Xerces (xml parser) and Tidy (html parser) with Python lxml and Google Gumbo, and makes the FlightCrew EPUB2 validator a plugin rather than built in tool.
Despite the major overhaul under the hood, using Sigil remains almost unchanged, which is great. So thank you to current maintainers Kevin Hendricks and Doug Massay and everyone else who makes this Free and open tool available!
Check out the code or get the latest version at Github.
Interested in minuet EPUB intricacies? Good, you are in for a treat!
One of the steps I mentioned for going from EPUB2 to EPUB3 is removing the toc.ncx file. This is actually a some what involved step (that is probably unnecessary) so I thought I would expand on it a bit. It also gives you a chance to poke around EPUB innards…
The NCX, i.e. “Navigation Center eXtended” was a feature to enhance navigation and accessibility based on the DAISY/NISO Standard. It was a required Spine element in EPUB2. However, the EPUB 3.0.1 spec that tells you the NCX is Superseded. Instead you are required to include an EPUB Navigation Document (i.e. nav.xhtml) that makes use of the HTML5 nav element. Basically you need to set up a <nav>, with <ol> inside, with <li> that have <a> relative links to parts of the ebook.
Since this file is a valid HTML document, it can be easily rendered by the reading device. Thus the new navigation file can serve both as a human and machine readable TOC. You can write the table of contents once in <nav>, use it at the beginning of the book (for people) and for the device to understand the reading order of the digital files to provide extended navigation.
So in EPUB3 you need a Nav doc, but do you need to get rid of NCX? No, not really… NCX Superseded says that we “MAY” include the NCX since it will not interfere with anything, “but EPUB 3 Reading Systems must ignore the NCX.” I.e. older devices will keep looking for NCX, but newer ones will definitely not.
So we have an EPUB2, we create a new Nav based TOC, the question is NCX to keep or not to keep… I really don’t have a good answer. It seems there is no reason to not keep it?
But if you want to get rid of it, building a Pure EPUB3, its more complicated than just deleting one file. Here’s what you need to do:
- Unzip your EPUB (it is probably already unzipped if you are monkeying around with the EPUB2 to 3 transition), navigate to the OEBPS directory.
- Delete the toc.ncx.
- Open the content.opf file in a text editor. This is an XML that defines the ebook.
- Look for the <manifest> element and find an <item> listing for the NCX (easiest just to search for “.ncx”). It should look something like this:
<item href="toc.ncx" id="ncx" media-type="application/x-dtbncx+xml"/>. Delete it!
- Find the <spine> element. It should have a TOC attribute that looks like this:
<spine toc="ncx">. Delete the whole attribute as it is optional in EPUB3, leaving
- Save your cleansed content.opf!
DigitalAladore 1.0 is a valid EPUB2. To recap: EPUB was chosen for the ebook because it is a free and open format built on open web standards (in contrast to proprietary formats such as Kindle AZW). And we love Free because of the many practical benefits of open source development plus the moral ideals of respecting the user’s freedom.
The EPUB2 standard was first released in 2007, but has since been superseded by EPUB3 released in October 2011. EPUB3 was designed to take advantage of new elements introduced in HTML5 and allow more interactive functionality (script). However, support of the full specification continues to be very poor. The only readers with full support seem to be commercial apps that deliver interactive books in a closed ecosystem. For example, AZARDI offers a cost-free reading app that has good support of advanced features of EPUB3, but it is focused on secure “content fulfillment” of interactive textbook subscriptions. To publish to the platform, authors must use their proprietary ebook creation application. Kobo and Apple have developed tweaked versions of EPUB3 that do not fully comply with the standard and focus on the possibilities for improved DRM, rather than functionality not found in EPUB2.
However, for simple functionality (i.e. a linear novel) EPUB3 is supported by most reading devices. I decided to update the Aladore EPUB2 to an EPUB3 version for future compatibility, higher specs, and improved semantic inflection. Guidelines now suggest adding larger images and cover images than I used in the EPUB2 to ensure they don’t look terrible on HD tablets. So while DigitalAladore 1.0 was optimized for older e-ink ereaders, the EPUB3 version will be optimized for larger, more powerful devices.
However, Sigil does not currently support the creation of ebooks following the EPUB3 spec. If you make changes to the markup following EPUB3, Sigil will actually correct them back to EPUB2 when saving the file. So, to create the Aladore EPUB3 we have to do a few extra steps:
- Replace all the image files with larger versions using Sigil.
- Use the Sigil plugin ePub3-itizer to export a pseudo EPUB3. Sigil developers intend to implement full EPUB3 creation and editing support soon, so this plugin is considered a “stop-gap measure.” It changes the HTML headers, restructures a few files, and adds the nav.xhtml.
- Unzip the ePub3-itizer output to edit the contents. Because Sigil limits the markup to XHTML valid to the EPUB2 spec, it is not possible to add HTML5 tags such as section or EPUB3 attributes such as epub:type (thus, it is what I call a pseudo EPUB3). I used the IDPF Accessibility Guidelines (The epub:type attribute) plus the attribute vocab EPUB 3 Structural Semantics Vocabulary to add some semantic structure to the text. This markup can be used for styling the document with CSS, but is also useful for machine processing and accessibility options. You can mark up sections of the ebook (frontmatter, body, backmatter), divisions within (abstract, chapters), types of content (footnote), or individual elements (title). I added div tags with attributes in the EPUB2 which I converted to section tags, for example, each <div class=”chapter”> became <section epub:type=”chapter”>. I used these epub:type values: cover, titlepage, chapter, epigraph, toc, and loi. Since I made each chapter a single XHMTL file, another option would be to add the epub:type attribute to the body element. However, those attributes would be lost if merging the HTML, so I prefer the section tags.
- Delete the toc.ncx file. This file was used by older reading devices to provide navigation functionality, but it is not part of the EPUB3 spec as it is replaced by nav.xhtml. However, many people seem to be leaving this file in the EPUB for legacy support. If you leave it, everything should work fine, but the file will NOT fully validate.
- Re-Zip the new EPUB3. EPUBs need to be zipped in the correct order or they will not function. This means you must create the zip archive first (in Windows right click somewhere and choose New > Compressed (zip) Folder), then add the mimetype file (drag it into the new zip folder). Then all the rest of the content can be added. Finally, change the extension from .zip to .epub.
- Validate with the IDPF EPUB Validator.
The sketchyTech blog talks about the differences created by this process in more detail if you want to hear from some one else…
But basically, that’s it! Not too complicated, although it requires some thought about 1) the quality of images to include, 2) changes to styling with larger screens in mind, and 3) consideration of semantic inflection to provide better accessibility and machine readability. I will post the new Aladore EPUB3 soon!