Interested in minuet EPUB intricacies? Good, you are in for a treat!
One of the steps I mentioned for going from EPUB2 to EPUB3 is removing the toc.ncx file. This is actually a some what involved step (that is probably unnecessary) so I thought I would expand on it a bit. It also gives you a chance to poke around EPUB innards…
The NCX, i.e. “Navigation Center eXtended” was a feature to enhance navigation and accessibility based on the DAISY/NISO Standard. It was a required Spine element in EPUB2. However, the EPUB 3.0.1 spec that tells you the NCX is Superseded. Instead you are required to include an EPUB Navigation Document (i.e. nav.xhtml) that makes use of the HTML5 nav element. Basically you need to set up a <nav>, with <ol> inside, with <li> that have <a> relative links to parts of the ebook.
Since this file is a valid HTML document, it can be easily rendered by the reading device. Thus the new navigation file can serve both as a human and machine readable TOC. You can write the table of contents once in <nav>, use it at the beginning of the book (for people) and for the device to understand the reading order of the digital files to provide extended navigation.
So in EPUB3 you need a Nav doc, but do you need to get rid of NCX? No, not really… NCX Superseded says that we “MAY” include the NCX since it will not interfere with anything, “but EPUB 3 Reading Systems must ignore the NCX.” I.e. older devices will keep looking for NCX, but newer ones will definitely not.
So we have an EPUB2, we create a new Nav based TOC, the question is NCX to keep or not to keep… I really don’t have a good answer. It seems there is no reason to not keep it?
But if you want to get rid of it, building a Pure EPUB3, its more complicated than just deleting one file. Here’s what you need to do:
- Unzip your EPUB (it is probably already unzipped if you are monkeying around with the EPUB2 to 3 transition), navigate to the OEBPS directory.
- Delete the toc.ncx.
- Open the content.opf file in a text editor. This is an XML that defines the ebook.
- Look for the <manifest> element and find an <item> listing for the NCX (easiest just to search for “.ncx”). It should look something like this:
<item href="toc.ncx" id="ncx" media-type="application/x-dtbncx+xml"/>. Delete it!
- Find the <spine> element. It should have a TOC attribute that looks like this:
<spine toc="ncx">. Delete the whole attribute as it is optional in EPUB3, leaving
- Save your cleansed content.opf!
Wow, its been awhile since my last post–sorry. Time to get back on track and get the polished EPUB out there!
I have a series of posts that will deal with editing and styling the draft EPUB into a more finished book. But, first lets review what an EPUB is:
EPUB is a free and open ebook format maintained by International Digital Publishing Forum. The current version is EPUB3, but many ebooks (like the Aladore draft) are still provided following the EPUB2 specs. It is built on web standards with the main content of the book contained in a series of XHTML files. The standard allows for CSS style sheets, images, and other supplemental files to be included. The book’s structure is communicated to ereading devices by the Open Packaging Format (.opf), an XML file that contains metadata, a manifest of the contents, and a “spine” element defining the linear order for reading the content files. All the components of the ebook are wrapped in a zip package given the .epub extension.
For example, the draft Aladore EPUB looks like this unzipped:
The first file, mimetype, simply says “application/epub+zip”, telling the device what the package is.
The container.xml file tells the device where the “root” file is, that will explain the structure of the book. It will normally reference the content.opf. The OPF is where things start to get more interesting, so lets take a quick look:
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<package xmlns="http://www.idpf.org/2007/opf" unique-identifier="BookId" version="2.0">
<metadata xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:opf="http://www.idpf.org/2007/opf">
<dc:identifier id="BookId" opf:scheme="UUID">urn:uuid:3e795f7d-7913-4181-b82c-3bd18e706507</dc:identifier>
<dc:creator opf:file-as="Newbolt, Henry" opf:role="aut">Henry Newbolt</dc:creator>
<dc:publisher>Digital Aladore project</dc:publisher>
<dc:description>A new ebook text based on digitized versions of the 1914 and 1915 editions of Aladore. Created in 2014 by the Digital Aladore project.</dc:description>
<dc:contributor opf:role="ill">Alice Hylton</dc:contributor>
<meta content="0.8.5" name="Sigil version" />
<meta name="cover" content="aladore_cover.jpg" />
<item href="toc.ncx" id="ncx" media-type="application/x-dtbncx+xml" />
<item href="Text/title.xhtml" id="title.xhtml" media-type="application/xhtml+xml" />
<item href="Text/Section0001.xhtml" id="Section0001.xhtml" media-type="application/xhtml+xml" />
<item href="Images/aladore_01.jpg" id="aladore_01.jpg" media-type="image/jpeg" />
<item href="Images/aladore_02.jpg" id="aladore_02.jpg" media-type="image/jpeg" />
<item href="Images/aladore_03.jpg" id="aladore_03.jpg" media-type="image/jpeg" />
<item href="Images/aladore_04.jpg" id="aladore_04.jpg" media-type="image/jpeg" />
<item href="Images/aladore_05.jpg" id="aladore_05.jpg" media-type="image/jpeg" />
<item href="Images/aladore_06.jpg" id="aladore_06.jpg" media-type="image/jpeg" />
<item href="Images/aladore_07.jpg" id="aladore_07.jpg" media-type="image/jpeg" />
<item href="Images/aladore_08.jpg" id="aladore_08.jpg" media-type="image/jpeg" />
<item href="Images/aladore_09.jpg" id="aladore_09.jpg" media-type="image/jpeg" />
<item href="Images/aladore_10.jpg" id="aladore_10.jpg" media-type="image/jpeg" />
<item href="Images/aladore_11.jpg" id="aladore_11.jpg" media-type="image/jpeg" />
<item href="Images/aladore_12.jpg" id="aladore_12.jpg" media-type="image/jpeg" />
<item href="Images/aladore_13.jpg" id="aladore_13.jpg" media-type="image/jpeg" />
<item href="Images/aladore_14.jpg" id="aladore_14.jpg" media-type="image/jpeg" />
<item href="Images/aladore_15.jpg" id="aladore_15.jpg" media-type="image/jpeg" />
<item href="Images/aladore_cover.jpg" id="aladore_cover.jpg" media-type="image/jpeg" />
<item href="Text/cover.xhtml" id="cover.xhtml" media-type="application/xhtml+xml" />
<item href="Styles/Style0001.css" id="Style0001.css" media-type="text/css" />
<itemref idref="cover.xhtml" />
<itemref idref="title.xhtml" />
<itemref idref="Section0001.xhtml" />
<reference href="Text/cover.xhtml" title="Cover" type="cover" />
If you are unfamiliar with XML it might look like intimidating gibberish, but this example is not too complex if you take it slowly. First, skip down to where it says
<metadata xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:opf="http://www.idpf.org/2007/opf">. This opens the metadata element that provides information describing the book. The two attributes of the element metadata let us know that follows the Dublin Core schema and the IDPF OPF qualifiers. Dublin Core gives us the basic fields you would expect, such as Title, Author, and Date. IDPF OPF qualifiers give the information more specific semantic meanings for ereading devices. For example, Dublin Core Creator is “Henry Newbolt”, but the OPF attribute says to file the name as “Newbolt, Henry”, since we normally sort authors by their last names.
The next element,
<manifest>, simply lists every file in the package. You can see the XHTML files that are the book’s text, the image files (JPGs), and the CSS style sheet.
Then, we have the
spine element. This tells the ereader the correct order to read the files. In this case: first look at the cover, then the title page, then the text! It also points out the table of contents will be a NCX file. The NCX is part of the EPUB2 spec that provides navigation and accessibility within the book. However, it is not part of the EPUB3 specification.
EPUB3 was approved in 2010 to take advantage of new HTML5 elements, better semantic markup, and allow more interactive functionality (script). However, support of the full specification continues to be very poor in reading apps and devices. I will write a full post about the issue soon. The draft Aladore EPUB was created using Sigil, which follows the EPUB2 standard (for now).
I have a whole series posts about polishing the draft with better markup and styling *almost* ready! Coming soon!