Tagged: epub

Google Docs EPUB

Some news on the EPUB creation front:

Google Docs just enabled a feature to export as EPUB! To use it, simply open the doc and look under File > Download as > EPUB Publication.

This is a handy and very easy method to create an ebook. However, the consistency and quality isn’t good. The markup it creates is down right bizarre with tons of unnecessary <span> tags and strange CSS. It also does not create a cover. In theory you could open this Google Doc EPUB with Sigil and do some polishing up, but given how unnecessarily complex the markup is, it would be more work than starting fresh.

So if you need a super quick EPUB for some reason, just click the “Download as” option. Otherwise, stick with the tools that provide better markup results, such as Writer2ePub and Sigil.

Advertisements

DigitalAladore 1.5, EPUB3 Edition!

Is DigitalAladore 1.0 looking crummy on your ultra high def 10 inch tablet screen?

Well, give DigitalAladore 1.5 a try! Following the workflow outlined in previous posts, I generated an Aladore EPUB3 edition. The images are much bigger and the CSS is slightly tweaked with larger screens in mind. Personally, I still find reading ebooks on tablets a bit unsatisfying, slightly too big and bright. But I think this version will look pretty good! However, at over 9MB it might be slow to load on your e-ink reader.

So with out further ado, you can find the new EPUB3 at Internet Archive,

DigitalAladore 1.5: Aladore, by Henry Newbolt (1914, epub3), https://archive.org/details/AladoreHenryNewbolt3

New Aladore EPUB!

If you have been following along, all that prototyping, testing, and tweaking eventually brings us to a NEW Aladore EPUB! I am calling it DigitalAladore1.0, because there might be some more versions to come (for example conversion to epub3 standard)…

This is an EPUB2 file which should render well on dedicated e-ink readers for a high quality reading experience. The text is much better than the auto-generated editions I encountered at the beginning of this project (here is one of the source editions on Internet Archive with a crummy PDF and epub available). We have done a lot of work to go beyond the first Digital Aladore draft edition. The images are nicer, the underlying mark up is sensible, the metadata is complete, and the epub package is put together correctly. And we did it all with Free software.

This is a major milestone for Digital Aladore, but I still have more to say (of course).  For example, I uploaded the new epub to Internet Archive, which I think is an amazing resource: we need to talk more about free distribution and the public domain. Lets save it for another day! For now:

DigitalAladore 1.0: Aladore, by Henry Newbolt (1914), https://archive.org/details/AladoreHenryNewbolt

Take a Peek at the Markup!

A few posts ago I outlined the improvements to the underlying markup of the text, moving us beyond the draft ebook.  Lets take a concrete look at what this means.

Here is what Chapter 3 of the draft Aladore EPUB looked like:

<h2 id="sigil_toc_id_3" style="text-align: center;">CHAPTER III.<br />
HOW IT FORTUNED TO YWAIN TO FIND A STAFF IN THE PLACE OF HIS SWORD.</h2>

<p>THEN Ywain turned his face towards the village, and went down the hill: and he went with a good heart, for though the boy had left him, yet he hoped not to be long without him, and even now when he looked straight forward it seemed that he had the joy of his company and his laughter. But when he turned and looked beside him, there was but his own shadow; black it lay and long, and about the edges of it a brightness was shining. Then he remembered that the sun was low and night rising among the hollows, and he bethought him of his supper and sleep.</p>

<p>So he went quickly to the village, and passed through it and came to the farmer's house that lay beside the great wood: and the farmwife gave him welcome, as one that knew not who he was, but could well pitch her guess within a mile or so. And she whispered to her husband, but he was hard of hearing and full of slumber from the fields. So when Ywain had supped, they showed him where he should lie. And when he was come there he laid him down, and the day went from him in a moment and he knew no more whether he were alive or dead.</p>

The new version of the markup looks like this:

<div class="chapter">
<h2 class="chapterHeading"><span class="chapterNumber">CHAPTER III.</span><br />HOW IT FORTUNED TO YWAIN TO FIND A STAFF IN THE PLACE OF HIS SWORD.</h2>

<p class="firstParagraph">THEN Ywain turned his face towards the village, and went down the hill: and he went with a good heart, for though the boy had left him, yet he hoped not to be long without him, and even now when he looked straight forward it seemed that he had the joy of his company and his laughter. But when he turned and looked beside him, there was but his own shadow; black it lay and long, and about the edges of it a brightness was shining. Then he remembered that the sun was low and night rising among the hollows, and he bethought him of his supper and sleep.</p>

<p>So he went quickly to the village, and passed through it and came to the farmer's house that lay beside the great wood: and the farmwife gave him welcome, as one that knew not who he was, but could well pitch her guess within a mile or so. And she whispered to her husband, but he was hard of hearing and full of slumber from the fields. So when Ywain had supped, they showed him where he should lie. And when he was come there he laid him down, and the day went from him in a moment and he knew no more whether he were alive or dead.</p>

Note the div class chapter, chapter heading class, chapter number span, and first paragraph class.  Here is the new CSS relevant to this selection:

body {
font-family: Georgia, serif;
margin-left: 1.1em;
margin-right: 1.1em;
}

h2.chapterHeading {
font-family: Georgia, serif;
font-size: 1.15em;
line-height: 1.6;
text-align: center;
margin-top: 5em;
margin-left: 3em;
margin-right: 3em;
}

span.chapterNumber {
font-size: 1.35em;
letter-spacing: 0.1em;
line-height: 2.5;
}

p {
text-indent: 1.2em;
line-height: 1.5;
margin-top: 0em;
margin-bottom: 0em;
}

p.firstParagraph {
text-indent: 0;
margin-top: 0.5em;
}

Which should get us to something that looks like this:

Chapter 3 rendered by the Readium app.

Chapter 3 rendered by the Readium app.

The Art & Practice of Typography

Its been a long and busy summer, but nothing much has happened at Digital Aladore. Far too long since the last posts! There are really only a few more to go before the project can wrap up and release the final ebook to the world. If only I can find the time…

If you need some heavy reading in traditional typography, check out Edmund G Gress, The Art & Practice of Typography, digitized by the Smithsonian:

art & practice of typographyIt is amusing to download the EPUB–why, oh why was this created?  Not only is the OCR appalling on all the strange fonts and columns, but it misses the point of showing off the art of typography!

First, the OCR could be vastly improved with just a few tiny edits, for example the first paragraph of the preface reads:

IN the preface to the first edition of “The Art and Practice of Typograpliy,” the author stated that he did not “anticipate again having tlie pleasure of producing a book as elaborate as tliis one,” but the favor witli wliich tlie volume was received made anotlier edition advisable

It takes one human glance to realize that “h” is not being recognized (by ABBYY), which a computer should realize as well with a simple spelling dictionary.  (Readers of Digital Aladore, of course, could fix this file up in no time!)

Meanwhile, the varied examples of type are reduced to this single CSS:

body {
font-family: "Palatino Linotype", "Book Antiqua", Palatino, Georgia, "Times New Roman", serif;
}
h1,h2,h3,h4 {
font-family: "Palatino Linotype", "Book Antiqua", Palatino, Georgia, "Times New Roman", serif;
}
p {
font-family: Georgia,  "Palatino Linotype", "Book Antiqua", Palatino, "Times New Roman", serif;
}
img {
display: block; text-align: center; margin: 1em auto;
}

Here is an amusing example: page 170,

art & practice of typography p170

is reduced by “ABBYY to EPUB” to this:

<div class="newpage" id="page-170"/>
<p> THE ART AND PRACTICE OF TYPOGRAPHY</p>
<p> EXAMPLE 465</p>
<p> Evolution of Roman lower-case type-faces. (A) Pen-made Roman capitals. (B) Development into Minuscules or lower-case thru rapid lettering. (C) Black Letter or German Text developed from Roman Uncials. (D) White Letter, the open, legible Caroline Minuscules, on which Jenson based his Roman type-face of 1470. (E) A recent typeface closely modeled on Jenson s Roman types. (F) Joseph Moxon's letters of 1676. (G) Caslon s type-face of 1722</p>
<p> The face first selected—and witlioiit Iicsitatioii—was foundries and tliat are available for niacliine composition.</p>
<p> Caslon Oldstyle as originally designed. Scotcli Roman was It may be well to inject liere a warning that most so-called</p>
<p> the second selection, Cheltenham Oldstyle the tliird, Clois- Caslon Oldstylcs are not as good as the one selected (Ex-</p>
<p> ter Oldstyle tlie fourth, Bodoni Book the fifth, and French amjile KiT-B) ; that Jenson Oldstyle is inferior to Clois-</p>
<p> Oldstyle the sixtli. (All shown in Example 4(57-) ter Oldstyle (Example I(i7-A) as a re])resentative of the</p>
<p> Type-faces designed and cut for j^rivate use were not original Jenson type. However, good representatives of</p>
<p> considered in making these selections, as it was believed Scotch Roman (Example 'KiT-D) are obtainable under the</p>
<p> best to adhere to type-faces that are procurable from most name of Wayside, of National Roman, etc.</p>
<p> ABCDEFGH IJ K L M N O P O R S T U V WX YZ</p>
<p> a h c cl e f g h i j k I m n o ]:&gt; q r s t u v w x y z How it appears assembieel</p>
<p> (A) Modernized Oldstyle, the Miller &amp; Richard type-face of about 1852</p>
<p> ABCDEFGHIJKLMNOPQRSTUVWXYZ abcdef ghi jklmnopqrstuvwxyz    How it appears assembled</p>
<p> (B) Century Expanded, the Benton "modern" type-face of 1901</p>
<p> EXAMPLE 466</p>
<p> Two standard type-faces that rate high in legibility, but that are colorless in the mass and lacking in the pleasing irregularities of form that characterized Roman type-faces before the nineteenth century. The various qualities of legibility found in Modernized Oldstyle have been converted to narrower letter shapes and more "modern " form in Century Expanded</p>

Which means it looks something like this on your ereader:

typography ereadHard on the eyes and meaningless!

The example illustrates the issues we have been dealing with at Digital Aladore– ebooks are awesome, but how can be bring the craft back into publishing?

CSS for Aladore

With the new and improved markup in place (detailed in the last post), I started prototyping CSS to style the ebook.  After exploring many variations, I decided to approximate the style of the original print edition.

First page of Chapter 3.

First page of Chapter 3.

For example, the chapter headings appear about one third down from the top of a new page. This white space reinforces the break and gives space for the large headings. I followed the same pattern, adding a margin-top to the <h2> that marks the chapter headings.  Getting more detailed, you can see in the print edition that compared to the chapter titles, the chapter numbers are slightly larger, have slightly exaggerated character spacing, and have a bit of extra line space. I approximated this in the CSS with font-size: 1.35em, letter-spacing: 0.1em, and line-height: 2.5.

Chapter 3 rendered by the Readium app.

Chapter 3 rendered by the Readium app.

To decide on a font, I balanced what is commonly available, is considered good for reading, and resonates with the original print. I selected Georgia, because it seems very close to the print edition, is very readable, and commonly available on reading devices. The common sans-serif fonts, such as Arial, look too modern and utilitarian for Aladore.

After creating a full set of CSS for the content, I added the style sheets to the EPUB using Sigil. Since the title page and TOC use totally separate styling from the rest of the content, I gave them separate CSS.  This minimizes the file size and simplifies things overall.

We tested this prototype on the three reading devices. Readium on the laptop was the only one that rendered all of the CSS styling. The Nook and Sony Reader did not support some of the details, such as letter-spacing.  The testing quickly identified issues to tweak, setting off repeated rounds of prototyping.

Readium was especially helpful for this process, since it did not involve sending the file to a separate device. However, it is not much different from looking at the XHTML in a narrow browser window and it has the widest CSS support of any current reading app—so I felt like it was not a true test of the ebook. However, one issue that Readium had was splitting the illustrations in two. Both Nook and Sony automatically limit the images to the height of one screen, breaking the text before and after. Image handling is complex and frustrating, CSS styles such as page-break-inside:avoid are not well supported. One approach could be to do the equivalent of what the print edition did when inserting the plates: put each illustration in its own XHTML file, ensuring that there is a page break before and after the image. This would decrease the chance that devices will break the image in half, although it would not guarantee it! I am not satisfied with the solution in the old print edition—its kind of an ugly way to integrate the illustrations, the same for ebooks.

After some more testing, we settled on one version (for now), I will show you some details in the next post.

 

Ebook Design Challenges…

Designing an ebook is complicated: its somewhere between print and web design, but presents many unique challenges.  In this section, 6.5 Ebook Design, I want explore the design process for the final Digital Aladore EPUB edition.

In a few earlier posts, I touched briefly on optimizing the EPUB for reader devices by creating smaller XHTML files, cleaner markup, and reasonable image sizes.  However, up to this point we haven’t looked closely at styling the ebook.  While the draft version of the Aladore EPUB is useable and readable (an improvement over the gibberish filled automatically generated ebook or overly compressed PDF provided at Internet Archive), a book is more than raw text.  The designer must make decisions about how to most effectively present the content. The format, layout, typography, and design elements are influenced by practical and aesthetic concerns, and informed by the user model.

In the draft Aladore EPUB, there is no CSS or inline styling, thus the presentation is left up to the defaults of the reading app rendering the book. These defaults can be wildly different, having even less consistency than web browsers. For example, some readers such as Nook, automatically indent paragraphs and add a slight margin from the edge of the screen. Others do not, leaving the text uncomfortably close to the edge. This hints at the challenges of designing and styling an ebook. The file needs to be flexible enough to meet user needs and expectations in a highly diverse and non-standardized environment.

Ebooks are used on many device types, including e-ink readers, tablets, phones, and computers. This presents several hardware challenges:

  • Screen sizes vary from tiny to huge. A ebook perfectly styled for the average 6” e-ink screen can look strange on a 10” HD tablet. In general, styling needs to be based on relative measures, not specific dimensions.
  • Screen refresh rates vary considerably, since dedicated ereaders can save battery with minimal refresh rates. However, these low refresh rates often cause interactive features to produce strange looking artifacts or not function well.
  • Input method, the devices support interacting with the document via several different means. Some older ereaders have only a few hard buttons allowing very minimal interaction with the text, other than turning pages. Newer ereaders and tablets have touch screen only which allow a more tactile and gesture based interaction. Computers are mostly mouse and keyboard focused which allows more detailed inputs, but creates a considerably different experience interacting with the text.
  • Hardware specs vary considerably. E-readers tend to have low end processors and small memory to maximize battery life, thus any rendering that requires processing or loading large files becomes cumbersome.

These issues are compounded by a number of software based challenges, since ebooks must be rendered by a reading application. Users are aware that they have a choice in what web browser they use and that there is slight variety in how each browser renders content. Designers use a variety of techniques, such as browser CSS prefixes, to create markup that will render consistently in the different browser engines.

Dedicated ereaders typically have a built in reading app with a specialized and proprietary rendering engine developed by the device maker (often based on Adobe Digital Editions). Since the app is essentially hidden, users are not really aware it. There is very little information available about the different engines. Tablets can install independent reading apps, so the user can choose one based on the functionality and look they want (more often the commercial ecosystem they buy from). However, since these rendering engines are more niche, less standardized, and less well known than web browser engines, there are not as many techniques for designers to automatically account for differences. Furthermore, results from EPUBTest demonstrate that support for the full specifications of EPUB varies considerably and is low overall (if you want to try it yourself, a set of epub3 books can be downloaded to systematically test devices for functionality and feature support).  Because of these challenges, there are few specific guidelines for styling and formatting ebooks.

Ebook UX

The main objective of this section of the Digital Aladore project is to create a polished EPUB2 edition optimized for ereader devices.

Anytime we are designing something, from websites to houses, we should be thinking about the end users. How will the features of the object support an enjoyable and efficient user experience? Even in a very informal design context (such as here in the one man Digital Aladore UX Design Center) our products are molded by how we imagine the user–in this case, the potential readers.

My user model is based on my own experience (as an avid reader of early fantasy novels and user of an ereading device), informal feedback from friends, and MobileRead forum content. I imagine users of the Aladore ebook will be focused on reading the novel in a linear fashion, yet are not interested in strict pagination (like a PDF). Instead they would value clear, readable text reflowing and customizable to suit their particular device. However, they also do not want to be distracted by the technology or innovative features of the ebook. They have expectations of a document genre based on experience reading print novels. They expect the design of the book to reflect the character of the content, or at least not clash with it. For example, take a look at Middlemarch using the Terminal CSS style from EPUB Zen Garden:

Terminal CSS, EPUB Garden.

Terminal CSS, EPUB Garden.

EPUB Zen Garden was inspired by CSS Zen Garden, but unfortunately the site is no longer live (the designs are reproduced in the epub-zen project on GitHub).  A user who selects the Terminal style is likely looking for amusing novelty, not efficient reading or the traditional experience of the novel.  The green screen style is unexpected, distracting, and changes the atmosphere. Furthermore, this style would not function on an greyscale e-ink reader.

Designing for ebooks is complicated by the fact that users will have a diverse mix of devices for reading. Since the function of dedicated ereaders is fairly simple, users update their device less frequently than phones or tablets. I imagine users, such as myself, with e-ink readers that have lower end specs from five or more years ago. Since a focus of Digital Aladore is openness and sharing, the ebook should not exclude these users. This suggests the need for flexible design that supports a wide range of screen sizes from phone to tablet, screen types from e-ink to retina display, and hardware specs. Since devices with larger screens and more powerful processing can use PDF versions of Aladore, this project focuses on optimizing the ebook for e-ink readers. I imagine users having devices such as Sony Reader, Kobo Touch, or Nook Simple Touch in addition to newer e-ink models.

Unfortunately, ereading devices and applications have very inconsistent support for the EPUB specification, which means it is hard to create designs that will behave consistently. Some devices ignore all CSS, only render a small subsection of the styles, or override them by default. Most devices also allow users to change style settings such as font and font size. For example, the native Reader app on Barnes & Noble Nook overrides any included styles by default, setting its own font, font size, line spacing, text-indent, and margins. The user must open the options and toggle “Publisher Defaults” on to view the ebook with its built in styles.  This means even if you spend time creating the perfect XHTML and CSS markup, the reader may never see your design!

While there are significant challenges to presenting a consistent representation of the document to users, good design of the ebook will still support user choice, accessibility, and usability.

Having a solid markup will create a sustainable and flexible product that can “play nicely” with the world of web standards. In a presentation titled “A Cautionary Tale About Poor Ebook Markup” (Ebookcraft 2014), Liza Daly gave examples of the common bad practices in commercial ebook markup such as using attributes to create elements that already exist and adding tables as images rather than using html. These habits make the book less accessible to services such as text-to-speech and enhancements for the visually impaired, but also limit its discoverability on the web. Although most reading devices do not support the semantic markup possibilities of EPUB3, Google does. Machines are already using the full markup to “understand” content for better search, indexing, and creating snippets.

So while the main objective of Digital Aladore is to create a good reading edition of Aladore for users of e-ink devices, I also hope to create a flexible and reusable text that efficiently utilizes standards to be more human and machine readable. This will ensure the ebook is usable (and enjoyable) for a wide variety of current and future readers.

 

 

Draft EPUB released!

Since I have been holding the Digital Aladore world in suspense for too long, I decided to release a draft version of the EPUB.  I uploaded it to the Internet Archive Community Texts collection for easy distribution:  https://archive.org/details/AladoreNewbolt

Draft cover image.

Draft cover image.

This version of the EPUB is minimally formatted.  The cover is pretty ugly.  And there is no stylesheets, so it won’t look very fancy.  But, it has the most up-to-date edited text, all the images, and it works!  So enjoy!

P.S. I also uploaded the plain text version to the Internet Archive page.