Category: Uncategorized

Visualizing Aladore the Book

Recently, images of Jason Shulman’s “Photographs of Film” project came across my Twitter feed. I love long exposure photography, so I was really enjoying the idea, but it also brought up the controversy surrounding his exhibit about a year ago. The issue is explained by researcher Kevin L. Ferguson in “To Cite or to Steal? When a Scholarly Project Turns Up in a Gallery”.

What is most interesting to me is Ferguson’s insistence on openness about his methods and techniques, versus Shulman’s deliberate secrecy. Readers of Digital Aladore will not be surprised that despite many years working as a studio artist, I sympathize with the openness!

Ferguson’s article included a handy mini-tutorial about ImageJ–which set me off to find some old images that I never shared on this blog. Early in the project, I covered using ImageJ to preprocess Aladore’s page images (Preprocessing 2). However, you can do many more interesting and beautiful things with ImageJ…

Like visualize the entire text block at once:

I find this image oddly compiling. It had a practical purpose of helping find the text block to crop out of the page images, thus simplifying OCR. But is also a unique visualization of the physical book, allowing us to read in a new way. We can see the pages summarized visually, perhaps revealing new insights into this artifact, hinting at the inky physicality missing in the digital images.

Anyway, they are really interesting to look at. So let’s do some more!

Get a copy of ImageJ2 (this is a newer version than used in my old posts, I suggest using the Fiji version). Make sure you have an up-to-date 64-bit version of Java installed (if you use 32-bit Java your memory will be limited).

Get a big stack of images, for example a full copy of Aladore from Internet Archive. Visit the Aladore 1914 book page and download the “SINGLE PAGE PROCESSED JPEG ZIP”. Unzip the package.

Open ImageJ / Fiji, go to “Edit” > “Options” > “Memory & threads”. Give it as much memory as seems reasonable for your system.

Now open your images: “File” > “Import” > “Image sequence”, navigate into the image folder and click “Open”. On the “Sequence Options”, check “Use virtual stack” to save memory. This will create an image stack. Browse through the stack by using the slider bar at the bottom.

If you want to work with these full sized images, you will need lots of RAM, otherwise resize the stack with “Image” > “Scale”. For example, Aladore is just over 400 page images, at around 2800 x 1800 pixels each. Running Zprojection Sum Slices required around 15Gb of memory. Scaled at “0.5”, it was able to run on 4Gb.

If you would like to crop the stack to clean up the edges, drag a rectangle over the image and “Image” > “Crop”.

The image above was created using “Image” > “Stacks” > “Z Project”, then selecting “Projection type” > “Min Intensity”. The processing will take a bit depending on your machine.

Here’s some other interesting visualizations to try:

Image > Stacks > Make montage, reveals the entire book in one image. Are there patterns and rhythms in the layout?

Image > Stacks > Z Project > Projection type > Sum Slices, creates this ghostly image:

Image > Stacks > Z Project > Projection type > Standard Deviation, a more sinister feeling visualization of the text block:

Image > Stacks > Plot Z-axis profile, charts the mean intensity, so the downward spikes are the blank pages after each illustration:

Image > Stacks > Orthogonal views, creates an interesting way to navigate the stack. The images on the right and bottom reveal cross sections of the book (image stack), use the cross-hairs to cut through the stack:

ImageJ is powerful and capable of creating fascinating visualizations of batches of images. It is developed for scientists working with imagery data, often biomedical microscopy tracking the numbers of cells or counting biomarkers. Digital humanities researchers such as Ferguson have expanded it’s use to media studies and visual culture. However, don’t let that stop you from “misusing” it to create something beautiful.

On that note, check out Ferguson’s recent use of ImageJ to manipulate video, “Edge”.

Reflecting on ColorOurCollections

My new favorite national holiday is #ColorOurCollections week, Feb 1-5, 2016.

The idea originated at the New York Academy of Medicine (NYAM) to merge the adult coloring book craze with digital collections. Amazing libraries around the world joined in, sharing some fun and beautiful coloring books. Open Culture listed some of the big names, such as Bodleian, Smithsonian, DPLA, and Europeana. However, it was great to see so many Less famous libraries creating awesome coloring books highlighting their fascinating collections– and having some fun!

None-the-less, #ColorOurCollections also highlighted some less-than-best practices in the creation and distribution of digital files. Many of these libraries are on the forefront of digital preservation and/or user experience, but put out PDFs that violated all the rules… Its a bit disappointing since its not that hard to get a few little details right, and libraries should be leading by example.

A lot of libraries did a great job, but here are some little things that bugged me about many coloring book offerings:

  • Random file names. If you are creating a PDF for public distribution don’t name it “coloringbook1.pdf” or  “color-our-collections.pdf” or “jim_file3.pdf”. The file name is one of the few bits of metadata that can be easily understood without even opening the file. Make your file name descriptive and meaningful, providing the basic metadata (creator, title, and date) in a place where Everyone can see it. Example: “ThisLibrary_CoolColoringBook_2016.pdf”
  • Huge file sizes. A coloring book is a mostly black and white document designed to be printed at approximately letter size paper. A reasonable file size for something around 15 pages is under 2 MB, and could be smaller. I saw many PDFs in the range of 25MB, and some over 65MB! Larger file sizes will not give better quality–this is for public web distribution not a professional press run printing glossy coffee table books. Think about your users and your web servers. People have to DOWNLOAD the PDF! Please make it a reasonable size.
  • No embedded metadata. When creating a PDF you should always check the embedded metadata. If you are exporting a PDF from LibreOffice Writer or MS Word, it will have embedded metadata automatically created based on your profile. Unfortunately, many people don’t realize that and have never checked their profile. Many of the coloring books thus have metadata like “Title: Microsoft Word – coloringbook_draft3_fromJim.docx” and an Author that is the profile name of who ever first created the file. This metadata will be displayed when users import the PDF into a ebook management tool, such as Calibre. Furthermore, this information is not helpful for future users trying to understand where the file came from and what it is–and could be a bit embarrassing depending on what the automatically generated information contains. I suggest you carefully edit the metadata before exporting the final version of your PDF. It should contain a meaningful title, a subject such as “#ColorOurCollections 2016”, an author/creator that relates to the institution, and a URL to find more information.
  • Lack of image metadata. If you send out a document highlighting some fascinating treasures of your collections–there had better be a clear means for users to find out more information! Every image used in the coloring book needs metadata directly on the page where it appears. Each page does not need the full archival description, but please give enough information for the users to find the item in your online collections. A title, identifier, and URL is nice. I think these references need to be given on each coloring page, not in a separate reference and index page. Online resources and printed coloring book pages are quickly disassociated from their original context–don’t expect that the information given on an introduction or TOC page will be available to users.
  • Lack of overall context. Many of the coloring books were just pages of images. That is great for many users, but I would like to see a short introduction page that explains the context. Where did these images come from? Why are they interesting in the scope of your collections? Where can I learn more? This is an easy chance to communicate with patrons and invite them into our collections–which is the point of #ColorOurCollections.
  • Links to paid databases. A few coloring books had reference links to paid databases. I found this a bit insulting and against the spirit of #ColorOurCollections. One of the most amazing aspects of digital collections is the ability to democratically open up the public domain to the PUBLIC. We are able to take fragile materials traditionally hidden away in a locked basement, and give them out freely to the world! It is disappointing to see objects in the public domain digitized and then LOCKED back up in a proprietary, paid database. Its even more disappointing to see those over priced rip offs promoted in a library coloring book.
  • Grey backgrounds. Sorry, but this is a coloring book! Who wants to color on grey paper? Who wants to waste printer ink printing a grey page background? Some images are just more appropriate for a coloring book than others. You can not just desaturate a digitized image and call it a coloring book. Digitized pages have a color, and that page background needs to be removed to make a quality coloring book page. Generally, most coloring book images should be fully binarized, i.e. only pure black and white. Using GIMP you could desaturate (Colors > Desaturate) or greyscale (Image > Mode > Greyscale) the image, then use a Threshold (Colors > Threshold) to eliminate the “color” of the page background. The coloring book image should be reduced to clean black lines and white background. ScanTailor is a great tool that can do this pre-processing for you for many coloring book appropriate images. Play around with the output options in Black & White mode, tweaking “Thickness” and “Despeckling” until you get a good result.

Why no Digital Aladore coloring book?

ScanTailor b&w processing

ScanTailor b&w processing

I was thinking about putting together an Aladore coloring book, but I found the images had too many shades of grey scale hatching to reduce nicely to clean lines. Processing ends up with too many black blobs, with too little detail. The images just don’t work as a coloring page! Here is a one page PDF attempt just to show you what I mean:

DigitalAladore_YwainColoringPage_2016

Anyway, #ColorOurCollections was good fun, and I am looking forward to it next year!

 

Google Docs EPUB

Some news on the EPUB creation front:

Google Docs just enabled a feature to export as EPUB! To use it, simply open the doc and look under File > Download as > EPUB Publication.

This is a handy and very easy method to create an ebook. However, the consistency and quality isn’t good. The markup it creates is down right bizarre with tons of unnecessary <span> tags and strange CSS. It also does not create a cover. In theory you could open this Google Doc EPUB with Sigil and do some polishing up, but given how unnecessarily complex the markup is, it would be more work than starting fresh.

So if you need a super quick EPUB for some reason, just click the “Download as” option. Otherwise, stick with the tools that provide better markup results, such as Writer2ePub and Sigil.

Public Domain Day 2016!

Happy 2016 from Digital Aladore!

January 1st brings us to another joyous Public Domain Day, the holiday where lots of people celebrate the New Year AND a new crop of works entering the public domain.

Sadly in America we have NOTHING to celebrate. Because of bizarre copyright extensions, we will not have any works entering the public domain until 2019. It is stunning to think that while the rest of the world is celebrating, the USA has not had a happy Public Domain Day since 1978… When copyright was first introduced in America the term was 14 years; current works now enjoy life of the author + 70 years, or if the work of multiple authors (corporate authorship) 95 years from publication. The extensions in 1978 and 1998 applied to retrospectively to old works, creating a crazy tangle of rules (check out a summary from Peter B. Hirtle), which highlights the nonsense of the move: the rationale for extended terms was incentivizing creation, but it seems hard to fathom it motivating a bunch of dead people! Meanwhile the preservation of our cultural resources has become illegal, with fragile artifacts such as our film heritage literally disappearing.

Here is what I said last year, and things have only gotten worse:

Recent research and economic modeling suggest that current copyright terms are too long and do NOT provide incentive for creation.  Instead our shared culture is being locked away by corporate profiteers.  In fact, the majority of works still protected by copyright are orphans–out of print with no likely hood of ever being used again commercially.  Projects like Digital Aladore, Free software, and honestly the majority of the internet point out that creators aren’t purely profit driven.  Its time to reform copyright to benefit the creators rather than hoarders of capitol (who already have plenty of power and wealth!).

North of the border, in Canada things are more cheerful this year. The works of lot of great authors and thinkers will become freely available resources to drive current learning, thought, and creativity. Libraries and Archives will be able to legally preserve, digitize, and provide access to valuable cultural creations. Check out the Public Domain Review’s Class of 2016 for some highlights. However, there is a pall on the celebrations. The Trans-Pacific Partnership trade deal threatens to force countries to have a minimum of life+70 years copyright term.

A sad holiday indeed, learn more at the Center for the Study of the Public Domain:

“What do these laws mean to you? As you can read in our analysis here, they impose great (and in many cases unnecessary) costs on creativity, on libraries and archives, on education and on scholarship. More broadly, they impose costs on our collective culture. We have little reason to celebrate on Public Domain Day because our public domain has been shrinking, not growing.”

None-the-less, here at Digital Aladore we wish you all the Best for the New Year! 

 

 

News at Sigil Ebook

Since Digital Aladore is more than a year old (see The Idea), I thought I should check in with a few of the key tools for any news. First up is Sigil Ebook editor, used for creating the various EPUB versions. As I have said many times, it is a great tool! There are a few features like the character report and auto merging html that I wish were in my everyday text editor.

After a scary period where it looked like development on Sigil might stall, I am happy to see it surge back to an active project full of interesting changes. This week version 0.9.1 was released stabilizing a host of new features moving the application towards full EPUB3 support. Also be sure to check through the Plugin Index to find many useful extensions for the editor.

Creating an editor that supports both EPUB2 and 3 is a bit complicated. As I mentioned in an earlier post, older versions of Sigil automatically correct markup and packaging to match the EPUB2 standard. To fix this issue, version 0.9.1 replaces Xerces (xml parser) and Tidy (html parser) with Python lxml and Google Gumbo, and makes the FlightCrew EPUB2 validator a plugin rather than built in tool.

Despite the major overhaul under the hood, using Sigil remains almost unchanged, which is great. So thank you to current maintainers Kevin Hendricks and Doug Massay and everyone else who makes this Free and open tool available!

Check out the code or get the latest version at Github.

 

Thoughts About EPUB3

When I first started looking into the EPUB3 specs, I was excited by the possibilities of a more powerful ebook format. Just think of all the neat things you can create with simple CSS and JS! I imagined creating little “epub apps” like a calculator or timer. It would be a neat way to add functionality to very simple devices such as the Sony Reader. I created a few test versions, however these demos often worked in Calibre’s built in reader, but were not functional with any actual ereaders.

Of course, the point would be to go beyond silly little apps and add some interesting and valuable extensions to the ebook, such as text collation or visualizations. Simple adjustable collation tools could be embedded so that the reader could query the text while reading. Some of this functionality has been built into the reading apps on some devices, such as Kindle X-Ray. Simple interactive elements would be useful for textbooks and manuals to make information delivery more interesting. Imagine something like Jupyter Notebook, which can run embedded Python code.

Unfortunately, there just isn’t good support for the advanced features of EPUB3 in an open and flexible way.  As I mentioned in a previous post, device makers only seem interested in the possibilities of further limiting users with tougher DRM, rather than enabling new possibilities. In the ideal world we could combine the open format with open hardware and software!

Lets Read Together: Chapter One

ALADORE

CHAPTER I.
OF THE HALL OF SULNEY AND HOW SIR YWAIN LEFT IT.

SIR YWAIN sat in the Hall of Sulney and did justice upon wrong-doers. And one man had gathered sticks where he ought not, and this was for the twentieth time; and another had snared a rabbit of his lord’s, and this was for the fortieth time; and another had beaten his wife, and she him, and this was for the hundredth time: so that Sir Ywain was weary of the sight of them. Moreover, his steward stood beside him, and put him in remembrance of all the misery that had else been forgotten.

And in the midst of his judging there was brought into the hall a child that had been found in the road, a boy of seven years as it seemed: and he was dressed in fine hunting green, but not after the fashion of that day or country. Also when they spoke to him he answered becomingly, but in a speech that no one could understand.

So Sir Ywain had him set by the table at his own side, and now and again as he judged those wrong-doers, he cast a look upon the child. And always the child looked back at him with bright eyes, and even when there was no looking between them, he listened to what was being said, and smiled as though that which was weariness to others was to him something new and joyful. But as the hour passed, Sir Ywain felt his mind slacken more and more, and whenever he saw the boy smiling, his own heart became heavier and heavier between his shoulders, and his life and the life of his people seemed like a high-road, dusty and endless, that might never be left without trespassing. And though he would not break off from his judging, yet he groaned over the offenders instead of rebuking them; and when he should have punished, he dismissed them upon their promise, so that his steward was mortified, and the guilty could not believe their ears.

Then when all was said and done the hall was cleared, and Sir Ywain was left alone with the boy.

But the steward, looking slyly back through the hinges of the door, saw that his lord and the child were speaking together; and he perceived that they understood one another well enough, though how this should have come about he was not able to guess, having himself heard the boy answering to all questions in none but an outlandish tongue.

Then he saw Sir Ywain rise up, and suddenly he was aware that his lord was calling for him loudly and with a hearty voice, as he would call for him long since, when they were at the wars together. And when he went in, Sir Ywain bade him summon all the household.

Now when the household were come into the hall they stood at a little distance from the dais, in the order of their service, and Sir Ywain stood above them in front of the high table. And beside him was the boy, and before him was his own brother, who was now an esquire grown, with hawk on wrist.

Then Sir Ywain bade his brother kneel down, and there he made him knight, taking his sword from him and laying it on his shoulder, and afterwards belting it again round his body. And he took the keys from his own girdle and the gold spurs from his own feet, and said aloud: I call you all to witness that as I have done off my knighthood and the Honour of Sulney, and given them to this my brother Sir Turquin, so also by these tokens do I deliver unto him the quiet possession of my house and goods and the seisin of all my lands, to hold unto him and his heirs for ever, by the service due and accustomed for the same. And henceforth I go free.

How Sir Ywain was led away of a Child

Then his brother, who was both glad and sorry, and moreover was still in doubt how this might end, stood holding the keys and the spurs, and looking at him without a word. And he looked also at the child, and he saw that for all the difference in their years, the eyes of Sir Ywain had become like the boy’s eyes: and as he looked his heart became heavy, and for a moment he envied his brother and feared for himself. But in his fear he moved his hands, and the keys clanked and the spurs clinked together, and his heart leaped up again for joy of his possessions.

And all this Ywain saw as it were a great way off, and he smiled, and forgot it again instantly. And the boy took his hand, and they went down the hall together. And when they came to the door to pass out, the steward got before them and bowed as he was used to do, and he spoke very gravely to Sir Ywain, reminding him that this same afternoon had been appointed among the lords, his neighbours, for the witnessing of certain charters.

But Ywain and the boy looked at one another and laughed, and the steward saw that they laughed at the lords and at him and at the very greatness of the business: and he was enraged, and turned away and went to his new master.

Then Sir Turquin came hastily after them, and he laid his hand upon his brother’s arm and bent his head a little, and spoke to him so that none else should hear, and he said: What is this that you are doing; for no man leaves all that he has, and departs suddenly, taking nothing with him. But those two went from him without answering, and they passed, as it seemed, very swiftly along the road under the woodside, and were hidden from him. And again, as he stood still watching, he saw them going swiftly above the wood where there was no path, but only the bare wold before them.

Keep reading: get the Digital Aladore ebook at Internet Archive!

Code Typography

At Digital Aladore we have been talking a bit about typography and about code. So what if you want a bit of good typographical design in your code?

Luckily, version 2.0 of the Hack typeface was released this week– oh the excitement!! Hack is Free and Open, and specifically designed for coding, AND I think looks pretty nice.  So it fits right in to the Digital Aladore scheme.

But if Hack doesn’t do it for you, check out Programming Fonts of the World (really it exists!), a blog devoted to monospace font news.  They have a fun app where you can test out the various combinations of theme and font in a simulated editor window (based on CodeMirror, a js browser editor.  Or you can get all the fonts with this package for the ultra-customizable (and fun) text editor Atom).  Find something that floats your boat and makes coding a bit more beautiful and fun!

 

Read Aladore as a Word Tree

Why not?

Check out an Aladore word tree visualization

created using Word Tree, http://www.jasondavies.com/wordtree

Basically you choose a word, and the visualization shows you all the things the follow that word anywhere in the text.  If you choose a word that is only used once–there will be just one line of text.  If you choose a commonly used word, you will get an amazing cross section of the entire text arranged on the tree.  It is a quick and interesting way to browse the text or get a sense of what it says about specific topics.

Enjoy!

Updating posts

I wanted to mention that I have recently done a number of minor updates to posts throughout Digital Aladore.

I know this isn’t exactly the right way to deal with blog content since followers maybe left out of the loop and it disturbs the chronological sedimentary “authenticity” of the writing.  However, I tend to think of the project in the long term, as if it could be read straight through rather than followed serially, so I want the posts to be accurate and make sense sequentially.  In generally the updates just fix links or format of content, or change tags or categories.  I also have clarified a few sentences here and there.  Rarely, I have expanded a few points where the original post seemed hard to follow.

If you actually read Digital Aladore in the past, it would probably be difficult to notice the updates, but I just wanted to mention it for full disclosure!