Continuing from the last post, basically I went looking for a Briss-like application for individual image files. What I discovered is the unique java application ImageJ. Although it is a bit of a tangent, I thought I would pass along my notes since it might be useful for some other projects–and anyway, its a really fascinating program!
Since ImageJ was developed by the governmental organization National Institutes of Health, it is licensed public domain. ImageJ can also be used via Fiji, an optimized package of ImageJ core, dependencies, and plugins focused on helping process scientific research images. Fiji development is active and provides continuous updates. ImageJ has a very steep learning curve, even if you are familiar with other image editing software since it uses unconventional terminology and workflows. However, it offers many unique and powerful features–I have barely scratched the surface. If you want to figure out how to do something, surf around on forums, or read the giant User Guide. I could not figure out how to automatically cluster the page images like Briss does, but I think it might be possible (?). However, based on a variety of forum posts I found all over the place and some experimentation, I pieced together this simplistic workflow to crop batches of page images:
- Open a batch of images in ImageJ (multiple items can be selected from the file browser). I used one chapter of pages.
- Create a “stack”, i.e. combine the opened images into a single window that can be worked on as a batch: On the menu, select Image > Stacks > Images to Stack. Then click OK to create the stack.
- Create a “Z Projection” to make the features of all pages visible at once, so we can decide where to crop: Select Image > Stacks > Z Project. The default projection should be “Average Intensity”, but for a better representation for my purposes, use the drop down menu to switch it to “Sum Slices” and click OK.
- Select the text area on the Z Projection: After processing we have a new window (the “Z projection”) displaying the “Sum” of the page images. Basically we can see exactly where all the text is for every page at once. Use the rectangle selection tool (on the toolbar) to click and drag a rectangle over the part of the page we want to keep. ImageJ calls this the “region of interest”!
- Transfer this selection to the “Stack”: click on the stack you first created. On the menu, go to Edit > Selection > Restore Selection. The selection box you drew on the Z Projection will be transferred to the Stack.
- Crop: on the menu, go to Image > Crop. The stack will be cropped.
- Export the cropped pages from the stack: on the menu, go to File > Save As > Image Sequence. The dialog box will let you set the parameters for the export. Select the option “Use slice labels as file name” to use the original file names.
After a bit of processing, you will have a batch of nicely cropped page images! This option is pretty neat and efficient, but since we are not adjusting for the variation of each page individually, results in some pages having bits of the header or footer remaining. It would be better if we could get a program to automatically detect the text body instead.
More on that in the next post!