To do a quick and easy comparison of the 1914 and 1915 HTML texts, the best tool is Notepad++, http://notepad-plus-plus.org. This is a weird case where Windows has an awesome Free software tool that Linux doesn’t! Notepad++ is a powerful and flexible text editor with extensive features to make coding easier.
The Notepad++ community also creates bunches of plugins to extend its functionality. For this task we need the Compare plugin. You may need to add it: on the menu click Plugins > Show Plugin Manager, then find Compare, check the box, and click install. Now, you are ready to compare any type of text based file–easy!
Simply open the files you want to compare (Notepad++ uses tabs), then click Plugins > Compare > Compare. With our two Aladore HTML files, it will look something like this:
The texts are aligned and scroll in sync, with a representation the differences displayed on the right side. Each line with a discrepancy is highlighted (but not the actual different characters or words). The type of change is indicated by colors and icons (for example, line added, line deleted, or line moved). This quickly reveals simple formatting issues.
In the example pictured above, the red highlights reveal a paragraph that was broken incorrectly in the 1915 text. Both files can be edited, so this is easily fixed by deleting the extra <p> tags and empty lines. The text was the same, only the HTML tagging was incorrect. I quickly worked through the red (line deleted) and green (line added) highlights, which were all similar formatting issues. This resulted in two HTML files with 1105 lines each. This confirms that the editions are nearly identical!
However, this still leaves hundreds of yellow highlighted lines, which simply indicate some change somewhere in the line (i.e. with in a single paragraph <p> to </p>). The exact difference is NOT highlighted. The majority of these differences are a single character, such as “S” versus “s”. It would be painstaking to find them all using Compare.
Honestly, it isn’t really necessary to go any further for this project, but to explore a few more tools, we will look more into these differences in the next post…