The long S of ebook editing

The long S of ebook editing

At artefacto, we do quite a few projects with open access and public domain texts. Many of these publications have been converted to digital formats using automated processes and this can result in some ..interesting interpretations of the text.

For example, The Internet Archive uses a 'Scribe system' to scan books and then run OCR software over the PDF files. You can see a short video on the Scribe digitisation process here. Project Gutenberg uses a process of distributed proofreaders to tidy up and improve the text of it's books.

These processes are constantly changing to take advantage of new scanning and digitisation technologies. Despite this, a lot of encoding and transcribing issues still make it through into the published version. Old style 'f''s suddenly become rendered as an 's'. Old spellings (or different fonts) get misunderstood and misinterpreted. And formatting..the less said the better.

One of secrets of epub editing is that epub files are really just a variation of a zip file. This means, that if you need to 'unpack' an epub file to make minor edits, you can change the file extension from .epub to .zip and then unzip the file to access particular components.

An epub 'file' is really a collection of files in a particular structure. This includes - mimetype file META-INF folder - container.xml Document folder - HTML, CSS, image files, plus OPF and NCX files

The minimum files you really, really, really need for a valid .epub files are the META-INF/container.xml and the mimetype file. The rest might vary a bit between different ebook readers and authoring tools.

A lot of people use Adobe InDesign for creating and viewing epub files but there are various open source tools that you can also use (but if you are using InDesign, is a great source of tips). And, on the more expensive side of things, there are commercial applications such as the £200 oXygen Author software. Browser plugins such as EPUBreader plugin are also useful for creating and testing ebooks.

We used a few different tools to edit the ebooks. Calibre ebook manager now includes eBook editing functionality that takes off where Sigil left off. Jutoh is another tool that rose from the ashes of Sigil. There are also smaller tools like Writer2ePub plugin for LibreOffice that came in handy for rebuilding smaller titles from scratch.

Also PressBooks is an online platform for creating ebooks in various formats. It’s open source, so you can either use the hosted version or install it on your own server.

The workflow that I've used for our most recent open book distribution project was primarily based on Calibre. Find and Replace in ebook editors is a bit of a godsend as many of the encoding issues you find will be through-out the text and trying to edit these one by one in a 500 page book can be a bit overwhelming. There are also some handy shortcuts like adding new assets (such as images) directly from the menu in Calibre and (re)creating the table of contents. Of course, if you are familiar with the epub format then you have the option to use a text editor to do edits such as these and this can be particularly helpful when it comes to encoding issues.

At the end, you can cross your fingers and head to the epub validator at: to help identify any remaining issues.

All in all, it can be messy and intricate work to tidy up the files to make these readable in the range of ebook reader apps and software currently available.

  1. ebooks
  2. epub
  3. open access
  4. public domain
  5. calibre
  6. PressBooks