While working on the Kindle (Mobi) version of Antigua and the Antiguans, I found that I was spending too much time attempting to
carefully craft some XHTML markup and a stylesheet that would get the final book to look the way I expected it to. Because the Mobi
format is only HTML 3.2 with a few extra elements, and no stylesheet support at all, Kindlegen has to do a lot of wizardry to
down-convert any EPUB (XHTML + CSS) you are starting with, and the results don’t appear to be consistent across an entire book.
It was while attempting to mark up the genealogical information in the appendices of the book that Kindlegen really came unstuck,
and I had to change tack completely. My markup in the EPUB looked like this:
<p class="offspring">Thomas Howard, 2nd Duke of Norfolk...</p>
<p class="offspring">Elizabeth, m. Thomas Boleyne, Viscount...</p>
<p class="offspring">1. George Boleyne, Viscount Rochford,....</p>
<p class="offspring">2. Anne, youngest ... left issue,</p>
<p class="offspring">Elizabeth, Queen of England</p>
<p class="offspring">3. Mary, eldest dau. of Thomas,...</p>
This struck me as sensible, semantic markup, with each successive generation being further indented, as in the original print book.
I wished it to look as below, but Kindlegen resolutely refused to play ball:
It struck me that I could assign an individual indent to each of the paragraphs, and get the hanging indent to work with an
appropriate number of non-breaking spaces at beginning of each paragraph, but why should I compromise the clean markup, or maintain
separate source documents for EPUB and Mobi?
The way I’ve decided to approach this is to write a script that will take my chosen markup vocabulary (XHTML elements with classes
of my choosing) and transform it into Mobi’s HTML 3.2 + the few mbp-namespace elements, such that the resulting document passes
through Kindlegen without any further changes taking place.
I still use Amazon’s chosen packaging program. (This is a big one, as Amazon seem to have recently rejected some books built
The transformations performed by the script can handle things that Kindlegen won’t even touch, such as small caps support.
Predictability of output, without having to put odd workarounds for Kindlegen failings in the source document. (Want a top margin
and a left margin without putting a
<div> inside a
This approach won’t work for KF8 documents with new features. That doesn’t matter for me, as I’m producing text-heavy books, but
anyone wanting to take advantage of KF8 features would have to run Kindlegen on their KF8 source, and then insert the output from
a script like mine in place of the mangled Mobi that Kindlegen had produced. (In effect, you’d need the hypothetical
to match the existing
mobiunpack.py, or try Peter Hatch’s ingenious
This is definitely in the “some assembly required” category. It works for my vocabulary for my documents. The transformations
from the input vocabulary to the mobi output may vary somewhat depending on the book, but I expect the vocabulary to remain
consistent over a long period.
I have to perform any transformations that Kindlegen would normally handle well, such as centring elements.
OK, what do I actually do?
Enough of the theory. What transformations do I perform on documents? Here is a selection of recipes that you may want to try. I’ll
use the CSS element.class notation.
Remove any existing
<link rel="stylesheet" ... /> elements. We don’t want Kindlegen to have any CSS information to allow it to
further affect our document.
Delete all comments as they are pointless and have provoked Kindlegen bugs.
Process all children of
span.sc to make a small caps effect, by inserting
<font size="-2"> around lowercase characters.
caption elements into centred paragraphs, immediately before their table.
Kindlegen doesn’t handle
p elements inside
blockquote, so just remove the blockquote and indent all paragraphs appropriately.
Convert ZWSP characters to ZWNJ to support line wrapping at em dashes
without unduly spacing them out.
Insert page breaks before
Produce nicely-indented lines of poetry.
My understanding of XSLT is not good enough to implement all of my recipes in the fashion that I think would be neatest, so I use a
Perl script and the XML::LibXML module. It lets me use XPath expressions to quickly select nodes of interest, and then add and
replace nodes at will.