Archive

Archive for January, 2012

That well-known Unicode character, Zero Width Non Joiner Freaky Repeater

January 17th, 2012 Paul Flo Williams 2 comments

Just when you think you’ve been all clever by putting zero width non joiner characters around em dashes, the Kindle renderer decides to get its knickers in a twist and does this:

The markup that produced the third line of that image was this:

<p width="0">Same bug with <i>italic</i>&#x200c;,
<b>bold</b>&#x200c; or
<span>spans</span>&#x200c;</p>

Do you see repeated words in the markup? No, me neither. Putting a ZWNJ (U+200C) character straight after the end of another element will cause the final word of that element to be repeated.

In practice, this is easily avoided. If you don’t put a ZWNJ before an em dash where that em dash comes directly after some styled element (e.g. bold, italic or text size change), you won’t hit this bug and all you lose is one extra line-break point.

If you weren’t using any ZWNJ characters around em dashes, you’d have fewer line breaks anyway, so nothing is lost.

In practice, as I’ve said before, I put zero width spaces around em dashes, and only change those into zero width non joiners for the Kindle’s benefit, with a script. That script also searches for places where the ZWNJ will trigger the Kindle bug and removes them, so I don’t have to think about this bug when I’m marking up a text.

Categories: Ebooks Tags:

No, Kindle Previewer, you may not auto update

January 17th, 2012 Paul Flo Williams 1 comment

Kindle Previewer is a great time saver for checking formatting, and I’m very pleased that it runs under Wine, as I’m currently running Fedora 16 (Verne).

However, this morning when I ran it, it auto-updated to the latest version, and that crashes. Auto updating is bad enough when you know you have a set of software that works exactly as you like, but updating to a version that won’t run is just bloody rude.

I bet that there’s a cute SELinux trick that could be used to stop Kindle Previewer having network access, but a simple stop-gap is to reinstall the old version, go to the directory where you installed it, and prevent autoupdate.jar from being used. In my case, that’s these two lines:

 $ cd ~/.wine/drive_c/Program\ Files/kindle\ previewer
 $ chmod 0 autoupdate.jar

Kindle Previewer now moans that it can’t access that jar file when it runs (duh), but pressing “OK” allows the rest of it to run just fine. I’ll investigate more when I decide I want to see how things look on the Kindle Fire, and that’s not going to be until they’re available in the UK.

Categories: Ebooks, Linux Tags:

Taming em dashes on the Kindle

January 4th, 2012 Paul Flo Williams 1 comment

I like em dashes, and use them when writing (or preserving the style of older books when I’m formatting them), but they need some taming for the Kindle, as discussions on MobileRead will show you.

If you attempt to insert them between two words without any spaces, the Kindle will stubbornly keep both words together when breaking lines. A simple way of getting round this would be to use a spaced en dash instead, as Jaye Manus suggests, but can we do any better?

The Unicode-compliant method of hinting that a line break can occur without a visible space is to put U+200B ZERO WIDTH SPACE either side of the dash, but the Kindle doesn’t recognise this character. However, it does recognise U+200C ZERO WIDTH NON-JOINER, and appears to treat that in exactly the way that ZWSP should be treated.

So, I would mark up the first em dash in Jaye’s example sentence: “I think he’s the best–and I use that loosely–so will let him live.” as

...the best&#x200c;&#x2014;&#x200c;and I use...

in the XHTML that I submit to Kindlegen.

With the U+200C ZWNJ included, the Kindle allows line breaking around the dash, and will even insert some space around the dash if necessary to justify the line. Dictionary lookups will also work exactly as you’d expect.

If you were sure that you always wanted a possible break point on both sides of your em dashes, you could just run a search and replace after marking up your document, but you may want to avoid breaks before a trailing dash at the end of a paragraph, for example.

The only further wrinkle here (for me) is that ZWNJ is the wrong character if you’re trying to maintain a ‘clean’ master XHTML file for EPUB conversion or as a web page, so I actually mark up the document with ZWSP and make the ZWSP-to-ZWNJ conversion part of the set that I do before running Kindlegen.

Categories: Ebooks Tags: ,