<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Ebooks on His Deeds Are Dust</title>
    <link>https://hisdeedsaredust.com/tags/ebooks/</link>
    <description>Recent content in Ebooks on His Deeds Are Dust</description>
    <generator>Hugo -- gohugo.io</generator>
    <language>en-gb</language>
    <copyright>Paul Flo Williams</copyright>
    <lastBuildDate>Thu, 14 Mar 2013 17:26:26 +0000</lastBuildDate><atom:link href="https://hisdeedsaredust.com/tags/ebooks/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>TEI with light markup</title>
      <link>https://hisdeedsaredust.com/posts/2013/tei-with-light-markup/</link>
      <pubDate>Thu, 14 Mar 2013 17:26:26 +0000</pubDate>
      
      <guid>https://hisdeedsaredust.com/posts/2013/tei-with-light-markup/</guid>
      <description>&lt;p&gt;After preparing ebooks for years with HTML and getting frustrated with a morass of divs
and spans with classes, I&amp;rsquo;ve decided to experiment with preparing texts in the vocabulary
of the &lt;a href=&#34;http://www.tei-c.org&#34;&gt;Text Encoding Initiative&lt;/a&gt;.
Conversion to XHTML for web, EPUB and Kindle formats will be taken care of by some
scripts, which may be XSLT later, but for now are Perl scripts.&lt;/p&gt;
&lt;p&gt;As I&amp;rsquo;m preparing books from OCRed scans, I&amp;rsquo;d like to keep my marked-up text as close as possible
to the original layout of the printed book, because it helps me spot errors. I&amp;rsquo;ve recently made
two major leaps forward that allow me to work through and correct text a lot faster.&lt;/p&gt;
&lt;p&gt;The first one is to keep all of the end of line hyphens intact, not even changing them to indicate &amp;ldquo;hard&amp;rdquo; or &amp;ldquo;soft&amp;rdquo; hyphens.
The TEI to XHTML script takes care of removing or keeping all hyphens by using a spell checker. I&amp;rsquo;m using the Perl module
Text::Hunspell, which can not only use multiple dictionaries (essential when recent works contain words in English, French,
German, Latin and Hindi), but also a book-specific dictionary containing proper names and unusual or archaic words.&lt;/p&gt;
&lt;p&gt;The second speed-up concerns quotation marks. Most quotation marks are removed from the text entirely, and replaced by one
of the TEI elements &lt;code&gt;&amp;lt;q&amp;gt;&lt;/code&gt; or &lt;code&gt;&amp;lt;soCalled&amp;gt;&lt;/code&gt;. The remaining quote marks are all apostrophes, and
they are retained as the ASCII single quote character, because they can be unambiguously changed to the Unicode right single
quote U+2019 by the script. Quote marks will be produced for the other elements (doubles and singles, nested as required)
by the script.&lt;/p&gt;
&lt;p&gt;There&amp;rsquo;s a lot more work to do, but I&amp;rsquo;ve put the results of some experiments online so I can test reading through them.
So far, I&amp;rsquo;ve put up two of Charles E. Pearce&amp;rsquo;s works, &lt;a href=&#34;http://charlespearce.org/pub/star-of-the-east/&#34;&gt;&lt;em&gt;A Star of the East&lt;/em&gt;&lt;/a&gt; and &lt;a href=&#34;http://charlespearce.org/pub/dragged-from-the-dark/&#34;&gt;&lt;em&gt;Dragged from the Dark!&lt;/em&gt;&lt;/a&gt;
The sources for those aren&amp;rsquo;t online yet, but I&amp;rsquo;ll put them up shortly.&lt;/p&gt;</description>
    </item>
    
  </channel>
</rss>
