Extracting font licence metadata

February 17th, 2010

I’ve just been looking at packaging a font for Fedora, and have found that there is no separate licence file, though the font metadata contains the full licence text. I should ask upstream to put a copy of the licence in their archive, the next time they do a release, but what if they aren’t interested? How would I extract the licence text?

The simplest starting point seems to be TTX, which can take a TrueType font and convert some or all of it to an XML document. From then, anything that can parse XML or plain text should help. I’ll just extract the TrueType “name” table:

$ ttx -t name font.ttf

This creates a file called font.ttx. Annoying, I can’t send the output to stdout, and I can’t ask that it overwrites any old file of the same name; it’ll just insist on creating font.ttx, font#1.ttx, font#2.ttx and so on.

If we take a look at font.ttx, we can see the field I’m interested in, in a namerecord element with attribute nameID of 13:

<namerecord nameID="13" platformID="1" platEncID="0" langID="0x0">
       This font software is copyright (c) 2007, Frixxon Wanglewurx. All rights reserved.&#13;&#10;"Frixxon Muck" is a Reserved ...
</namerecord>

The entire licence is extracted as one line in the XML output, with carriage returns and line feeds encoded as &#13; and &#10;, respectively. There may be more than one copy of this licence, with different platformID values.

To go from here to a text file with no more than 80 characters per line and LF endings, I could either use sed or xsltproc. Let’s take a look at how both would work. Here’s getlicence.sed:

/<namerecord nameID="13"/ {
  n
  s/^ \+//
  s/&#13;//g
  s/&#9;/\t/g
  s/&#10;/\n/g
  s/&amp;/\&/g
  p
  q
}

Which reads as: find the namerecord, skip to the next line, strip leading spaces, convert some XML entities to text, print it and quit.

I’d invoke this with:

$ sed -nf getlicence.sed font.ttx | fold -s > LICENSE.txt

Yes, despite using the British spelling of licence normally, I’ll use license for packaging.

The XML entity conversion is a bit of a faff which a proper XML tool could take care of for me. Here’s getlicence.xsl:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
<xsl:template match='/'>
  <xsl:value-of select="//namerecord[@nameID='13'][position()=1]"/>
</xsl:template>
</xsl:stylesheet>

I’d use this with:

$ xsltproc getlicence.xsl font.ttx | dos2unix | fold -s > LICENSE.txt

There are plenty more tiny hacks that TTX and XSLT can help with. I haven’t yet tried many round trip conversions with TTX, but if I lose my way in FontForge’s Font Information menus and dialogs, I could be tempted.

flo Fonts ,

Security through perversity

December 17th, 2009

The corporate Information Services overlords have recently introduced a single sign-on solution for our intranet applications, CA SiteMinder. Somewhere along the lines, a discussion must have taken place about a feature of that pesky Mozilla Firefox, helpfully remembering passwords. Although we have to sign on to even access our internal network, there must have been raised eyebrows that Firefox could automatically sign us on to intranet applications as well. I’ll show you how our webgrunts “solved” the problem. I have no idea whether this is part of SiteMinder, or simply a local perversion.

Firstly, I’ll tell you what I’m not showing you.

  • I’m not showing you that this claims to be an XHTML page, despite the invalid element nesting that makes Firebug mark many of the elements of the DOM in its faded “what the hell?” style
  • I’m not showing you that the same JavaScript function is included twice, for no useful purpose.

In fact, I’m showing you a cleaned-up snippet that demonstrates the behaviour of the login form while sparing you some of the worst syntax of the original.

The login form

The login form appears on the page as two text fields, for username and password, and a “Connect” button, as you’d expect. What is not apparent until you examine the source, is that the username and password are in different forms, which I believe is the key to this trick.

 <form name="login"
       action="sm_login.fcc"
       method="post"
       onsubmit="connect(); return false;">
    <input type=hidden name=SMAUTHREASON value="0">
    <input type=hidden name=SMAGENTNAME value="IoqEFNY64K">
    <input type=hidden name=POSTPRESERVATIONDATA value="">
    <input type=hidden name="SMENC" value="ISO-8859-1">
    <input type=hidden name="SMLOCALE" value="US-EN">
    <input type="hidden" name="PASSWORD" value="">
    <input type="hidden" name="lang" value="">
    Username: <input type="text" name="USER" maxlength="8" size="23">
  </form>
  <form name="pwd"
       onsubmit="connect(); return false;"
       method="post"
       action="">
    Password: <input type="password" name="tpep" size="23">
  </form>
  <input type="button" value=" Connect " onclick="javascript:connect()">

So here we have two forms. One is the real login form, login, but what should have been a password text field has been made hidden. A second form, called pwd, has been used to hold a new password text field. The “Connect” button doesn’t belong to either form, but calls a global connect() function, just as the other two forms do.

connect() function

Both forms run submissions through a piece of JavaScript:

function connect() {
  if (document.login.USER.value == "") {
    alert('Enter your user name');
    document.login.USER.focus();
  } else {
    if (document.pwd.tpep.value == "") {
      document.pwd.tpep.focus();
    } else {
      document.login.PASSWORD.value = document.pwd.tpep.value;
      document.login.lang.value = document.pwd.lang.value;
      document.pwd.tpep.value='';
      document.login.submit();
    }
  }
}

This little beauty does some standard checking that neither the username or password are blank, but then it copies the password field that we typed into the other form’s hidden field, blanks the visible password, and then forces submission of the login form.

This function can’t just return true or false to allow or stop the normal submission event, because it might have been invoked from the second form, “pwd”, if I pressed Enter after typing my password.

Greasemonkeying this baby

I routinely use Greasemonkey to make our intranet even vaguely usable, which for me means making sure that text isn’t too small for me to read, correcting structural defects that Internet Explorer ignores but which break the layout on Firefox, and adding functionality to applications.

Unfortunately, the structure of the real page that contains this snippet is ugly enough that I’ve so far failed to fix it. What I need to do is to defang the connect() function and put the password field in the first form, but I’m going to have to rewrite quite a chunk of the login page to achieve this.

While considering how to do it, however, I was playing around in Firebug, and quite by accident managed to rewrite the form enough to get Firefox to remember the password, and then insert it in the real thing. Bizarre, but satisfying :-P

flo Poor choices , ,

The magic of #ifdef

December 16th, 2009

The beauty of the preprocessor #ifdef . . . #endif directives in C and C++ is that there are so many ways to abuse them.

I’ve been working on some vintage code (at least 15 years old) that provides a model for how not to do things. The compilation is controlled by no fewer than 750 #ifdef switches. Many of these are used in header files as guards against double inclusion, but the others roughly split into these jobs:

  • controlling platform-specific code generation
  • controlling project-specific code generation
  • experimental features
  • controlling different versions of hardware

The oddest of these switches are:

For the people who don’t like to document:

#ifdef __

For dyslexic programmers:

#ifdef DSTL_UPGRADES
#ifdef DTSL_UPGRADES

For those who think the code runs too quickly:

#ifdef INEFFICIENT

For those hopeful of a quick fix:

#ifdef MAKETHISWORK

For those who aren’t confident of our source control systems (and are dyslexic):

#ifdef OLD_SLOW_WAY
#ifdef ORIGINAL_CODE
#ifdef OROGINAL_CODE
#ifdef REDUNDANT_CODE
#ifdef REDUNDANT_FUNCTIONS

For those who can’t quite remember which operating system they are using:

#ifdef _vxworks
#ifdef __vxworks
#ifdef VXWORKS
#ifdef _VXWORKS
#ifdef __VXWORKS

For those trying the super-secret go-faster-stripes:

#ifdef WIN32_LEAN_AND_MEAN

For those who super unpositively don’t no way double negative want that code:

#ifdef _WIN32_trynot

. . . but just the once, or later, or huh, maybe not at all?:

#ifdef __JUSTONCE__
#ifdef _JUST_ONCE_
#ifdef notdef
#ifdef __NOTSMART_
#ifdef _NOTSMART_
#ifdef notyet
#ifdef NOTYET
#ifdef THIS_IS_NECESSARY
#ifdef THIS_IS_TOO_EXPENSIVE

Even the choice of names for include guards shows how coding standards change over time, or are ignored, or how the language standards themselves are ignored (the leading underscores):

xxx_inc_
_xxx_H_
__xxx_H__
xxx_include
xxx_H
xxx_inc
xxx_h
xxx_Hinc
_INC_xxx
INC_xxx

flo Poor choices