<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Perl on His Deeds Are Dust</title>
    <link>https://hisdeedsaredust.com/tags/perl/</link>
    <description>Recent content in Perl on His Deeds Are Dust</description>
    <generator>Hugo -- gohugo.io</generator>
    <language>en-gb</language>
    <copyright>Paul Flo Williams</copyright>
    <lastBuildDate>Thu, 26 Dec 2024 18:33:34 +0000</lastBuildDate><atom:link href="https://hisdeedsaredust.com/tags/perl/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>My unsophisticated Perl cribsheet</title>
      <link>https://hisdeedsaredust.com/posts/2024/perl-cribsheet/</link>
      <pubDate>Thu, 26 Dec 2024 18:33:34 +0000</pubDate>
      
      <guid>https://hisdeedsaredust.com/posts/2024/perl-cribsheet/</guid>
      <description>&lt;p&gt;For donkey&amp;rsquo;s years I have been developing web applications with Apache httpd,
Perl, CGI and MySQL, because that has always been the default setup on my web
host. I &lt;em&gt;know&lt;/em&gt; I should be moving away from nearly all of these, with the
exception of Perl, but that would involve me doing something funky with a new
server environment, containers, droplets or, &lt;em&gt;sigh&lt;/em&gt;, anything that gets kicked
into next year&amp;rsquo;s resolutions.&lt;sup id=&#34;fnref:1&#34;&gt;&lt;a href=&#34;#fn:1&#34; class=&#34;footnote-ref&#34; role=&#34;doc-noteref&#34;&gt;1&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;Unfortunately, since 1999 I have been tripped up every few years with a new release of one of the
above layers that actually improves their Unicode support (&lt;strong&gt;yay!&lt;/strong&gt;), while
triggering problems somewhere else (&lt;strong&gt;boo&lt;/strong&gt;.) I still remember how scared I was
when I found out that some strings are internally Latin-1 and some are Unicode
and thinking that I needed to mess with internal flags to manipulate them. &lt;em&gt;So&lt;/em&gt;
very glad to have been proven wrong, there.&lt;sup id=&#34;fnref:2&#34;&gt;&lt;a href=&#34;#fn:2&#34; class=&#34;footnote-ref&#34; role=&#34;doc-noteref&#34;&gt;2&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;
&lt;p&gt;Long story short, &lt;em&gt;for my own benefit&lt;/em&gt; (and before I trigger another layer
collapse), this is how I am currently tackling Unicode from bottom to top of my
web environment, because I want poo emojis in my database as much as anyone
else. This reflects my understanding of the setup that is currently working for
me.&lt;/p&gt;
&lt;h2 id=&#34;perl&#34;&gt;Perl &lt;a href=&#34;#perl&#34; class=&#34;anchor&#34;&gt;🔗&lt;/a&gt;&lt;/h2&gt;&lt;p&gt;Since my scripts are written in Perl, this is the major part. I need all these
parts to support Unicode:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Command line arguments. I like testing web scripts from the command line,
particularly because I like to see a JSON output for the big hashes that I
normally pump through &lt;a href=&#34;https://template-toolkit.org/&#34;&gt;Template Toolkit&lt;/a&gt; in
order to produce a web page.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Database connection.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Web response, which really means getting the correct encoding on stdout.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;JSON output. At the moment, I use this for testing but I do hope to get more
sophisticated over time and have more AJAX-y pages or a working API for my
applications.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The first lines of my Perl scripts are:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-perl&#34; data-lang=&#34;perl&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#75715e&#34;&gt;#!/usr/bin/perl -CAS&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;use&lt;/span&gt; v5&lt;span style=&#34;color:#ae81ff&#34;&gt;.34&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;use&lt;/span&gt; utf8;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Taken in order:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&#34;https://perldoc.perl.org/perlrun#-C-%5Bnumber/list%5D&#34;&gt;&lt;code&gt;perl -CAS&lt;/code&gt;&lt;/a&gt; says that the default file handles (input, output, error) are
UTF-8 encoded. Command line arguments are also UTF-8 encoded. Essentially, all
my strings in Perl will contain Unicode characters, not octets, and serialising
to/from UTF-8 happens at my interfaces. With these options, I no longer have to
put &lt;code&gt;binmode STDOUT, &#39;:utf&#39;&lt;/code&gt; in my scripts.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;code&gt;use v5.34;&lt;/code&gt; just keeps me up-to-date with the latest features I can use on
my webhost, which allows me to say &lt;code&gt;say&lt;/code&gt;. I no longer have to say &lt;code&gt;use strict&lt;/code&gt;, but I&amp;rsquo;d still need &lt;code&gt;use warnings&lt;/code&gt; until I get to v5.36.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;a href=&#34;https://perldoc.perl.org/perlunicode#use-utf8-still-needed-to-enable-UTF-8-in-scripts&#34;&gt;&lt;code&gt;use utf8;&lt;/code&gt;&lt;/a&gt; allows me to put Unicode characters directly in my Perl script,
&lt;em&gt;and that is all it does&lt;/em&gt;. I like to do this directly, only resorting to &lt;code&gt;\N{...}&lt;/code&gt; when I can&amp;rsquo;t clearly see
what the character is meant to be, which in my monospaced Vim environment, means
long dashes. I&amp;rsquo;ve never actually used a poo emoji in a script, though I&amp;rsquo;ve
undoubtedly written many a program which could be judged that way.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;cgi&#34;&gt;CGI &lt;a href=&#34;#cgi&#34; class=&#34;anchor&#34;&gt;🔗&lt;/a&gt;&lt;/h2&gt;&lt;p&gt;Although I still use &lt;a href=&#34;https://metacpan.org/pod/CGI&#34;&gt;CGI.pm&lt;/a&gt;, it now only gets used for retrieving parameters and
setting the content type of the response. I can either do:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-perl&#34; data-lang=&#34;perl&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;use&lt;/span&gt; CGI ();
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;use&lt;/span&gt; Encode;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;my&lt;/span&gt; $cgi &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; CGI&lt;span style=&#34;color:#f92672&#34;&gt;-&amp;gt;&lt;/span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;new&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;my&lt;/span&gt; $p &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; decode(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;UTF-8&amp;#39;&lt;/span&gt;, $cgi&lt;span style=&#34;color:#f92672&#34;&gt;-&amp;gt;&lt;/span&gt;param(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;q&amp;#39;&lt;/span&gt;));
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;to decode parameters myself, or&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-perl&#34; data-lang=&#34;perl&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;use&lt;/span&gt; CGI &lt;span style=&#34;color:#e6db74&#34;&gt;qw(-utf8)&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;my&lt;/span&gt; $cgi &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; CGI&lt;span style=&#34;color:#f92672&#34;&gt;-&amp;gt;&lt;/span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;new&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;my&lt;/span&gt; $p &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; $cgi&lt;span style=&#34;color:#f92672&#34;&gt;-&amp;gt;&lt;/span&gt;param(&lt;span style=&#34;color:#e6db74&#34;&gt;&amp;#39;q&amp;#39;&lt;/span&gt;);
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;Clearly the latter is simpler. Either method does the appropriate
deserialisation for me so that, again, I&amp;rsquo;m handling Unicode characters
internally, not octets. As a bonus, for debug, I can just output things to
stderr or stdout (where it doesn&amp;rsquo;t interfere with the web response), and the
UTF-8 serialisation will happen for me.&lt;/p&gt;
&lt;h2 id=&#34;mysql&#34;&gt;MySQL &lt;a href=&#34;#mysql&#34; class=&#34;anchor&#34;&gt;🔗&lt;/a&gt;&lt;/h2&gt;&lt;p&gt;My web host (Pair Networks) uses MySQL 8.0, so my database text columns are
created with a charset of utf8mb4 and a collation of utf8mb4_general_ci. Oddly
enough, until two minutes ago, when I
&lt;a href=&#34;https://www.pair.com/support/kb/mysql-80/&#34;&gt;checked&lt;/a&gt;, I believed they were still using
version 5.7. Clearly I&amp;rsquo;m not sophisticated enough to have tripped over problems
during their transition.&lt;/p&gt;
&lt;p&gt;My connection line looks like this:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-perl&#34; data-lang=&#34;perl&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;my&lt;/span&gt; $dbh &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; DBI&lt;span style=&#34;color:#f92672&#34;&gt;-&amp;gt;&lt;/span&gt;connect($source, $user, $pass,
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;          { mysql_enable_utf8mb4 &lt;span style=&#34;color:#f92672&#34;&gt;=&amp;gt;&lt;/span&gt; &lt;span style=&#34;color:#ae81ff&#34;&gt;1&lt;/span&gt; });
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;and that performs all the (de)serialisation I need. That MySQL option used to be
&lt;code&gt;mysql_enable_utf8&lt;/code&gt;, before characters got larger.&lt;/p&gt;
&lt;h2 id=&#34;json-output&#34;&gt;JSON output &lt;a href=&#34;#json-output&#34; class=&#34;anchor&#34;&gt;🔗&lt;/a&gt;&lt;/h2&gt;&lt;p&gt;JSON output is as simple as:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;pre tabindex=&#34;0&#34; style=&#34;color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;&#34;&gt;&lt;code class=&#34;language-perl&#34; data-lang=&#34;perl&#34;&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;use&lt;/span&gt; JSON ();
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;my&lt;/span&gt; $json &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; JSON&lt;span style=&#34;color:#f92672&#34;&gt;-&amp;gt;&lt;/span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;new&lt;/span&gt;;
&lt;/span&gt;&lt;/span&gt;&lt;span style=&#34;display:flex;&#34;&gt;&lt;span&gt;&lt;span style=&#34;color:#66d9ef&#34;&gt;my&lt;/span&gt; $json_text &lt;span style=&#34;color:#f92672&#34;&gt;=&lt;/span&gt; $json&lt;span style=&#34;color:#f92672&#34;&gt;-&amp;gt;&lt;/span&gt;encode($r);
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p&gt;The thing that tripped me up most recently was using JSON&amp;rsquo;s encode_json routine,
without noticing that &lt;em&gt;that&lt;/em&gt; does the UTF-8 serialisation, which resulted in me
double-encoding the output. I find that I have to read documentation very
carefully in order to distinguish between interfaces (functions) that consume or
produce UTF-8 output &lt;em&gt;versus&lt;/em&gt; Unicode output. I want Unicode internally, so that
counting or splitting works as expected.&lt;/p&gt;
&lt;h2 id=&#34;thats-a-wrap&#34;&gt;That&amp;rsquo;s a wrap &lt;a href=&#34;#thats-a-wrap&#34; class=&#34;anchor&#34;&gt;🔗&lt;/a&gt;&lt;/h2&gt;&lt;p&gt;Now I&amp;rsquo;ve written it down and tested it, it looks simple again, but I was worried
that I&amp;rsquo;d done something tragically wrong when I picked the wrong JSON routine
and convinced myself that &lt;em&gt;that&lt;/em&gt; was the correct part, rather than the rest of
the components that had been working correctly up to that point. Sometimes I&amp;rsquo;m
unsure enough of my understanding that I presume I&amp;rsquo;m more likely to have got two
wrongs making a right than the clean stack.&lt;/p&gt;
&lt;div class=&#34;footnotes&#34; role=&#34;doc-endnotes&#34;&gt;
&lt;hr&gt;
&lt;ol&gt;
&lt;li id=&#34;fn:1&#34;&gt;
&lt;p&gt;Moving to &lt;a href=&#34;https://metacpan.org/pod/Dancer2&#34;&gt;Dancer2&lt;/a&gt; is my top resolution for 2025, but I really need someone
else to write that blog post that says &amp;ldquo;Get an account with so-and-so, run this
super deployment script and copy your application &lt;em&gt;here&lt;/em&gt; and Bob&amp;rsquo;s your uncle,
and it&amp;rsquo;ll cost you 37p per month.&amp;rdquo;&amp;#160;&lt;a href=&#34;#fnref:1&#34; class=&#34;footnote-backref&#34; role=&#34;doc-backlink&#34;&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&#34;fn:2&#34;&gt;
&lt;p&gt;With the maturing of the &amp;lsquo;unicode_strings&amp;rsquo; feature, this too became an historic worry.&amp;#160;&lt;a href=&#34;#fnref:2&#34; class=&#34;footnote-backref&#34; role=&#34;doc-backlink&#34;&gt;&amp;#x21a9;&amp;#xfe0e;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/div&gt;
</description>
    </item>
    
  </channel>
</rss>
