Tue Sep 17 14:23:41 +07 2019

Dealing with Let's Encrypt certificates

April 2023 the system also works with Certbot.

The old, manual, procedure is commented out at the bottom of this page.

Rationale

Before Let's Encrypt (LE) can sign a certificate request (CSR) and issue a certificate (CRT) it must verify that you are the owner of the domain for that certificate.

There are two major way the verification can happen: via the web or via DNS. As part of the verification process, LE will provide you with a one time challenge string and you have to install this string into an HTML page on your web server corresponding to the domain or you install the string as a TXT record for the DNS of the domain. In either case, you have proven that you can modify the web or the DNS, so you must be the rightful owner of the domain.

The web verification method is fully automated when the certificate is created for a web server. But when the domain has no web server attached (think for exemple an LDAP server) the DNS method must be used.

While the DNS method is "working", I have several issue with it. The major being that the high redundancy of caching in DNS means that the verification of the challenge may be quite hard to achieve. Once LE has failed to verify the challenge (for exemple because it has not fully propagated yet), that failure is cached in the world wide DNS and you must wait for the cache to expire before you can try again.

I also have the feeling that ClouFlare DNS (used by LE) is slower to update than Google DNS.

This resulted in a lot of frustration when the certificate renewal had to be requested many times over a 24 hours period before it finally succeeded.

Another issue with DNS verification is that the DNS zone that serves the domain must be allowing automatic updates. When a zone allows automatic updates, BIND is annoyingly rewriting the zone file into a leaner format, removing all comments, etc. And I want my zone files to keep the comments in it because other processes are based on these comments. In the past, I would keep a copy of the zone file and manually restore it after the certificates have been issued, putting the zone in dynamic update mode then back in static mode, but this is really not convenient and prone to errors.

I have several domains with certificates, each with different requirements, with different directories to install the files, some with combined form (CRT+KEY, CRT+CA). No two machines are equal when it comes to where and how to install the certificate, what service to restart after the installation, etc. A manual renewal demands a great deal of care to do something that could be automated.

To solve the problem of BIND mangling the DNS zone, I came to the idea of having a different sub-zone for each domain that needs a certificate. Each sub-zone is in update mode, BIND can rewrite that sub-zone, it does not matter to me and my other processes.

And with that I had all the bricks I needed to fully automate the issuing of certificates.

Synopsis

The solution is based on:

  1. a specific DNS sub-zone for each domain that need a certificate;
  2. a database to store the certificates for each domains;
  3. a script that will request new certificates from LE and store them in the database;
  4. a script on each server that get the certificate from the database, install it and does what is needed by that specific server to activate the new certificate;
  5. and a script of minor importance that creates a new entry in the database for a new domain.

The DNS sub-zones

In the file named.conf there is a similar entry for each domain:

zone "service.exemple.com" in {
     type master;
     allow-query { any; };
     file "db.service.exemple.com";
     zone-statistics yes;
     update-policy {
         grant acme-sh-update name _acme-challenge.service.exemple.com TXT;
     };
};

This entry defines a sub-zone for the domain service.exemple.com and makes the sub-zone updatable where acme-sh-update is a key used in the communications between DNS and the certificate update script. I use TSIG signatures, for example:

key "acme-sh-update" {
        algorithm hmac-sha512;
        secret "*************************************************************************************==";
};

This is the same key that is defrined in the file pointed by the parameter SAVED_NSUPDATE_KEY of the .acme.sh/account.conf file.

The main zone file for exemple.com is then modified and the record:

service          IN      A       192.168.0.19

is changed into:

service          IN      NS      dns.exemple.com

This change declares a sub-zone served on the same DNS server.

Finally, the sub-zone file db.service.exemple.com is like:

$ORIGIN .
$TTL 43200      ; 12 hours
service.exemple.com     IN SOA  dns.exemple.com. postmaster.exemple.com. (
       5          ; serial
       21600      ; refresh (6 hours)
       1800       ; retry (30 minutes)
       1209600    ; expire (2 weeks)
       43200      ; minimum (12 hours)
       )
                        NS      dns.exemple.com.
                        A       192.168.0.10

Once the sub-zone has been created and the DNS server has been restarted, I suggest to wait at least 12 hours for the sub-zone to propagate.

You can also use a tool like DNS Checker to monitor the propagation of the DNS information for service.exemple.com. It is worth to add dns.google and one.one.one.one (CloudFlare) to the checker. Once the checker returns stable results, you can start issuing certificates.

Note: the sub-zone is dynamic, you cannot directly edit it once it is online. you must rndc freeze it before editing and rndc thaw it after.

Since each server has its own sub-zone, DNS verification cannot work for alternative names: if a certificate request is issued for foo.exemple.com and bar.exemple.com, they each have their own sub-zone, but the request the LE certificate will try to install the challenge _acme-challenge.bar.exemple.com in the sub-zone foo.exemple.com and it will fail. The only way it could succeed would be to use a single zone at the upper level exemple.com but it would present the inconvenients mentioned above.

From my experience, using sub-zones leads to a way more reliable access to LE verification over DNS, I achieved a 100% success rate, compared to the frustration I was experiencing before.

Certbot

Going with certbot avoid all the DNS procedure above. But it is limited to web sites that are hosted on the server that runs the download_cert.pl script.

The script will tell certbot to write the challenge in a specific location that is defined by webroot_path and must correspond to the DocumentRoot directory of the web site.

To allow for alternative names, certbot is used with the option --webroot only.

All websites covered by the primary and alternative names must have the same DocumentRoot directory: they must all be different name for one single web site (for exemple the service is chaging name).

The database

The database has only one table: certs. The table structure is shown below:

CREATE TABLE `certs` (
  `name` char(250) COLLATE ascii_bin NOT NULL,
  `enable` enum('Y','N') COLLATE ascii_bin DEFAULT 'Y',
  `comment` tinytext COLLATE ascii_bin,
  `certbot` enum('Y','N') COLLATE ascii_bin DEFAULT 'N',
  `csr` blob NOT NULL,
  `ca` blob,
  `crt` blob,
  `md5_modulus` char(64) COLLATE ascii_bin NOT NULL,
  `md5_ca` char(64) COLLATE ascii_bin DEFAULT NULL,
  `ts_expire` int(64) unsigned DEFAULT NULL,
  `ts_last_update` int(64) unsigned DEFAULT NULL,
  `count_fail_update` int(8) DEFAULT NULL,
  `old_ca` blob,
  `old_crt` blob,
  `webroot_path` tinytext COLLATE ascii_bin,
  PRIMARY KEY (`name`),
  KEY `name` (`name`)
) ENGINE=InnoDB DEFAULT CHARSET=ascii COLLATE=ascii_bin

The fields are used as follow:

name
the domain name, the name that will be secured with the certificate, this field is compulsory;
enable
if set to N the certificate for that domain will not be update automatically;
comment
it can be useful;
certbot
should be Y if the certificate is retreived via certbot instead of acme.sh, this is important for certificates with alternative names;
csr
the CSR for the domain, this field is compulsory as there could not be a certificate issued without a request;
ca
the certificate for the Certification Authority (CA); once in a while the CA will change and needs to be updated accordingly;
crt
the certificate obtained from LS;
md5_modulus
the MD5 of the modulus of the key/CSR, the modulus should be consistent on the key, the CSR and the CRT, having the modulus saved in the database allows a faster check, this field is compulsory but the data is obtained from the CSR;
md5-ca
the MD5 of the CA certificate, it is useful to check is the CA has changed;
ts_expire
the timestamp of the expiry date of the CRT, used to check is any new certificate has been saved in the database, caching that value saves time;
ts_last_update
the timestamp of the last tentative to contact LE to issue that certificate, LE has a moderation rate, this timestamp is used to avoid contacting LE too often;
count_fail_update
the number of unsuccessful updates for the given certificate, if that number grows above 10, it means there are serious problem, it should never grow above 2;
old_ca
the old CA certificate, once a new CA certificate has been issued;
old_crt
the old CRT once a new CRT has been issued;
webroot_path
the directory where certbot will place the challenge, it must correspond to the DocumentRoot directory of the web site.

Creating alternative names

The web site foo.exemple.com can also be accessed by bar.otherdomain.com. Both URLs point to the same IP address, the second URL is a CNAME to the first URL.

The certificate request must include both names, where foo is the primary name and bar is the alternative name.

To create the CSR, you must create a configuration file similar to:

[req]
default_bits       = 2048
distinguished_name = req_distinguished_name
req_extensions     = req_ext
prompt             = no
[req_distinguished_name]
countryName             = TH
stateOrProvinceName     = Pathumthani
localityName            = Klong Luang
organizationName        = Asian Institute of Technology
organizationalUnitName  = CSIM
commonName              = foo.exemple.com
emailAddress            = on@cs.ait.ac.th
[req_ext]
subjectAltName = @alt_names
[alt_names]
DNS.1 = bar.otherdomain.com

In the openssl req -new... command, add -config config_file and use the CSR in the database as usual.

The scripts should work with the CSR with alternative names, but certbot must be enabled for that host and the webroot_path must be completed.

The download_cert.pl script

This script is run once a day on a central server.

The script:

  1. for each domain that is enable:
    • if the certificate is about to expire (less that n days until ts_expire):
      • tries to get a certificate from LE
      • if successful and the modulus is consistent with md5_modulus, saves CRT in the database, updates ts_expiry, saves the CA in the database and updates md5_ca;
      • if unsuccessful, increases count_fail_update
  2. sleeps one hour to avoid LS throttling;
  3. repeats for each domain that was unsuccessful until success or count_fail_update becomes to high.

The install_cert.pl script

This script is run once a day on each server that uses an LE certificate.

To accommodate slight variation in the clock of the systems, two timestamps that differ by less than one hour are deemed equivalent.

The script:

  1. if the domain is enable
  2. if the certificate in the database is not yet expired (ts_expire older than current time)
  3. if the certificate in the database is newer than the certificate currently installed
    1. makes a copy of the current certificate;
    2. installs the new certificate file;
    3. if the CA has been updated (comparing md5_ca) installs the new CA;
  4. if a new certificate has been installed, does whatever is needed to restart the service:
    • simply restarts the service and check it is still accessible afterward;
    • creates some files combining the CRT with the key or the CA before restarting the service;
    • sends an email to an operator who will restart the service manually;
    • reboots the server;
    • logs data to syslog;
    • etc.

This script needs to be customized to fit the operating system, the service, the environment, etc.

This script uses the Perl modules Unix::Syslog, Mail::SendEasy, Date::Manip::Date, DBI, DBD::mysql, as well as any module necessary to test that the service is running (Net::LDAP, Authen::Simple::RADIUS, etc.) The modules need to be installed on every server running the script.

So far I have exemples of install script for Apache web site, Postfix mail, OpenLDAP, FreeRadius and VMware ESXi. Other install scripts are not categorised. As FreeRadius is running in a chroot'ed environment, the certicicate go installed in the chroot directory.

The populate.pl script

This script is used to add a domain in the database. It takes the name of the domain as argument.

The CSR must exist in a pre-defined place. The md5_modulus is extracted from the CSR.

In the database, enable is set to no.

Other fields are left null or set to zero.

The fields ts_expire and ts_last_update must be set to 1 to avoid a problem in the download script.


Posted by Olivier | Permanent link | File under: administration