Blog v3: Let's Try Again (also, NixOS modules)

Ok, so, it’s been like 5 years since a blog post. In my defense, I was getting my Master’s Degree at PSU over that time. Also, I’ve been raising a daughter, which generally keeps me short on time.

Now that I’m fully a master at all of the computer sciences, I figured it’s time to take another stab at this whole writing thing. Besides, I’m always working on projects and nobody gets to know about them if I don’t do this. So, here goes…

Hugo

When I decided to reboot this thing a few days ago, I first wanted to make sure I was using a reasonable framework. I’ve made poor decisions in the past about static site generators, using Misaki on the original site, and then moving the Hexo for the 2017 “reboot” (3 articles probably doesn’t count as a reboot). I know Jekyll has been around forever, but arbitrarily I went with Hugo instead. Thankfully, as far as content went, it was a simple port this time around, as I had already ported to markdown with front matter for the Hexo port. Honestly, the Hugo port is boring and I don’t wanna spend a lot of time writing about it. The hardest part was finding a free theme that had decent features, and in that I found PaperMod. I am fairly happy with it, but the docs could use some work. That being said, because Hugo compiles pages so fast, I was able to use general trial-and-error to get I wanted in just an hour or so.

Netlify

The second part of rebooting the blog was moving away from hosting the site myself on a DigitalOcean droplet VPS and hosting it on some automated hosting. I had heard from my good friend and co-worker Chris that Netlify is very good for for this sort of thing, so I decided to sign up. Netlify is excellent. I quickly had it wired up to my Gitlab repo, with Netlify installing a deploy webhook as part of the process. Again, I don’t really think there’s much to talk about here, as there are a lot of tutorials on hosting on Netlify.

It’s probably worth mentioning that the previous VPS-hosted version used some dumb scripting I wrote myself that used rsync over ssh and included decrypting an ssh key within the Travis workflow. It wasn’t the worst, but it was also not very good. The original version of the site was even worse and used scp and expect. I dunno what I was thinking…

Books

Ok, so this is actually the meat of the post. On the original VPS, I had an nginx configuration that hosted the blog under /, and also just had files served up at /books/. These were mostly PDFs of books we read for CS Book Club, once upon a time, as well as whatever comics we’re reading in the comic book club I started. I’ve never liked this setup, as I end up getting random books on my laptop, and then I need to scp them to the VPS, and put them in a certain directory, and I do this every time. Also, my VPS isn’t the biggest thing so I was always contending with disk space. I told myself today that I wanted to solve this problem forever, somehow. That somehow became pretty interesting to me, and I think perhaps others would also be interested.

S3

I expect people know that Amazon S3 is an object store where you can host effectively infinite files for pennies. It makes a lot of sense for me to host these there, as they’re very static, not too big, and it’s something like $0.09 per GB to transfer out of the bucket to the wild woolie Internet. The issue with hosting in S3 is that there’s not really a straight-forward interface for access if you aren’t an AWS account owner or something similar. You can easily link to public objects and fetch them out of the bucket, but you have to have the link for it to be useful. There’s no going to the root of a directory and getting a listing like you can with apache or nginx, so that was the first complication. To solve this, I found github.com/rufuspollock/s3-bucket-listing, which cleverly solves this problem for S3 buckets set up as static website hosting, by parsing the XML bucket listing, parsing the content, and generating an html listing from the contents. By setting this script as the index page, as well as the error page, it is able to also parse whichever directory you try to navigate to, and fetch the listings for subdirectories, all from one static file. I thought this was very clever.

There are a few things to know about hosting files this way. First thing, if you want to use a custom domain to point at the bucket, you need to name the bucket to match the domain. I made this mistake at first, and had to destroy my original bucket and replace it with the correctly-named bucket. Secondly, static sites hosted from S3 cannot use https. If you want this, you will need to use CloudFront instead, which can still use S3 as the data source. I may end up doing this as a later improvement. This would also allow me to use Lambda@Edge to set up simple authentication.

Rclone

Another thing to figure out was how to automate getting my books from my NAS at home to the S3 bucket, on a regular schedule. Again, my friend and co-worker Chris had mentioned rclone as the right tool for the job. rclone is basically rsync on crack. rsync is great and it’s served me well for years, but rclone does what I need and abstracts away all the various cloud provider storage options instead of just scp.

You need to set up a small config file about the cloud provider details. You can do so interactively by running the following:

rclone config

For S3, my generated configuration looks like the following:

home :: ~ % cat ~/.config/rclone/rclone.conf
[remote]
type = s3
provider = AWS
env_auth = true
region = us-west-2
location_constraint = us-west-2
acl = public-read
server_side_encryption = AES256
storage_class = STANDARD

For the above config, the setting env_auth = true means to pull the AWS credentials from the standard AWS_ variables in the environment. I opted to do this to keep credentials out of the configuration. After having those variables set, here’s an example of using rclone:

rclone sync ./local-dir remote:my-s3-bucket

rclone can do a whole lot, which is my main motivation for using it. It means if I decided to move to a different cloud object store (or something local even), I could just adjust the configuration and not have to retool.

NixOS

Now, having a destination for the files and a tool for transfering them, I need a place to run the tool from. My NAS contains all the books, and I run NixOS there, so my initial plan was a simple cronjob-type thing from there. I already have a similar cronjob for my snapraid-runner module , so I planned to copy the module I made for that, and refactor it for this new situation. If you’re not familiar with NixOS, it’s a Linux distribution where configuration is centralized and declarative. I’m honestly not an expert at Nix yet, but I use my NAS as a way to learn it, and I’m getting better over time.

Instead of jumping right into the module itself, here is the instantiation of the module. I feel like if you see how the module is used, it should give more context for how the module is structured.

# these are function arguments
# they aren't actually used in the module instantiation, but the calling module will pass them, regardless
{ pkgs, config, lib, ... }:

# these are local bindings that just avoid some duplication in the body of the module
let
  bucket = "<bucket name>";
  region = "<region>";
in
{
  services.bucket-server = {
    enable = true;

    # this is an "attrset", which is nix's name for a map/dictionary/hash
    # the idea here is <key> is a local directory, which is synced to the bucket's <value> subdirectory
    # so '/mnt/storage/comics -> <s3 bucket>/comics', etc
    syncDirs = {
      "/mnt/storage/comics" = "comics";
      "/mnt/storage/books" = "books";
    };

    bucketName = bucket;
    bucketUrl = "https://s3.${region}.amazonaws.com/${bucket}";

    # this is for additional flags for rclone, which aren't required
    # '-v' is for 'verbose', which is to make the logs more useful
    # rclone is pretty quiet by default
    globalFlags = "-v";

    # this is probably not the best way to handle these secrets, but it keeps it out of source control
    awsAccessKeyId = builtins.readFile "/var/secret/aws-access-key-id.txt";
    awsSecretAccessKey = builtins.readFile "/var/secret/aws-secret-access-key.txt";
    awsDefaultRegion = region;
  };
}

Hopefully the in-line comments help make some sense of what each setting is about.

Now, we can look at the module itself.

{ config, pkgs, lib, ... }:

with lib;
let
  # this is a convenience, because we need access to the instantiation configuration regularly
  # 'cfg' allows access to all the settings we defined in the previous section
  cfg = config.services.bucket-server;
in
{
  # 'options' is the interface of your module.
  #  you have all of your settings defined here, with their types and default values (if any)
  options.services.bucket-server = with lib; with types; {
    enable = mkEnableOption "bucket-server";

    # systemd timer settings
    startAt = mkOption {
      type = str;
      default = "*-*-* *:00:00"; # every hour
    };

    # rclone settings
    syncDirs = mkOption {
      type = attrs;
    };

    globalFlags = mkOption {
      type = str;
      default = "";
    };

    bucketName = mkOption {
      type = str;
    };

    # s3 listing settings
    bucketUrl = mkOption {
      type = str;
    };

    indexTitle = mkOption {
      type = str;
      default = "S3 Bucket Listing Generator";
    };

    s3blIgnorePath = mkOption {
      type = str;
      default = "false";
    };

    excludeFile = mkOption {
      type = str;
      default = "index.html";
    };

    autoTitle = mkOption {
      type = str;
      default = "true";
    };

    # aws settings (for rclone)
    awsAccessKeyId = mkOption {
      type = str;
    };

    awsSecretAccessKey = mkOption {
      type = str;
    };

    awsDefaultRegion = mkOption {
      type = str;
      default = "us-east-1";
    };
  };

  # config is the 'meat' of the module, where the work is actually done
  # what we're making here is a systemd timer (effectively a cron job) that does the following:
  #   1. creates an index.html file to the s3 bucket via rclone, using the settings we defined before.
  #      defaults were used in many cases, but can be overridden in our instantiation.
  #
  #   2. creates the rclone remote configuration, also filled in with settings from our instantiation
  #      in this case, it is minimal and only setting the aws region of the bucket
  #
  #   3. setting up the system timer, with a description, environment, trigger time, etc
  #
  #   4. generating the script that is run by the systemd timer. we'll discuss what's going on there more, below
  config = mkIf cfg.enable {
    systemd.services.bucket-server =
    let
      # 1. create the index.html for s3 bucket listing
      # https://github.com/rufuspollock/s3-bucket-listing
      indexFile = builtins.toFile "index.html" ''
        <!DOCTYPE html>
        <html>
        <head>
          <title>${cfg.indexTitle}</title>
        </head>
        <body>
          <div id="navigation"></div>
          <div id="listing"></div>

        <script type="text/javascript" src="https://ajax.googleapis.com/ajax/libs/jquery/3.1.1/jquery.min.js"></script>
        <script type="text/javascript">
          var S3BL_IGNORE_PATH = ${cfg.s3blIgnorePath};
          var BUCKET_URL = '${cfg.bucketUrl}';
          var EXCLUDE_FILE = '${cfg.excludeFile}';
          var AUTO_TITLE = ${cfg.autoTitle};
          var BUCKET_WEBSITE_URL = 'http://${cfg.bucketName}'
        </script>
        <script type="text/javascript" src="https://rufuspollock.github.io/s3-bucket-listing/list.js"></script>
        </body>
        </html>
      '';

      # 2. create the rclone cloud configuration
      rcloneConfig = builtins.toFile "" ''
        [remote]
        type = s3
        provider = AWS
        env_auth = true
        region = ${cfg.awsDefaultRegion}
        location_constraint = ${cfg.awsDefaultRegion}
        acl = public-read
        server_side_encryption = AES256
        storage_class = STANDARD
      '';
    in
    {
      # 3. setting up the systemd timer settings, including the AWS environment variables necessary for rclone
      description = "Bucket Server";
      serviceConfig = {
        Type = "oneshot";
        User = "root";
      };
      environment = {
        AWS_ACCESS_KEY_ID = cfg.awsAccessKeyId;
        AWS_SECRET_ACCESS_KEY = cfg.awsSecretAccessKey;
        AWS_DEFAULT_REGION = cfg.awsDefaultRegion;
      };
      path = [
        pkgs.rclone
      ];
      startAt = cfg.startAt;

      # 4. generating the systemd script run by the timer
      script = ''
        set +e

        ${pkgs.rclone}/bin/rclone --config="${rcloneConfig}" ${cfg.globalFlags} copyto ${indexFile} remote:${cfg.bucketName}/index.html
        ${concatStringsSep "\n"
            (forEach (attrNames cfg.syncDirs) (syncDir:
              ''${pkgs.rclone}/bin/rclone --config="${rcloneConfig}" ${cfg.globalFlags} --track-renames sync ${syncDir} remote:${cfg.bucketName}/${cfg.syncDirs.${syncDir}}''))
        }
      '';
    };
  };
}

I think this is a pretty standard NixOS module, although I haven’t spent a ton of time looking at other modules. There’s probably a ton of better ones to look at, but this one is mine.

Problem

The nix language is a functional language, so sometimes the tools you’re used to aren’t always available if you’re used to writing imperitive-style code. I knew that I wanted the ability to pass in a map of source and destination directories for this module, and have the module iterate over them to generate multiple calls to rclone in order to sync each one in turn. However, I had never really dealt with programming over the collection types in nix. To say concisely, what I wanted was this:

# In:
syncDirs = {
  "/local/dir/1" = "a";
  "/local/dir/2" = "b";
  ...
};

# Out:
''
rclone sync /local/dir/1 remote:bucket/a
rclone sync /local/dir/2 remote:bucket/b
...
''

Solution

To do this, I used the following “function”:

concatStringsSep "\n"
    (forEach (attrNames cfg.syncDirs) (syncDir:
      ''${pkgs.rclone}/bin/rclone --config="${rcloneConfig}" ${cfg.globalFlags} --track-renames sync ${syncDir} remote:${cfg.bucketName}/${cfg.syncDirs.${syncDir}}''))

Let’s work from the inside out. The inner-most portion is our generated string. All of the variable substitutions are from outside of the calling function, except syncDir.

Lambda

syncDir: ''${pkgs.rclone}/bin/rclone --config="${rcloneConfig}" ${cfg.globalFlags} --track-renames sync ${syncDir} remote:${cfg.bucketName}/${cfg.syncDirs.${syncDir}}''

This is actually just an unnamed function (or lambda) that takes one argument, syncDir, and returns a new string that contains syncDir (and the rest)

attrNames

let
  set = {
    a = 1;
    b = 2;
    c = 3;
  }
in
  attrNames set;

This returns a list of the keys of an attrset, so in our given example above, it would return [ "a" "b" "c" ]

forEach

forEach (attrNames cfg.syncDirs) (syncDir:
  ''${pkgs.rclone}/bin/rclone --config="${rcloneConfig}" ${cfg.globalFlags} --track-renames sync ${syncDir} remote:${cfg.bucketName}/${cfg.syncDirs.${syncDir}}'')

forEach is actually just map with the arguments reversed. I could have used map instead, but I think I stumbled upon forEach first when randomly googling for examples. If you’re unfamiliar with map, I will forgo the details and try to describe forEach.forEach xs f takes a list as the first argument, and a function as the second argument. It then takes each element of the first argument list and calls the function using the element as its argument. It collects the results and puts them in a new list.

forEach ["a" "b" "c"] (arg: "hello ${arg}"

# returns:
["hello a" "hello b" "hello c"]

We are getting extremely close to what we need now. The last issue is that we’re embedding this expression within the larger context of the script, which needs to be a string, so we need to join the list of strings back into a single string that can be embedded in the script.

concatStringsSep

concatStringsSep sep xs takes a separator string and a list of strings, and will concatenate all the elements of the list, using the separator between.

concatStringsSep " uwu " [ "a" "b" "c" ]

# returns:
"a uwu b uwu c"

# "\n" means newline
concatStringsSep "\n" [ "a" "b" "c" ]

# returns:
"a
b
c"

With this, we are able to take our list of generated rclone commands, and concatenate them into a multiline string that can be embedded into our systemd script. We are done, and the script is semi-dynamic. If I decide to add another directory to be synced, it’s as easy as adding a key and value to my syncDirs setting within my module instantiation. Huzzah.

Tying It All Together

Having created my S3 bucket (with the correct name for using a custom domain), and learning how rclone works, and now also creating a NixOS module to manage rclone, aws credentials, and a systemd timer, let’s see it in action.

home :: ~/code/nixos-config ‹master*› % sudo nixos-rebuild switch -I nixos-config=./hosts/rattnix/configuration.nix
building Nix...
building the system configuration...
trace: warning: findutils locate does not support pruning by directory component
activating the configuration...
setting up /etc...
reloading user units for rattboi...
setting up tmpfiles
the following new units were started: bucket-server.timer

nixos-rebuild switch is the command that applies your declarative centralized configuration in NixOS. The last line shows that bucket-server.timer is started.

Lets look at the systemd unit file that was generated.

home :: ~/code/nixos-config ‹master*› % systemctl cat bucket-server.service
# /etc/systemd/system/bucket-server.service
[Unit]
Description=Bucket Server

[Service]
Environment="AWS_ACCESS_KEY_ID=..."
Environment="AWS_DEFAULT_REGION=..."
Environment="AWS_SECRET_ACCESS_KEY=..."
Environment="LOCALE_ARCHIVE=/nix/store/gfzp1a6ab4ffwg75bnrycwdrd7cqki1i-glibc-locales-2.33-117/lib/locale/locale-archi>
Environment="PATH=/nix/store/ndvmiwn5ir7s55kla8m2kxsc2lyah79j-rclone-1.57.0/bin:/nix/store/jd1y449cf66yx5d1hwyjvc4562b>
Environment="TZDIR=/nix/store/hcrw29p0rv8lkb31yb728kgna4nq1ydd-tzdata-2021c/share/zoneinfo"

ExecStart=/nix/store/7lzqqxif5kyapbs0pqxmj5fd2gm9qx25-unit-script-bucket-server-start/bin/bucket-server-start
Type=oneshot
User=root

Looking good. Lets go one level deeper to the systemd script referenced above as ExecStart.

home :: ~/code/nixos-config ‹master*› % cat /nix/store/7lzqqxif5kyapbs0pqxmj5fd2gm9qx25-unit-script-bucket-server-start/bin/bucket-server-start
#!/nix/store/bm7jr70d9ghn5cczb3q0w90apsm05p54-bash-5.1-p8/bin/bash -e
set +e

/nix/store/ndvmiwn5ir7s55kla8m2kxsc2lyah79j-rclone-1.57.0/bin/rclone --config="/nix/store/asi51j2d04nyw6zq14vq5gsv2qwgyfm8-" -v copyto /nix/store/bhzna6x6sq7x6xw21gqv1pwajchw3jb5-index.html remote:<domain>/index.html
/nix/store/ndvmiwn5ir7s55kla8m2kxsc2lyah79j-rclone-1.57.0/bin/rclone --config="/nix/store/asi51j2d04nyw6zq14vq5gsv2qwgyfm8-" -v --track-renames sync /mnt/storage/books remote:<domain>/books
/nix/store/ndvmiwn5ir7s55kla8m2kxsc2lyah79j-rclone-1.57.0/bin/rclone --config="/nix/store/asi51j2d04nyw6zq14vq5gsv2qwgyfm8-" -v --track-renames sync /mnt/storage/comics remote:<domain>/comics

Of course there’s a lot of noise there, but this script can and does live independently of any globally installed software, and can co-exist with any other version of bash, rclone, etc installed. And the fact is that you don’t have to see this, as it’s all abstracted away in the NixOS module I wrote. Lastly, lets look at any output of the script running.

home :: ~/code/nixos-config ‹master*› % sudo journalctl --follow -u bucket-server.service
[sudo] password for rattboi:
-- Journal begins at Wed 2022-06-22 03:04:27 PDT. --
Jun 27 01:00:06 home bucket-server-start[1802236]: INFO  : S3 bucket <bucket> path comics: Making map for --track-renames
Jun 27 01:00:06 home bucket-server-start[1802236]: INFO  : S3 bucket <bucket> path comics: Finished making map for --track-renames
Jun 27 01:00:07 home bucket-server-start[1802236]: INFO  : There was nothing to transfer
Jun 27 01:00:07 home bucket-server-start[1802236]: INFO  :
Jun 27 01:00:07 home bucket-server-start[1802236]: Transferred:                     0 B / 0 B, -, 0 B/s, ETA -
Jun 27 01:00:07 home bucket-server-start[1802236]: Checks:               265 / 265, 100%
Jun 27 01:00:07 home bucket-server-start[1802236]: Elapsed time:         1.4s
Jun 27 01:00:07 home systemd[1]: bucket-server.service: Deactivated successfully.
Jun 27 01:00:07 home systemd[1]: Finished Bucket Server.
Jun 27 01:00:07 home systemd[1]: bucket-server.service: Consumed 2.440s CPU time, received 976.9K IP traffic, sent 545.1K IP traffic.

Glorious. There were no new files to sync, so the script took only 1.4 seconds to run. If any file is added to one of the local directories I’ve defined in the module instantation, the remote subdirectory is updated within the hour.

Conclusion

I hope you’ve learned something from reading this long-winded post, even if what you’ve learned is that you never want to read my blog again. It’s the longest post I’ve ever written, which is by no means a statement to its quality. The main things that I hoped to show is that NixOS and functional progrmaming is cool, and that most problems aren’t that deep if you know how to google effectively. I was able to get most of what was described in this post done in a few hours, because I was able to leverage the many open-source tools that others before me have developed, and for that I am grateful.

Thanks again for reading my post. I don’t know how to conclude a post, obviously. I’ll work on it.

Hugo#

Netlify#

Books#

S3#

Rclone#

NixOS#

Problem#

Solution#

Lambda#

attrNames#

forEach#

concatStringsSep#

Tying It All Together#

Conclusion#