The blog of dlaa.me

It depends... [A look at the footprint of popular Node.js command-line parsing packages]

In a recent discussion of the Node.js ecosystem, I opined that packages with a large number of dependencies contribute to excessive disk space use by apps that reference them.

But I didn't have data to back that claim up, so I made some measurements to find out. Command-line argument parsing is a common need and there are a variety of packages to make it easier. I found nine of the most popular and installed each into a new, blank project as a standard dependency item in package.json via npm install. Then I counted the number of direct dependencies for that package, the total (transitive) number of packages that end up being installed, and the size (in bytes) of disk space consumed (on Windows). I tabulated the results below and follow with a few observations.

Important: I made no attempt to assess the quality or usefulness of these packages. They are all popular and each offers a different approach to the problem. Some are feature-rich, while others offer a simple API. I am not promoting or critiquing any of them; rather, I am using the aggregate as a source of data.

Package Popularity Direct Dependencies Transitive Dependencies Size on Disk
argparse 494 1 2 152,661
commander 18865 0 1 48,328
command-line-args 677 3 5 237,789
dashdash 156 1 2 94,377
meow 2344 10 43 455,525
minimatch 2335 1 4 57,803
minimist 8490 0 1 31,151
nomnom 549 2 6 119,237
yargs 7516 12 44 576,724

These metrics were captured on 2017-11-25 and may have changed by the time you read this.

Notes:

  • The two most popular packages are the smallest on disk and have no dependencies; the third and fourth most popular are the biggest and have the most dependencies.
  • Packages with fewer dependencies tend to have the smallest size; those with the most dependencies have the largest.
  • The difference between the extremes of direct dependency count is about 10x.
  • The difference between extremes for transitive dependency count is about 40x.
  • The difference between disk space extremes is about 20x.

While this was a simple experiment that doesn't represent the whole Node ecosystem, it seems reasonable to conclude that:

Similar packages can exhibit differences of an order of magnitude (or more) in dependency count and size. If that matters for your scenario, measure before you choose!

For my part, I tend to resist taking on additional dependencies when possible and prefer using dependencies that adhere to the same principle. Reinventing the wheel is wasteful, of course - but sometimes less is more and it's good to keep complexity to a minimum.

Back to backup [Revisiting and refreshing my approach to backups]

A topic that comes up from time to time is how people deal with backing up their data. It's one of those times again, so I'm sharing my technique for the benefit of anyone who's interested. My previous post on backup strategy was written over 11 years ago (!), and it's surprising how much is still relevant. This post expands on the original and includes my latest practices. Of course, we all have different priorities, restrictions, and aversion to data loss, so what I describe here doesn't apply universally. Just take whatever's relevant and ignore the rest. :)

Considerations

Dependability - Hardware eventually fails. Things with moving parts tend to die sooner, but even solid-state drives have problems in the long run. Sometimes, the risk of failure is compounded because a storage device is so tightly integrated with the computer that it can become unusable due to failure of an unrelated component. And even if hardware doesn't break on its own, events like a lightning strike can take things out unexpectedly.

Accidental deletion or overwrite - As careful as one might be, every now and then it's possible to accidentally delete an important file. Or maybe just open it, make some inadvertent edits, and absentmindedly save them. This is a scenario many backup strategies don't account for; if an unwanted change to the original data is immediately replicated to a backup, it can be difficult to recover from the mistake.

Undetected corruption - A random hardware or software failure (or power loss) can invisibly result in a garbled file, directory, or disk. The challenge is that corruption can go undetected for a long time until something happens to need the relevant bits on disk. It's easy to propagate corruption to a backup copy when you don't realize it's present.

Local disaster - Fire and tornadoes are rare, but can destroy everything in the house. Even with a perfect a backup on a spare drive, if that drive was in the same place at the time of the disaster, the original and backup are both lost. Odds of avoiding the problem are improved slightly by keeping drives in different rooms, but a sufficiently serious calamity will not be deterred.

Large-scale disaster - The likelihood is quite low, but if a flood (as a timely example) comes through the area, it won't matter how many backups are stored around the house because all of them will be underwater. The only protection is to store things offsite, preferably far away.

Security/privacy - Most people won't be the target of industrial espionage, but a nosy house guest can be just as problematic - especially so because they have prolonged access to your data. If thieves steal a computer, they can read its drives. Bank records, tax documents, source code, etc., can all be compromised if precautions haven't been taken to encrypt the original and all backups.

Bandwidth - Many people use services that store data in the cloud. With a fast-enough connection to the Internet, this works well. But typical Internet plans have limited upstream bandwidth, meaning it can take hours to upload a single video. Things catch up over the course of a few days, but the latest data can be lost if service is abruptly cut off.

Validation - In order to be sure backups are successful, it's necessary to test them - regularly. Otherwise, you might not be as well prepared as you thought. Some backup approaches are easy to verify, others not so much.

Ease of recovery - In the event of a complete restore from backup, things have probably gone very wrong and any additional hardship is of little concern. That said, having immediate access to all data is a nice perk.

Cross-platform support - Whether you prefer Windows, Mac, or Linux, you'll probably stay with your OS of choice after catastrophes strikes. But it's nice to have a backup strategy where data can be recovered from different platform. If your threat model includes a global virus that wipes out a particular operating system, cross-platform support is important.

Implementation

All drives storing data I care about are encrypted with BitLocker. This mitigates the risk of theft/tampering and means I can leave backup drives anywhere. Each backup drive is a 1 or 2 TB 2.5" external USB drive. These devices are nice because they don't need a separate power supply and they're inherently portable and resilient - especially when stored inside a waterproof Ziploc bag.

At the end of each day, I mirror the latest changes from the primary drive in my computer to a backup drive sitting next to it. This step protects against hardware failures (high risk) and this is the drive I'll take if I need to leave home in a hurry.

Every couple of months, I calculate the checksum of every file and save them to the drive. Then I mirror to the backup, calculate checksums for everything on the backup, and make sure all the checksums match. This makes sure every bit on both drives is correctly written and readable. Any files that weren't changed help detect bit rot because those checksums are stable. Once verified, I swap the backup drive with another just like it at a nearby location (ex: friend's house, safe deposit box, etc.). This protects against theft or fire (lower risk).

Once a year, I swap the latest backup drive with another drive in a different geographic region (ex: relative's house). This protects against large-scale disaster like a flood (much lower risk), so the fact that it can be up to a year out of date is acceptable.

Because I've been doing this for a while, I also have a couple of spare drives that don't get updated regularly. These provide access to even older backups and can be used to recover outdated versions of a file in the event of persistent, undetected corruption or significant operator error.

Conclusion

This system addresses the above considerations in a way that works well for me. Extra redundancy provides peace of mind and requires very little effort most of the time. Cost is low and every step is under my control, so I don't need to worry about recurring fees or vendors going out of business. Recovery is simple and my family knows how to access backups if I'm not available. Remote backup drives consume no power, can live in the back of a drawer, and can be lost or destroyed without concern.

Nota bene

When I did the math on storage requirements a decade ago, I was concerned about capacity needs outpacing storage advances. But that didn't happen. In practice, the price of a drive with room for everything has stayed around (or below) $100. I buy a new drive every couple of years, meaning the amortized cost is minimal.

Tags: Technical

Hour of No Code [Solving Day 1 Part 1 of Advent of Code 2016 without writing any code]

Advent of Code is a cool idea! Author Eric Wastl describes it as "a series of small programming puzzles for a variety of skill levels". I thought I might work through the first few days with non-programmer children, so I read the description of Day 1: No Time for a Taxicab:

[...] start at the given coordinates [...] and face North. Then, follow the provided sequence: either turn left (L) or right (R) 90 degrees, then walk forward the given number of blocks [...]

Following simple, mechanical steps seemed like a fine introduction to programming for young people, so I signed in and requested the data for Part 1 of the puzzle.

Oh...

You see, while the examples Eric provides are quite simple, the actual input is over 150 steps and some of the distances are three-digit numbers. So while this puzzle is perfect for a small programming challenge, it'd be too tedious to work through manually.

Unless...

What if we could solve the programming puzzle without any programming?

What if we could solve it with something familiar to non-developers?

What if we could solve it with a basic spreadsheet???

Hmmm...

Let's try!

We'll be using Google Sheets because it's free, easy to use, and convenient. (But any decent spreadsheet will do.) And we'll be working with the sample data (not the actual puzzle data) so as not to give too much away. Keep reading to see all the steps - or follow along in a spreadsheet if you want to get your hands dirty.

Good, let's start with the sample data:

R5, L5, R5, R3 leaves you 12 blocks away.

To understand why and familiarize yourself with the puzzle, read about taxicab geometry and work through the steps on paper first. It's important to realize that R and L steps do not necessarily alternate and that the direction at the end of each step depends on the direction at the beginning of the step. However, given the state after N steps, the state after step N+1 is completely defined by that direction and distance.

Spreadsheets typically go "down", so let's set ours up like that by adding the steps to a Data column (leaving row 2 blank for reasons that will be clear later):

A
1 Data
2
3 R5
4 L5
5 R5
6 R3

Each step contains two pieces of information: the Direction to turn and the Distance to move. Let's start by pulling them apart so we can deal with them separately. The LEFT and RIGHT functions are our friends here (with a little help from LEN). The Direction is simply the first character of the data, or =LEFT(A3, 1) (assuming we're looking at row 3, the first value). And the Distance is just the rest of the string, or =RIGHT(A3, LEN(A3)-1).

Type those formulas into cells B3 and C3 and replicate the formulas down to row 6. (Your spreadsheet program should automatically update the formulas for rows 4-6 to be relative to their respective cells.) Done correctly, that gives the following:

A B C
1 Data Direction Distance
2
3 R5 R 5
4 L5 L 5
5 R5 R 5
6 R3 R 3

We'd like to turn the Direction letter into a number so we can do math on it - the IF function works nicely for that. By using -1 to represent a left turn and 1 to represent a right turn, use the formula =IF(B3="L", -1, 1) to create a Rotation field for row 3 (and then replicate the formula down):

A B C D
1 Data Direction Distance Rotation
2
3 R5 R 5 1
4 L5 L 5 -1
5 R5 R 5 1
6 R3 R 3 1

Now let's use Rotation to create a Heading that tracks where the player is pointing after each move. The four possible values are North, East, South, and West - it's convenient to use the numbers 0, 1, 2, and 3 respectively. Because we're adding and subtracting numbers to determine the Heading, it's possible to get values outside that range (ex: 3 + 1 = 4). Just like with a 12-hour clock (11am + 2 hours = 1pm), we'll use modular arithmetic to stay within the desired range. The starting direction is North, so we'll seed an initial Heading by putting the value 0 in cell E2. (See, I told you row 2 would be useful!) After that, the MOD function can be used to figure out the new Heading by adding the Rotation to the current value: =MOD(E2 + D3, 4). Replicating this formula down gives the following:

A B C D E
1 Data Direction Distance Rotation Heading
2 0
3 R5 R 5 1 1
4 L5 L 5 -1 0
5 R5 R 5 1 1
6 R3 R 3 1 2

Knowing a Heading and a Distance means we can track of the X and Y coordinates as the player moves around the city. If the player is facing East and moves forward, the X coordinate will increase by that Distance; if facing West, it will decrease. Similarly, a move facing North will increase the Y coordinate, while a move South will decrease it. The CHOOSE function allows us to represent this quite easily. The only catch is that CHOOSE indices are 1-based and our Heading values are 0-based, so we need to adjust for that by adding 1. After seeding 0 values for the starting X and Y coordinates in row 2, we can update the X coordinate with =F2 + CHOOSE(E3+1, 0, C3, 0, -C3) and the Y coordinate with =G2 + CHOOSE(E3+1, C3, 0, -C3, 0). Replicating those formulas down leads to:

A B C D E F G
1 Data Direction Distance Rotation Heading X Y
2 0 0 0
3 R5 R 5 1 1 5 0
4 L5 L 5 -1 0 5 5
5 R5 R 5 1 1 10 5
6 R3 R 3 1 2 10 2

Now all that's left to solve the puzzle is to compute the taxicab distance of each X/Y coordinate from the starting point. Well, that's just the magnitude of the X coordinate added to the magnitude of the Y coordinate and that can be expressed using the ABS function: =ABS(F3) + ABS(G3). Replicating that down gives the following, final, table and the answer to the sample puzzle:

A B C D E F G H
1 Data Direction Distance Rotation Heading X Y Taxicab Distance from Start
2 0 0 0
3 R5 R 5 1 1 5 0 5
4 L5 L 5 -1 0 5 5 10
5 R5 R 5 1 1 10 5 15
6 R3 R 3 1 2 10 2 12

Sure enough, the taxicab distance after the final move (cell H6) is 12 - just like the sample said it should be! And we figured that out without writing any "code" at all - we simply came up with few small formulas to methodically break the problem down into little pieces we could solve with a spreadsheet. That may not quite be coding (in the usual sense), but it sure is programming!

PS - Having definitively established that you are 1337 with a spreadsheet, you might be looking for other challenges... Good news: Part 2 of the Advent of Code taxicab problem has been left as an exercise for the reader! :)

Just another rando with a polyfill [math-random-polyfill.js is a browser-based polyfill for JavaScript's Math.random() that tries to make it more random]

JavaScript's Math.random() function has a well-deserved reputation for not generating truly random numbers. (Gasp!) Modern browsers offer a solution with the crypto.getRandomValues() function that new code should be using instead. However, most legacy scripts haven't been - and won't be - updated for the new hotness.

I wanted to improve the behavior of legacy code and looked around for a polyfill of Math.random() that leveraged crypto.getRandomValues() to generate output, but didn't find one. It seemed straightforward to implement, so I created math-random-polyfill.js with the tagline: A browser-based polyfill for JavaScript's Math.random() that tries to make it more random.

You can learn more about math-random-polyfill.js on its GitHub project page which includes the following:

The MDN documentation for Math.random() explicitly warns that return values should not be used for cryptographic purposes. Failing to heed that advice can lead to problems, such as those documented in the article TIFU by using Math.random(). However, there are scenarios - especially involving legacy code - that don't lend themselves to easily replacing Math.random() with crypto.getRandomValues(). For those scenarios, math-random-polyfill.js attempts to provide a more random implementation of Math.random() to mitigate some of its disadvantages.

...

math-random-polyfill.js works by intercepting calls to Math.random() and returning the same 0 <= value < 1 based on random data provided by crypto.getRandomValues(). Values returned by Math.random() should be completely unpredictable and evenly distributed - both of which are true of the random bits returned by crypto.getRandomValues(). The polyfill maps those values into floating point numbers by using the random bits to create integers distributed evenly across the range 0 <= value < Number.MAX_SAFE_INTEGER then dividing by Number.MAX_SAFE_INTEGER + 1. This maintains the greatest amount of randomness and precision during the transfer from the integer domain to the floating point domain.

I've included a set of unit tests meant to detect the kinds of mistakes that would compromise the usefulness of math-random-polyfill.js. The test suite passes on the five (most popular) browsers I tried, which leads me to be cautiously optimistic about the validity and viability of this approach. :)

Sky-Hole Revisited [Pi-Hole in a cloud VM for easy DNS-based ad-blocking]

I wrote about my adventures running a Pi-Hole in the cloud for DNS-based ad-blocking roughly a year ago. In the time since, I've happily used a Sky-Hole for all the devices and traffic at home. When updating my Sky-Hole virtual machine recently, I used a simpler approach than before and wanted to briefly document the new workflow.

For more context on why someone might want to use a DNS-based ad-blocker, please refer to the original post.

Installation

  1. Create an Ubuntu Server virtual machine with your cloud provider of choice (such as Azure or AWS)

    Note: Thanks to improvements by the Pi-Hole team, it's now able to run in the smallest virtual machine size

  2. Connect via SSH and update the package database:

    sudo apt-get update

  3. Install Pi-Hole:

    curl -L https://install.pi-hole.net | bash

    Note: Running scripts directly from the internet is risky, so consider using the alternate install instead

  4. Open the dnsmasq configuration file:

    sudo nano /etc/dnsmasq.d/01-pihole.conf

  5. Turn off logging by commenting-out the corresponding line:

    #log-queries

  6. Open the Pi-Hole configuration file:

    sudo nano /etc/pihole/setupVars.conf

  7. Update it to use an invalid address for blocked domains:

    IPv4_address=0.0.0.0

  8. Re-generate the block list:

    sudo /opt/pihole/gravity.sh

  9. Verify the block list looks reasonable:

    cat /etc/pihole/gravity.list

  10. Verify logging is off:

    cat /var/log/pihole.log

  11. Reboot to ensure everything loads successfully:

    sudo reboot

  12. Grant access to the virtual machine's public IP address by opening the relevant network ports (incoming UDP and TCP on port 53)

Don't forget

If you use a Pi-Hole regularly, please consider donating to the Pi-Hole project so the maintainers can continue developing and improving it.

Binary Log OBjects, gotta download 'em all! [A simple tool to download blobs from an Azure container]

The latest in a series of "I didn't want to write a thing, but couldn't find another thing that already did exactly what I wanted, which is probably because I'm too picky, but whatever" projects, azure-blob-container-download (a.k.a. abcd) is a simple, command-line tool to download all the blobs in an Azure storage container. Here's how it's described in the README:

A simple, cross-platform tool to bulk-download blobs from an Azure storage container.

Though limited in scope, it does a specific set of things vs. the official tools:

The motivation for this project was the same as with my previous post about getting an HTTPS certificate: I've migrated my website from a virtual machine to an Azure Web App. And while it's easy to enable logging for a Web App and get hourly log files in the W3C Extended Log File Format, it wasn't obvious to me how to parse those logs offline to measure traffic, referrers, etc.. (Although that's not something I've bothered with up to now, it's an ability I'd like to have.) What I wanted was a trustworthy, cross-platform tool to download all those log files to a local machine - but the options I investigated each seemed to be missing something.

So I wrote a simple Node.JS CLI and gave it a few extra features to make my life easier. The code is fairly compact and straightforward (and the dependencies minimal), so it's easy to audit. The complete options for downloading and filtering are:

Usage: abcd [options]

Options:
  --account           Storage account (or set AZURE_STORAGE_ACCOUNT)  [string]
  --key               Storage access key (or set AZURE_STORAGE_ACCESS_KEY)  [string]
  --containerPattern  Regular expression filter for container names  [string]
  --blobPattern       Regular expression filter for blob names  [string]
  --startDate         Starting date for blobs  [string]
  --endDate           Ending date for blobs  [string]
  --snapshots         True to include blob snapshots  [boolean]
  --version           Show version number  [boolean]
  --help              Show help  [boolean]

Download blobs from an Azure container.
https://github.com/DavidAnson/azure-blob-container-download

Azure Web Apps create a new log file every hour, so they add up quickly; abcd's date filtering options make it easy to perform incremental downloads. The default directory structure (based on / separators) is collapsed during download, so all files end up in the same directory (named by container) and ordered by date. The tool limits itself to one download at a time, so things proceed at a steady, moderate pace. Once blobs have finished downloading, you're free to do with them as you please. :)

Find out more on the GitHub project page for azure-blob-container-download.

Free as in ... HTTPS certificates? [Obtaining and configuring a free HTTPS certificate for an Azure Web App with a custom domain]

Providing secure access to all Internet content - not just that for banking and buying - is quickly becoming the norm. Although setting up a web site has been fairly easy for years, enabling HTTPS for that site was more challenging. The Let's Encrypt project is trying to improve things for everyone - by making certificates free and easier to use, they enable more sites to offer secure access.

Let's Encrypt is notable for (at least) two achievements. The first is lowering the cost for anyone to obtain a certificate - you can't beat free! The second is simplifying the steps to enable HTTPS on a server. Thus far, Let's Encrypt has focused their efforts on Linux systems, so the process for Windows servers hasn't changed much. Further complicating things, many sites nowadays are hosted by services like Azure or CloudFlare, which makes validating ownership more difficult.

As someone who is in the process of migrating content from a virtual machine with a custom domain to an Azure Web App, I've been looking for an easy way to make use of Let's Encrypt certificates. A bit of searching turned up some helpful resources:

Nothing was exactly what I wanted, so I came up with the following approach based on tweaks to the first two articles above. The Let's Encrypt tool runs on Linux, so I use that platform exclusively. Everything can be done in a terminal window, so it's easily scripted. There is no need to open a firewall or use another machine; everything can be done in one place. And by taking advantage of the nifty ability to boot from a Live CD, the technique is easy to apply even if you don't have a Linux box handy.

  1. Boot an Ubuntu 16.04 Live CD

  2. Run "Software & Updates" and enable the "universe" repository

  3. sudo apt install letsencrypt

  4. sudo apt install git

  5. git config --global user.email "user@example.com"

  6. git config --global user.name "User Name"

  7. git clone https://example.scm.azurewebsites.net:443/Example.git

    • Be sure /.well-known/acme-challenge/web.config exits and is configured to allow extension-less files:

      <configuration>
        <system.webServer>
          <staticContent>
            <mimeMap fileExtension="" mimeType="text/plain"/>
          </staticContent>
        </system.webServer>
      </configuration>
      
  8. sudo letsencrypt certonly --manual --domain example.com --domain www.example.com --email user@example.com --agree-tos --text

    • Note: Include the --test-cert option when trying things out
  9. Repeat for each domain:

    1. nano verification-file and paste the provided content
    2. git add verification-file
    3. git commit -m "Add verification file."
    4. git push
    5. Allow Let's Encrypt to verify ownership by fetching the verification file
  10. sudo openssl pkcs12 -export -inkey /etc/letsencrypt/live/example.com/privkey.pem -in /etc/letsencrypt/live/example.com/fullchain.pem -out fullchain.pfx -passout pass:your-password

  11. Follow the steps to Configure a custom domain name in Azure App Service using fullchain.pfx

  12. Enjoy browsing your site securely!

Respect my securitah! [The check-pages suite now prefers HTTPS and includes a CLI]

There are many best practices to keep in mind when maintaining a web site, so it's helpful to have tools that check for common mistakes. I've previously written about two Node.js packages I created for this purpose, check-pages and grunt-check-pages, both of which can be easily integrated into an automated workflow. I updated them recently and wish to highlight two aspects.

HTTPS

There's a movement underway to make the Internet safer, and one of the best ways is to use the secure HTTPS protocol when browsing the web. Not all sites support HTTPS, but many do, and it's good to link to the secure version of a page when available. The trick is knowing when that's possible - especially for links created long ago or before a site was updated to support HTTPS. That's where the new --preferSecure option comes in: it raises an error whenever a page links to potentially-secure content insecurely. Scanning a site with the --checkLinks/--preferSecure option enabled is now an easy way to identify links that could be updated to provide a safer browsing experience.

Aside: The moarTLS Chrome extension does a similar thing in the browser; check it out!

CLI

check-pages is easy to integrate into an automated workflow, but sometimes it's nice to run one-off tests or experiment interactively with a site's configuration. To that end, I created a simple command-line wrapper that exposes all the check-pages functionality (including --preferSecure) in a way that's easy to use on the platform/shell of your choice. Simply install it via npm, point it at the page(s) of interest, and review the list of possible issues. Here's the output of the --help command:

Usage: check-pages <page URLs> [options]

Checks:
  --checkLinks        Validates each link on a page  [boolean]
  --checkCaching      Validates Cache-Control/ETag  [boolean]
  --checkCompression  Validates Content-Encoding  [boolean]
  --checkXhtml        Validates page structure  [boolean]

checkLinks options:
  --linksToIgnore     List of URLs to ignore  [array]
  --noEmptyFragments  Fails for empty fragments  [boolean]
  --noLocalLinks      Fails for local links  [boolean]
  --noRedirects       Fails for HTTP redirects  [boolean]
  --onlySameDomain    Ignores links to other domains  [boolean]
  --preferSecure      Verifies HTTPS when available  [boolean]
  --queryHashes       Verifies query string file hashes  [boolean]

Options:
  --summary          Summarizes issues after running  [boolean]
  --terse            Results on one line, no progress  [boolean]
  --maxResponseTime  Response timeout (milliseconds)  [number]
  --userAgent        Custom User-Agent header  [string]
  --version          Show version number  [boolean]
  --help             Show help  [boolean]

Checks various aspects of a web page for correctness.
https://github.com/DavidAnson/check-pages-cli

This tool made from 100% recycled electrons [DHCPLite is a small, simple, configuration-free DHCP server for Windows]

Earlier this week, I posted the code for a tool I wrote many years ago:

In 2001, I wrote DHCPLite to unblock development scenarios between Windows and prototype hardware we were developing on. I came up with the simplest DHCP implementation possible and took all the shortcuts I could - but it was enough to get the job done! Since then, I've heard from other teams using DHCPLite for scenarios of their own. And recently, I was asked by some IoT devs to share DHCPLite with that community. So I dug up the code, cleaned it up a bit, got it compiling with the latest toolset, and am sharing the result here.

If that sounds interesting, please visit the DHCPLite project page on GitHub.

Trivia
  • Other than a couple of assumptions that changed in the last 15 years (ex: interface order), the original code worked pretty much as-is.
  • Now that the CRT has sensible string functions, I was able to replace my custom strncpyz helper with strncpy_s and _TRUNCATE.
  • Now that the IP Helper API is broadly available, I was able to statically link to it instead of loading dynamically via classes LibraryLoader and AnsiString.
  • Now that support for the STL is commonplace, I was able to replace the custom GrowableThingCollection implementation with std::vector.
  • I used the "one true brace style" (editors note: hyperbole!) from Code Complete, but it never really caught on and the code now honors Visual Studio defaults.

It's been a while since I wrote native code and revisiting DHCPLite was a fond trip down memory lane. I remain a firm believer in the benefits of a garbage-collected environment like JavaScript or C#/.NET for productivity and correctness - but there's also something very satisfying about writing native code and getting all the tricky little details right.

Or at least thinking you have... :)

Delayed Reaction [My experience converting a jQuery/Knockout.js application to use the React library]

It's important to stay up-to-date with technology trends and popular frameworks. That was part of the reason I wrote this blog using Node.js and it's part of the reason I recently converted a project to use the React library. That project was PassWeb, a simple, secure cloud-based password manager. I wrote PassWeb almost two years ago and use it nearly every day. If you're interested in how it works, please read the introductory blog post about PassWeb. For the purposes of this post, the thing to know is that PassWeb is built on the popular jQuery and Knockout.js frameworks.

To be clear, both frameworks are perfectly good - but switching was a great opportunity to learn about React. :)

Conversion

The original architecture was pretty much what you'd expect: application logic lives in a JavaScript file and the user interface lives in an HTML file. My goal when converting to React was to make as few changes to the logic as possible in order to minimize the risk of introducing behavioral bugs. So I worked in stages:

Having performed the bulk of the migration, all that remained was to identify and fix the handful of bugs that got introduced along the way.

Details

While JSX isn't required to use React, it's a natural fit and I chose JSX so I could get the full React experience. Putting JSX in the browser means using a transpiler to convert the embedded HTML to JavaScript. Babel provides excellent support for this via the React preset and was easy to work with. Because I was now running code through a transpiler, I also enabled the ES2015 Preset which supports newer features of the JavaScript language like let, const, and lambda expressions. I only scratched the surface of ES2015, but it was nice to be able to do so for "free".

One thing I noticed as I migrated more and more code was that much of what I was writing was boilerplate to deal with the propagation of state to and from (observable) properties. I captured this repetitive code in three helper methods and doing so significantly simplified components. (Projects like ReactLink formalize this pattern within the React ecosystem.)

Findings

Something I was curious about was how performance would differ after converting to React. For the most part, things were fast enough before that there was no need to optimize. Except for one scenario: filtering the list of items interactively. So I'd already tuned the Knockout implementation for better performance by toggling the visibility (CSS display:none) of unfiltered items instead of removing and re-adding them to/from the DOM.

When I converted to React, I used the simplest implementation and - unsurprisingly - this scenario performed worse. The first thing I did was implement the shouldComponentUpdate function on the component corresponding to each list item (as recommended by the Advanced Performance section of the docs). React's built-in performance tools are very useful and quickly showed the need for this optimization (as well as confirming the benefits). Two helpful posts that discuss the topic further are Optimizing React Performance using keys, component life cycle, and performance tools and Performance Engineering with React.

Implementing shouldComponentUpdate was a good start, but I had the same basic problem that adding and removing hundreds of elements just wasn't snappy. So I made the same visibility optimization, introducing another component to act as a thin wrapper around the existing one and deal exclusively with visibility. After that, the overall performance of the filter scenario was improved to approximate parity. (Actually, React was still a little slower for the 10,000 item case, but fared better in other areas, and I'm comfortable declaring performance roughly equivalent between the two implementations.)

Other considerations are complexity and size. Two frameworks have been replaced by one, so that's a pretty clear win on the complexity side. Size is a little murky, though. The minified size of the React framework is a little smaller then the combined sizes of jQuery and Knockout. However, the size of the new JSX file is notably larger than the templated HTML it replaces (recall that the code for logic stayed basically the same). And compiling JSX tends to expand the size of the code. But fortunately, Babel lets you minify scripts and that's enough to offset most of the growth. In the end, the React version of PassWeb is slightly smaller than the jQuery/Knockout version - but not enough to be the sole reason to convert.

Conclusion

Now that the dust has settled, would I do it all over again? Definitely! :)

Although there weren't dramatic victories in performance or size, I like the modular approach React encourages and feel it may lead to simpler code. I also like that React combines UI logic and presentation better and allowed me to completely gut the HTML file (which now contains only head and script tags). I also see value in unifying an application's state into one place (formalized by libraries like Redux), though I deliberately didn't explore that here. Most importantly, this was a good learning experience and I really enjoyed getting to know React.

I'll definitely consider React for my next project - maybe even finding an excuse to explore React Native...