# The blog of dlaa.me

## Interviewing for Fun and Profit [An unreasonably detailed consideration of a practical programming puzzle]

A few days ago, Brent Simmons published a thoughtful blog post about one aspect of programming interviews. As someone who conducts interviews regularly for work, the topic is of interest to me. Rather than summarizing the post, I encourage you to go read Practicing the Coding Challenges now. (You might also enjoy some of the replies to Brent's tweet linking to it.)

After reading the post, I tweeted a response:

I agree with the spirit of this post - that clarity beats premature optimization - but feel the conclusion skips an important point. The "efficient" solution is correct for arbitrarily large numbers while the "clear" one fails after MAX_INT (or similar).

I thought about discussing that, but felt like I'd be muddying up the post too much.

And I responded:

Understood, thanks! I don't know how it was posed on LeetCode, so that difference may not even be relevant. But if I were discussing this with a candidate during an interview, it's something I'd want them to raise for discussion as an indication they're thinking about edge cases.

And I kept thinking about this problem in the context of a programming interview. When I saw Olof Hellman's well-reasoned follow-up post Practicing the Coding Challenges, I realized I wasn't the only one.

I was curious what the "clear" and "efficient" solutions might look like in practice, so I decided to write them and see. I used JavaScript because it's my "go to" language. Things will look a little different in other languages, but I had to pick one and I chose something popular. I'm not including comments in the code below because they'll distract from it and I'm not validating parameters (for `null`-ness or range) per typical JavaScript practice.

As a reminder, here's the problem definition:

You need to add two numbers and get a sum. But, importantly, the digits of those numbers are stored in arrays, and they're backwards.

The return value also needs to be in a backwards array.

If inputs are [4,2,6] and [3,8,0,8], the function should return [7,0,7,8] - because 624 + 8083 == 8707.

Let's start with the "clear" approach. The point here is to be obvious and straightforward - like you normally want your code to be. Recall Brent's pseudocode proposal:

``````let num1 = number(from: array1)
let num2 = number(from: array2)
let sum = num1 + num2
return array(from: sum)
``````

Now, in order to run that, we need to implement those two conversion functions. As John Siracusa pointed out, we can do anything we want inside there and because we want to handle arbitrarily large inputs, we'll use JavaScript's `BigInt` built-in.

As an aside, `BigInt` does not have widespread support across browsers. In fact, none of the three JavaScript tools I use on iOS support it and I had to switch to a "real" computer to finish this post.

Okay, here's what I came up with:

``````function clear(array1, array2) {
const num1 = toBigInt(array1);
const num2 = toBigInt(array2);
const sum = num1 + num2;
}
function toBigInt(arr) {
const copy = [...arr];
copy.reverse();
const numberAsString = copy.join("");
return BigInt(numberAsString);
}
function toArray(num) {
const numString = num.toString();
const charArray = numString.split("");
const digitArray = charArray.map(Number);
digitArray.reverse();
return digitArray;
}
``````

Because this is the "clear" implementation, each line does a single thing and I've mostly avoided idiomatic JavaScript. You could argue `map(Number)` is a bit subtle, but it should be familiar to most JavaScript programmers. The `BigInt` constructor takes a `Number` or a `String`, so we need to bounce the input through a `String` on the way into/out of `BigInt`. We copy the input array to avoid mutating data we don't own.

Okay, so what does the "efficient" code look like? Here's what I came up with:

``````function efficient(array1, array2) {
const length = Math.max(array1.length, array2.length);
const res = new Array(length);
let carry = 0;
for (let i = 0; (i < length) || carry; i++) {
const sum = (array1[i] || 0) + (array2[i] || 0) + carry;
carry = (sum >= 10) ? 1 : 0;
res[i] = sum % 10;
}
return res;
}
``````

Because this isn't the "clear" solution, I took some liberties with this implementation in the interest of compactness and efficiency. The output array is pre-allocated (though it may need to grow by one) and there is a single loop over the input with no redundancy and fairly little overhead I see. I'll acknowledge some folks might be uncomfortable with the (deliberate) ignorance of array bounds and that the use of `carry` in the loop condition may be too clever by half. (It can be removed at the cost of a single additional line of code, but I like how this approach reuses the core logic for the final overflow.) Otherwise, I tried not to be unnecessarily obscure.

Conclusions?

The "clear" code is definitely clearer, but it's not as simple as it originally seemed to be. The need to implement helper functions basically tripled the amount of code that needed to be written. The "efficient" code is smaller and ought to be quite a bit cheaper to run considering the algorithmic efficiency and the fact that it doesn't require JIT-ing `BigInt` or the need for `String` parsing. In fact, by eschewing those dependencies, it's able to run in many environments (like iOS) where `BigInt` doesn't exist - so it's also the more "flexible" implementation!

Of course, there's no "right" answer here - just inspiration and food for thought. My thanks go out to everyone who contributed to the discussion and especially Brent, whose NetNewsWire app is my favorite RSS reader!

## Existence is random [JavaScript code to efficiently generate a random (version 4) UUID]

A few months ago, I dashed out a JavaScript function to efficiently generate a random (version 4) UUID per RFC 4122. This was inspired by code I saw for the same purpose that was not especially efficient - or random. (For more on randomness in JavaScript, see my post about polyfilling `Math.random()`.) My UUID generator uses crypto.getRandomValues() for the best randomness and tries to do as little extra work as possible for efficiency.

The generateRandomUUID page on GitHub has the code and a link to the generateRandomUUID sample page that runs in your browser and generates a new UUID every time it's loaded.

Because it's so small, I've included the code here for reference:

``````// generateRandomUUID.js
// https://github.com/DavidAnson/generateRandomUUID
// 2019-10-27

"use strict";

// JavaScript code to efficiently generate a random UUID per RFC 4122
function generateRandomUUID() {
// UUIDs have 16 byte values
const bytes = new Uint8Array(16);
// Seed bytes with cryptographically random values
crypto.getRandomValues(bytes);
bytes[6] = (bytes[6] & 0x0f) | 0x40;
bytes[8] = (bytes[8] & 0x3f) | 0x80;
// Convert bytes to hex and format appropriately
const uuid = Array.prototype.map.call(bytes, (b, i) => {
// Left-pad single-character values with 0,
return ((b < 16) ? "0" : "") +
b.toString(16) +
(((i % 2) && (i < 10) && (i > 2)) ? "-" : "");
}).join("");
// Return the string
return uuid;
}
``````

## Let's go to the video tape! [tape-player is a simple, terse, in-process reporter for the tape test harness for Node.js]

I've been a happy user of the `nodeunit` test harness for a long time, but it was deprecated a few years ago. Recently, I went looking for a similar Node.js test harness to replace it. I prefer small, simple packages and settled on the `tape` test harness. I enjoy nearly everything about it, but didn't like having to pipe output to a formatter (more on this below). So I wrote a quick bit of code to create an in-process reporter. Then I realized what I'd done could have broader applicability (in my own projects, if nowhere else!) and published a reusable package after adding scenario tests to ensure formatted output for all of the the `tape` primitives is reasonable.

If this seems interesting, the README goes into more detail:

The Test Anything Protocol (TAP) used by many test harnesses is versatile, but it's not much to look at - or rather, it's too much to look at. There are many custom formatters that work with the `tape` test harness, but most work by piping process output. This is a useful technique, but interferes with the exit status of the test harness which is a problem in scripts that are meant to fail when tests fail (like `npm test`). (Though there are workarounds for this, they are shell- and platform-specific.)

Fortunately, `tape` offers an alternative logging mechanism via its `createStream` API. This technique is easy to use and runs in-process so it doesn't interfere with the exit status of the test harness. `tape-player` takes advantage of this to produce a concise test log that's easy to enable.

You can find directions to install and enable `tape-player` on the GitHub project for `tape-player`.

## Good, cheap, fast: you only get one and you don't get to pick [CoLR, the Camera of Last Resort]

One way I learn about new technologies is by using them. Sometimes I have a compelling scenario that yearns for a creative solution. Other times I don't. This was one of those times... So I created what's probably the worst camera app ever as a way of experimenting with four new (to me) technologies.

• CSS Grid Layout - A modern layout system based on grid cells with the great working reference A Complete Guide to Grid
• MediaDevices.getUserMedia() API - The part of the WebRTC Media Streams API that allows websites to access the device camera for video and image capture
• Verdict: A somewhat cumbersome API that's nonetheless quite powerful
• Preact - Exactly what it says on the box, a "Fast 3kB alternative to React with the same modern API"
• Verdict: Works like a charm, easy for someone with React.js experience
• Dexie.js - More truth in advertising, a "Minimalistic Wrapper for IndexedDB", where IndexedDB is "a JavaScript-based object-oriented database" that runs in the browser
• Verdict: Easy to use, even for someone with no prior IndexedDB experience

If you want to learn more, please visit the CoLR project on GitHub. There, you'll find complete source code, instructions, links to resources, and some FAQ's.

If you just want to see how bad a camera app can be (and your device has a camera and your browser supports ECMAScript 2015), please open the CoLR web app in your browser.

## Oops, I did it again [A rewrite of my website/blog platform - now public, open-source, and on GitHub as simple-website-with-blog]

Almost 5 years ago, I moved my blog (and website) from a hosted environment to a Node.js implementation of my own creation. At the time, I was new to Node and this was a great way to learn. That code has served me well, but over time I've found a few things I wanted to change - in part due to the rapid growth of the Node platform. Some of the changes were foundational, so I chose to do a rewrite instead of multiple revisions.

The rewrite ended up taking about twice as long as I anticipated, but I'm glad I did it. The new architecture is similar to what it was before - and the user interface almost identical - but there are some nice improvements and modernizations under the hood. Of note, the rendering was separated so it's easy to customize (there are samples for a text blog and a photo blog; the unit tests are implemented as a blog, too), search is much more powerful (with inclusion, exclusion, partial matches, etc.), and the code is simplified by the removal of some undue complexity (mostly features I thought were neat but that added little). Finally, because the new project was written to be reused for other purposes and by other people, I'm able to share it.

For context, the project goals are:

• An easy way to create a simple, secure website with a blog
• Support for text-based and photo-based blog formats
• Easy authoring in HTML, Markdown (with code formatting), or JSON
• Ordering of posts by publish date or content date
• Easy customization of site layout and formatting
• High resolution (2x) support for photo blog images
• Support for Windows and Linux hosting with Node.js
• Simple post format that separates content and metadata
• Ability to author hidden posts and schedule a publish date
• Ability to create posts that never show up in the timeline
• Support for archive links and tagging of posts by category
• Quick search of post content, including simple search queries
• Automatic cross-linking of related posts
• No JavaScript requirement for client browsers

To see it in action, just browse this blog and site. :)

## A picture is worth a thousand words [A small script to update contact photos on iOS]

I'm always looking for new ways to develop code. My latest adventure was writing a script to update contact photos on iOS for a prettier experience in the Messages app conversation list. The script is called `update-ios-contact-images.js` and it runs in Scriptable, a great app for interacting with the iOS platform from JavaScript. The idea is:

There are various conditions where Apple iOS can't (or won't) synchronize Contact photos between iPhone/iPad devices. If you're in this situation and want it to "just work", you can configure each device manually. Or you can run this script to do that for you.

`update-ios-contact-images.js` takes a list of email addresses and optional image links and sets the photo for matching contacts in your address book. If an image link is provided, it's used as-is; if not, the Gravatar for that email address is used instead.

You can find more in the update-ios-contact-images.js repository on Github, but the code is short enough that I've included it below.

Notes

• As I said on Twitter, this was the first meaningful programming project I did completely on iPad (and iPhone). Research, prototyping, coding, debugging, documentation, and posting to GitHub were all done on an iOS device (using a Bluetooth keyboard at times). The overall workflow has some rough edges, but many of the pieces are there today to do real-world development tasks.
• I'm accustomed to using JavaScript Promises directly, but took this opportunity to try out async and await. The latter are definitely easier to use - and probably easier to understand (though the syntax error JavaScriptCore gives for using `await` outside an `async` function is not obvious to me). However, the lack of support for "parallelism" by `async`/`await` means you still need to know about Promises and be comfortable using helpers like Promise.all (so I wonder how much of a leaky abstraction this ends up being).
• Yes, I know JavaScript is technically single-threaded in this context; that's why I put the word "parallelism" in quotes above. :)

Code

``````// update-ios-contact-images
// A Scriptable (https://scriptable.app/) script to update contact photos on iOS
// Takes a list of email accounts and image URLs and assigns each image to the corresponding contact
// https://github.com/DavidAnson/update-ios-contact-images

// List of email accounts and images to update
const accounts = [
{
email: "test1@example.com"
// No "image" property; uses Gravatar
},
{
email: "test2@example.com",
image: "https://example.com/images/test2.jpg"
}
];

// MD5 hash for Gravatar (see https://github.com/blueimp/JavaScript-MD5 and https://cdnjs.com/libraries/blueimp-md5)

const contacts = await Contact.all(await ContactsContainer.all());

// For all accounts...
await Promise.all(accounts.map(account => {
const emailLower = account.email.trim().toLowerCase();
// For all contacts with that email...
return Promise.all(contacts.
map(async contact => {
// Use specified image or fallback to Gravatar (see https://en.gravatar.com/site/implement/images/)
const url = account.image || `https://www.gravatar.com/avatar/\${md5(emailLower)}`;
// Update contact
Contact.update(contact);
console.log(`Updated contact image for "\${emailLower}" to "\${url}"`);
}));
}));

// Save changes
Contact.persistChanges();
console.log("Saved changes");
``````

## Looking for greener pastures [A Practical Comparison of Mastodon and Micro.blog]

I've been looking into Twitter alternatives Mastodon and Micro.blog recently. I couldn't find a good comparison of the two services, so I created one and put it on GitHub to quickly iterate on any feedback. That's happened, so I'm posting the comparison here where it's easier to find. Enjoy!

### A Practical Comparison of Mastodon and Micro.blog

Many of us are looking at Twitter alternatives and there are two services that stand out: Micro.blog and Mastodon.

These services take different approaches, so choosing one is challenging. This page highlights some of the differences and is meant for non-nerds who don't want to get bogged down by implementation details. Every attempt has been made to be accurate, but some technical details are deliberately glossed over.

For more about similar services, see the Comparison of microblogging services on Wikipedia

Mastodon Micro.blog
Web site https://joinmastodon.org/ https://micro.blog/
Sales pitch "Social networking, back in your hands. Follow friends and discover new ones. Publish anything you want: links, pictures, text, video. All on a platform that is community-owned and ad-free." "A network of independent microblogs. Short posts like tweets but on your own web site that you control. Micro.blog is a safe community for microblogs. A timeline to follow friends and discover new posts. Blog hosting built on open standards."
Best price Free Free, but requires a separate blog for posting
Actual price Free \$5 per month, no blog needed
Code of conduct Depends on the server (Example) https://help.micro.blog/2017/community-guidelines/
Hashtags in posts Yes No
Replies handled differently No Yes
Able to export content Yes Yes
Cross-posting to Twitter No Yes, with a \$2/month or \$5/month subscription
Official iOS or Android app No iOS only
Official Mac or Windows app No Mac only
Third-party clients Yes Yes

Deliberately omitted from above: User counts, open source status, federation details

I'm not part of either project, so there may be mistakes in the table above. That's why this is on GitHub - please open an issue or send a pull request to correct any problems you find. If you are adding content, please do so for both platforms and link to your sources.

For other questions or to start a discussion, contact me on:

## Convert *all* the things [PowerShell script to convert RAW and HEIC photos to JPEG]

Like most people, I take lots of photos. Like many people, I save them in the highest-quality format (often RAW). Like some people, I edit those pictures on a desktop computer.

Support for RAW images has gotten better over the years, but there are still many tools and programs that do not support these bespoke formats. So it's handy to have a quick and easy way to convert such photos into a widely-supported format like JPEG. There are many tools to do so, but it's hard to beat a command line script for simplicity and ease of use.

I didn't know of one that met my criteria, so I wrote a PowerShell script:

ConvertTo-Jpeg - A PowerShell script that converts RAW (and other) image files to the widely-supported JPEG format

Notes:

• This script uses the Windows.Graphics.Imaging API to decode and encode. That API supports a variety of file formats and when new formats are added to the list, they are automatically recognized by the script. Because the underlying implementation is maintained by the Windows team, it is fast and secure.
• As it happens, support for a new format showed up in the days since I wrote this script: Windows 10's April 2018 Update added support for HEIC/HEIF images such as those created by iPhones running iOS 11.
• The Windows.Graphics.Imaging API is intended for use by Universal Windows Platform (UWP) applications, but I am using it from PowerShell. This is unfortunately harder than it should be, but allowed me to release a single script file which anybody can read and audit.
• Transparency is not a goal for everyone, but it's important to me - especially in today's environment where malware is so prevalent. I don't trust random code on the Internet, so I prefer to use - and create - open implementations when possible.
• The choice of PowerShell had some drawbacks. For one, it is not a language I work with often, so I spent more time looking things up than I normally do. For another, it's clear that interoperating with UWP APIs is not a core scenario for PowerShell. In particular, calling asynchronous methods is tricky, and I did a lot of searching before I found a solution I liked: Using WinRT's IAsyncOperation in PowerShell.
• There are some obvious improvements that could be made, but I deliberately started simple and will add features if/when the need arises.

## It depends... [A look at the footprint of popular Node.js command-line parsing packages]

In a recent discussion of the Node.js ecosystem, I opined that packages with a large number of dependencies contribute to excessive disk space use by apps that reference them.

But I didn't have data to back that claim up, so I made some measurements to find out. Command-line argument parsing is a common need and there are a variety of packages to make it easier. I found nine of the most popular and installed each into a new, blank project as a standard `dependency` item in `package.json` via `npm install`. Then I counted the number of direct dependencies for that package, the total (transitive) number of packages that end up being installed, and the size (in bytes) of disk space consumed (on Windows). I tabulated the results below and follow with a few observations.

Important: I made no attempt to assess the quality or usefulness of these packages. They are all popular and each offers a different approach to the problem. Some are feature-rich, while others offer a simple API. I am not promoting or critiquing any of them; rather, I am using the aggregate as a source of data.

Package Popularity Direct Dependencies Transitive Dependencies Size on Disk
argparse 494 1 2 152,661
commander 18865 0 1 48,328
command-line-args 677 3 5 237,789
dashdash 156 1 2 94,377
meow 2344 10 43 455,525
minimatch 2335 1 4 57,803
minimist 8490 0 1 31,151
nomnom 549 2 6 119,237
yargs 7516 12 44 576,724

These metrics were captured on 2017-11-25 and may have changed by the time you read this.

Notes:

• The two most popular packages are the smallest on disk and have no dependencies; the third and fourth most popular are the biggest and have the most dependencies.
• Packages with fewer dependencies tend to have the smallest size; those with the most dependencies have the largest.
• The difference between the extremes of direct dependency count is about 10x.
• The difference between extremes for transitive dependency count is about 40x.
• The difference between disk space extremes is about 20x.

While this was a simple experiment that doesn't represent the whole Node ecosystem, it seems reasonable to conclude that:

Similar packages can exhibit differences of an order of magnitude (or more) in dependency count and size. If that matters for your scenario, measure before you choose!

For my part, I tend to resist taking on additional dependencies when possible and prefer using dependencies that adhere to the same principle. Reinventing the wheel is wasteful, of course - but sometimes less is more and it's good to keep complexity to a minimum.

## Back to backup [Revisiting and refreshing my approach to backups]

A topic that comes up from time to time is how people deal with backing up their data. It's one of those times again, so I'm sharing my technique for the benefit of anyone who's interested. My previous post on backup strategy was written over 11 years ago (!), and it's surprising how much is still relevant. This post expands on the original and includes my latest practices. Of course, we all have different priorities, restrictions, and aversion to data loss, so what I describe here doesn't apply universally. Just take whatever's relevant and ignore the rest. :)

##### Considerations

Dependability - Hardware eventually fails. Things with moving parts tend to die sooner, but even solid-state drives have problems in the long run. Sometimes, the risk of failure is compounded because a storage device is so tightly integrated with the computer that it can become unusable due to failure of an unrelated component. And even if hardware doesn't break on its own, events like a lightning strike can take things out unexpectedly.

Accidental deletion or overwrite - As careful as one might be, every now and then it's possible to accidentally delete an important file. Or maybe just open it, make some inadvertent edits, and absentmindedly save them. This is a scenario many backup strategies don't account for; if an unwanted change to the original data is immediately replicated to a backup, it can be difficult to recover from the mistake.

Undetected corruption - A random hardware or software failure (or power loss) can invisibly result in a garbled file, directory, or disk. The challenge is that corruption can go undetected for a long time until something happens to need the relevant bits on disk. It's easy to propagate corruption to a backup copy when you don't realize it's present.

Local disaster - Fire and tornadoes are rare, but can destroy everything in the house. Even with a perfect a backup on a spare drive, if that drive was in the same place at the time of the disaster, the original and backup are both lost. Odds of avoiding the problem are improved slightly by keeping drives in different rooms, but a sufficiently serious calamity will not be deterred.

Large-scale disaster - The likelihood is quite low, but if a flood (as a timely example) comes through the area, it won't matter how many backups are stored around the house because all of them will be underwater. The only protection is to store things offsite, preferably far away.

Security/privacy - Most people won't be the target of industrial espionage, but a nosy house guest can be just as problematic - especially so because they have prolonged access to your data. If thieves steal a computer, they can read its drives. Bank records, tax documents, source code, etc., can all be compromised if precautions haven't been taken to encrypt the original and all backups.

Bandwidth - Many people use services that store data in the cloud. With a fast-enough connection to the Internet, this works well. But typical Internet plans have limited upstream bandwidth, meaning it can take hours to upload a single video. Things catch up over the course of a few days, but the latest data can be lost if service is abruptly cut off.

Validation - In order to be sure backups are successful, it's necessary to test them - regularly. Otherwise, you might not be as well prepared as you thought. Some backup approaches are easy to verify, others not so much.

Ease of recovery - In the event of a complete restore from backup, things have probably gone very wrong and any additional hardship is of little concern. That said, having immediate access to all data is a nice perk.

Cross-platform support - Whether you prefer Windows, Mac, or Linux, you'll probably stay with your OS of choice after catastrophes strikes. But it's nice to have a backup strategy where data can be recovered from different platform. If your threat model includes a global virus that wipes out a particular operating system, cross-platform support is important.

##### Implementation

All drives storing data I care about are encrypted with BitLocker. This mitigates the risk of theft/tampering and means I can leave backup drives anywhere. Each backup drive is a 1 or 2 TB 2.5" external USB drive. These devices are nice because they don't need a separate power supply and they're inherently portable and resilient - especially when stored inside a waterproof Ziploc bag.

At the end of each day, I mirror the latest changes from the primary drive in my computer to a backup drive sitting next to it. This step protects against hardware failures (high risk) and this is the drive I'll take if I need to leave home in a hurry.

Every couple of months, I calculate the checksum of every file and save them to the drive. Then I mirror to the backup, calculate checksums for everything on the backup, and make sure all the checksums match. This makes sure every bit on both drives is correctly written and readable. Any files that weren't changed help detect bit rot because those checksums are stable. Once verified, I swap the backup drive with another just like it at a nearby location (ex: friend's house, safe deposit box, etc.). This protects against theft or fire (lower risk).

Once a year, I swap the latest backup drive with another drive in a different geographic region (ex: relative's house). This protects against large-scale disaster like a flood (much lower risk), so the fact that it can be up to a year out of date is acceptable.

Because I've been doing this for a while, I also have a couple of spare drives that don't get updated regularly. These provide access to even older backups and can be used to recover outdated versions of a file in the event of persistent, undetected corruption or significant operator error.

##### Conclusion

This system addresses the above considerations in a way that works well for me. Extra redundancy provides peace of mind and requires very little effort most of the time. Cost is low and every step is under my control, so I don't need to worry about recurring fees or vendors going out of business. Recovery is simple and my family knows how to access backups if I'm not available. Remote backup drives consume no power, can live in the back of a drawer, and can be lost or destroyed without concern.

###### Nota bene

When I did the math on storage requirements a decade ago, I was concerned about capacity needs outpacing storage advances. But that didn't happen. In practice, the price of a drive with room for everything has stayed around (or below) \$100. I buy a new drive every couple of years, meaning the amortized cost is minimal.

Tags: Technical