The blog of dlaa.me

Posts tagged "Technical"

Without geometry, life is pointless [Another interesting programming puzzle worked through]

I happened across another coding puzzle recently that caught my interest. (If you're curious, here is a link to the puzzle I discussed a few weeks ago about implementing a weird kind of addition.) The problem this time is as follows:

You are given an array of coordinates, coordinates[i] = [x,y], where [x,y] represents the coordinate of a point. Check if these points make a straight line in the XY plane.

I saw an approach to solving this that didn't thrill me, so I wanted to solve it for myself.

If you're going to do the same, stop reading now because there are spoilers ahead!

As an aside, one thing I don't love about this problem is that it requires a certain amount of geometry knowledge to solve and furthermore the solution I use benefits from experience with trigonometric functions. However, one can be a perfectly good programmer without knowing either of these topics and they're not likely to come up in most roles. I try to avoid questions like this when I'm interviewing because they're likely to favor some backgrounds over others. That said, I didn't come up with this scenario and it's still interesting to think about, so let's go.

The first thing is to figure out how to approach the problem. If you remember much about linear equations, you might think to use the equation for a line to solve this: y = mx + b. If the slope, m, and the y-intercept, b, are the same for all pairs of points, then they all lie on the same line. This is helpful, but the equation starts to fall apart for points that lie on the same vertical line (for example, along the y-axis) because the slope there is infinite. Now, maybe you are comfortable dealing with infinity or risking division by zero, but I am not. That makes the vertical line scenario a special case and "special case" is often another phrase for "bug farm".

We don't want to use the slope equation, but we're in luck because there's something similar that's easier to work with: angle. Specifically:

The Math.atan2() function returns the angle in the plane (in radians) between the positive x-axis and the ray from (0,0) to the point (x,y), for Math.atan2(y,x).

That's not a solution as-is, but if we choose two points from the input and subtract the corresponding x and y coordinates, we get coordinates that define a ray in the manner described above. Now, two sets of points on line segments sharing the same angle (as calculated above) will be parallel, but not necessarily collinear. However, if we use the same point to calculate the angle to every other point of the input, that should be sufficient to determine collinearity of the set. In words:

If the angle of the line segment made up by pairing the first point of the input array with every other point of the array is the same, then we know all those rays point in the same direction and all originate at the same point (the first point) and therefore all the points are on the same line!

We're nearly there, but there's one more observation: two rays that go in exactly opposite directions from the same point are also on the same line. For example, the line segment defined by two points on a ray with an angle of 10° to the x-axis are parallel to the segment corresponding to two points on a ray with an angle of 190° to the x-axis. So instead of worrying about all 360°, we can focus on just the first 180°. The modulo operator provides an easy way to map angles between 180° and 360° into the range of 0° to 180°. (Except that the atan2 function returns values in radians instead of degrees because "Why not?" so we're going to be dealing with π.)

With all that behind us, the following algorithm should make sense. There are a few subtleties so it correctly handles empty lists as well as lists with just one point and also lists where the same point appears multiple times. In addition, it bails out as soon as it finds a non-collinear point for efficiency. Here's the code - as is the custom with JavaScript, there's no parameter validation, but this is intended to work for all valid inputs:

function arePointsCollinear(points) {
  let angle = null;
  const first = points[0];
  for (let i = 1; i < points.length; i++) {
    const current = points[i];
    const deltaX = current[0] - first[0];
    const deltaY = current[1] - first[1];
    if (deltaX || deltaY) {
      const theta = (Math.atan2(deltaY, deltaX) + Math.PI) % Math.PI;
      if (angle === null) {
        angle = theta;
      } else if (angle !== theta) {
        return false;
      }
    }
  }
  return true;
}

Interviewing for Fun and Profit [An unreasonably detailed consideration of a practical programming puzzle]

A few days ago, Brent Simmons published a thoughtful blog post about one aspect of programming interviews. As someone who conducts interviews regularly for work, the topic is of interest to me. Rather than summarizing the post, I encourage you to go read Practicing the Coding Challenges now. (You might also enjoy some of the replies to Brent's tweet linking to it.)

After reading the post, I tweeted a response:

I agree with the spirit of this post - that clarity beats premature optimization - but feel the conclusion skips an important point. The "efficient" solution is correct for arbitrarily large numbers while the "clear" one fails after MAX_INT (or similar).

Brent replied:

I thought about discussing that, but felt like I'd be muddying up the post too much.

And I responded:

Understood, thanks! I don't know how it was posed on LeetCode, so that difference may not even be relevant. But if I were discussing this with a candidate during an interview, it's something I'd want them to raise for discussion as an indication they're thinking about edge cases.

And I kept thinking about this problem in the context of a programming interview. When I saw Olof Hellman's well-reasoned follow-up post Practicing the Coding Challenges, I realized I wasn't the only one.

I was curious what the "clear" and "efficient" solutions might look like in practice, so I decided to write them and see. I used JavaScript because it's my "go to" language. Things will look a little different in other languages, but I had to pick one and I chose something popular. I'm not including comments in the code below because they'll distract from it and I'm not validating parameters (for null-ness or range) per typical JavaScript practice.

As a reminder, here's the problem definition:

You need to add two numbers and get a sum. But, importantly, the digits of those numbers are stored in arrays, and they're backwards.

The return value also needs to be in a backwards array.

If inputs are [4,2,6] and [3,8,0,8], the function should return [7,0,7,8] - because 624 + 8083 == 8707.

Let's start with the "clear" approach. The point here is to be obvious and straightforward - like you normally want your code to be. Recall Brent's pseudocode proposal:

let num1 = number(from: array1)
let num2 = number(from: array2)
let sum = num1 + num2
return array(from: sum)

Now, in order to run that, we need to implement those two conversion functions. As John Siracusa pointed out, we can do anything we want inside there and because we want to handle arbitrarily large inputs, we'll use JavaScript's BigInt built-in.

As an aside, BigInt does not have widespread support across browsers. In fact, none of the three JavaScript tools I use on iOS support it and I had to switch to a "real" computer to finish this post.

Okay, here's what I came up with:

function clear(array1, array2) {
  const num1 = toBigInt(array1);
  const num2 = toBigInt(array2);
  const sum = num1 + num2;
  return toArray(sum);
}
function toBigInt(arr) {
  const copy = [...arr];
  copy.reverse();
  const numberAsString = copy.join("");
  return BigInt(numberAsString);
}
function toArray(num) {
  const numString = num.toString();
  const charArray = numString.split("");
  const digitArray = charArray.map(Number);
  digitArray.reverse();
  return digitArray;
}

Because this is the "clear" implementation, each line does a single thing and I've mostly avoided idiomatic JavaScript. You could argue map(Number) is a bit subtle, but it should be familiar to most JavaScript programmers. The BigInt constructor takes a Number or a String, so we need to bounce the input through a String on the way into/out of BigInt. We copy the input array to avoid mutating data we don't own.

Okay, so what does the "efficient" code look like? Here's what I came up with:

function efficient(array1, array2) {
  const length = Math.max(array1.length, array2.length);
  const res = new Array(length);
  let carry = 0;
  for (let i = 0; (i < length) || carry; i++) {
    const sum = (array1[i] || 0) + (array2[i] || 0) + carry;
    carry = (sum >= 10) ? 1 : 0;
    res[i] = sum % 10;
  }
  return res;
}

Because this isn't the "clear" solution, I took some liberties with this implementation in the interest of compactness and efficiency. The output array is pre-allocated (though it may need to grow by one) and there is a single loop over the input with no redundancy and fairly little overhead I see. I'll acknowledge some folks might be uncomfortable with the (deliberate) ignorance of array bounds and that the use of carry in the loop condition may be too clever by half. (It can be removed at the cost of a single additional line of code, but I like how this approach reuses the core logic for the final overflow.) Otherwise, I tried not to be unnecessarily obscure.

Conclusions?

The "clear" code is definitely clearer, but it's not as simple as it originally seemed to be. The need to implement helper functions basically tripled the amount of code that needed to be written. The "efficient" code is smaller and ought to be quite a bit cheaper to run considering the algorithmic efficiency and the fact that it doesn't require JIT-ing BigInt or the need for String parsing. In fact, by eschewing those dependencies, it's able to run in many environments (like iOS) where BigInt doesn't exist - so it's also the more "flexible" implementation!

Of course, there's no "right" answer here - just inspiration and food for thought. My thanks go out to everyone who contributed to the discussion and especially Brent, whose NetNewsWire app is my favorite RSS reader!

Existence is random [JavaScript code to efficiently generate a random (version 4) UUID]

A few months ago, I dashed out a JavaScript function to efficiently generate a random (version 4) UUID per RFC 4122. This was inspired by code I saw for the same purpose that was not especially efficient - or random. (For more on randomness in JavaScript, see my post about polyfilling Math.random().) My UUID generator uses crypto.getRandomValues() for the best randomness and tries to do as little extra work as possible for efficiency.

The generateRandomUUID page on GitHub has the code and a link to the generateRandomUUID sample page that runs in your browser and generates a new UUID every time it's loaded.

Because it's so small, I've included the code here for reference:

// generateRandomUUID.js
// https://github.com/DavidAnson/generateRandomUUID
// 2019-10-27

"use strict";

// JavaScript code to efficiently generate a random UUID per RFC 4122
function generateRandomUUID() {
  // UUIDs have 16 byte values
  const bytes = new Uint8Array(16);
  // Seed bytes with cryptographically random values
  crypto.getRandomValues(bytes);
  // Set required fields for an RFC 4122 random UUID
  bytes[6] = (bytes[6] & 0x0f) | 0x40;
  bytes[8] = (bytes[8] & 0x3f) | 0x80;
  // Convert bytes to hex and format appropriately
  const uuid = Array.prototype.map.call(bytes, (b, i) => {
    // Left-pad single-character values with 0,
    // Convert to hexadecimal,
    // Add dashes
    return ((b < 16) ? "0" : "") +
      b.toString(16) +
      (((i % 2) && (i < 10) && (i > 2)) ? "-" : "");
  }).join("");
  // Return the string
  return uuid;
}

A picture is worth a thousand words [A small script to update contact photos on iOS]

I'm always looking for new ways to develop code. My latest adventure was writing a script to update contact photos on iOS for a prettier experience in the Messages app conversation list. The script is called update-ios-contact-images.js and it runs in Scriptable, a great app for interacting with the iOS platform from JavaScript. The idea is:

There are various conditions where Apple iOS can't (or won't) synchronize Contact photos between iPhone/iPad devices. If you're in this situation and want it to "just work", you can configure each device manually. Or you can run this script to do that for you.

update-ios-contact-images.js takes a list of email addresses and optional image links and sets the photo for matching contacts in your address book. If an image link is provided, it's used as-is; if not, the Gravatar for that email address is used instead.

You can find more in the update-ios-contact-images.js repository on Github, but the code is short enough that I've included it below.

Notes

  • As I said on Twitter, this was the first meaningful programming project I did completely on iPad (and iPhone). Research, prototyping, coding, debugging, documentation, and posting to GitHub were all done on an iOS device (using a Bluetooth keyboard at times). The overall workflow has some rough edges, but many of the pieces are there today to do real-world development tasks.
  • I'm accustomed to using JavaScript Promises directly, but took this opportunity to try out async and await. The latter are definitely easier to use - and probably easier to understand (though the syntax error JavaScriptCore gives for using await outside an async function is not obvious to me). However, the lack of support for "parallelism" by async/await means you still need to know about Promises and be comfortable using helpers like Promise.all (so I wonder how much of a leaky abstraction this ends up being).
    • Yes, I know JavaScript is technically single-threaded in this context; that's why I put the word "parallelism" in quotes above. :)

Code

// update-ios-contact-images
// A Scriptable (https://scriptable.app/) script to update contact photos on iOS
// Takes a list of email accounts and image URLs and assigns each image to the corresponding contact
// https://github.com/DavidAnson/update-ios-contact-images

// List of email accounts and images to update
const accounts = [
  {
    email: "test1@example.com"
    // No "image" property; uses Gravatar
  },
  {
    email: "test2@example.com",
    image: "https://example.com/images/test2.jpg"
  }
];

// MD5 hash for Gravatar (see https://github.com/blueimp/JavaScript-MD5 and https://cdnjs.com/libraries/blueimp-md5)
eval(await (new Request("https://cdnjs.cloudflare.com/ajax/libs/blueimp-md5/2.10.0/js/md5.min.js")).loadString());

// Load all address book contacts
const contacts = await Contact.all(await ContactsContainer.all());

// For all accounts...
await Promise.all(accounts.map(account => {
  // Normalize email address
  const emailLower = account.email.trim().toLowerCase();
  // For all contacts with that email...
  return Promise.all(contacts.
    filter(contact => contact.emailAddresses.some(address => address.value.toLowerCase() === emailLower)).
    map(async contact => {
      // Use specified image or fallback to Gravatar (see https://en.gravatar.com/site/implement/images/)
      const url = account.image || `https://www.gravatar.com/avatar/${md5(emailLower)}`;
      // Load image from web
      contact.image = await (new Request(url).loadImage());
      // Update contact
      Contact.update(contact);
      console.log(`Updated contact image for "${emailLower}" to "${url}"`);
    }));
}));

// Save changes
Contact.persistChanges();
console.log("Saved changes");

Looking for greener pastures [A Practical Comparison of Mastodon and Micro.blog]

I've been looking into Twitter alternatives Mastodon and Micro.blog recently. I couldn't find a good comparison of the two services, so I created one and put it on GitHub to quickly iterate on any feedback. That's happened, so I'm posting the comparison here where it's easier to find. Enjoy!

A Practical Comparison of Mastodon and Micro.blog

Many of us are looking at Twitter alternatives and there are two services that stand out: Micro.blog and Mastodon.

These services take different approaches, so choosing one is challenging. This page highlights some of the differences and is meant for non-nerds who don't want to get bogged down by implementation details. Every attempt has been made to be accurate, but some technical details are deliberately glossed over.

For more about similar services, see the Comparison of microblogging services on Wikipedia

Mastodon Micro.blog
Web site https://joinmastodon.org/ https://micro.blog/
Sales pitch "Social networking, back in your hands. Follow friends and discover new ones. Publish anything you want: links, pictures, text, video. All on a platform that is community-owned and ad-free." "A network of independent microblogs. Short posts like tweets but on your own web site that you control. Micro.blog is a safe community for microblogs. A timeline to follow friends and discover new posts. Blog hosting built on open standards."
Best price Free Free, but requires a separate blog for posting
Actual price Free $5 per month, no blog needed
Harassment and abuse https://blog.joinmastodon.org/2018/07/cage-the-mastodon/ https://help.micro.blog/2018/twitter-differences/
Code of conduct Depends on the server (Example) https://help.micro.blog/2017/community-guidelines/
Privacy policy Depends on the server https://help.micro.blog/2018/privacy-policy/
Hashtags in posts Yes No
Replies handled differently No Yes
Able to export content Yes Yes
Cross-posting to Twitter No Yes, with a $2/month or $5/month subscription
Import Twitter friends Yes No
Official iOS or Android app No iOS only
Official Mac or Windows app No Mac only
Third-party clients Yes Yes
Security User name + password (2FA is optional) Email address only

Deliberately omitted from above: User counts, open source status, federation details

I'm not part of either project, so there may be mistakes in the table above. That's why this is on GitHub - please open an issue or send a pull request to correct any problems you find. If you are adding content, please do so for both platforms and link to your sources.

For other questions or to start a discussion, contact me on:

Convert *all* the things [PowerShell script to convert RAW and HEIC photos to JPEG]

Like most people, I take lots of photos. Like many people, I save them in the highest-quality format (often RAW). Like some people, I edit those pictures on a desktop computer.

Support for RAW images has gotten better over the years, but there are still many tools and programs that do not support these bespoke formats. So it's handy to have a quick and easy way to convert such photos into a widely-supported format like JPEG. There are many tools to do so, but it's hard to beat a command line script for simplicity and ease of use.

I didn't know of one that met my criteria, so I wrote a PowerShell script:

ConvertTo-Jpeg - A PowerShell script that converts RAW (and other) image files to the widely-supported JPEG format

Notes:

  • This script uses the Windows.Graphics.Imaging API to decode and encode. That API supports a variety of file formats and when new formats are added to the list, they are automatically recognized by the script. Because the underlying implementation is maintained by the Windows team, it is fast and secure.
    • As it happens, support for a new format showed up in the days since I wrote this script: Windows 10's April 2018 Update added support for HEIC/HEIF images such as those created by iPhones running iOS 11.
  • The Windows.Graphics.Imaging API is intended for use by Universal Windows Platform (UWP) applications, but I am using it from PowerShell. This is unfortunately harder than it should be, but allowed me to release a single script file which anybody can read and audit.
    • Transparency is not a goal for everyone, but it's important to me - especially in today's environment where malware is so prevalent. I don't trust random code on the Internet, so I prefer to use - and create - open implementations when possible.
  • The choice of PowerShell had some drawbacks. For one, it is not a language I work with often, so I spent more time looking things up than I normally do. For another, it's clear that interoperating with UWP APIs is not a core scenario for PowerShell. In particular, calling asynchronous methods is tricky, and I did a lot of searching before I found a solution I liked: Using WinRT's IAsyncOperation in PowerShell.
  • There are some obvious improvements that could be made, but I deliberately started simple and will add features if/when the need arises.

It depends... [A look at the footprint of popular Node.js command-line parsing packages]

In a recent discussion of the Node.js ecosystem, I opined that packages with a large number of dependencies contribute to excessive disk space use by apps that reference them.

But I didn't have data to back that claim up, so I made some measurements to find out. Command-line argument parsing is a common need and there are a variety of packages to make it easier. I found nine of the most popular and installed each into a new, blank project as a standard dependency item in package.json via npm install. Then I counted the number of direct dependencies for that package, the total (transitive) number of packages that end up being installed, and the size (in bytes) of disk space consumed (on Windows). I tabulated the results below and follow with a few observations.

Important: I made no attempt to assess the quality or usefulness of these packages. They are all popular and each offers a different approach to the problem. Some are feature-rich, while others offer a simple API. I am not promoting or critiquing any of them; rather, I am using the aggregate as a source of data.

Package Popularity Direct Dependencies Transitive Dependencies Size on Disk
argparse 494 1 2 152,661
commander 18865 0 1 48,328
command-line-args 677 3 5 237,789
dashdash 156 1 2 94,377
meow 2344 10 43 455,525
minimatch 2335 1 4 57,803
minimist 8490 0 1 31,151
nomnom 549 2 6 119,237
yargs 7516 12 44 576,724

These metrics were captured on 2017-11-25 and may have changed by the time you read this.

Notes:

  • The two most popular packages are the smallest on disk and have no dependencies; the third and fourth most popular are the biggest and have the most dependencies.
  • Packages with fewer dependencies tend to have the smallest size; those with the most dependencies have the largest.
  • The difference between the extremes of direct dependency count is about 10x.
  • The difference between extremes for transitive dependency count is about 40x.
  • The difference between disk space extremes is about 20x.

While this was a simple experiment that doesn't represent the whole Node ecosystem, it seems reasonable to conclude that:

Similar packages can exhibit differences of an order of magnitude (or more) in dependency count and size. If that matters for your scenario, measure before you choose!

For my part, I tend to resist taking on additional dependencies when possible and prefer using dependencies that adhere to the same principle. Reinventing the wheel is wasteful, of course - but sometimes less is more and it's good to keep complexity to a minimum.

Back to backup [Revisiting and refreshing my approach to backups]

A topic that comes up from time to time is how people deal with backing up their data. It's one of those times again, so I'm sharing my technique for the benefit of anyone who's interested. My previous post on backup strategy was written over 11 years ago (!), and it's surprising how much is still relevant. This post expands on the original and includes my latest practices. Of course, we all have different priorities, restrictions, and aversion to data loss, so what I describe here doesn't apply universally. Just take whatever's relevant and ignore the rest. :)

Considerations

Dependability - Hardware eventually fails. Things with moving parts tend to die sooner, but even solid-state drives have problems in the long run. Sometimes, the risk of failure is compounded because a storage device is so tightly integrated with the computer that it can become unusable due to failure of an unrelated component. And even if hardware doesn't break on its own, events like a lightning strike can take things out unexpectedly.

Accidental deletion or overwrite - As careful as one might be, every now and then it's possible to accidentally delete an important file. Or maybe just open it, make some inadvertent edits, and absentmindedly save them. This is a scenario many backup strategies don't account for; if an unwanted change to the original data is immediately replicated to a backup, it can be difficult to recover from the mistake.

Undetected corruption - A random hardware or software failure (or power loss) can invisibly result in a garbled file, directory, or disk. The challenge is that corruption can go undetected for a long time until something happens to need the relevant bits on disk. It's easy to propagate corruption to a backup copy when you don't realize it's present.

Local disaster - Fire and tornadoes are rare, but can destroy everything in the house. Even with a perfect a backup on a spare drive, if that drive was in the same place at the time of the disaster, the original and backup are both lost. Odds of avoiding the problem are improved slightly by keeping drives in different rooms, but a sufficiently serious calamity will not be deterred.

Large-scale disaster - The likelihood is quite low, but if a flood (as a timely example) comes through the area, it won't matter how many backups are stored around the house because all of them will be underwater. The only protection is to store things offsite, preferably far away.

Security/privacy - Most people won't be the target of industrial espionage, but a nosy house guest can be just as problematic - especially so because they have prolonged access to your data. If thieves steal a computer, they can read its drives. Bank records, tax documents, source code, etc., can all be compromised if precautions haven't been taken to encrypt the original and all backups.

Bandwidth - Many people use services that store data in the cloud. With a fast-enough connection to the Internet, this works well. But typical Internet plans have limited upstream bandwidth, meaning it can take hours to upload a single video. Things catch up over the course of a few days, but the latest data can be lost if service is abruptly cut off.

Validation - In order to be sure backups are successful, it's necessary to test them - regularly. Otherwise, you might not be as well prepared as you thought. Some backup approaches are easy to verify, others not so much.

Ease of recovery - In the event of a complete restore from backup, things have probably gone very wrong and any additional hardship is of little concern. That said, having immediate access to all data is a nice perk.

Cross-platform support - Whether you prefer Windows, Mac, or Linux, you'll probably stay with your OS of choice after catastrophes strikes. But it's nice to have a backup strategy where data can be recovered from different platform. If your threat model includes a global virus that wipes out a particular operating system, cross-platform support is important.

Implementation

All drives storing data I care about are encrypted with BitLocker. This mitigates the risk of theft/tampering and means I can leave backup drives anywhere. Each backup drive is a 1 or 2 TB 2.5" external USB drive. These devices are nice because they don't need a separate power supply and they're inherently portable and resilient - especially when stored inside a waterproof Ziploc bag.

At the end of each day, I mirror the latest changes from the primary drive in my computer to a backup drive sitting next to it. This step protects against hardware failures (high risk) and this is the drive I'll take if I need to leave home in a hurry.

Every couple of months, I calculate the checksum of every file and save them to the drive. Then I mirror to the backup, calculate checksums for everything on the backup, and make sure all the checksums match. This makes sure every bit on both drives is correctly written and readable. Any files that weren't changed help detect bit rot because those checksums are stable. Once verified, I swap the backup drive with another just like it at a nearby location (ex: friend's house, safe deposit box, etc.). This protects against theft or fire (lower risk).

Once a year, I swap the latest backup drive with another drive in a different geographic region (ex: relative's house). This protects against large-scale disaster like a flood (much lower risk), so the fact that it can be up to a year out of date is acceptable.

Because I've been doing this for a while, I also have a couple of spare drives that don't get updated regularly. These provide access to even older backups and can be used to recover outdated versions of a file in the event of persistent, undetected corruption or significant operator error.

Conclusion

This system addresses the above considerations in a way that works well for me. Extra redundancy provides peace of mind and requires very little effort most of the time. Cost is low and every step is under my control, so I don't need to worry about recurring fees or vendors going out of business. Recovery is simple and my family knows how to access backups if I'm not available. Remote backup drives consume no power, can live in the back of a drawer, and can be lost or destroyed without concern.

Nota bene

When I did the math on storage requirements a decade ago, I was concerned about capacity needs outpacing storage advances. But that didn't happen. In practice, the price of a drive with room for everything has stayed around (or below) $100. I buy a new drive every couple of years, meaning the amortized cost is minimal.

Tags: Technical

Hour of No Code [Solving Day 1 Part 1 of Advent of Code 2016 without writing any code]

Advent of Code is a cool idea! Author Eric Wastl describes it as "a series of small programming puzzles for a variety of skill levels". I thought I might work through the first few days with non-programmer children, so I read the description of Day 1: No Time for a Taxicab:

[...] start at the given coordinates [...] and face North. Then, follow the provided sequence: either turn left (L) or right (R) 90 degrees, then walk forward the given number of blocks [...]

Following simple, mechanical steps seemed like a fine introduction to programming for young people, so I signed in and requested the data for Part 1 of the puzzle.

Oh...

You see, while the examples Eric provides are quite simple, the actual input is over 150 steps and some of the distances are three-digit numbers. So while this puzzle is perfect for a small programming challenge, it'd be too tedious to work through manually.

Unless...

What if we could solve the programming puzzle without any programming?

What if we could solve it with something familiar to non-developers?

What if we could solve it with a basic spreadsheet???

Hmmm...

Let's try!

We'll be using Google Sheets because it's free, easy to use, and convenient. (But any decent spreadsheet will do.) And we'll be working with the sample data (not the actual puzzle data) so as not to give too much away. Keep reading to see all the steps - or follow along in a spreadsheet if you want to get your hands dirty.

Good, let's start with the sample data:

R5, L5, R5, R3 leaves you 12 blocks away.

To understand why and familiarize yourself with the puzzle, read about taxicab geometry and work through the steps on paper first. It's important to realize that R and L steps do not necessarily alternate and that the direction at the end of each step depends on the direction at the beginning of the step. However, given the state after N steps, the state after step N+1 is completely defined by that direction and distance.

Spreadsheets typically go "down", so let's set ours up like that by adding the steps to a Data column (leaving row 2 blank for reasons that will be clear later):

A
1 Data
2
3 R5
4 L5
5 R5
6 R3

Each step contains two pieces of information: the Direction to turn and the Distance to move. Let's start by pulling them apart so we can deal with them separately. The LEFT and RIGHT functions are our friends here (with a little help from LEN). The Direction is simply the first character of the data, or =LEFT(A3, 1) (assuming we're looking at row 3, the first value). And the Distance is just the rest of the string, or =RIGHT(A3, LEN(A3)-1).

Type those formulas into cells B3 and C3 and replicate the formulas down to row 6. (Your spreadsheet program should automatically update the formulas for rows 4-6 to be relative to their respective cells.) Done correctly, that gives the following:

A B C
1 Data Direction Distance
2
3 R5 R 5
4 L5 L 5
5 R5 R 5
6 R3 R 3

We'd like to turn the Direction letter into a number so we can do math on it - the IF function works nicely for that. By using -1 to represent a left turn and 1 to represent a right turn, use the formula =IF(B3="L", -1, 1) to create a Rotation field for row 3 (and then replicate the formula down):

A B C D
1 Data Direction Distance Rotation
2
3 R5 R 5 1
4 L5 L 5 -1
5 R5 R 5 1
6 R3 R 3 1

Now let's use Rotation to create a Heading that tracks where the player is pointing after each move. The four possible values are North, East, South, and West - it's convenient to use the numbers 0, 1, 2, and 3 respectively. Because we're adding and subtracting numbers to determine the Heading, it's possible to get values outside that range (ex: 3 + 1 = 4). Just like with a 12-hour clock (11am + 2 hours = 1pm), we'll use modular arithmetic to stay within the desired range. The starting direction is North, so we'll seed an initial Heading by putting the value 0 in cell E2. (See, I told you row 2 would be useful!) After that, the MOD function can be used to figure out the new Heading by adding the Rotation to the current value: =MOD(E2 + D3, 4). Replicating this formula down gives the following:

A B C D E
1 Data Direction Distance Rotation Heading
2 0
3 R5 R 5 1 1
4 L5 L 5 -1 0
5 R5 R 5 1 1
6 R3 R 3 1 2

Knowing a Heading and a Distance means we can track of the X and Y coordinates as the player moves around the city. If the player is facing East and moves forward, the X coordinate will increase by that Distance; if facing West, it will decrease. Similarly, a move facing North will increase the Y coordinate, while a move South will decrease it. The CHOOSE function allows us to represent this quite easily. The only catch is that CHOOSE indices are 1-based and our Heading values are 0-based, so we need to adjust for that by adding 1. After seeding 0 values for the starting X and Y coordinates in row 2, we can update the X coordinate with =F2 + CHOOSE(E3+1, 0, C3, 0, -C3) and the Y coordinate with =G2 + CHOOSE(E3+1, C3, 0, -C3, 0). Replicating those formulas down leads to:

A B C D E F G
1 Data Direction Distance Rotation Heading X Y
2 0 0 0
3 R5 R 5 1 1 5 0
4 L5 L 5 -1 0 5 5
5 R5 R 5 1 1 10 5
6 R3 R 3 1 2 10 2

Now all that's left to solve the puzzle is to compute the taxicab distance of each X/Y coordinate from the starting point. Well, that's just the magnitude of the X coordinate added to the magnitude of the Y coordinate and that can be expressed using the ABS function: =ABS(F3) + ABS(G3). Replicating that down gives the following, final, table and the answer to the sample puzzle:

A B C D E F G H
1 Data Direction Distance Rotation Heading X Y Taxicab Distance from Start
2 0 0 0
3 R5 R 5 1 1 5 0 5
4 L5 L 5 -1 0 5 5 10
5 R5 R 5 1 1 10 5 15
6 R3 R 3 1 2 10 2 12

Sure enough, the taxicab distance after the final move (cell H6) is 12 - just like the sample said it should be! And we figured that out without writing any "code" at all - we simply came up with few small formulas to methodically break the problem down into little pieces we could solve with a spreadsheet. That may not quite be coding (in the usual sense), but it sure is programming!

PS - Having definitively established that you are 1337 with a spreadsheet, you might be looking for other challenges... Good news: Part 2 of the Advent of Code taxicab problem has been left as an exercise for the reader! :)

Just another rando with a polyfill [math-random-polyfill.js is a browser-based polyfill for JavaScript's Math.random() that tries to make it more random]

JavaScript's Math.random() function has a well-deserved reputation for not generating truly random numbers. (Gasp!) Modern browsers offer a solution with the crypto.getRandomValues() function that new code should be using instead. However, most legacy scripts haven't been - and won't be - updated for the new hotness.

I wanted to improve the behavior of legacy code and looked around for a polyfill of Math.random() that leveraged crypto.getRandomValues() to generate output, but didn't find one. It seemed straightforward to implement, so I created math-random-polyfill.js with the tagline: A browser-based polyfill for JavaScript's Math.random() that tries to make it more random.

You can learn more about math-random-polyfill.js on its GitHub project page which includes the following:

The MDN documentation for Math.random() explicitly warns that return values should not be used for cryptographic purposes. Failing to heed that advice can lead to problems, such as those documented in the article TIFU by using Math.random(). However, there are scenarios - especially involving legacy code - that don't lend themselves to easily replacing Math.random() with crypto.getRandomValues(). For those scenarios, math-random-polyfill.js attempts to provide a more random implementation of Math.random() to mitigate some of its disadvantages.

...

math-random-polyfill.js works by intercepting calls to Math.random() and returning the same 0 <= value < 1 based on random data provided by crypto.getRandomValues(). Values returned by Math.random() should be completely unpredictable and evenly distributed - both of which are true of the random bits returned by crypto.getRandomValues(). The polyfill maps those values into floating point numbers by using the random bits to create integers distributed evenly across the range 0 <= value < Number.MAX_SAFE_INTEGER then dividing by Number.MAX_SAFE_INTEGER + 1. This maintains the greatest amount of randomness and precision during the transfer from the integer domain to the floating point domain.

I've included a set of unit tests meant to detect the kinds of mistakes that would compromise the usefulness of math-random-polyfill.js. The test suite passes on the five (most popular) browsers I tried, which leads me to be cautiously optimistic about the validity and viability of this approach. :)