Posts tagged "Technical"

A picture is worth a thousand words [A small script to update contact photos on iOS]

Monday, November 5, 2018

I'm always looking for new ways to develop code. My latest adventure was writing a script to update contact photos on iOS for a prettier experience in the Messages app conversation list. The script is called update-ios-contact-images.js and it runs in Scriptable, a great app for interacting with the iOS platform from JavaScript. The idea is:

There are various conditions where Apple iOS can't (or won't) synchronize Contact photos between iPhone/iPad devices. If you're in this situation and want it to "just work", you can configure each device manually. Or you can run this script to do that for you.

update-ios-contact-images.js takes a list of email addresses and optional image links and sets the photo for matching contacts in your address book. If an image link is provided, it's used as-is; if not, the Gravatar for that email address is used instead.

You can find more in the update-ios-contact-images.js repository on Github, but the code is short enough that I've included it below.

Notes

As I said on Twitter, this was the first meaningful programming project I did completely on iPad (and iPhone). Research, prototyping, coding, debugging, documentation, and posting to GitHub were all done on an iOS device (using a Bluetooth keyboard at times). The overall workflow has some rough edges, but many of the pieces are there today to do real-world development tasks.
I'm accustomed to using JavaScript Promises directly, but took this opportunity to try out async and await. The latter are definitely easier to use - and probably easier to understand (though the syntax error JavaScriptCore gives for using await outside an async function is not obvious to me). However, the lack of support for "parallelism" by async/await means you still need to know about Promises and be comfortable using helpers like Promise.all (so I wonder how much of a leaky abstraction this ends up being).
- Yes, I know JavaScript is technically single-threaded in this context; that's why I put the word "parallelism" in quotes above. :)

Code

// update-ios-contact-images
// A Scriptable (https://scriptable.app/) script to update contact photos on iOS
// Takes a list of email accounts and image URLs and assigns each image to the corresponding contact
// https://github.com/DavidAnson/update-ios-contact-images

// List of email accounts and images to update
const accounts = [
  {
    email: "test1@example.com"
    // No "image" property; uses Gravatar
  },
  {
    email: "test2@example.com",
    image: "https://example.com/images/test2.jpg"
  }
];

// MD5 hash for Gravatar (see https://github.com/blueimp/JavaScript-MD5 and https://cdnjs.com/libraries/blueimp-md5)
eval(await (new Request("https://cdnjs.cloudflare.com/ajax/libs/blueimp-md5/2.10.0/js/md5.min.js")).loadString());

// Load all address book contacts
const contacts = await Contact.all(await ContactsContainer.all());

// For all accounts...
await Promise.all(accounts.map(account => {
  // Normalize email address
  const emailLower = account.email.trim().toLowerCase();
  // For all contacts with that email...
  return Promise.all(contacts.
    filter(contact => contact.emailAddresses.some(address => address.value.toLowerCase() === emailLower)).
    map(async contact => {
      // Use specified image or fallback to Gravatar (see https://en.gravatar.com/site/implement/images/)
      const url = account.image || `https://www.gravatar.com/avatar/${md5(emailLower)}`;
      // Load image from web
      contact.image = await (new Request(url).loadImage());
      // Update contact
      Contact.update(contact);
      console.log(`Updated contact image for "${emailLower}" to "${url}"`);
    }));
}));

// Save changes
Contact.persistChanges();
console.log("Saved changes");

Tags: Technical Utilities

Looking for greener pastures [A Practical Comparison of Mastodon and Micro.blog]

Tuesday, September 4, 2018

I've been looking into Twitter alternatives Mastodon and Micro.blog recently. I couldn't find a good comparison of the two services, so I created one and put it on GitHub to quickly iterate on any feedback. That's happened, so I'm posting the comparison here where it's easier to find. Enjoy!

A Practical Comparison of Mastodon and Micro.blog

Many of us are looking at Twitter alternatives and there are two services that stand out: Micro.blog and Mastodon.

These services take different approaches, so choosing one is challenging. This page highlights some of the differences and is meant for non-nerds who don't want to get bogged down by implementation details. Every attempt has been made to be accurate, but some technical details are deliberately glossed over.

For more about similar services, see the Comparison of microblogging services on Wikipedia

	Mastodon	Micro.blog
Web site	https://joinmastodon.org/	https://micro.blog/
Sales pitch	"Social networking, back in your hands. Follow friends and discover new ones. Publish anything you want: links, pictures, text, video. All on a platform that is community-owned and ad-free."	"A network of independent microblogs. Short posts like tweets but on your own web site that you control. Micro.blog is a safe community for microblogs. A timeline to follow friends and discover new posts. Blog hosting built on open standards."
Best price	Free	Free, but requires a separate blog for posting
Actual price	Free	$5 per month, no blog needed
Harassment and abuse	https://blog.joinmastodon.org/2018/07/cage-the-mastodon/	https://help.micro.blog/2018/twitter-differences/
Code of conduct	Depends on the server (Example)	https://help.micro.blog/2017/community-guidelines/
Privacy policy	Depends on the server	https://help.micro.blog/2018/privacy-policy/
Hashtags in posts	Yes	No
Replies handled differently	No	Yes
Able to export content	Yes	Yes
Cross-posting to Twitter	No	Yes, with a $2/month or $5/month subscription
Import Twitter friends	Yes	No
Official iOS or Android app	No	iOS only
Official Mac or Windows app	No	Mac only
Third-party clients	Yes	Yes
Security	User name + password (2FA is optional)	Email address only

Deliberately omitted from above: User counts, open source status, federation details

I'm not part of either project, so there may be mistakes in the table above. That's why this is on GitHub - please open an issue or send a pull request to correct any problems you find. If you are adding content, please do so for both platforms and link to your sources.

For other questions or to start a discussion, contact me on:

Tags: Miscellaneous Technical

Convert all the things [PowerShell script to convert RAW and HEIC photos to JPEG]

Monday, May 7, 2018

Like most people, I take lots of photos. Like many people, I save them in the highest-quality format (often RAW). Like some people, I edit those pictures on a desktop computer.

Support for RAW images has gotten better over the years, but there are still many tools and programs that do not support these bespoke formats. So it's handy to have a quick and easy way to convert such photos into a widely-supported format like JPEG. There are many tools to do so, but it's hard to beat a command line script for simplicity and ease of use.

I didn't know of one that met my criteria, so I wrote a PowerShell script:

ConvertTo-Jpeg - A PowerShell script that converts RAW (and other) image files to the widely-supported JPEG format

Notes:

This script uses the Windows.Graphics.Imaging API to decode and encode. That API supports a variety of file formats and when new formats are added to the list, they are automatically recognized by the script. Because the underlying implementation is maintained by the Windows team, it is fast and secure.
- As it happens, support for a new format showed up in the days since I wrote this script: Windows 10's April 2018 Update added support for HEIC/HEIF images such as those created by iPhones running iOS 11.
The Windows.Graphics.Imaging API is intended for use by Universal Windows Platform (UWP) applications, but I am using it from PowerShell. This is unfortunately harder than it should be, but allowed me to release a single script file which anybody can read and audit.
- Transparency is not a goal for everyone, but it's important to me - especially in today's environment where malware is so prevalent. I don't trust random code on the Internet, so I prefer to use - and create - open implementations when possible.
The choice of PowerShell had some drawbacks. For one, it is not a language I work with often, so I spent more time looking things up than I normally do. For another, it's clear that interoperating with UWP APIs is not a core scenario for PowerShell. In particular, calling asynchronous methods is tricky, and I did a lot of searching before I found a solution I liked: Using WinRT's IAsyncOperation in PowerShell.
There are some obvious improvements that could be made, but I deliberately started simple and will add features if/when the need arises.

Tags: Technical Utilities

It depends... [A look at the footprint of popular Node.js command-line parsing packages]

Tuesday, November 28, 2017

In a recent discussion of the Node.js ecosystem, I opined that packages with a large number of dependencies contribute to excessive disk space use by apps that reference them.

But I didn't have data to back that claim up, so I made some measurements to find out. Command-line argument parsing is a common need and there are a variety of packages to make it easier. I found nine of the most popular and installed each into a new, blank project as a standard dependency item in package.json via npm install. Then I counted the number of direct dependencies for that package, the total (transitive) number of packages that end up being installed, and the size (in bytes) of disk space consumed (on Windows). I tabulated the results below and follow with a few observations.

Important: I made no attempt to assess the quality or usefulness of these packages. They are all popular and each offers a different approach to the problem. Some are feature-rich, while others offer a simple API. I am not promoting or critiquing any of them; rather, I am using the aggregate as a source of data.

Package	Popularity	Direct Dependencies	Transitive Dependencies	Size on Disk
argparse	494	1	2	152,661
commander	18865	0	1	48,328
command-line-args	677	3	5	237,789
dashdash	156	1	2	94,377
meow	2344	10	43	455,525
minimatch	2335	1	4	57,803
minimist	8490	0	1	31,151
nomnom	549	2	6	119,237
yargs	7516	12	44	576,724

These metrics were captured on 2017-11-25 and may have changed by the time you read this.

Notes:

The two most popular packages are the smallest on disk and have no dependencies; the third and fourth most popular are the biggest and have the most dependencies.
Packages with fewer dependencies tend to have the smallest size; those with the most dependencies have the largest.
The difference between the extremes of direct dependency count is about 10x.
The difference between extremes for transitive dependency count is about 40x.
The difference between disk space extremes is about 20x.

While this was a simple experiment that doesn't represent the whole Node ecosystem, it seems reasonable to conclude that:

Similar packages can exhibit differences of an order of magnitude (or more) in dependency count and size. If that matters for your scenario, measure before you choose!

For my part, I tend to resist taking on additional dependencies when possible and prefer using dependencies that adhere to the same principle. Reinventing the wheel is wasteful, of course - but sometimes less is more and it's good to keep complexity to a minimum.

Tags: Node.js Technical

Back to backup [Revisiting and refreshing my approach to backups]

Thursday, September 7, 2017

A topic that comes up from time to time is how people deal with backing up their data. It's one of those times again, so I'm sharing my technique for the benefit of anyone who's interested. My previous post on backup strategy was written over 11 years ago (!), and it's surprising how much is still relevant. This post expands on the original and includes my latest practices. Of course, we all have different priorities, restrictions, and aversion to data loss, so what I describe here doesn't apply universally. Just take whatever's relevant and ignore the rest. :)

Considerations

Dependability - Hardware eventually fails. Things with moving parts tend to die sooner, but even solid-state drives have problems in the long run. Sometimes, the risk of failure is compounded because a storage device is so tightly integrated with the computer that it can become unusable due to failure of an unrelated component. And even if hardware doesn't break on its own, events like a lightning strike can take things out unexpectedly.

Accidental deletion or overwrite - As careful as one might be, every now and then it's possible to accidentally delete an important file. Or maybe just open it, make some inadvertent edits, and absentmindedly save them. This is a scenario many backup strategies don't account for; if an unwanted change to the original data is immediately replicated to a backup, it can be difficult to recover from the mistake.

Undetected corruption - A random hardware or software failure (or power loss) can invisibly result in a garbled file, directory, or disk. The challenge is that corruption can go undetected for a long time until something happens to need the relevant bits on disk. It's easy to propagate corruption to a backup copy when you don't realize it's present.

Local disaster - Fire and tornadoes are rare, but can destroy everything in the house. Even with a perfect a backup on a spare drive, if that drive was in the same place at the time of the disaster, the original and backup are both lost. Odds of avoiding the problem are improved slightly by keeping drives in different rooms, but a sufficiently serious calamity will not be deterred.

Large-scale disaster - The likelihood is quite low, but if a flood (as a timely example) comes through the area, it won't matter how many backups are stored around the house because all of them will be underwater. The only protection is to store things offsite, preferably far away.

Security/privacy - Most people won't be the target of industrial espionage, but a nosy house guest can be just as problematic - especially so because they have prolonged access to your data. If thieves steal a computer, they can read its drives. Bank records, tax documents, source code, etc., can all be compromised if precautions haven't been taken to encrypt the original and all backups.

Bandwidth - Many people use services that store data in the cloud. With a fast-enough connection to the Internet, this works well. But typical Internet plans have limited upstream bandwidth, meaning it can take hours to upload a single video. Things catch up over the course of a few days, but the latest data can be lost if service is abruptly cut off.

Validation - In order to be sure backups are successful, it's necessary to test them - regularly. Otherwise, you might not be as well prepared as you thought. Some backup approaches are easy to verify, others not so much.

Ease of recovery - In the event of a complete restore from backup, things have probably gone very wrong and any additional hardship is of little concern. That said, having immediate access to all data is a nice perk.

Cross-platform support - Whether you prefer Windows, Mac, or Linux, you'll probably stay with your OS of choice after catastrophes strikes. But it's nice to have a backup strategy where data can be recovered from different platform. If your threat model includes a global virus that wipes out a particular operating system, cross-platform support is important.

Implementation

All drives storing data I care about are encrypted with BitLocker. This mitigates the risk of theft/tampering and means I can leave backup drives anywhere. Each backup drive is a 1 or 2 TB 2.5" external USB drive. These devices are nice because they don't need a separate power supply and they're inherently portable and resilient - especially when stored inside a waterproof Ziploc bag.

At the end of each day, I mirror the latest changes from the primary drive in my computer to a backup drive sitting next to it. This step protects against hardware failures (high risk) and this is the drive I'll take if I need to leave home in a hurry.

Every couple of months, I calculate the checksum of every file and save them to the drive. Then I mirror to the backup, calculate checksums for everything on the backup, and make sure all the checksums match. This makes sure every bit on both drives is correctly written and readable. Any files that weren't changed help detect bit rot because those checksums are stable. Once verified, I swap the backup drive with another just like it at a nearby location (ex: friend's house, safe deposit box, etc.). This protects against theft or fire (lower risk).

Once a year, I swap the latest backup drive with another drive in a different geographic region (ex: relative's house). This protects against large-scale disaster like a flood (much lower risk), so the fact that it can be up to a year out of date is acceptable.

Because I've been doing this for a while, I also have a couple of spare drives that don't get updated regularly. These provide access to even older backups and can be used to recover outdated versions of a file in the event of persistent, undetected corruption or significant operator error.

Conclusion

This system addresses the above considerations in a way that works well for me. Extra redundancy provides peace of mind and requires very little effort most of the time. Cost is low and every step is under my control, so I don't need to worry about recurring fees or vendors going out of business. Recovery is simple and my family knows how to access backups if I'm not available. Remote backup drives consume no power, can live in the back of a drawer, and can be lost or destroyed without concern.

Nota bene

When I did the math on storage requirements a decade ago, I was concerned about capacity needs outpacing storage advances. But that didn't happen. In practice, the price of a drive with room for everything has stayed around (or below) $100. I buy a new drive every couple of years, meaning the amortized cost is minimal.

Tags: Technical

Hour of No Code [Solving Day 1 Part 1 of Advent of Code 2016 without writing any code]

Thursday, December 15, 2016

Advent of Code is a cool idea! Author Eric Wastl describes it as "a series of small programming puzzles for a variety of skill levels". I thought I might work through the first few days with non-programmer children, so I read the description of Day 1: No Time for a Taxicab:

[...] start at the given coordinates [...] and face North. Then, follow the provided sequence: either turn left (L) or right (R) 90 degrees, then walk forward the given number of blocks [...]

Following simple, mechanical steps seemed like a fine introduction to programming for young people, so I signed in and requested the data for Part 1 of the puzzle.

Oh...

You see, while the examples Eric provides are quite simple, the actual input is over 150 steps and some of the distances are three-digit numbers. So while this puzzle is perfect for a small programming challenge, it'd be too tedious to work through manually.

Unless...

What if we could solve the programming puzzle without any programming?

What if we could solve it with something familiar to non-developers?

What if we could solve it with a basic spreadsheet???

Hmmm...

Let's try!

We'll be using Google Sheets because it's free, easy to use, and convenient. (But any decent spreadsheet will do.) And we'll be working with the sample data (not the actual puzzle data) so as not to give too much away. Keep reading to see all the steps - or follow along in a spreadsheet if you want to get your hands dirty.

Good, let's start with the sample data:

R5, L5, R5, R3 leaves you 12 blocks away.

To understand why and familiarize yourself with the puzzle, read about taxicab geometry and work through the steps on paper first. It's important to realize that R and L steps do not necessarily alternate and that the direction at the end of each step depends on the direction at the beginning of the step. However, given the state after N steps, the state after step N+1 is completely defined by that direction and distance.

Spreadsheets typically go "down", so let's set ours up like that by adding the steps to a Data column (leaving row 2 blank for reasons that will be clear later):

	A
1	Data
2
3	R5
4	L5
5	R5
6	R3

Each step contains two pieces of information: the Direction to turn and the Distance to move. Let's start by pulling them apart so we can deal with them separately. The LEFT and RIGHT functions are our friends here (with a little help from LEN). The Direction is simply the first character of the data, or =LEFT(A3, 1) (assuming we're looking at row 3, the first value). And the Distance is just the rest of the string, or =RIGHT(A3, LEN(A3)-1).

Type those formulas into cells B3 and C3 and replicate the formulas down to row 6. (Your spreadsheet program should automatically update the formulas for rows 4-6 to be relative to their respective cells.) Done correctly, that gives the following:

	A	B	C
1	Data	Direction	Distance
2
3	R5	R	5
4	L5	L	5
5	R5	R	5
6	R3	R	3

We'd like to turn the Direction letter into a number so we can do math on it - the IF function works nicely for that. By using -1 to represent a left turn and 1 to represent a right turn, use the formula =IF(B3="L", -1, 1) to create a Rotation field for row 3 (and then replicate the formula down):

	A	B	C	D
1	Data	Direction	Distance	Rotation
2
3	R5	R	5	1
4	L5	L	5	-1
5	R5	R	5	1
6	R3	R	3	1

Now let's use Rotation to create a Heading that tracks where the player is pointing after each move. The four possible values are North, East, South, and West - it's convenient to use the numbers 0, 1, 2, and 3 respectively. Because we're adding and subtracting numbers to determine the Heading, it's possible to get values outside that range (ex: 3 + 1 = 4). Just like with a 12-hour clock (11am + 2 hours = 1pm), we'll use modular arithmetic to stay within the desired range. The starting direction is North, so we'll seed an initial Heading by putting the value 0 in cell E2. (See, I told you row 2 would be useful!) After that, the MOD function can be used to figure out the new Heading by adding the Rotation to the current value: =MOD(E2 + D3, 4). Replicating this formula down gives the following:

	A	B	C	D	E
1	Data	Direction	Distance	Rotation	Heading
2					0
3	R5	R	5	1	1
4	L5	L	5	-1	0
5	R5	R	5	1	1
6	R3	R	3	1	2

Knowing a Heading and a Distance means we can track of the X and Y coordinates as the player moves around the city. If the player is facing East and moves forward, the X coordinate will increase by that Distance; if facing West, it will decrease. Similarly, a move facing North will increase the Y coordinate, while a move South will decrease it. The CHOOSE function allows us to represent this quite easily. The only catch is that CHOOSE indices are 1-based and our Heading values are 0-based, so we need to adjust for that by adding 1. After seeding 0 values for the starting X and Y coordinates in row 2, we can update the X coordinate with =F2 + CHOOSE(E3+1, 0, C3, 0, -C3) and the Y coordinate with =G2 + CHOOSE(E3+1, C3, 0, -C3, 0). Replicating those formulas down leads to:

	A	B	C	D	E	F	G
1	Data	Direction	Distance	Rotation	Heading	X	Y
2					0	0	0
3	R5	R	5	1	1	5	0
4	L5	L	5	-1	0	5	5
5	R5	R	5	1	1	10	5
6	R3	R	3	1	2	10	2

Now all that's left to solve the puzzle is to compute the taxicab distance of each X/Y coordinate from the starting point. Well, that's just the magnitude of the X coordinate added to the magnitude of the Y coordinate and that can be expressed using the ABS function: =ABS(F3) + ABS(G3). Replicating that down gives the following, final, table and the answer to the sample puzzle:

	A	B	C	D	E	F	G	H
1	Data	Direction	Distance	Rotation	Heading	X	Y	Taxicab Distance from Start
2					0	0	0
3	R5	R	5	1	1	5	0	5
4	L5	L	5	-1	0	5	5	10
5	R5	R	5	1	1	10	5	15
6	R3	R	3	1	2	10	2	12

Sure enough, the taxicab distance after the final move (cell H6) is 12 - just like the sample said it should be! And we figured that out without writing any "code" at all - we simply came up with few small formulas to methodically break the problem down into little pieces we could solve with a spreadsheet. That may not quite be coding (in the usual sense), but it sure is programming!

PS - Having definitively established that you are 1337 with a spreadsheet, you might be looking for other challenges... Good news: Part 2 of the Advent of Code taxicab problem has been left as an exercise for the reader! :)

Tags: Miscellaneous Technical

Just another rando with a polyfill [math-random-polyfill.js is a browser-based polyfill for JavaScript's Math.random() that tries to make it more random]

Thursday, December 8, 2016

JavaScript's Math.random() function has a well-deserved reputation for not generating truly random numbers. (Gasp!) Modern browsers offer a solution with the crypto.getRandomValues() function that new code should be using instead. However, most legacy scripts haven't been - and won't be - updated for the new hotness.

I wanted to improve the behavior of legacy code and looked around for a polyfill of Math.random() that leveraged crypto.getRandomValues() to generate output, but didn't find one. It seemed straightforward to implement, so I created math-random-polyfill.js with the tagline: A browser-based polyfill for JavaScript's Math.random() that tries to make it more random.

You can learn more about math-random-polyfill.js on its GitHub project page which includes the following:

The MDN documentation for Math.random() explicitly warns that return values should not be used for cryptographic purposes. Failing to heed that advice can lead to problems, such as those documented in the article TIFU by using Math.random(). However, there are scenarios - especially involving legacy code - that don't lend themselves to easily replacing Math.random() with crypto.getRandomValues(). For those scenarios, math-random-polyfill.js attempts to provide a more random implementation of Math.random() to mitigate some of its disadvantages.

...

math-random-polyfill.js works by intercepting calls to Math.random() and returning the same 0 <= value < 1 based on random data provided by crypto.getRandomValues(). Values returned by Math.random() should be completely unpredictable and evenly distributed - both of which are true of the random bits returned by crypto.getRandomValues(). The polyfill maps those values into floating point numbers by using the random bits to create integers distributed evenly across the range 0 <= value < Number.MAX_SAFE_INTEGER then dividing by Number.MAX_SAFE_INTEGER + 1. This maintains the greatest amount of randomness and precision during the transfer from the integer domain to the floating point domain.

I've included a set of unit tests meant to detect the kinds of mistakes that would compromise the usefulness of math-random-polyfill.js. The test suite passes on the five (most popular) browsers I tried, which leads me to be cautiously optimistic about the validity and viability of this approach. :)

Tags: Technical Web

Sky-Hole Revisited [Pi-Hole in a cloud VM for easy DNS-based ad-blocking]

Monday, November 21, 2016

I wrote about my adventures running a Pi-Hole in the cloud for DNS-based ad-blocking roughly a year ago. In the time since, I've happily used a Sky-Hole for all the devices and traffic at home. When updating my Sky-Hole virtual machine recently, I used a simpler approach than before and wanted to briefly document the new workflow.

For more context on why someone might want to use a DNS-based ad-blocker, please refer to the original post.

Installation

Create an Ubuntu Server virtual machine with your cloud provider of choice (such as Azure or AWS)

Note: Thanks to improvements by the Pi-Hole team, it's now able to run in the smallest virtual machine size
Connect via SSH and update the package database:

sudo apt-get update
Install Pi-Hole:

curl -L https://install.pi-hole.net | bash

Note: Running scripts directly from the internet is risky, so consider using the alternate install instead
Open the dnsmasq configuration file:

sudo nano /etc/dnsmasq.d/01-pihole.conf
Turn off logging by commenting-out the corresponding line:

#log-queries
Open the Pi-Hole configuration file:

sudo nano /etc/pihole/setupVars.conf
Update it to use an invalid address for blocked domains:

IPv4_address=0.0.0.0
Re-generate the block list:

sudo /opt/pihole/gravity.sh
Verify the block list looks reasonable:

cat /etc/pihole/gravity.list
Verify logging is off:

cat /var/log/pihole.log
Reboot to ensure everything loads successfully:

sudo reboot
Grant access to the virtual machine's public IP address by opening the relevant network ports (incoming UDP and TCP on port 53)

Don't forget

If you use a Pi-Hole regularly, please consider donating to the Pi-Hole project so the maintainers can continue developing and improving it.

Tags: Miscellaneous Technical Web

Binary Log OBjects, gotta download 'em all! [A simple tool to download blobs from an Azure container]

Wednesday, August 10, 2016

The latest in a series of "I didn't want to write a thing, but couldn't find another thing that already did exactly what I wanted, which is probably because I'm too picky, but whatever" projects, azure-blob-container-download (a.k.a. abcd) is a simple, command-line tool to download all the blobs in an Azure storage container. Here's how it's described in the README:

A simple, cross-platform tool to bulk-download blobs from an Azure storage container.

Though limited in scope, it does a specific set of things vs. the official tools:

AzCopy is not cross-platform

Azure CLI does not bulk-download

Azure PowerShell is not cross-platform

Azure Portal does not bulk-download

The motivation for this project was the same as with my previous post about getting an HTTPS certificate: I've migrated my website from a virtual machine to an Azure Web App. And while it's easy to enable logging for a Web App and get hourly log files in the W3C Extended Log File Format, it wasn't obvious to me how to parse those logs offline to measure traffic, referrers, etc.. (Although that's not something I've bothered with up to now, it's an ability I'd like to have.) What I wanted was a trustworthy, cross-platform tool to download all those log files to a local machine - but the options I investigated each seemed to be missing something.

So I wrote a simple Node.JS CLI and gave it a few extra features to make my life easier. The code is fairly compact and straightforward (and the dependencies minimal), so it's easy to audit. The complete options for downloading and filtering are:

Usage: abcd [options]

Options:
  --account           Storage account (or set AZURE_STORAGE_ACCOUNT)  [string]
  --key               Storage access key (or set AZURE_STORAGE_ACCESS_KEY)  [string]
  --containerPattern  Regular expression filter for container names  [string]
  --blobPattern       Regular expression filter for blob names  [string]
  --startDate         Starting date for blobs  [string]
  --endDate           Ending date for blobs  [string]
  --snapshots         True to include blob snapshots  [boolean]
  --version           Show version number  [boolean]
  --help              Show help  [boolean]

Download blobs from an Azure container.
https://github.com/DavidAnson/azure-blob-container-download

Azure Web Apps create a new log file every hour, so they add up quickly; abcd's date filtering options make it easy to perform incremental downloads. The default directory structure (based on / separators) is collapsed during download, so all files end up in the same directory (named by container) and ordered by date. The tool limits itself to one download at a time, so things proceed at a steady, moderate pace. Once blobs have finished downloading, you're free to do with them as you please. :)

Find out more on the GitHub project page for azure-blob-container-download.

Tags: Node.js Technical Utilities

Free as in ... HTTPS certificates? [Obtaining and configuring a free HTTPS certificate for an Azure Web App with a custom domain]

Wednesday, May 18, 2016

Providing secure access to all Internet content - not just that for banking and buying - is quickly becoming the norm. Although setting up a web site has been fairly easy for years, enabling HTTPS for that site was more challenging. The Let's Encrypt project is trying to improve things for everyone - by making certificates free and easier to use, they enable more sites to offer secure access.

Let's Encrypt is notable for (at least) two achievements. The first is lowering the cost for anyone to obtain a certificate - you can't beat free! The second is simplifying the steps to enable HTTPS on a server. Thus far, Let's Encrypt has focused their efforts on Linux systems, so the process for Windows servers hasn't changed much. Further complicating things, many sites nowadays are hosted by services like Azure or CloudFlare, which makes validating ownership more difficult.

As someone who is in the process of migrating content from a virtual machine with a custom domain to an Azure Web App, I've been looking for an easy way to make use of Let's Encrypt certificates. A bit of searching turned up some helpful resources:

How to get a free SSL Cert for your Azure Web App with Let's Encrypt - In which a virtual machine and custom routing are used
How to Validate a Let's Encrypt Certificate on a Site Already Active on CloudFlare - In which command-line steps are outlined
Ssl certificate for your Azure website using Letsencrypt - In which a reverse proxy and Vagrant are used
Azure Web App Site Extension for easy installation and configuration of Let's Encrypt issued SSL certificates for custom domain names - In which a site extension is written to handle things automatically (though without support)
Add support for free SSL certs like those from Let's Encrypt - In which the Web Apps team endorses said extension

Nothing was exactly what I wanted, so I came up with the following approach based on tweaks to the first two articles above. The Let's Encrypt tool runs on Linux, so I use that platform exclusively. Everything can be done in a terminal window, so it's easily scripted. There is no need to open a firewall or use another machine; everything can be done in one place. And by taking advantage of the nifty ability to boot from a Live CD, the technique is easy to apply even if you don't have a Linux box handy.

Boot an Ubuntu 16.04 Live CD
- Or a future version of Windows with the Ubuntu subsystem
Run "Software & Updates" and enable the "universe" repository
sudo apt install letsencrypt
sudo apt install git
git config --global user.email "user@example.com"
git config --global user.name "User Name"

git clone https://example.scm.azurewebsites.net:443/Example.git

Be sure /.well-known/acme-challenge/web.config exits and is configured to allow extension-less files:

<configuration>
  <system.webServer>
    <staticContent>
      <mimeMap fileExtension="" mimeType="text/plain"/>
    </staticContent>
  </system.webServer>
</configuration>

sudo letsencrypt certonly --manual --domain example.com --domain www.example.com --email user@example.com --agree-tos --text
- Note: Include the --test-cert option when trying things out
Repeat for each domain:
1. nano verification-file and paste the provided content
2. git add verification-file
3. git commit -m "Add verification file."
4. git push
5. Allow Let's Encrypt to verify ownership by fetching the verification file
sudo openssl pkcs12 -export -inkey /etc/letsencrypt/live/example.com/privkey.pem -in /etc/letsencrypt/live/example.com/fullchain.pem -out fullchain.pfx -passout pass:your-password
Follow the steps to Configure a custom domain name in Azure App Service using fullchain.pfx
Enjoy browsing your site securely!

Tags: Technical Web