The blog of dlaa.me

Posts from August 2016

Binary Log OBjects, gotta download 'em all! [A simple tool to download blobs from an Azure container]

The latest in a series of "I didn't want to write a thing, but couldn't find another thing that already did exactly what I wanted, which is probably because I'm too picky, but whatever" projects, azure-blob-container-download (a.k.a. abcd) is a simple, command-line tool to download all the blobs in an Azure storage container. Here's how it's described in the README:

A simple, cross-platform tool to bulk-download blobs from an Azure storage container.

Though limited in scope, it does a specific set of things vs. the official tools:

The motivation for this project was the same as with my previous post about getting an HTTPS certificate: I've migrated my website from a virtual machine to an Azure Web App. And while it's easy to enable logging for a Web App and get hourly log files in the W3C Extended Log File Format, it wasn't obvious to me how to parse those logs offline to measure traffic, referrers, etc.. (Although that's not something I've bothered with up to now, it's an ability I'd like to have.) What I wanted was a trustworthy, cross-platform tool to download all those log files to a local machine - but the options I investigated each seemed to be missing something.

So I wrote a simple Node.JS CLI and gave it a few extra features to make my life easier. The code is fairly compact and straightforward (and the dependencies minimal), so it's easy to audit. The complete options for downloading and filtering are:

Usage: abcd [options]

Options:
  --account           Storage account (or set AZURE_STORAGE_ACCOUNT)  [string]
  --key               Storage access key (or set AZURE_STORAGE_ACCESS_KEY)  [string]
  --containerPattern  Regular expression filter for container names  [string]
  --blobPattern       Regular expression filter for blob names  [string]
  --startDate         Starting date for blobs  [string]
  --endDate           Ending date for blobs  [string]
  --snapshots         True to include blob snapshots  [boolean]
  --version           Show version number  [boolean]
  --help              Show help  [boolean]

Download blobs from an Azure container.
https://github.com/DavidAnson/azure-blob-container-download

Azure Web Apps create a new log file every hour, so they add up quickly; abcd's date filtering options make it easy to perform incremental downloads. The default directory structure (based on / separators) is collapsed during download, so all files end up in the same directory (named by container) and ordered by date. The tool limits itself to one download at a time, so things proceed at a steady, moderate pace. Once blobs have finished downloading, you're free to do with them as you please. :)

Find out more on the GitHub project page for azure-blob-container-download.