The blog of dlaa.me
Tag: "Node.js"
  • Binary Log OBjects, gotta download 'em all! [A simple tool to download blobs from an Azure container]
    Wednesday, August 10th 2016

    The latest in a series of "I didn't want to write a thing, but couldn't find another thing that already did exactly what I wanted, which is probably because I'm too picky, but whatever" projects, azure-blob-container-download (a.k.a. abcd) is a simple, command-line tool to download all the blobs in an Azure storage container. Here's how it's described in the README:

    A simple, cross-platform tool to bulk-download blobs from an Azure storage container.

    Though limited in scope, it does a specific set of things vs. the official tools:

    The motivation for this project was the same as with my previous post about getting an HTTPS certificate: I've migrated my website from a virtual machine to an Azure Web App. And while it's easy to enable logging for a Web App and get hourly log files in the W3C Extended Log File Format, it wasn't obvious to me how to parse those logs offline to measure traffic, referrers, etc.. (Although that's not something I've bothered with up to now, it's an ability I'd like to have.) What I wanted was a trustworthy, cross-platform tool to download all those log files to a local machine - but the options I investigated each seemed to be missing something.

    So I wrote a simple Node.JS CLI and gave it a few extra features to make my life easier. The code is fairly compact and straightforward (and the dependencies minimal), so it's easy to audit. The complete options for downloading and filtering are:

    Usage: abcd [options]
    
    Options:
      --account           Storage account (or set AZURE_STORAGE_ACCOUNT)  [string]
      --key               Storage access key (or set AZURE_STORAGE_ACCESS_KEY)  [string]
      --containerPattern  Regular expression filter for container names  [string]
      --blobPattern       Regular expression filter for blob names  [string]
      --startDate         Starting date for blobs  [string]
      --endDate           Ending date for blobs  [string]
      --snapshots         True to include blob snapshots  [boolean]
      --version           Show version number  [boolean]
      --help              Show help  [boolean]
    
    Download blobs from an Azure container.
    https://github.com/DavidAnson/azure-blob-container-download
    

    Azure Web Apps create a new log file every hour, so they add up quickly; abcd's date filtering options make it easy to perform incremental downloads. The default directory structure (based on / separators) is collapsed during download, so all files end up in the same directory (named by container) and ordered by date. The tool limits itself to one download at a time, so things proceed at a steady, moderate pace. Once blobs have finished downloading, you're free to do with them as you please. :)

    Find out more on the GitHub project page for azure-blob-container-download.

    Tags: Node.js Technical Utilities
  • Respect my securitah! [The check-pages suite now prefers HTTPS and includes a CLI]
    Wednesday, May 11th 2016

    There are many best practices to keep in mind when maintaining a web site, so it's helpful to have tools that check for common mistakes. I've previously written about two Node.js packages I created for this purpose, check-pages and grunt-check-pages, both of which can be easily integrated into an automated workflow. I updated them recently and wish to highlight two aspects.

    HTTPS

    There's a movement underway to make the Internet safer, and one of the best ways is to use the secure HTTPS protocol when browsing the web. Not all sites support HTTPS, but many do, and it's good to link to the secure version of a page when available. The trick is knowing when that's possible - especially for links created long ago or before a site was updated to support HTTPS. That's where the new --preferSecure option comes in: it raises an error whenever a page links to potentially-secure content insecurely. Scanning a site with the --checkLinks/--preferSecure option enabled is now an easy way to identify links that could be updated to provide a safer browsing experience.

    Aside: The moarTLS Chrome extension does a similar thing in the browser; check it out!

    CLI

    check-pages is easy to integrate into an automated workflow, but sometimes it's nice to run one-off tests or experiment interactively with a site's configuration. To that end, I created a simple command-line wrapper that exposes all the check-pages functionality (including --preferSecure) in a way that's easy to use on the platform/shell of your choice. Simply install it via npm, point it at the page(s) of interest, and review the list of possible issues. Here's the output of the --help command:

    Usage: check-pages <page URLs> [options]
    
    Checks:
      --checkLinks        Validates each link on a page  [boolean]
      --checkCaching      Validates Cache-Control/ETag  [boolean]
      --checkCompression  Validates Content-Encoding  [boolean]
      --checkXhtml        Validates page structure  [boolean]
    
    checkLinks options:
      --linksToIgnore     List of URLs to ignore  [array]
      --noEmptyFragments  Fails for empty fragments  [boolean]
      --noLocalLinks      Fails for local links  [boolean]
      --noRedirects       Fails for HTTP redirects  [boolean]
      --onlySameDomain    Ignores links to other domains  [boolean]
      --preferSecure      Verifies HTTPS when available  [boolean]
      --queryHashes       Verifies query string file hashes  [boolean]
    
    Options:
      --summary          Summarizes issues after running  [boolean]
      --terse            Results on one line, no progress  [boolean]
      --maxResponseTime  Response timeout (milliseconds)  [number]
      --userAgent        Custom User-Agent header  [string]
      --version          Show version number  [boolean]
      --help             Show help  [boolean]
    
    Checks various aspects of a web page for correctness.
    https://github.com/DavidAnson/check-pages-cli
    
    Tags: Node.js Technical Web
  • Catch common Markdown mistakes as you make them [markdownlint is a Visual Studio Code extension to lint Markdown files]
    Tuesday, December 8th 2015

    The lightweight, cross-platform Visual Studio Code editor recently gained support for extensions, third party packages that add or enhance capabilities of the tool. Of particular interest to me are linters, syntax checkers that help avoid mistakes and maintain consistency when working with a language (either code or markup). I've previously written about markdownlint, a Node.js linter for the Markdown markup language. After looking at the VS Code API, it seemed straightforward to create a markdownlint extension for Code. I did so and published markdownlint to the extension gallery where it can be installed via the command ext install markdownlint. What's nice about editor integration for a linter is that feedback is immediate and interactive: mistakes are highlighted as they're made and it's easy to click a link for information about any rule violation.

    If linting Markdown is something that interests you, please try the markdownlint extension for VS Code and share your feedback!

    Tags: Miscellaneous Node.js Technical
  • Not romantically binding [promise-ring wraps Node.js callbacks with native ES6 Promises]
    Monday, July 20th 2015

    JavaScript Promises are a powerful way of working with asynchronous code. They make sequencing operations easy and offer a clear, predictable way to handle errors that might occur along the way. Much has been written about the benefits of Promises and I won't try to repeat it here.

    What I do hope to do is make Promises a slightly more natural part of the Node.js development experience. In version 0.12.* (as well as in io.js), ES6 Promises are natively available. But the standard set of modules (such as File System) still use their original callback-based design and there's a bit of a disconnect between how you might want to write something and how you're able to. Fortunately, most of the Promise libraries that are already available include wrappers to convert callback-based functions into ones that return a Promise. However, most of those libraries assume you'll be using their custom implementation of Promise (from the "olden days" when that was the only option). And while different Promises/A+ implementations are meant to be interoperable, it seems silly to pull in a second Promise implementation when a perfectly good one is already available.

    That's where promise-ring comes in: it's a tiny npm package that provides functions to convert typical callback-based APIs into their Promise-based counterparts using the V8 JavaScript engine's native Promise implementation. Briefly:

    promise-ring is small, simple library with no dependencies that eases the use of native JavaScript Promises in projects without a Promise library.

    Documentation is available in the README along with runnable samples demonstrating the use of each API. It's all quite simple and exactly what you'd expect. A bonus feature is the wrapAll function which makes it easier to work with modules that expose many different callback-based functions (such as the File System module; see below).

    For an example of using promise-ring and Promises to simplify code, here is a typical callback-based snippet to copy a file onto itself:

    var fs = require("fs");
    
    // Copy a file onto itself using callbacks
    fs.stat(file, function(err) {
      if (err) {
        console.error(err);
      } else {
        fs.readFile(file, encoding, function(errr, content) {
          if (errr) {
            console.error(errr);
          } else {
            fs.writeFile(file, content, encoding, function(errrr) {
              if (errrr) {
                console.error(errrr);
              } else {
                console.log("Copied " + file);
              }
            });
          }
        });
      }
    });
    

    And here's the same code converted to use Promises via promise-ring:

    var pr = require("promise-ring");
    var fsp = pr.wrapAll(require("fs"));
    
    // Copy a file onto itself using Promises
    fsp.stat(file)
      .then(function() {
        return fsp.readFile(file, encoding);
      })
      .then(function(content) {
        return fsp.writeFile(file, content, encoding);
      })
      .then(function() {
        console.log("Copied " + file);
      })
      .catch(console.error);
    

    The second implementation is more concise, easier to follow, and DRY-er. That's the power of Promises! :)

    Find out more by visiting promise-ring on GitHub or promise-ring in the npm gallery.

    Tags: Node.js Technical
  • Lint-free documentation [markdownlint is a Node.js style checker and lint tool for Markdown files]
    Tuesday, May 12th 2015

    I'm a strong believer in using static analysis tools to identify problems and catch mistakes. The Node.js/io.js community has some great options for linting JavaScript code (ex: JSHint and ESLint), and I use them regularly. But code isn't the only important asset - documentation can be just as important to a project's success.

    The open-source community has pretty much standardized on Markdown for documentation which is a great choice because it's easy to read, write, and understand. That said, Markdown has a syntax, so there are "right" and "wrong" ways to do things - and not all parsers handle nuances the same way (though the CommonMark effort is trying to standardize). In particular, there are constructs that can lead to missing/broken text in some parsers but which are not obviously wrong in the original Markdown.

    To show what I mean, I created a Gist of common Markdown mistakes. If you're not a Markdown expert, you might learn something by comparing the source and output. :)

    Aside: The Markdown parser used by GitHub is quite good - but many issues are user error and it can't (yet) read your mind.

     

    You shouldn't need to be a Markdown expert to avoid silly mistakes - that's what we have computers for. When I looked around for a Node-based linter, I didn't see anything - but I did find a very nice implementation for Ruby by Mark Harrison. I don't tend to have Ruby available in my development environment, but I had an itch to scratch, so I installed it and added a couple of rules to Mark's tool for the checks I wanted. Mark kindly accepted the corresponding pull requests, and all was well.

    Except that once I'd tasted of the fruit of Markdown linting, I wanted to integrate it into other workflows - many of which are exclusively Node-based. I briefly entertained the idea of creating a Node package to install Ruby then use it to install and run a Ruby gem - but that made my head hurt...

     

    So I prototyped a Node version of markdownlint by porting a few rules over and then ran the idea by Mark. He was supportive (and raised some great points!), so I gradually ported the rest of the rules to JavaScript with the same numbering/naming system to make it easy for people to migrate between the two tools. Mark already had a fantastic test infrastructure and great documentation for rules, so I shamelessly reused both in the Node version. Configuration for JavaScript tools is typically JSON, so the Node version uses a slightly different format than Ruby (though both are simple/obvious). I started with a fully asynchronous API for efficiency, but ended up adding a synchronous version for scenarios where that's more convenient. I strived to achieve functional parity with the Ruby implementation (and continue to do so as Mark makes updates!), but duplicating the CLI was a non-goal (please have a look at the mdl gem if that's what you need).

    If this sounds interesting, please have a look at markdownlint on GitHub. As of this writing, it supports the same set of ~40 rules that the Ruby implementation does - you can read all about them in Mark's fantastic Rules.md. markdownlint exposes a single API which can be called in an asynchronous or synchronous manner and accepts an options object to identify the files/strings to lint and the set of rules to apply. It returns a simple object that lists the items that were checked along with the line numbers for any violations. The documentation shows of all of this and includes examples of calling markdownlint from both gulp and Grunt.

     

    To make sure markdownlint works well, I've integrated it into some of my own projects, including this blog which I wrote specifically to allow authoring in Markdown. That's a nice start, but it doesn't prove markdownlint can handle larger projects with significant documentation written by different people at different times. For that you'd need to integrate with a project like ESLint which has extensive documentation that's entirely Markdown-based.

    So I did. :) Supporting ESLint was one of the motivating factors behind porting markdownlint to Node in the first place: I love the tool and use it in all my projects. The documentation is excellent, but every now and then I'd come across weird or broken text. After submitting a couple of pull requests with fixes, I decided adding a Markdown linter to their test script would be a better way to keep typos out of the documentation. It turns out this was on the team's radar as well, and they - especially project owner Nicholas - were very helpful and accommodating as I introduced markdownlint and tweaked things to satisfy some of the rules.

     

    At this point, maybe I've convinced you markdownlint works for my own purposes and that it works for some other purposes, but it's likely you have special requirements or would like to "try before you buy". (Which seems an ironic thing to say about free software, but there's a cost to everything, so maybe it's not that unreasonable after all.) Well, I have just the thing for you:

    An interactive markdownlint demo that runs in the browser!

    Although browser support was not (is not!) a goal, the relevant code is all JavaScript with just one dependency (that itself offers browser support) and only two methods that need polyfills (trimLeft/trimRight). So it was actually fairly straightforward (with some help from Browserify) to create a standalone, offline-enabled web page that lets anyone use a (modern) browser to experiment with markdownlint and validate arbitrary content. To make it super easy to get started, I made some deliberate mistakes in the sample content for the demo - feel free to fix them for me. :)

     

    In summary:

    • Markdown is great
    • It's easy to read and write
    • Sometimes it doesn't do what you think
    • There are tools to help
    • markdownlint is one of them
    • Get it for Ruby or Node
    • Or try it in the browser
    Tags: Node.js Technical Web
  • Extensibility is a wonderful thing [A set of Visual Studio Code tasks for common npm functionality in Node.js and io.js]
    Thursday, April 30th 2015

    Yesterday at its Build conference, Microsoft released the Visual Studio Code editor which is a lightweight, cross-platform tool for building web and cloud applications. I've been using internal releases for a while and highly recommend trying it out!

    One thing I didn't know about until yesterday was support for Tasks to automate common steps like build and testing. As the documentation shows, there's already knowledge of common build frameworks, including gulp for Node.js and io.js. But for simple Node projects I like to automate via npm's scripts because they're simple and make it easy to integrate with CI systems like Travis. So I whipped up a simple tasks.json for Code that handles build, test, and lint for typical npm configurations. I've included it below for anyone who's interested.

    Note: Thanks to metadata, the build and test tasks are recognized as such by Code and easily run with the default hotkeys Ctrl+Shift+B and Ctrl+Shift+T.

    Enjoy!

     

    {
      "version": "0.1.0",
      "command": "npm",
      "isShellCommand": true,
      "suppressTaskName": true,
      "tasks": [
        {
          // Build task, Ctrl+Shift+B
          // "npm install --loglevel info"
          "taskName": "install",
          "isBuildCommand": true,
          "args": ["install", "--loglevel", "info"]
        },
        {
          // Test task, Ctrl+Shift+T
          // "npm test"
          "taskName": "test",
          "isTestCommand": true,
          "args": ["test"]
        },
        {
          // "npm run lint"
          "taskName": "lint",
          "args": ["run", "lint"]
        }
      ]
    }
    

    Updated 2015-05-02: Added --loglevel info to npm install for better progress reporting

    Updated 2016-02-27: Added isShellCommand, suppressTaskName, and updated args to work with newer versions of VS Code

    Tags: Node.js Miscellaneous
  • Supporting both sides of the Grunt vs. Gulp debate [check-pages is a Gulp-friendly task to check various aspects of a web page for correctness]
    Tuesday, February 10th 2015

    A few months ago, I wrote about grunt-check-pages, a Grunt task to check various aspects of a web page for correctness. I use grunt-check-pages when developing my blog and have found it very handy for preventing mistakes and maintaining consistency.

    Two things have changed since then:

    1. I released multiple enhancements to grunt-check-pages that make it more powerful
    2. I extracted its core functionality into the check-pages package which works well with Gulp

     

    First, an overview of the improvements; here's the change log for grunt-check-pages:

    • 0.1.0 - Initial release, support for checkLinks and checkXhtml.
    • 0.1.1 - Tweak README for better formatting.
    • 0.1.2 - Support page-only mode (no link or XHTML checks), show response time for requests.
    • 0.1.3 - Support maxResponseTime option, buffer all page responses, add "no-cache" header to requests.
    • 0.1.4 - Support checkCaching and checkCompression options, improve error handling, use gruntMock.
    • 0.1.5 - Support userAgent option, weak entity tags, update nock dependency.
    • 0.2.0 - Support noLocalLinks option, rename disallowRedirect option to noRedirects, switch to ESLint, update superagent and nock dependencies.
    • 0.3.0 - Support queryHashes option for CRC-32/MD5/SHA-1, update superagent dependency.
    • 0.4.0 - Rename onlySameDomainLinks option to onlySameDomain, fix handling of redirected page links, use page order for links, update all dependencies.
    • 0.5.0 - Show location of redirected links with noRedirects option, switch to crc-hash dependency.
    • 0.6.0 - Support summary option, update crc-hash, grunt-eslint, nock dependencies.
    • 0.6.1 - Add badges for automated build and coverage info to README (along with npm, GitHub, and license).
    • 0.6.2 - Switch from superagent to request, update grunt-eslint and nock dependencies.
    • 0.7.0 - Move task implementation into reusable check-pages package.
    • 0.7.1 - Fix misreporting of "Bad link" for redirected links when noRedirects enabled.

    There are now more things you can validate and better diagnostics during validation. For information about the various options, visit the grunt-check-pages package in the npm repository.

     

    Secondly, I started looking into Gulp as an alternative to Grunt. My blog's Gruntfile.js is the most complicated I have, so I tried converting it to a gulpfile.js. Conveniently, existing packages supported everything I already do (test, LESS, lint) - though not what I use grunt-check-pages for (no surprise).

    Clearly, the next step was to create a version of the task for Gulp - but it turns out that's not necessary! Gulp's task structure is simple enough that invoking standard asynchronous helpers is easy to do inline. So all I really needed was to factor out the core functionality into a reusable method.

    Here's how that looks:

    /**
     * Checks various aspects of a web page for correctness.
     *
     * @param {object} host Specifies the environment.
     * @param {object} options Configures the task.
     * @param {function} done Callback function.
     * @returns {void}
     */
    module.exports = function(host, options, done) { ... }
    

    With that in place, it's easy to invoke check-pages - whether from a Gulp task or something else entirely. The host parameter handles log/error messages (pass console for convenience), options configures things in the usual fashion, and the done callback gets called at the end (with an Error parameter if anything went wrong).

    Like so:

    var gulp = require("gulp");
    var checkPages = require("check-pages");
    
    gulp.task("checkDev", [ "start-development-server" ], function(callback) {
      var options = {
        pageUrls: [
          'http://localhost:8080/',
          'http://localhost:8080/blog',
          'http://localhost:8080/about.html'
        ],
        checkLinks: true,
        onlySameDomain: true,
        queryHashes: true,
        noRedirects: true,
        noLocalLinks: true,
        linksToIgnore: [
          'http://localhost:8080/broken.html'
        ],
        checkXhtml: true,
        checkCaching: true,
        checkCompression: true,
        maxResponseTime: 200,
        userAgent: 'custom-user-agent/1.2.3',
        summary: true
      };
      checkPages(console, options, callback);
    });
    
    gulp.task("checkProd", function(callback) {
      var options = {
        pageUrls: [
          'http://example.com/',
          'http://example.com/blog',
          'http://example.com/about.html'
        ],
        checkLinks: true,
        maxResponseTime: 500
      };
      checkPages(console, options, callback);
    });
    

    As a result, grunt-check-pages has become a thin wrapper over check-pages and there's no duplication between the two packages (though each has a complete set of tests just to be safe). For information about the options above, visit the check-pages package in the npm repository.

     

    The combined effect is that I'm able to do a better job validating web site updates and I can use whichever of Grunt or Gulp feels more appropriate for a given scenario. That's good for peace of mind - and a great way to become more familiar with both tools!

    Tags: Node.js Technical Web
  • Everything old is new again [crc-hash is a Node.js Crypto Hash implementation for the CRC algorithm]
    Tuesday, January 27th 2015

    Yep, another post about hash functions... True, I could have stopped when I implemented CRC-32 for .NET or when I implemented MD5 for Silverlight. Certainly, sharing the code for four versions of ComputeFileHashes could have been a good laurel upon which to rest.

    But then I started using Node.js, and found one more hash-oriented itch to scratch. :)

    From the project page:

    Node.js's Crypto module implements the Hash class which offers a simple Stream-based interface for creating hash digests of data. The createHash function supports many popular algorithms like SHA and MD5, but does not include older/simpler CRC algorithms like CRC-32. Fortunately, the crc package in npm provides comprehensive CRC support and offers an API that can be conveniently used by a Hash subclass.

    crc-hash is a Crypto Hash wrapper for the crc package that makes it easy for Node.js programs to use the CRC family of hash algorithms via a standard interface.

    With just one (transitive!) dependency, crc-hash is lightweight. Because it exposes a common interface, it's easy to integrate with existing scenarios. Thanks to crc, it offers support for all the popular CRC algorithms. You can learn more on the crc-hash npm page or the crc-hash GitHub page.

    Notes:

    • One of the great things about the Node community is the breadth of packages available. In this case, I was able to leverage the comprehensive crc package by alexgorbatchev for all the algorithmic bits.
    • After being indifferent on the topic of badges, I discovered shields.io and its elegance won me over. You can see the five badges I picked near the top of README.md on the npm/GitHub pages above.
    Tags: Node.js Technical Web
  • "That's a funny looking warthog", a post about mocking Grunt [gruntMock is a simple mock for testing Grunt.js multi-tasks]
    Wednesday, September 10th 2014

    While writing the grunt-check-pages task for Grunt.js, I wanted a way to test the complete lifecycle: to load the task in a test context, run it against various inputs, and validate the output. It didn't seem practical to call into Grunt itself, so I looked around for a mock implementation of Grunt. There were plenty of mocks for use with Grunt, but I didn't find anything that mocked the API itself. So I wrote a very simple one and used it for testing.

    That worked well, so I wanted to formalize my gruntMock implementation and post it as an npm package for others to use. Along the way, I added a bunch of additional API support and pulled in domain-based exception handling for a clean, self-contained implementation. As I hoped, updating grunt-check-pages made its tests simpler and more consistent.

    Although gruntMock doesn't implement the complete Grunt API, it implements enough of it that I expect most tasks to be able to use it pretty easily. If not, please let me know what's missing! :)

     

    For more context, here's part of the introductory section of README.md:

    gruntMock is simple mock object that simulates the Grunt task runner for multi-tasks and can be easily integrated into a unit testing environment such as Nodeunit. gruntMock invokes tasks the same way Grunt does and exposes (almost) the same set of APIs. After providing input to a task, gruntMock runs and captures its output so tests can verify expected behavior. Task success and failure are unified, so it's easy to write positive and negative tests.

    Here's what gruntMock looks like in a simple scenario under Nodeunit:

    var gruntMock = require('gruntmock');
    var example = require('./example-task.js');
    
    exports.exampleTest = {
    
      pass: function(test) {
        test.expect(4);
        var mock = gruntMock.create({
          target: 'pass',
          files: [
            { src: ['unused.txt'] }
          ],
          options: { str: 'string', num: 1 }
        });
        mock.invoke(example, function(err) {
          test.ok(!err);
          test.equal(mock.logOk.length, 1);
          test.equal(mock.logOk[0], 'pass');
          test.equal(mock.logError.length, 0);
          test.done();
        });
      },
    
      fail: function(test) {
        test.expect(2);
        var mock = gruntMock.create({ target: 'fail' });
        mock.invoke(example, function(err) {
          test.ok(err);
          test.equal(err.message, 'fail');
          test.done();
        });
      }
    };
    

     

    For a more in-depth example, have a look at the use of gruntMock by grunt-check-pages. That shows off integration with other mocks (specifically nock, a nice HTTP server mock) as well as the testOutput helper function that's used to validate each test case's output without duplicating code. It also demonstrates how gruntMock's unified handling of success and failure allows for clean, consistent testing of input validation, happy path, and failure scenarios.

    To learn more - or experiment with gruntMock - visit gruntMock on npm or gruntMock on GitHub.

    Happy mocking!

    Tags: Grunt Node.js Utilities Web
  • Say goodbye to dead links and inconsistent formatting [grunt-check-pages is a simple Grunt task to check various aspects of a web page for correctness]
    Tuesday, August 12th 2014

    As part of converting my blog to a custom Node.js app, I wrote a set of tests to validate its routes, structure, content, and behavior (using mocha/grunt-mocha-test). Most of these tests are specific to my blog, but some are broadly applicable and I wanted to make them available to anyone who was interested. So I created a Grunt plugin and published it to npm:

    grunt-check-pages

    An important aspect of creating web sites is to validate the structure and content of their pages. The checkPages task provides an easy way to integrate this testing into your normal Grunt workflow.

    By providing a list of pages to scan, the task can:

     

    Link validation is fairly uncontroversial: you want to ensure each hyperlink on a page points to valid content. grunt-check-pages supports the standard HTML link types (ex: <a href="..."/>, <img src="..."/>) and makes an HTTP HEAD request to each link to make sure it's valid. (Because some web servers misbehave, the task also tries a GET request before reporting a link broken.) There are options to limit checking to same-domain links, to disallow links that redirect, and to provide a set of known-broken links to ignore. (FYI: Links in draft elements (ex: picture) are not supported for now.)

    XHTML compliance might be a little controversial. I'm not here to persuade you to love XHTML - but I do have some experience parsing HTML and can reasonably make a few claims:

    • HTML syntax errors are tricky for browsers to interpret and (historically) no two work the same way
    • Parsing ambiguity leads to rendering issues which create browser-specific quirks and surprises
    • HTML5 is more prescriptive about invalid syntax, but nothing beats a well-formed document
    • Being able to confidently parse web pages with simple tools is pleasant and quite handy
    • Putting a close '/' on your img and br tags is a small price to pay for peace of mind :)

    Accordingly, grunt-check-pages will (optionally) parse each page as XML and report the issues it finds.

     

    grunt.initConfig({
      checkPages: {
        development: {
          options: {
            pageUrls: [
              'http://localhost:8080/',
              'http://localhost:8080/blog',
              'http://localhost:8080/about.html'
            ],
            checkLinks: true,
            onlySameDomainLinks: true,
            disallowRedirect: false,
            linksToIgnore: [
              'http://localhost:8080/broken.html'
            ],
            checkXhtml: true
          }
        },
        production: {
          options: {
            pageUrls: [
              'http://example.com/',
              'http://example.com/blog',
              'http://example.com/about.html'
            ],
            checkLinks: true,
            checkXhtml: true
          }
        }
      }
    });
    

    Something I find useful (and outline above) is to define separate configurations for development and production. My development configuration limits itself to links within the blog and ignores some that don't work when I'm self-hosting. My production configuration tests everything across a broader set of pages. This lets me iterate quickly during development while validating the live deployment more thoroughly.

    If you'd like to incorporate grunt-check-pages into your workflow, you can get it via grunt-check-pages on npm or grunt-check-pages on GitHub. And if you have any feedback, please let me know!

     

    Footnote: grunt-check-pages is not a site crawler; it looks at exactly the set of pages you ask it to. If you're looking for a crawler, you may be interested in something like grunt-link-checker (though I haven't used it myself).

    Tags: Grunt Node.js Utilities Web