The blog of dlaa.me
  • Solving puzzles at 30,000 feet [An iterative solution for the "Is this a binary search tree?" programming problem]
    Tuesday, April 7th 2015

    Sitting on a plane recently looking for a distraction, I recalled a programming challenge by James Michael Hare: Little Puzzlers-Is Tree a Binary Search Tree?. All I had to work with was a web browser, so I used JavaScript to come up with a solution. James subsequently blogged a recursive implementation in C# which is quite elegant. Wikipedia's Binary search tree page uses the same approach and C++ for its verification sample.

    Because I did things a little differently, I thought I'd share - along with a few thoughts:

    /**
     * Determines if a tree of {value, left, right} nodes is a binary search tree.
     * @param {Object} root Root of the tree to examine.
     * @returns {Boolean} True iff root is a binary search tree.
     */
    function isBinarySearchTree(root) {
      var wrapper, node, stack = [{ node: root }];
      while (wrapper = stack.pop()) {
        if (node = wrapper.node) {
          if ((node.value <= wrapper.min) || (wrapper.max <= node.value)) {
            return false;
          }
          stack.push({ node: node.left, min: wrapper.min, max: node.value },
                     { node: node.right, min: node.value, max: wrapper.max });
        }
      }
      return true;
    }
    

    Notes:

    • Tree nodes are assumed to have a numeric value and references to their left and right nodes (both possibly null).
      • I used the name value (vs. data) because it is slightly more specific.
    • I decided on an iterative algorithm because it has two notable advantages over recursion:
      • In the worst case for a tree with N nodes, an iterative solution has bookkeeping for N/2 nodes (when starting to process the leaf nodes of a balanced tree assuming nodes were queued) whereas a recursive solution has bookkeeping for all N nodes (when processing the deepest node of a completely unbalanced tree).
        • Because there are two recursive calls, I don't think tail recursion can be counted on to fix the worst-case behavior.
      • The memory used for bookkeeping by an iterative solution comes from the heap which is generally much larger than the thread stack.
      • To be fair, neither advantage is likely to be significant in practice - but they make good discussion points during an interview. :)
    • The iterative algorithm has a disadvantage:
      • Bookkeeping requires an additional object type (wrapper in the code above) which associates the relevant min and max bounds with pending node instances.
        • ... unless you avoid the wrapper by augmenting the node elements themselves.
          • ... which is quite easy in JavaScript thanks to its dynamic type system.
        • The creation/destruction of wrapper objects creates additional memory pressure.
          • Although these objects are short-lived and therefore low-impact for typical garbage collection algorithms.
    • I intended the code to be concise, so I made use of assignments in conditional expressions.
    • The code uses a stack (vs. a queue) because stacks tend to be simpler than queues - especially when implemented with an array.
    • I made use of the fact that comparing a number to undefined evaluates to false so I could avoid specifying explicit minimum/maximum values (as in the Wikipedia example) or making HasValue checks (as in James's example).
    • If you have a different approach or a suggestion to simplify this one, please share!
      • And note: I'm interested in algorithmic changes, not tweaks like removing extra parenthesis. :)
    Tags: Miscellaneous Technical
  • Supporting both sides of the Grunt vs. Gulp debate [check-pages is a Gulp-friendly task to check various aspects of a web page for correctness]
    Tuesday, February 10th 2015

    A few months ago, I wrote about grunt-check-pages, a Grunt task to check various aspects of a web page for correctness. I use grunt-check-pages when developing my blog and have found it very handy for preventing mistakes and maintaining consistency.

    Two things have changed since then:

    1. I released multiple enhancements to grunt-check-pages that make it more powerful
    2. I extracted its core functionality into the check-pages package which works well with Gulp


    First, an overview of the improvements; here's the change log for grunt-check-pages:

    • 0.1.0 - Initial release, support for checkLinks and checkXhtml.
    • 0.1.1 - Tweak README for better formatting.
    • 0.1.2 - Support page-only mode (no link or XHTML checks), show response time for requests.
    • 0.1.3 - Support maxResponseTime option, buffer all page responses, add "no-cache" header to requests.
    • 0.1.4 - Support checkCaching and checkCompression options, improve error handling, use gruntMock.
    • 0.1.5 - Support userAgent option, weak entity tags, update nock dependency.
    • 0.2.0 - Support noLocalLinks option, rename disallowRedirect option to noRedirects, switch to ESLint, update superagent and nock dependencies.
    • 0.3.0 - Support queryHashes option for CRC-32/MD5/SHA-1, update superagent dependency.
    • 0.4.0 - Rename onlySameDomainLinks option to onlySameDomain, fix handling of redirected page links, use page order for links, update all dependencies.
    • 0.5.0 - Show location of redirected links with noRedirects option, switch to crc-hash dependency.
    • 0.6.0 - Support summary option, update crc-hash, grunt-eslint, nock dependencies.
    • 0.6.1 - Add badges for automated build and coverage info to README (along with npm, GitHub, and license).
    • 0.6.2 - Switch from superagent to request, update grunt-eslint and nock dependencies.
    • 0.7.0 - Move task implementation into reusable check-pages package.
    • 0.7.1 - Fix misreporting of "Bad link" for redirected links when noRedirects enabled.

    There are now more things you can validate and better diagnostics during validation. For information about the various options, visit the grunt-check-pages package in the npm repository.


    Secondly, I started looking into Gulp as an alternative to Grunt. My blog's Gruntfile.js is the most complicated I have, so I tried converting it to a gulpfile.js. Conveniently, existing packages supported everything I already do (test, LESS, lint) - though not what I use grunt-check-pages for (no surprise).

    Clearly, the next step was to create a version of the task for Gulp - but it turns out that's not necessary! Gulp's task structure is simple enough that invoking standard asynchronous helpers is easy to do inline. So all I really needed was to factor out the core functionality into a reusable method.

    Here's how that looks:

    /**
     * Checks various aspects of a web page for correctness.
     *
     * @param {object} host Specifies the environment.
     * @param {object} options Configures the task.
     * @param {function} done Callback function.
     * @returns {void}
     */
    module.exports = function(host, options, done) { ... }
    

    With that in place, it's easy to invoke check-pages - whether from a Gulp task or something else entirely. The host parameter handles log/error messages (pass console for convenience), options configures things in the usual fashion, and the done callback gets called at the end (with an Error parameter if anything went wrong).

    Like so:

    var gulp = require("gulp");
    var checkPages = require("check-pages");
    
    gulp.task("checkDev", [ "start-development-server" ], function(callback) {
      var options = {
        pageUrls: [
          'http://localhost:8080/',
          'http://localhost:8080/blog',
          'http://localhost:8080/about.html'
        ],
        checkLinks: true,
        onlySameDomain: true,
        queryHashes: true,
        noRedirects: true,
        noLocalLinks: true,
        linksToIgnore: [
          'http://localhost:8080/broken.html'
        ],
        checkXhtml: true,
        checkCaching: true,
        checkCompression: true,
        maxResponseTime: 200,
        userAgent: 'custom-user-agent/1.2.3',
        summary: true
      };
      checkPages(console, options, callback);
    });
    
    gulp.task("checkProd", function(callback) {
      var options = {
        pageUrls: [
          'http://example.com/',
          'http://example.com/blog',
          'http://example.com/about.html'
        ],
        checkLinks: true,
        maxResponseTime: 500
      };
      checkPages(console, options, callback);
    });
    

    As a result, grunt-check-pages has become a thin wrapper over check-pages and there's no duplication between the two packages (though each has a complete set of tests just to be safe). For information about the options above, visit the check-pages package in the npm repository.


    The combined effect is that I'm able to do a better job validating web site updates and I can use whichever of Grunt or Gulp feels more appropriate for a given scenario. That's good for peace of mind - and a great way to become more familiar with both tools!

    Tags: Node.js Technical Web
  • Everything old is new again [crc-hash is a Node.js Crypto Hash implementation for the CRC algorithm]
    Tuesday, January 27th 2015

    Yep, another post about hash functions... True, I could have stopped when I implemented CRC-32 for .NET or when I implemented MD5 for Silverlight. Certainly, sharing the code for four versions of ComputeFileHashes could have been a good laurel upon which to rest.

    But then I started using Node.js, and found one more hash-oriented itch to scratch. :)

    From the project page:

    Node.js's Crypto module implements the Hash class which offers a simple Stream-based interface for creating hash digests of data. The createHash function supports many popular algorithms like SHA and MD5, but does not include older/simpler CRC algorithms like CRC-32. Fortunately, the crc package in npm provides comprehensive CRC support and offers an API that can be conveniently used by a Hash subclass.

    crc-hash is a Crypto Hash wrapper for the crc package that makes it easy for Node.js programs to use the CRC family of hash algorithms via a standard interface.

    With just one (transitive!) dependency, crc-hash is lightweight. Because it exposes a common interface, it's easy to integrate with existing scenarios. Thanks to crc, it offers support for all the popular CRC algorithms. You can learn more on the crc-hash npm page or the crc-hash GitHub page.

    Notes:

    • One of the great things about the Node community is the breadth of packages available. In this case, I was able to leverage the comprehensive crc package by alexgorbatchev for all the algorithmic bits.
    • After being indifferent on the topic of badges, I discovered shields.io and its elegance won me over. You can see the five badges I picked near the top of README.md on the npm/GitHub pages above.
    Tags: Node.js Technical Web
  • Out of hibernation [A new home and a bunch of updates for TextAnalysisTool.NET]
    Monday, January 12th 2015

    TextAnalysisTool.NET is one of the first side projects I did at Microsoft, and one of the most popular. (Click here for relevant blog posts by tag.) Many people inside and outside the company have written me with questions, feature requests, or sometimes just to say "thank you". It's always great to hear from users, and they've provided a long list of suggestions and ideas for ways to make TextAnalysisTool.NET better.

    By virtue of changing teams and roles various times over the years, I don't find myself using TextAnalysisTool.NET as much as I once did. My time and interests are spread more thinly, and I haven't been updating the tool as aggressively. (Understatement of the year?)

    Various coworkers have asked for access to the code, but nothing much came of that - until recently, when a small group showed up with the interest, expertise, and motivation to drive TextAnalysisTool.NET forward! They inspired me to simplify the contribution process and they have been making a steady stream of enhancements for a while now. It's time to take things to the next level, and today marks the first public update to TextAnalysisTool.NET in a long time!


    The new source for all things TextAnalysisTool is: the TextAnalysisTool.NET home page

    That's where you'll find an overview, download link, release notes, and other resources. The page is owned by the new TextAnalysisTool GitHub organization, so all of us are able to make changes and publish new releases. There's also an issue tracker, so users can report bugs, comment on issues, update TODOs, make suggestions, etc..

    The new 2015-01-07 release can be downloaded from there, and includes the following changes since the 2013-05-07 release:

    2015-01-07 by Uriel Cohen (http://github.com/cohen-uriel)
    ----------
    * Added a tooltip to the loaded file indicator in the status bar
    * Fixed a bug where setting a marker used in an active filter causes the
      current selection of lines to be changed
    
    2015-01-07 by David Anson (http://dlaa.me/)
    ----------
    * Improve HTML representation of clipboard text when copying for more
      consistent paste behavior
    
    2015-01-01 by Uriel Cohen (http://github.com/cohen-uriel)
    ----------
    * Fixed a bug where TAB characters are omitted in the display
    * Fixed a bug where lines saved to file include an extra white space at the
      start
    
    2014-12-21 by Uriel Cohen (http://github.com/cohen-uriel)
    ----------
    * Changed compilation to target .NET Framework 4.0
    
    2014-12-11 by Uriel Cohen (http://github.com/cohen-uriel)
    ----------
    * Redesigned the status bar indications to be consistent with Visual Studio and
      added the number of currently selected lines
    
    2014-12-04 by Uriel Cohen (http://github.com/cohen-uriel)
    ----------
    * Added the ability to append an existing filters file to the current filters
      list
    
    2014-12-01 by Uriel Cohen (http://github.com/cohen-uriel)
    ----------
    * Added recent file/filter menus for easy access to commonly-used files
    * Added a new settings registry key to set the
      maximum number of recent files or filter files allowed in the
      corresponding file menus
    * Fixed bug where pressing SPACE with no matching lines from filters
      crashed the application
    * Fixed a bug where copy-pasting lines from the application to Lync
      resulted in one long line without carriage returns
    
    2014-11-11 by Uriel Cohen (http://github.com/cohen-uriel)
    ----------
    * Added support for selection of background color in the filters
      (different selection of colors than the foreground colors)
    * The background color can be saved and loaded with the filters
    * Filters from previous versions that lack a background color will have the
      default background color
    * Saving foreground color field in filters to 'foreColor' attribute.
      Old 'color' attribute is still being loaded for backward compatibility
      purposes.
    * Changed control alignment in Find dialog and Filter dialog
    
    2014-10-21 by Mike Morante (http://github.com/mike-mo)
    ----------
    * Fix localization issue with the build string generation
    
    2014-04-22 by Mike Morante (http://github.com/mike-mo)
    ----------
    * Line metadata is now visually separate from line text contents
    * Markers can be shown always/never/when in use to have more room for line text
      and the chosen setting persists across sessions
    * Added statusbar panel funnel icon to reflect the current status of the Show
      Only Filtered Lines setting
    
    2014-02-27 by Mike Morante (http://github.com/mike-mo)
    ----------
    * Added zoom controls to quickly increase/decrease the font size
    * Zoom level persists across sessions
    * Added status bar panel to show current zoom level
    


    These improvements were all possible thanks to the time and dedication of the new contributors (and organization members):

    Please join me in thanking these generous souls for taking time out of their busy schedule to contribute to TextAnalysisTool.NET! They've been a pleasure to work with, and a great source of ideas and suggestions. I've been really pleased with their changes and hope you find the new TextAnalysisTool.NET more useful than ever!

    Tags: Technical TextAnalysisTool Utilities
  • Casting a Spell✔ [A simple app that makes it easy to spell-check with a browser]
    Thursday, December 4th 2014

    I pay attention to spelling. Maybe it's because I'm not a very good speller. Maybe it's because I have perfectionist tendencies. Maybe it's just a personality flaw.

    Whatever the reason, I'm always looking for ways to avoid mistakes.

    Some time ago, I wrote a code analysis rule for .NET/FxCop. More recently I shared two command-line programs to extract strings and comments from C#/JavaScript and followed up with a similar extractor for HTML/XML.

    Those tools work well, but I also wanted something GUI for a more natural fit with UI-centric workflows. I prototyped a simple WPF app that worked okay, but it wasn't as ubiquitously available as I wanted. Surprisingly often, I'd find myself on a machine without latest version of the tool. (Classic first world problem, I know...) So I decided to go with a web app instead.

    The key observation was that modern browsers already integrate with the host operating system's spell-checker via the spellcheck HTML attribute. By leveraging that, my app would automatically get a comprehensive dictionary, support for multiple languages, native-code performance, and support for the user's custom dictionary. #winning!

    Aside: The (very related) forceSpellCheck API isn't supported by any browser I've tried. Fortunately, it's not needed for Firefox, its absence can be coded around for Chrome, and there's a simple manual workaround for Internet Explorer. Click the "Help / About" link in the app for more information.


    Inspired by web browsers' native support for spell-checking, I've created Spell✔ (a.k.a. SpellV), a simple app that makes it easy to spell-check with a browser. Click the link to try it out now - it's offline-enabled, so you can use it again later even without a network connection!

    To import content, Spell✔ supports pasting text from the clipboard, drag-dropping a file, or browsing the folder structure. For a customized experience, you can switch among multiple views of the data, including:

    • Original text (duh...)
    • Unique words, sorted alphabetically and displayed compactly for easy scanning
    • HTML/XML content (including text, comments, and attributes)
    • JSON content (string values only)
    • JavaScript code comments and string literals

    Read more about how Spell✔ works, how it was built, or check out the code on the GitHub page for SpellV!

    Tags: Technical Utilities Web
  • A quick programming challenge with money [Counting and enumerating the ways to break a $1 bill]
    Tuesday, December 2nd 2014

    Sunday evening I happened across a blog post by Josh Smith about finding all the ways to break a $1 bill. Specifically, the goal is to:

    Count the number of ways to combine coins worth 100, 50, 25, 10, 5, and 1 cent for a total value of 100 cents

    As Josh says, it's a fun challenge and I encourage you to stop reading now and solve it!

    Seriously: Stop now. Spoilers ahead...

    I took his advice and sat down with pen, paper, and a self-imposed 30 minute time limit. I came up with the C# solution below just before time ran out. As I note in the code, I forgot one line (though I caught it when typing up the solution). Less embarrassingly, this implementation worked correctly the first time I ran it. What's more, it's flexible with regard to the target amount and number/value of the coins (both of which are passed as parameters to the constructor). For bonus points, it outputs all the combinations it finds along the way.

    I've added some comments to the code to outline the general algorithm. There are some opportunities to refactor for clarity and an implicit assumption values are passed in decreasing order, but otherwise I'm pretty happy with how it turned out.

    If you take on the challenge and come up with something interesting, please leave a note - I'd love to see other approaches!


    using System;
    
    // Problem: Count (and output) all ways to make $1 with U.S. coins (100, 50, 25, 10, 5, 1 cents).
    // Inspiration: http://ijoshsmith.com/2014/11/30/getting-into-functional-programming-with-swift/
    
    class BreakADollar
    {
        public static void Main()
        {
            // Entry point/test harness
            Console.WriteLine("Total possibilities: {0}",
                new BreakADollar(
                    100,
                    new[] { 100, 50, 25, 10, 5, 1 })
                .Invoke());
        }
    
        // Input
        private readonly int _target;
        private readonly int[] _values;
        // State
        private readonly int[] _counts;
        private int _ways;
    
        public BreakADollar(int target, int[] values)
        {
            // Initialize
            _target = target;
            _values = values;
            _counts = new int[values.Length];
            _ways = 0;
        }
    
        public int Invoke()
        {
            // Start the recursive process and return the result
            Recurse(0, 0);
            return _ways;
        }
    
        private bool Recurse(int i, int sum)
        {
            if (_target == sum)
            {
                // Met the target, done here
                ShowCounts();
                _ways++;
                return false;
            }
            else if (i == _counts.Length)
            {
                // Out of values, keep looking
                return true;
            }
            else if (sum < _target)
            {
                // Search, using increasing counts of current value
                while (Recurse(i + 1, sum))
                {
                    _counts[i]++; // Note: Missed this line at first
                    sum += _values[i];
                }
                // Reset and continue
                _counts[i] = 0;
                return true;
            }
            else
            {
                // Exceeded the target, done here
                return false;
            }
        }
    
        private void ShowCounts()
        {
            // Show the count for each value
            for (var i = 0; i < _counts.Length; i++)
            {
                Console.Write("{0}:{1} ", _values[i], _counts[i]);
            }
            Console.WriteLine();
        }
    }
    
    Tags: Miscellaneous Technical
  • Not another POODLE pun [Batch script to disable SSL 3.0 on Windows servers - including virtual machines and Azure cloud services]
    Wednesday, October 29th 2014

    Much has been penned (and punned) recently about POODLE, the "Padding Oracle On Downgraded Legacy Encryption" security vulnerability. If you're not familiar with it, the Wikipedia entry for POODLE is a good start and Troy Hunt's POODLE treatise provides more detail.

    Assuming you've made the decision to disable SSL 3.0 to mitigate POODLE attacks, this Azure Blog post includes a two-part batch/PowerShell script to do that. Based on the information in KB245030, that script can be run on a bare OS, a VM, or as part of an Azure cloud service.

    It's a fine script as scripts go [ :) ], but maybe you're not a PowerShell fanatic or maybe you'd prefer a single file and/or less code to audit. If so, I present the following batch-only script for your consideration:

    @echo off
    setlocal
    set REBOOT=N
    set LOG=%~d0\DisableSslv3.log
    set SSLKEY=HKLM\SYSTEM\CurrentControlSet\Control\SecurityProviders\SCHANNEL\Protocols\SSL 3.0
    
    call :MAIN >> %LOG% 2>&1
    
    goto :EOF
    
    
    :MAIN
    
    REM Show current SSL 3.0 configuration
    reg.exe query "%SSLKEY%" /s
    
    REM Check current SSL 3.0 configuration
    for /f "tokens=3" %%v in ('reg.exe query "%SSLKEY%\Server" /v Enabled') do (
        if NOT "0x0"=="%%v" set REBOOT=Y
    )
    if ERRORLEVEL 1 set REBOOT=Y
    for /f "tokens=3" %%v in ('reg.exe query "%SSLKEY%\Client" /v DisabledByDefault') do (
        if NOT "0x1"=="%%v" set REBOOT=Y
    )
    if ERRORLEVEL 1 set REBOOT=Y
    
    REM Update and reboot if necessary
    if "%REBOOT%"=="Y" (
        echo Update needed to disable SSL 3.0.
        reg.exe add "%SSLKEY%\Server" /v Enabled /t REG_DWORD /d 0 /f
        reg.exe add "%SSLKEY%\Client" /v DisabledByDefault /t REG_DWORD /d 1 /f
        echo Rebooting to apply changes...
        shutdown.exe /r /c "Rebooting to disable SSL 3.0" /f /d p:2:4
    ) else (
        echo SSL 3.0 already disabled.
    )
    
    goto :EOF
    

    Notes:

    • This is a riff on the aforementioned script, meant to serve as a jumping-off point and alternate approach.
    • Like the original script, this one is idempotent and can be safely run multiple times (for example, every startup) on a bare OS, VM, or cloud service. The log file is additive, so you can see if it ever made changes.
    • Security Advisory 3009008 only mentions disabling SSL 3.0 for server scenarios; this script also disables it for client scenarios to protect outgoing connections to machines that have not been secured.
    • I work almost exclusively with recent OS releases on Azure; SSL 2.0 is already disabled there, so this script leaves those settings alone. André Klingsheim's post on hardening Windows Server provides more context.
    • An immediate reboot is performed whenever changes are made - consider commenting that line out during testing. :)
    • While reviewing this post, I found a discussion of related techniques on Server Fault which may also be of interest.

    Whatever you do to address the POODLE vulnerability, be sure to check your work, perhaps with one of the following oft-recommended resources:

    Tags: Technical Web
  • A trip down memory (footprint) lane [Download for the original TextAnalysisTool, circa 2001]
    Monday, September 22nd 2014

    As you might guess from the name, TextAnalysisTool.NET (introductory blog post, related links) was not the first version of the tool. The original implementation was written in C, compiled for x86, slightly less capable, and named simply TextAnalysisTool. I got an email asking for a download link recently, so I dug up a copy and am posting it for anyone who's interested.

    The UI should be very familiar to TextAnalysisTool.NET users:

    The original TextAnalysisTool filtering a simple file

    The behavior is mostly the same as well (though the different hot key for "add filter" trips me up pretty consistently).

    A few notes:

    • The code is over 13 years old
    • So I'm not taking feature requests :)
    • But it runs on vintage operating systems (seriously, this is before Windows XP)
    • And it also runs great on Windows 8.1 (yay backward compatibility!)
    • It supports:
      • Text filters
      • Regular expressions
      • Markers
      • Find
      • Go to
      • Reload
      • Copy/paste
      • Saved configurations
      • Multi-threading
    • But does not support:
      • Colors
      • Rich selection
      • Rich copy
      • Line counts
      • Filter hot keys
      • Plugins
      • Unicode

    Because it uses ASCII-encoding for strings (vs. .NET's Unicode representation), you can reasonably expect loading a text file in TextAnalysisTool to use about half as much memory as it does in TextAnalysisTool.NET. However, as a 32-bit application, TextAnalysisTool is limited to the standard 2GB virtual address space of 32-bit processes on Windows (even on a 64-bit OS). On the other hand, TextAnalysisTool.NET is an architecture-neutral application and can use the full 64-bit virtual address space on a 64-bit OS. There may be rare machine configurations where the physical/virtual memory situation is such that older TextAnalysisTool can load a file newer TextAnalysisTool.NET can't - so if you're stuck, give it a try!

    Aside: If you're really adventurous, you can try using EditBin to set the /LARGEADDRESSAWARE option on TextAnalysisTool.exe to get access to more virtual address space on a 64-bit OS or via /3GB on a 32-bit OS. But be warned that you're well into "undefined behavior" territory because I don't think that switch even existed when I wrote TextAnalysisTool. I've tried it briefly and things seem to work - but this is definitely sketchy. :)

    Writing the original TextAnalysisTool was a lot of fun and contributed significantly to a library of C utility functions I used at the time called ToolBox. It also provided an excellent conceptual foundation upon which to build TextAnalysisTool.NET in addition to good lessons about how to approach the problem space. If I ever get around to writing a third version (TextAnalysisTool.WPF? TextAnalysisTool.Next?), it will take inspiration from both projects - and handle absurdly-large files.

    So if you're curious to try a piece of antique software, click here to download the original TextAnalysisTool.

    But for everything else, you should probably click here to download the newer TextAnalysisTool.NET.

    Tags: Technical TextAnalysisTool Utilities
  • "That's a funny looking warthog", a post about mocking Grunt [gruntMock is a simple mock for testing Grunt.js multi-tasks]
    Wednesday, September 10th 2014

    While writing the grunt-check-pages task for Grunt.js, I wanted a way to test the complete lifecycle: to load the task in a test context, run it against various inputs, and validate the output. It didn't seem practical to call into Grunt itself, so I looked around for a mock implementation of Grunt. There were plenty of mocks for use with Grunt, but I didn't find anything that mocked the API itself. So I wrote a very simple one and used it for testing.

    That worked well, so I wanted to formalize my gruntMock implementation and post it as an npm package for others to use. Along the way, I added a bunch of additional API support and pulled in domain-based exception handling for a clean, self-contained implementation. As I hoped, updating grunt-check-pages made its tests simpler and more consistent.

    Although gruntMock doesn't implement the complete Grunt API, it implements enough of it that I expect most tasks to be able to use it pretty easily. If not, please let me know what's missing! :)


    For more context, here's part of the introductory section of README.md:

    gruntMock is simple mock object that simulates the Grunt task runner for multi-tasks and can be easily integrated into a unit testing environment such as Nodeunit. gruntMock invokes tasks the same way Grunt does and exposes (almost) the same set of APIs. After providing input to a task, gruntMock runs and captures its output so tests can verify expected behavior. Task success and failure are unified, so it's easy to write positive and negative tests.

    Here's what gruntMock looks like in a simple scenario under Nodeunit:

    var gruntMock = require('gruntmock');
    var example = require('./example-task.js');
    
    exports.exampleTest = {
    
      pass: function(test) {
        test.expect(4);
        var mock = gruntMock.create({
          target: 'pass',
          files: [
            { src: ['unused.txt'] }
          ],
          options: { str: 'string', num: 1 }
        });
        mock.invoke(example, function(err) {
          test.ok(!err);
          test.equal(mock.logOk.length, 1);
          test.equal(mock.logOk[0], 'pass');
          test.equal(mock.logError.length, 0);
          test.done();
        });
      },
    
      fail: function(test) {
        test.expect(2);
        var mock = gruntMock.create({ target: 'fail' });
        mock.invoke(example, function(err) {
          test.ok(err);
          test.equal(err.message, 'fail');
          test.done();
        });
      }
    };
    


    For a more in-depth example, have a look at the use of gruntMock by grunt-check-pages. That shows off integration with other mocks (specifically nock, a nice HTTP server mock) as well as the testOutput helper function that's used to validate each test case's output without duplicating code. It also demonstrates how gruntMock's unified handling of success and failure allows for clean, consistent testing of input validation, happy path, and failure scenarios.

    To learn more - or experiment with gruntMock - visit gruntMock on npm or gruntMock on GitHub.

    Happy mocking!

    Tags: Grunt Node.js Utilities Web
  • Say goodbye to dead links and inconsistent formatting [grunt-check-pages is a simple Grunt task to check various aspects of a web page for correctness]
    Tuesday, August 12th 2014

    As part of converting my blog to a custom Node.js app, I wrote a set of tests to validate its routes, structure, content, and behavior (using mocha/grunt-mocha-test). Most of these tests are specific to my blog, but some are broadly applicable and I wanted to make them available to anyone who was interested. So I created a Grunt plugin and published it to npm:

    grunt-check-pages

    An important aspect of creating web sites is to validate the structure and content of their pages. The checkPages task provides an easy way to integrate this testing into your normal Grunt workflow.

    By providing a list of pages to scan, the task can:


    Link validation is fairly uncontroversial: you want to ensure each hyperlink on a page points to valid content. grunt-check-pages supports the standard HTML link types (ex: <a href="..."/>, <img src="..."/>) and makes an HTTP HEAD request to each link to make sure it's valid. (Because some web servers misbehave, the task also tries a GET request before reporting a link broken.) There are options to limit checking to same-domain links, to disallow links that redirect, and to provide a set of known-broken links to ignore. (FYI: Links in draft elements (ex: picture) are not supported for now.)

    XHTML compliance might be a little controversial. I'm not here to persuade you to love XHTML - but I do have some experience parsing HTML and can reasonably make a few claims:

    • HTML syntax errors are tricky for browsers to interpret and (historically) no two work the same way
    • Parsing ambiguity leads to rendering issues which create browser-specific quirks and surprises
    • HTML5 is more prescriptive about invalid syntax, but nothing beats a well-formed document
    • Being able to confidently parse web pages with simple tools is pleasant and quite handy
    • Putting a close '/' on your img and br tags is a small price to pay for peace of mind :)

    Accordingly, grunt-check-pages will (optionally) parse each page as XML and report the issues it finds.


    grunt.initConfig({
      checkPages: {
        development: {
          options: {
            pageUrls: [
              'http://localhost:8080/',
              'http://localhost:8080/blog',
              'http://localhost:8080/about.html'
            ],
            checkLinks: true,
            onlySameDomainLinks: true,
            disallowRedirect: false,
            linksToIgnore: [
              'http://localhost:8080/broken.html'
            ],
            checkXhtml: true
          }
        },
        production: {
          options: {
            pageUrls: [
              'http://example.com/',
              'http://example.com/blog',
              'http://example.com/about.html'
            ],
            checkLinks: true,
            checkXhtml: true
          }
        }
      }
    });
    

    Something I find useful (and outline above) is to define separate configurations for development and production. My development configuration limits itself to links within the blog and ignores some that don't work when I'm self-hosting. My production configuration tests everything across a broader set of pages. This lets me iterate quickly during development while validating the live deployment more thoroughly.

    If you'd like to incorporate grunt-check-pages into your workflow, you can get it via grunt-check-pages on npm or grunt-check-pages on GitHub. And if you have any feedback, please let me know!


    Footnote: grunt-check-pages is not a site crawler; it looks at exactly the set of pages you ask it to. If you're looking for a crawler, you may be interested in something like grunt-link-checker (though I haven't used it myself).

    Tags: Grunt Node.js Utilities Web