The blog of dlaa.me

A trip down memory (footprint) lane [Download for the original TextAnalysisTool, circa 2001]

As you might guess from the name, TextAnalysisTool.NET (introductory blog post, related links) was not the first version of the tool. The original implementation was written in C, compiled for x86, slightly less capable, and named simply TextAnalysisTool. I got an email asking for a download link recently, so I dug up a copy and am posting it for anyone who's interested.

The UI should be very familiar to TextAnalysisTool.NET users:

The original TextAnalysisTool filtering a simple file

The behavior is mostly the same as well (though the different hot key for "add filter" trips me up pretty consistently).

A few notes:

  • The code is over 13 years old
  • So I'm not taking feature requests :)
  • But it runs on vintage operating systems (seriously, this is before Windows XP)
  • And it also runs great on Windows 8.1 (yay backward compatibility!)
  • It supports:
    • Text filters
    • Regular expressions
    • Markers
    • Find
    • Go to
    • Reload
    • Copy/paste
    • Saved configurations
    • Multi-threading
  • But does not support:
    • Colors
    • Rich selection
    • Rich copy
    • Line counts
    • Filter hot keys
    • Plugins
    • Unicode

Because it uses ASCII-encoding for strings (vs. .NET's Unicode representation), you can reasonably expect loading a text file in TextAnalysisTool to use about half as much memory as it does in TextAnalysisTool.NET. However, as a 32-bit application, TextAnalysisTool is limited to the standard 2GB virtual address space of 32-bit processes on Windows (even on a 64-bit OS). On the other hand, TextAnalysisTool.NET is an architecture-neutral application and can use the full 64-bit virtual address space on a 64-bit OS. There may be rare machine configurations where the physical/virtual memory situation is such that older TextAnalysisTool can load a file newer TextAnalysisTool.NET can't - so if you're stuck, give it a try!

Aside: If you're really adventurous, you can try using EditBin to set the /LARGEADDRESSAWARE option on TextAnalysisTool.exe to get access to more virtual address space on a 64-bit OS or via /3GB on a 32-bit OS. But be warned that you're well into "undefined behavior" territory because I don't think that switch even existed when I wrote TextAnalysisTool. I've tried it briefly and things seem to work - but this is definitely sketchy. :)

Writing the original TextAnalysisTool was a lot of fun and contributed significantly to a library of C utility functions I used at the time called ToolBox. It also provided an excellent conceptual foundation upon which to build TextAnalysisTool.NET in addition to good lessons about how to approach the problem space. If I ever get around to writing a third version (TextAnalysisTool.WPF? TextAnalysisTool.Next?), it will take inspiration from both projects - and handle absurdly-large files.

So if you're curious to try a piece of antique software, click here to download the original TextAnalysisTool.

But for everything else, you should probably click here to download the newer TextAnalysisTool.NET.

"That's a funny looking warthog", a post about mocking Grunt [gruntMock is a simple mock for testing Grunt.js multi-tasks]

While writing the grunt-check-pages task for Grunt.js, I wanted a way to test the complete lifecycle: to load the task in a test context, run it against various inputs, and validate the output. It didn't seem practical to call into Grunt itself, so I looked around for a mock implementation of Grunt. There were plenty of mocks for use with Grunt, but I didn't find anything that mocked the API itself. So I wrote a very simple one and used it for testing.

That worked well, so I wanted to formalize my gruntMock implementation and post it as an npm package for others to use. Along the way, I added a bunch of additional API support and pulled in domain-based exception handling for a clean, self-contained implementation. As I hoped, updating grunt-check-pages made its tests simpler and more consistent.

Although gruntMock doesn't implement the complete Grunt API, it implements enough of it that I expect most tasks to be able to use it pretty easily. If not, please let me know what's missing! :)

 

For more context, here's part of the introductory section of README.md:

gruntMock is simple mock object that simulates the Grunt task runner for multi-tasks and can be easily integrated into a unit testing environment such as Nodeunit. gruntMock invokes tasks the same way Grunt does and exposes (almost) the same set of APIs. After providing input to a task, gruntMock runs and captures its output so tests can verify expected behavior. Task success and failure are unified, so it's easy to write positive and negative tests.

Here's what gruntMock looks like in a simple scenario under Nodeunit:

var gruntMock = require('gruntmock');
var example = require('./example-task.js');

exports.exampleTest = {

  pass: function(test) {
    test.expect(4);
    var mock = gruntMock.create({
      target: 'pass',
      files: [
        { src: ['unused.txt'] }
      ],
      options: { str: 'string', num: 1 }
    });
    mock.invoke(example, function(err) {
      test.ok(!err);
      test.equal(mock.logOk.length, 1);
      test.equal(mock.logOk[0], 'pass');
      test.equal(mock.logError.length, 0);
      test.done();
    });
  },

  fail: function(test) {
    test.expect(2);
    var mock = gruntMock.create({ target: 'fail' });
    mock.invoke(example, function(err) {
      test.ok(err);
      test.equal(err.message, 'fail');
      test.done();
    });
  }
};

 

For a more in-depth example, have a look at the use of gruntMock by grunt-check-pages. That shows off integration with other mocks (specifically nock, a nice HTTP server mock) as well as the testOutput helper function that's used to validate each test case's output without duplicating code. It also demonstrates how gruntMock's unified handling of success and failure allows for clean, consistent testing of input validation, happy path, and failure scenarios.

To learn more - or experiment with gruntMock - visit gruntMock on npm or gruntMock on GitHub.

Happy mocking!

Say goodbye to dead links and inconsistent formatting [grunt-check-pages is a simple Grunt task to check various aspects of a web page for correctness]

As part of converting my blog to a custom Node.js app, I wrote a set of tests to validate its routes, structure, content, and behavior (using mocha/grunt-mocha-test). Most of these tests are specific to my blog, but some are broadly applicable and I wanted to make them available to anyone who was interested. So I created a Grunt plugin and published it to npm:

grunt-check-pages

An important aspect of creating web sites is to validate the structure and content of their pages. The checkPages task provides an easy way to integrate this testing into your normal Grunt workflow.

By providing a list of pages to scan, the task can:

 

Link validation is fairly uncontroversial: you want to ensure each hyperlink on a page points to valid content. grunt-check-pages supports the standard HTML link types (ex: <a href="..."/>, <img src="..."/>) and makes an HTTP HEAD request to each link to make sure it's valid. (Because some web servers misbehave, the task also tries a GET request before reporting a link broken.) There are options to limit checking to same-domain links, to disallow links that redirect, and to provide a set of known-broken links to ignore. (FYI: Links in draft elements (ex: picture) are not supported for now.)

XHTML compliance might be a little controversial. I'm not here to persuade you to love XHTML - but I do have some experience parsing HTML and can reasonably make a few claims:

  • HTML syntax errors are tricky for browsers to interpret and (historically) no two work the same way
  • Parsing ambiguity leads to rendering issues which create browser-specific quirks and surprises
  • HTML5 is more prescriptive about invalid syntax, but nothing beats a well-formed document
  • Being able to confidently parse web pages with simple tools is pleasant and quite handy
  • Putting a close '/' on your img and br tags is a small price to pay for peace of mind :)

Accordingly, grunt-check-pages will (optionally) parse each page as XML and report the issues it finds.

 

grunt.initConfig({
  checkPages: {
    development: {
      options: {
        pageUrls: [
          'http://localhost:8080/',
          'http://localhost:8080/blog',
          'http://localhost:8080/about.html'
        ],
        checkLinks: true,
        onlySameDomainLinks: true,
        disallowRedirect: false,
        linksToIgnore: [
          'http://localhost:8080/broken.html'
        ],
        checkXhtml: true
      }
    },
    production: {
      options: {
        pageUrls: [
          'http://example.com/',
          'http://example.com/blog',
          'http://example.com/about.html'
        ],
        checkLinks: true,
        checkXhtml: true
      }
    }
  }
});

Something I find useful (and outline above) is to define separate configurations for development and production. My development configuration limits itself to links within the blog and ignores some that don't work when I'm self-hosting. My production configuration tests everything across a broader set of pages. This lets me iterate quickly during development while validating the live deployment more thoroughly.

If you'd like to incorporate grunt-check-pages into your workflow, you can get it via grunt-check-pages on npm or grunt-check-pages on GitHub. And if you have any feedback, please let me know!

 

Footnote: grunt-check-pages is not a site crawler; it looks at exactly the set of pages you ask it to. If you're looking for a crawler, you may be interested in something like grunt-link-checker (though I haven't used it myself).

Just because I'm paranoid doesn't mean they're not out to get me [Open-sourcing PassWeb: A simple, secure, cloud-based password manager]

I've used a password manager for many years because I feel that's the best way to maintain different (strong!) passwords for every account. I chose Password Safe when it was one of the only options and stuck with it until a year or so ago. Its one limitation was becoming more and more of an issue: it only runs on Windows and only on a PC. I'm increasingly using Windows on other devices (ex: phone or tablet) or running other operating systems (ex: iOS or Linux), and was unable to access my passwords more and more frequently.

To address that problem, I decided to switch to a cloud-based password manager for access from any platform. Surveying the landscape, there appeared to be many good options (some free, some paid), but I had a fundamental concern about trusting important personal data to someone else. Recent, high-profile hacks of large corporations suggest that some companies don't try all that hard to protect their customers' data. Unfortunately, the incentives aren't there to begin with, and the consequences (to the company) of a breach are fairly small.

Instead, I thought it would be interesting to write my own cloud-based password manager - because at least that way I'd know the author had my best interests at heart. :) On the flip side, I introduce the risk of my own mistake or bug compromising the system. But all software has bugs - so "Better the evil you know (or can manage) than the evil you don't". Good news is that I've taken steps to try to make things more secure; bad news is that it only takes one bug to throw everything out the window...

Hence the disclaimer:

I've tried to ensure PassWeb is safe and secure for normal use in low-risk environments, but do not trust me. Before using PassWeb, you should evaluate it against your unique needs, priorities, threats, and comfort level. If you find a problem or a weakness, please let me know so I can address it - but ultimately you use PassWeb as-is and at your own risk.

With that in mind, I'm open-sourcing PassWeb's code for others to use, play around with, or find bugs in. In addition to being a good way of sharing knowledge and helping others, this will satisfy the requests for code that I've already gotten. :)

Some highlights:

  • PassWeb's client is built with HTML, CSS, and JavaScript and is an offline-enabled single-page application.
  • It runs on all recent browsers (I've tried) and is mobile-friendly so it works well on phones, too.
  • Entries have a title, the name/password for an account, a link to the login page, and a notes section for additional details (like answers to security questions).
  • PassWeb can generate strong, random passwords; use them as-is, tweak them, or ignore them and type your own.
  • The small server component runs on ASP.NET or Node.js (I provide both implementations and a set of unit tests).
  • Data is encrypted via AES/CBC and only ever decrypted on the client (the server never sees the user name or password).
  • The code is small, with few dependencies, and should be easy to audit.

For details, instructions, and the code, visit the GitHub repository: https://github.com/DavidAnson/PassWeb

If you find PassWeb interesting or useful, that's great! If you have any thoughts or suggestions, please tell me. If you find a bug, let me know and I'll try to fix it.

Another tool in the fight against spelling errors. [Added HtmlStringExtractor for pulling strings, attributes, and comments from HTML/XML files and URLs]

When I blogged simple string extraction utilities for C# and JavaScript code, I thought I'd covered the programming languages that matter most to me. Then I realized my focus was too narrow; I spend a lot of time working with markup languages and want content there to be correctly spelled as well.

So I dashed off another string extractor based on the implementation of JavaScriptStringExtractor:

HtmlStringExtractor Runs on Node.js. Dependencies via npm. htmlparser2 for parsing, glob for globbing, and request for web access. Wildcard matching includes subdirectories when ** is used. Pass a URL to extract strings from the web.

HtmlStringExtractor works just like its predecessors, outputting all strings/attributes/comments to the console for redirection or processing. But it has an additional power: the ability to access content directly from the internet via URL. Because so much of the web is HTML, it seemed natural to support live-extraction - which in turn makes it easier to spell-check a web site and be sure you're including "hidden" text (like title and alt attributes) that copy+paste don't cover.

Its HTML parser is very forgiving, so HtmlStringExtractor will happily work with HTML-like languages such as XML, ASPX, and PHP. Of course, the utility of doing so decreases as the language gets further removed from HTML, but for many scenarios the results are quite acceptable. In the specific case of XML, output should be entirely meaningful, filtering out all element and attribute metadata and leaving just the "real" data for review.

In keeping with the theme of "small and simple", I didn't add an option to exclude attributes by name - but you can imagine that filtering out things like id, src, and href would do a lot to reduce noise. Who knows, maybe I'll support that in a future update. :)

For now, things are simple. The StringExtractors GitHub repository has the complete code for all three extractors.

Enjoy!

Aside: As I wrote this post, I realized there's another "language" I use regularly: JSON. Because of its simple structure, I don't think there's a need for JsonStringExtractor - but if you feel otherwise, please let me know! (It'd be easy to create.)

Spelling is hard; let's go parsing. [Simple command-line utilities to extract strings and comments from C# and JavaScript code]

Maybe it's old fashioned, but I try to spell things correctly. I don't always succeed, so I rely on tools to help.

Integrated spell-checking for documents and email is ubiquitous, but there's not much support for source code. I've previously written about a code analysis rule I implemented for .NET/FxCop, but that doesn't help with JavaScript (which I've been using more and more).

Sometimes I'll copy+paste source code into Microsoft Word, but that's an act of true desperation because there are so many false positives for keywords, variables, syntax, and the like. So I wrote a simple tool to reduce the noise:

JavaScriptStringExtractor Runs on Node.js. Dependencies via npm. Esprima for parsing and glob for globbing. Wildcard matching includes subdirectories when ** is used.

It's a simple command-line utility to extract strings and comments from a JavaScript file and write the results to standard output. Redirect that output to a file and open it in Word for a much improved spell-checking experience!

Granted, it's still not ideal, but it is quick and painless and filters out most of the noise. I scan for red squiggles, update the code, and get on with life, feeling better about myself for the effort. :)

JavaScript was so easy, I wrote a version for C#, too:

CSharpStringExtractor Runs on .NET. Dependencies via NuGet. Roslyn for parsing. Wildcard matching includes subdirectories.

It's pretty much the same tool, just for a different language.

With JavaScript and C# taken care of, my needs are satisfied - but there are lots more languages out there and maybe other people want to try.

So I created a GitHub repository with the complete code for both tools: StringExtractors

Enjoy!

PS - Pull requests for other languages are welcome. :)

Too liberal a definition of "input" [Raising awareness of a quirk of the input event for text elements with a placeholder on Internet Explorer]

I tracked down a strange problem in a web app recently that turned out to be the result of a browser bug. I wasted some time because searching for info wasn't all that helpful and I wanted to blog the details so I could save someone else a bit of trouble.

The scenario is simple; this application deals with private information and wants to log the user out after a period of inactivity. It does so by listening to the input event of its text boxes and resetting an inactivity timer whenever the user types. (There's similar handling for button clicks, etc..) If the inactivity timer ever fires, the user gets logged out. Because the freshly-loaded application has no private data, there's no need to start the inactivity timer until after the first user input.

This behavior is pretty straightforward and the initial implementation worked as intended. But there was a problem on Internet Explorer (versions 10 and 11): a newly-loaded instance of the app would reload itself even without any input! What turned out to be the issue was that an input element with its placeholder attribute set also had its autofocus attribute set and when IE set focus to that element, it triggered the input event! The app (wrongly) believed the user had interacted with it and started its inactivity timer.

Fortunately, this is not catastrophic because the app fails securely (though more by chance than design). But it's still annoying and a bit of a nuisance...

Trying to detect and ignore the bogus first interaction on a single browser seemed dodgy, so I went with a call to setTimeout to clear the inactivity timer immediately after load on all browsers. Sometimes, this is harmlessly unnecessary; on IE, it undoes the effect of the bogus input event and achieves the desired behavior.

Depending on your scenario, a different approach might be better - the biggest challenge is understanding what's going on in the first place! :)

Notes:

  • The Internet Explorer team is aware of and acknowledges the issue - you can track its status here

  • I created a simple demonstration:

    <!DOCTYPE html>
    <html xmlns="http://www.w3.org/1999/xhtml">
    <head>
        <title>Input event behavior for text elements with/out a placeholder</title>
    </head>
    <body>
        <!-- Text boxes with/without placeholder text -->
        <input id="without" type="text"/>
        <input id="with" type="text" placeholder="Placeholder" autofocus="autofocus"/>
    
        <!-- Reference jQuery for convenience -->
        <script src="//code.jquery.com/jquery-1.11.0.min.js"></script>
    
        <script>
            // Helper function for logging
            function log(message) {
                $("body").append($("<div>" + message + "</div>"));
            }
    
            // Handler for the input event
            function onInput(evt) {
                log("input event for element '" + evt.target.id + "', value='" + evt.target.value + "'");
            }
    
            // Attach input handler to all input elements
            $("input").on("input", onInput);
    
            // Provide brief context
            log("&nbsp;");
            log("Click in and out of both text boxes to test your browser.");
            log("If you see an input event immediately on load, that's a bug.");
            log("&nbsp;");
        </script>
    </body>
    </html>
    

Avoiding unnecessary dependencies by writing a few lines of code [A simple URL rewrite implementation for Node.js apps under iisnode removes the need for the URL Rewrite module]

As the administrator of multiple computers, I want the machines I run to be as stable and secure as possible. (Duh!) One way I go about that is by not installing anything that isn't absolutely necessary. So whenever I see a dependency that's providing only a tiny bit of functionality, I try to get rid of it. Such was the case with the URL Rewrite module for IIS when I migrated my blog to Node.js.

Aside: I'm not hating: URL Rewrite does some cool stuff, lots of people love it, and it's recommended by Microsoft. However, my sites don't use it, so it represented a new dependency and was therefore something I wanted to avoid. :)

The role URL Rewrite plays in the Node.js scenario is minimal - it points all requests to app.js, passes along the original URL for the Node app to handle, and that's about it. Tomasz Janczuk outlines the recommended configuration in his post Using URL rewriting with node.js applications hosted in IIS using iisnode.

 

I figured I could do the same thing with a few lines of code by defining a simple IHttpModule implementation in App_Code. So I did! :)

The class I wrote is named IisNodeUrlRewriter and using it is easy. Starting from a working Node.js application on iisnode (refer to Hosting node.js applications in IIS on Windows for guidance), all you need to do is:

  1. Put a copy of IisNodeUrlRewriter.cs in the App_Code folder
  2. Update the site's web.config to load the new module
  3. Profit!

Here's what the relevant part of web.config looks like:

<configuration>
  <system.webServer>
    <modules>
      <add name="IisNodeUrlRewriter" type="IisNodeUrlRewriter"/>
    </modules>
  </system.webServer>
</configuration>

 

Once enabled and running, IisNodeUrlRewriter rewrites all incoming requests to /app.js except for:

  • Requests at or under /iisnode, iisnode's default log directory
  • Requests at or under /app.js/debug, iisnode's node-inspector entry-point
  • Requests at or under /app.js.debug, the directory iisnode creates to support node-inspector

If you prefer the name server.js, you want to block one of the above paths in production, or you're running Node as part of a larger ASP.NET site (as I do), tweak the code to fit your scenario.

 

I modified the standard Node.js "hello world" sample to prove everything works and show the original request URL getting passed through:

require("http").createServer(function (req, res) {
    res.writeHead(200, { "Content-Type": "text/plain" });
    res.end("Requested URL: " + req.url);
}).listen(process.env.PORT, "127.0.0.1");
console.log("Server running...");

Sample output:

http://localhost/app.js
Requested URL: /app.js
http://localhost/
Requested URL: /
http://localhost/any/path?query=params
Requested URL: /any/path?query=params

I also verified that IIS's default logging of requests remains correct (though you can't tell that from the output above).

Note: I've found that accessing the debug path through IisNodeUrlRewriter can be somewhat finicky and take a few tries to load successfully. I'm not sure why, but because I don't use that functionality, I haven't spent much time investigating.

 

The implementation of IisNodeUrlRewriter is straightforward and boils down to two calls to HttpContext.RewritePath that leverage ASP.NET's default execution pipeline. One handy thing is the use of HttpContext.Items to save/restore the original URI. One obscure thing is the use of a single regular expression to match all three (actually six!) paths above. (If you're not comfortable with regular expressions or prefer to be more explicit, the relevant check can easily be turned into a set of string comparisons.)

The implementation is included below in its entirety; it ends up being more comments than code! I've also created a GitHub Gist for IisNodeUrlRewriter in case anyone wants to iterate on it or make improvements. (Some ideas: auto-detect the logging/debug paths by parsing web.config/iisnode.yml, use server.js instead of app.js when present, support redirects for only parts of the site, etc.)

 

// Copyright (c) 2014 by David Anson, http://dlaa.me/
// Released under the MIT license, http://opensource.org/licenses/MIT

using System;
using System.Text.RegularExpressions;
using System.Web;

/// <summary>
/// An IHttpModule that rewrites all URLs to /app.js (making the original URL available to the Node.js application).
/// </summary>
/// <remarks>
/// For use with Node.js applications running under iisnode on IIS as an alternative to installing the URL Rewrite module.
/// </remarks>
public class IisNodeUrlRewriter : IHttpModule
{
    /// <summary>
    /// Unique lookup key for the HttpContext.Items dictionary.
    /// </summary>
    private const string ItemsKey_PathAndQuery = "__IisNodeUrlRewriter_PathAndQuery__";

    /// <summary>
    /// Regex matching paths that should not be rewritten.
    /// </summary>
    /// <remarks>
    /// Specifically: /iisnode(/...) /app.js/debug(/...) and /app.js.debug(/...)
    /// </remarks>
    private Regex _noRewritePaths = new Regex(@"^/(iisnode|app\.js[/\.]debug)(/.*)?$", RegexOptions.IgnoreCase);

    /// <summary>
    /// Initializes a module and prepares it to handle requests.
    /// </summary>
    /// <param name="context">An HttpApplication that provides access to the methods, properties, and events common to all application objects within an ASP.NET application.</param>
    public void Init(HttpApplication context)
    {
        context.BeginRequest += HandleBeginRequest;
        context.PreRequestHandlerExecute += HandlePreRequestHandlerExecute;
    }

    /// <summary>
    /// Occurs as the first event in the HTTP pipeline chain of execution when ASP.NET responds to a request.
    /// </summary>
    /// <param name="sender">The source of the event.</param>
    /// <param name="e">An System.EventArgs that contains no event data.</param>
    private void HandleBeginRequest(object sender, EventArgs e)
    {
        // Get context
        var application = (HttpApplication)sender;
        var context = application.Context;

        // Rewrite all paths *except* those to the default log directory or those used for debugging
        if (!_noRewritePaths.IsMatch(context.Request.Path))
        {
            // Save original path
            context.Items[ItemsKey_PathAndQuery] = context.Request.Url.PathAndQuery;

            // Rewrite path
            context.RewritePath("/app.js");
        }
    }

    /// <summary>
    /// Occurs just before ASP.NET starts executing an event handler (for example, a page or an XML Web service).
    /// </summary>
    /// <param name="sender">The source of the event.</param>
    /// <param name="e">An System.EventArgs that contains no event data.</param>
    private void HandlePreRequestHandlerExecute(object sender, EventArgs e)
    {
        // Get context
        var application = (HttpApplication)sender;
        var context = application.Context;

        // Restore original path (if present)
        var originalPath = context.Items[ItemsKey_PathAndQuery] as string;
        if (null != originalPath)
        {
            context.RewritePath(originalPath);
        }
    }

    /// <summary>
    /// Disposes of the resources (other than memory) used by the module that implements IHttpModule.
    /// </summary>
    public void Dispose()
    {
        // Nothing to do here; implementation of this method is required by IHttpModule
    }
}

Plug it in, plug it in [Sample code for two TextAnalysisTool.NET plug-ins demonstrates support for custom file types]

A few days ago, @_yabloki tweeted asking how to write a TextAnalysisTool.NET plug-in. I've answered this question a few times in email, but never blogged it before now.

To understand the basis of the question, you need to know what TextAnalysisTool.NET is; for that, I refer you to the TextAnalysisTool.NET page for an overview.

Animated GIF showing basic TextAnalysisTool.NET functionality

To understand the rest of the question, you need to know what a plug-in is; for that, there's the following paragraph from the documentation:

TextAnalysisTool.NET's support for plug-ins allows users to add in their own
code that understands specialized file types.  Every time a file is opened,
each plug-in is given a chance to take responsibility for parsing that file.
When a plug-in takes responsibility for parsing a file, it becomes that plug-
in's job to produce a textual representation of the file for display in the
usual line display.  If no plug-in supports a particular file, then it gets
opened using TextAnalysisTool.NET's default parser (which displays the file's
contents directly).  One example of what a plug-in could do is read a binary
file format and produce meaningful textual output from it (e.g., if the file is
compressed or encrypted).  Another plug-in might add support for the .zip
format and display a list of the files within the archive.  A particularly
ambitious plug-in might translate text files from one language to another.  The
possibilities are endless!

 

Armed with an understanding of TextAnalysisTool.NET and its support for plug-ins, we're ready to look at the interface plug-ins must implement:

namespace TextAnalysisTool.NET.Plugin
{
    /// <summary>
    /// Interface that all TextAnalysisTool.NET plug-ins must implement
    /// </summary>
    internal interface ITextAnalysisToolPlugin
    {
        /// <summary>
        /// Gets a meaningful string describing the type of file supported by the plug-in
        /// </summary>
        /// <remarks>
        /// Used to populate the "Files of type" combo box in the Open file dialog
        /// </remarks>
        /// <example>
        /// "XML Files"
        /// </example>
        /// <returns>descriptive string</returns>
        string GetFileTypeDescription();

        /// <summary>
        /// Gets the file type pattern describing the type(s) of file supported by the plug-in
        /// </summary>
        /// <remarks>
        /// Used to populate the "Files of type" combo box in the Open file dialog
        /// </remarks>
        /// <example>
        /// "*.xml"
        /// </example>
        /// <returns>file type pattern</returns>
        string GetFileTypePattern();

        /// <summary>
        /// Indicates whether the plug-in is able to parse the specified file
        /// </summary>
        /// <param name="fileName">full path to the file</param>
        /// <remarks>
        /// Called whenever a file is being opened to give the plug-in a chance to handle it;
        /// ideally the result can be returned based solely on the file name, but it is
        /// acceptable to open, read, and close the file if necessary
        /// </remarks>
        /// <returns>true iff the file is supported</returns>
        bool IsFileTypeSupported(string fileName);

        /// <summary>
        /// Returns a TextReader instance that will be used to read the specified file
        /// </summary>
        /// <param name="fileName">full path to the file</param>
        /// <remarks>
        /// The only methods that will be called (and therefore need to be implemented) are
        /// TextReader.ReadLine() and IDisposable.Dispose()
        /// </remarks>
        /// <returns>TextReader instance</returns>
        System.IO.TextReader GetReaderForFile(string fileName);
    }
}

Disclaimer: I wrote TextAnalysisTool.NET many years ago as a way to learn the (then) newly-released .NET 1.0 Framework. Extensibility frameworks like MEF weren't available yet, so please forgive the omission! :)

 

As you can see, the plug-in interface is simple, straightforward, automatically integrates into the standard File|Open UI, and leaves a great deal of freedom around implementation and function. Specifically, the TextReader instance returned by GetReaderForFile can do pretty much whatever you want. For example:

  • Simple tweaks to the input (ex: normalizing time stamps)
  • Filtering of the input (ex: to remove irrelevant lines)
  • Complex transformations of the input (ex: format conversions)
  • Completely unrelated data (ex: input from a network socket)

There's a lot of flexibility, and maybe the open-endedness is daunting? :) To make things concrete, I've packaged two of the samples I came up with during the original plug-in definition.

 

TATPlugin_SampleData

Loads files named like 3.lines and renders that many lines of sample text into the display.

Input (file name):

3.lines

Output:

1: The quick brown fox jumps over a lazy dog.
2: The quick brown fox jumps over a lazy dog.
3: The quick brown fox jumps over a lazy dog.

 

TATPlugin_XMLFormatter

Loads well-formed XML and pretty-prints it for easier reading.

Input:

<root><element><nested>value</nested></element><element><shallow><deep>value</deep></shallow></element></root>

Output:

<root>
  <element>
    <nested>value</nested>
  </element>
  <element>
    <shallow>
      <deep>value</deep>
    </shallow>
  </element>
</root>

 

[Click here to download the source code and supporting files for the sample TextAnalysisTool.NET plug-ins]

The download ZIP also includes Plugin.cs (the file defining the above interface), a few sample data files, and some trivial Build.cmd scripts to compile everything from a Visual Studio Developer Command Prompt (or similar environment where csc.exe and MSBuild.exe are available).

Note: When experimenting with the samples, remember that TextAnalysisTool.NET loads its plugins from the current directory at startup. So put a copy of TextAnalysisTool.NET (and its .config file) alongside the DLL outputs in the root of the samples directory and remember to re-start it if you change one of the samples. To check that plug-ins are loaded successfully, use the Help|Installed plug-ins menu item.

Aside: Plug-ins are generally UI-less, but they don't have to be - take a look at what Tomer did with the WPPFormatter plug-in for an example.

It's not much of a prompt if the user can't see it [Raising awareness of window.prompt's troublesome behavior in Internet Explorer on Windows 8]

For as long as people have been running scripts on web pages, window.prompt has been a simple way to get user input. Although it rarely offers the ideal user experience, simplicity and ubiquity make window.prompt a good way ask the user a question and get a simple response. And as part of the HTML5 specification, you know you can trust window.prompt with all your cross-browser prompting needs!

Well, unless you're running the metro/immersive/Windows Store/non-desktop version of Internet Explorer on Windows 8.

Then you're in trouble.

You see, window.prompt has a strange implementation in immersive IE on Windows 8.x: it doesn't actually prompt the user. Instead, it silently returns undefined without doing anything! To make matters worse, this behavior doesn't seem to be documented.

The result is a certain amount of surprise and workarounds by people who rely on window.prompt behaving as specified.

Unfortunately, this is the kind of thing you don't know is broken until a customer reports a problem. :( But fortunately, once you know about this behavior, you can detect it and take steps to accommodate it. I got burned recently, so I've written this post to raise awareness in the hope that I can save some of you a bit of trouble.

In brief, there are three classes of behavior for window.prompt in Internet Explorer on Windows 8:

User Action Return value
Provide a response string (possibly empty)
Cancel the prompt null
Running immersive IE undefined (no prompt)

If this post helps just one person, I'll consider it a success. :)

Aside: While the choice of undefined for the immersive IE scenario is a reasonable one (it being a "falsy" value like null), I first thought it was at odds with the HTML5 specification. But on closer reading I think this behavior falls outside the specification which only concerns itself with scenarios where the prompt is displayed. So while I'd say immersive IE is violating the spec by not showing the prompt, its use of undefined as a return value is sensible and valid. :)

Further aside: If you think the chart above is all you need, you may still be in for a surprise. This page claims the Opera browser returns undefined when a user cancels the prompt - a violation of (my understanding of) the HTML5 specification.