Posts tagged "Technical"

Another tool in the fight against spelling errors. [Added HtmlStringExtractor for pulling strings, attributes, and comments from HTML/XML files and URLs]

Tuesday, July 1, 2014

When I blogged simple string extraction utilities for C# and JavaScript code, I thought I'd covered the programming languages that matter most to me. Then I realized my focus was too narrow; I spend a lot of time working with markup languages and want content there to be correctly spelled as well.

So I dashed off another string extractor based on the implementation of JavaScriptStringExtractor:

HtmlStringExtractor Runs on Node.js. Dependencies via npm. htmlparser2 for parsing, glob for globbing, and request for web access. Wildcard matching includes subdirectories when ** is used. Pass a URL to extract strings from the web.

HtmlStringExtractor works just like its predecessors, outputting all strings/attributes/comments to the console for redirection or processing. But it has an additional power: the ability to access content directly from the internet via URL. Because so much of the web is HTML, it seemed natural to support live-extraction - which in turn makes it easier to spell-check a web site and be sure you're including "hidden" text (like title and alt attributes) that copy+paste don't cover.

Its HTML parser is very forgiving, so HtmlStringExtractor will happily work with HTML-like languages such as XML, ASPX, and PHP. Of course, the utility of doing so decreases as the language gets further removed from HTML, but for many scenarios the results are quite acceptable. In the specific case of XML, output should be entirely meaningful, filtering out all element and attribute metadata and leaving just the "real" data for review.

In keeping with the theme of "small and simple", I didn't add an option to exclude attributes by name - but you can imagine that filtering out things like id, src, and href would do a lot to reduce noise. Who knows, maybe I'll support that in a future update. :)

For now, things are simple. The StringExtractors GitHub repository has the complete code for all three extractors.

Enjoy!

Aside: As I wrote this post, I realized there's another "language" I use regularly: JSON. Because of its simple structure, I don't think there's a need for JsonStringExtractor - but if you feel otherwise, please let me know! (It'd be easy to create.)

Tags: Technical Utilities

Spelling is hard; let's go parsing. [Simple command-line utilities to extract strings and comments from C# and JavaScript code]

Thursday, June 5, 2014

Maybe it's old fashioned, but I try to spell things correctly. I don't always succeed, so I rely on tools to help.

Integrated spell-checking for documents and email is ubiquitous, but there's not much support for source code. I've previously written about a code analysis rule I implemented for .NET/FxCop, but that doesn't help with JavaScript (which I've been using more and more).

Sometimes I'll copy+paste source code into Microsoft Word, but that's an act of true desperation because there are so many false positives for keywords, variables, syntax, and the like. So I wrote a simple tool to reduce the noise:

JavaScriptStringExtractor Runs on Node.js. Dependencies via npm. Esprima for parsing and glob for globbing. Wildcard matching includes subdirectories when ** is used.

It's a simple command-line utility to extract strings and comments from a JavaScript file and write the results to standard output. Redirect that output to a file and open it in Word for a much improved spell-checking experience!

Granted, it's still not ideal, but it is quick and painless and filters out most of the noise. I scan for red squiggles, update the code, and get on with life, feeling better about myself for the effort. :)

JavaScript was so easy, I wrote a version for C#, too:

CSharpStringExtractor Runs on .NET. Dependencies via NuGet. Roslyn for parsing. Wildcard matching includes subdirectories.

It's pretty much the same tool, just for a different language.

With JavaScript and C# taken care of, my needs are satisfied - but there are lots more languages out there and maybe other people want to try.

So I created a GitHub repository with the complete code for both tools: StringExtractors

Enjoy!

PS - Pull requests for other languages are welcome. :)

Tags: Technical Utilities

Too liberal a definition of "input" [Raising awareness of a quirk of the input event for text elements with a placeholder on Internet Explorer]

Tuesday, May 13, 2014

I tracked down a strange problem in a web app recently that turned out to be the result of a browser bug. I wasted some time because searching for info wasn't all that helpful and I wanted to blog the details so I could save someone else a bit of trouble.

The scenario is simple; this application deals with private information and wants to log the user out after a period of inactivity. It does so by listening to the input event of its text boxes and resetting an inactivity timer whenever the user types. (There's similar handling for button clicks, etc..) If the inactivity timer ever fires, the user gets logged out. Because the freshly-loaded application has no private data, there's no need to start the inactivity timer until after the first user input.

This behavior is pretty straightforward and the initial implementation worked as intended. But there was a problem on Internet Explorer (versions 10 and 11): a newly-loaded instance of the app would reload itself even without any input! What turned out to be the issue was that an input element with its placeholder attribute set also had its autofocus attribute set and when IE set focus to that element, it triggered the input event! The app (wrongly) believed the user had interacted with it and started its inactivity timer.

Fortunately, this is not catastrophic because the app fails securely (though more by chance than design). But it's still annoying and a bit of a nuisance...

Trying to detect and ignore the bogus first interaction on a single browser seemed dodgy, so I went with a call to setTimeout to clear the inactivity timer immediately after load on all browsers. Sometimes, this is harmlessly unnecessary; on IE, it undoes the effect of the bogus input event and achieves the desired behavior.

Depending on your scenario, a different approach might be better - the biggest challenge is understanding what's going on in the first place! :)

Notes:

The Internet Explorer team is aware of and acknowledges the issue - you can track its status here

I created a simple demonstration:

<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
    <title>Input event behavior for text elements with/out a placeholder</title>
</head>
<body>
    <!-- Text boxes with/without placeholder text -->
    <input id="without" type="text"/>
    <input id="with" type="text" placeholder="Placeholder" autofocus="autofocus"/>

    <!-- Reference jQuery for convenience -->
    <script src="//code.jquery.com/jquery-1.11.0.min.js"></script>

    <script>
        // Helper function for logging
        function log(message) {
            $("body").append($("<div>" + message + "</div>"));
        }

        // Handler for the input event
        function onInput(evt) {
            log("input event for element '" + evt.target.id + "', value='" + evt.target.value + "'");
        }

        // Attach input handler to all input elements
        $("input").on("input", onInput);

        // Provide brief context
        log("&nbsp;");
        log("Click in and out of both text boxes to test your browser.");
        log("If you see an input event immediately on load, that's a bug.");
        log("&nbsp;");
    </script>
</body>
</html>

Tags: Technical Web

Avoiding unnecessary dependencies by writing a few lines of code [A simple URL rewrite implementation for Node.js apps under iisnode removes the need for the URL Rewrite module]

Friday, May 2, 2014

As the administrator of multiple computers, I want the machines I run to be as stable and secure as possible. (Duh!) One way I go about that is by not installing anything that isn't absolutely necessary. So whenever I see a dependency that's providing only a tiny bit of functionality, I try to get rid of it. Such was the case with the URL Rewrite module for IIS when I migrated my blog to Node.js.

Aside: I'm not hating: URL Rewrite does some cool stuff, lots of people love it, and it's recommended by Microsoft. However, my sites don't use it, so it represented a new dependency and was therefore something I wanted to avoid. :)

The role URL Rewrite plays in the Node.js scenario is minimal - it points all requests to app.js, passes along the original URL for the Node app to handle, and that's about it. Tomasz Janczuk outlines the recommended configuration in his post Using URL rewriting with node.js applications hosted in IIS using iisnode.

I figured I could do the same thing with a few lines of code by defining a simple IHttpModule implementation in App_Code. So I did! :)

The class I wrote is named IisNodeUrlRewriter and using it is easy. Starting from a working Node.js application on iisnode (refer to Hosting node.js applications in IIS on Windows for guidance), all you need to do is:

Put a copy of IisNodeUrlRewriter.cs in the App_Code folder
Update the site's web.config to load the new module
Profit!

Here's what the relevant part of web.config looks like:

<configuration>
  <system.webServer>
    <modules>
      <add name="IisNodeUrlRewriter" type="IisNodeUrlRewriter"/>
    </modules>
  </system.webServer>
</configuration>

Once enabled and running, IisNodeUrlRewriter rewrites all incoming requests to /app.js except for:

Requests at or under /iisnode, iisnode's default log directory
Requests at or under /app.js/debug, iisnode's node-inspector entry-point
Requests at or under /app.js.debug, the directory iisnode creates to support node-inspector

If you prefer the name server.js, you want to block one of the above paths in production, or you're running Node as part of a larger ASP.NET site (as I do), tweak the code to fit your scenario.

I modified the standard Node.js "hello world" sample to prove everything works and show the original request URL getting passed through:

require("http").createServer(function (req, res) {
    res.writeHead(200, { "Content-Type": "text/plain" });
    res.end("Requested URL: " + req.url);
}).listen(process.env.PORT, "127.0.0.1");
console.log("Server running...");

Sample output:

http://localhost/app.js
Requested URL: /app.js

http://localhost/
Requested URL: /

http://localhost/any/path?query=params
Requested URL: /any/path?query=params

I also verified that IIS's default logging of requests remains correct (though you can't tell that from the output above).

Note: I've found that accessing the debug path through IisNodeUrlRewriter can be somewhat finicky and take a few tries to load successfully. I'm not sure why, but because I don't use that functionality, I haven't spent much time investigating.

The implementation of IisNodeUrlRewriter is straightforward and boils down to two calls to HttpContext.RewritePath that leverage ASP.NET's default execution pipeline. One handy thing is the use of HttpContext.Items to save/restore the original URI. One obscure thing is the use of a single regular expression to match all three (actually six!) paths above. (If you're not comfortable with regular expressions or prefer to be more explicit, the relevant check can easily be turned into a set of string comparisons.)

The implementation is included below in its entirety; it ends up being more comments than code! I've also created a GitHub Gist for IisNodeUrlRewriter in case anyone wants to iterate on it or make improvements. (Some ideas: auto-detect the logging/debug paths by parsing web.config/iisnode.yml, use server.js instead of app.js when present, support redirects for only parts of the site, etc.)

// Copyright (c) 2014 by David Anson, http://dlaa.me/
// Released under the MIT license, http://opensource.org/licenses/MIT

using System;
using System.Text.RegularExpressions;
using System.Web;

/// <summary>
/// An IHttpModule that rewrites all URLs to /app.js (making the original URL available to the Node.js application).
/// </summary>
/// <remarks>
/// For use with Node.js applications running under iisnode on IIS as an alternative to installing the URL Rewrite module.
/// </remarks>
public class IisNodeUrlRewriter : IHttpModule
{
    /// <summary>
    /// Unique lookup key for the HttpContext.Items dictionary.
    /// </summary>
    private const string ItemsKey_PathAndQuery = "__IisNodeUrlRewriter_PathAndQuery__";

    /// <summary>
    /// Regex matching paths that should not be rewritten.
    /// </summary>
    /// <remarks>
    /// Specifically: /iisnode(/...) /app.js/debug(/...) and /app.js.debug(/...)
    /// </remarks>
    private Regex _noRewritePaths = new Regex(@"^/(iisnode|app\.js[/\.]debug)(/.*)?$", RegexOptions.IgnoreCase);

    /// <summary>
    /// Initializes a module and prepares it to handle requests.
    /// </summary>
    /// <param name="context">An HttpApplication that provides access to the methods, properties, and events common to all application objects within an ASP.NET application.</param>
    public void Init(HttpApplication context)
    {
        context.BeginRequest += HandleBeginRequest;
        context.PreRequestHandlerExecute += HandlePreRequestHandlerExecute;
    }

    /// <summary>
    /// Occurs as the first event in the HTTP pipeline chain of execution when ASP.NET responds to a request.
    /// </summary>
    /// <param name="sender">The source of the event.</param>
    /// <param name="e">An System.EventArgs that contains no event data.</param>
    private void HandleBeginRequest(object sender, EventArgs e)
    {
        // Get context
        var application = (HttpApplication)sender;
        var context = application.Context;

        // Rewrite all paths *except* those to the default log directory or those used for debugging
        if (!_noRewritePaths.IsMatch(context.Request.Path))
        {
            // Save original path
            context.Items[ItemsKey_PathAndQuery] = context.Request.Url.PathAndQuery;

            // Rewrite path
            context.RewritePath("/app.js");
        }
    }

    /// <summary>
    /// Occurs just before ASP.NET starts executing an event handler (for example, a page or an XML Web service).
    /// </summary>
    /// <param name="sender">The source of the event.</param>
    /// <param name="e">An System.EventArgs that contains no event data.</param>
    private void HandlePreRequestHandlerExecute(object sender, EventArgs e)
    {
        // Get context
        var application = (HttpApplication)sender;
        var context = application.Context;

        // Restore original path (if present)
        var originalPath = context.Items[ItemsKey_PathAndQuery] as string;
        if (null != originalPath)
        {
            context.RewritePath(originalPath);
        }
    }

    /// <summary>
    /// Disposes of the resources (other than memory) used by the module that implements IHttpModule.
    /// </summary>
    public void Dispose()
    {
        // Nothing to do here; implementation of this method is required by IHttpModule
    }
}

Tags: Node.js Technical Web

Plug it in, plug it in [Sample code for two TextAnalysisTool.NET plug-ins demonstrates support for custom file types]

Tuesday, April 29, 2014

A few days ago, @_yabloki tweeted asking how to write a TextAnalysisTool.NET plug-in. I've answered this question a few times in email, but never blogged it before now.

To understand the basis of the question, you need to know what TextAnalysisTool.NET is; for that, I refer you to the TextAnalysisTool.NET page for an overview.

Animated GIF showing basic TextAnalysisTool.NET functionality

To understand the rest of the question, you need to know what a plug-in is; for that, there's the following paragraph from the documentation:

TextAnalysisTool.NET's support for plug-ins allows users to add in their own
code that understands specialized file types.  Every time a file is opened,
each plug-in is given a chance to take responsibility for parsing that file.
When a plug-in takes responsibility for parsing a file, it becomes that plug-
in's job to produce a textual representation of the file for display in the
usual line display.  If no plug-in supports a particular file, then it gets
opened using TextAnalysisTool.NET's default parser (which displays the file's
contents directly).  One example of what a plug-in could do is read a binary
file format and produce meaningful textual output from it (e.g., if the file is
compressed or encrypted).  Another plug-in might add support for the .zip
format and display a list of the files within the archive.  A particularly
ambitious plug-in might translate text files from one language to another.  The
possibilities are endless!

Armed with an understanding of TextAnalysisTool.NET and its support for plug-ins, we're ready to look at the interface plug-ins must implement:

namespace TextAnalysisTool.NET.Plugin
{
    /// <summary>
    /// Interface that all TextAnalysisTool.NET plug-ins must implement
    /// </summary>
    internal interface ITextAnalysisToolPlugin
    {
        /// <summary>
        /// Gets a meaningful string describing the type of file supported by the plug-in
        /// </summary>
        /// <remarks>
        /// Used to populate the "Files of type" combo box in the Open file dialog
        /// </remarks>
        /// <example>
        /// "XML Files"
        /// </example>
        /// <returns>descriptive string</returns>
        string GetFileTypeDescription();

        /// <summary>
        /// Gets the file type pattern describing the type(s) of file supported by the plug-in
        /// </summary>
        /// <remarks>
        /// Used to populate the "Files of type" combo box in the Open file dialog
        /// </remarks>
        /// <example>
        /// "*.xml"
        /// </example>
        /// <returns>file type pattern</returns>
        string GetFileTypePattern();

        /// <summary>
        /// Indicates whether the plug-in is able to parse the specified file
        /// </summary>
        /// <param name="fileName">full path to the file</param>
        /// <remarks>
        /// Called whenever a file is being opened to give the plug-in a chance to handle it;
        /// ideally the result can be returned based solely on the file name, but it is
        /// acceptable to open, read, and close the file if necessary
        /// </remarks>
        /// <returns>true iff the file is supported</returns>
        bool IsFileTypeSupported(string fileName);

        /// <summary>
        /// Returns a TextReader instance that will be used to read the specified file
        /// </summary>
        /// <param name="fileName">full path to the file</param>
        /// <remarks>
        /// The only methods that will be called (and therefore need to be implemented) are
        /// TextReader.ReadLine() and IDisposable.Dispose()
        /// </remarks>
        /// <returns>TextReader instance</returns>
        System.IO.TextReader GetReaderForFile(string fileName);
    }
}

Disclaimer: I wrote TextAnalysisTool.NET many years ago as a way to learn the (then) newly-released .NET 1.0 Framework. Extensibility frameworks like MEF weren't available yet, so please forgive the omission! :)

As you can see, the plug-in interface is simple, straightforward, automatically integrates into the standard File|Open UI, and leaves a great deal of freedom around implementation and function. Specifically, the TextReader instance returned by GetReaderForFile can do pretty much whatever you want. For example:

Simple tweaks to the input (ex: normalizing time stamps)
Filtering of the input (ex: to remove irrelevant lines)
Complex transformations of the input (ex: format conversions)
Completely unrelated data (ex: input from a network socket)

There's a lot of flexibility, and maybe the open-endedness is daunting? :) To make things concrete, I've packaged two of the samples I came up with during the original plug-in definition.

TATPlugin_SampleData

Loads files named like 3.lines and renders that many lines of sample text into the display.

Input (file name):

3.lines

Output:

1: The quick brown fox jumps over a lazy dog.
2: The quick brown fox jumps over a lazy dog.
3: The quick brown fox jumps over a lazy dog.

TATPlugin_XMLFormatter

Loads well-formed XML and pretty-prints it for easier reading.

Input:

<root><element><nested>value</nested></element><element><shallow><deep>value</deep></shallow></element></root>

Output:

<root>
  <element>
    <nested>value</nested>
  </element>
  <element>
    <shallow>
      <deep>value</deep>
    </shallow>
  </element>
</root>

[Click here to download the source code and supporting files for the sample TextAnalysisTool.NET plug-ins]

The download ZIP also includes Plugin.cs (the file defining the above interface), a few sample data files, and some trivial Build.cmd scripts to compile everything from a Visual Studio Developer Command Prompt (or similar environment where csc.exe and MSBuild.exe are available).

Note: When experimenting with the samples, remember that TextAnalysisTool.NET loads its plugins from the current directory at startup. So put a copy of TextAnalysisTool.NET (and its .config file) alongside the DLL outputs in the root of the samples directory and remember to re-start it if you change one of the samples. To check that plug-ins are loaded successfully, use the Help|Installed plug-ins menu item.

Aside: Plug-ins are generally UI-less, but they don't have to be - take a look at what Tomer did with the WPPFormatter plug-in for an example.

Tags: Technical TextAnalysisTool Utilities

It's not much of a prompt if the user can't see it [Raising awareness of window.prompt's troublesome behavior in Internet Explorer on Windows 8]

Friday, April 4, 2014

For as long as people have been running scripts on web pages, window.prompt has been a simple way to get user input. Although it rarely offers the ideal user experience, simplicity and ubiquity make window.prompt a good way ask the user a question and get a simple response. And as part of the HTML5 specification, you know you can trust window.prompt with all your cross-browser prompting needs!

Well, unless you're running the metro/immersive/Windows Store/non-desktop version of Internet Explorer on Windows 8.

Then you're in trouble.

You see, window.prompt has a strange implementation in immersive IE on Windows 8.x: it doesn't actually prompt the user. Instead, it silently returns undefined without doing anything! To make matters worse, this behavior doesn't seem to be documented.

The result is a certain amount of surprise and workarounds by people who rely on window.prompt behaving as specified.

Unfortunately, this is the kind of thing you don't know is broken until a customer reports a problem. :( But fortunately, once you know about this behavior, you can detect it and take steps to accommodate it. I got burned recently, so I've written this post to raise awareness in the hope that I can save some of you a bit of trouble.

In brief, there are three classes of behavior for window.prompt in Internet Explorer on Windows 8:

User Action	Return value
Provide a response	`string` (possibly empty)
Cancel the prompt	`null`
Running immersive IE	`undefined` (no prompt)

If this post helps just one person, I'll consider it a success. :)

Aside: While the choice of undefined for the immersive IE scenario is a reasonable one (it being a "falsy" value like null), I first thought it was at odds with the HTML5 specification. But on closer reading I think this behavior falls outside the specification which only concerns itself with scenarios where the prompt is displayed. So while I'd say immersive IE is violating the spec by not showing the prompt, its use of undefined as a return value is sensible and valid. :)

Further aside: If you think the chart above is all you need, you may still be in for a surprise. This page claims the Opera browser returns undefined when a user cancels the prompt - a violation of (my understanding of) the HTML5 specification.

Tags: Technical Web

Is it still cheating if you come up with the cheat yourself? [Simple code to solve a "sliding pieces" puzzle]

Tuesday, July 16, 2013

I was playing around with one of those "rearrange the pieces on the board" puzzles recently and realized I was stumped after unsuccessfully making the same moves over and over and over...:|

Aside: If you're not familiar with sliding puzzles, the 15-puzzle and Klotski puzzle are both classic examples.

For the particular puzzle I was stuck on, the board looks like:

  ######
 #      #
 #      #
 #      #
#        #
 #  ##  #
 #  ##  #
  ##  ##

And uses the following seven pieces:

AA  BB  CC  DD   E  FF   G
AA  BB  CC  D   EE   F   GG

The challenge is to get the 'A' piece from here:

  ######
 #      #
 #      #
 #      #
#        #
 #AA##  #
 #AA##  #
  ##  ##

To here:

  ######
 #      #
 #      #
 #      #
#        #
 #  ##AA#
 #  ##AA#
  ##  ##

With the starting state:

  ######
 #DDBBFF#
 #DEBBGF#
 #EECCGG#
#   CC   #
 #AA##  #
 #AA##  #
  ##  ##

Go ahead and give it a try if you want a challenge! You can make your own puzzle out of paper cutouts or build something with interlocking cubes (which I can say from experience works quite well).

Aside: I'm not going to reveal where the original puzzle came from because I don't want to spoil it for anyone. But feel free to leave a comment if it looks familiar!

Now, maybe the puzzle's solution is/was obvious to you, but like I said, I was stuck. As it happens, I was in a stubborn mood and didn't want to admit defeat, so I decided to write a simple program to solve the puzzle for me!

I'd done this before (many years ago), and had a decent sense of what was involved; I managed to bang out the solution below pretty quickly. My goal was to solve the puzzle with minimal effort on my part - so the implementation favors simplicity over performance and there's a lot of room for improvement. Still, it finds the solution in about a second and that's more than quick enough for my purposes.:)

I've included the complete implementation below. The code should be fairly self-explanatory, so read on if you're interested in one way to solve something like this. Two things worth calling out are that this approach is guaranteed to find the solution with the fewest number of moves and it can handle arbitrarily-shaped pieces and boards - both of which flow pretty naturally from the underlying design.

There's not much more to say - except that you should feel free to reuse the code for your own purposes!

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;

// Quick (minimally-optimized) solver for a typical "slide the pieces" puzzle.
// * Implements breadth-first search to guarantee it finds the optimal solution
// * Detects already-seen states and avoids analyzing them again
// * Handles arbitrarily shaped pieces and irregular/non-contiguous boards
// Performance notes:
// * Board.GetHashCode is called *very* often; keeping it fast is ideal
// * Board.IsValid is also called frequently and should be quick to run
// * Board._locations could have the board location removed (it's always 0)
// * It may be more efficient check validity before creating Board instances
class SlidePuzzleSolver
{
    public static void Main()
    {
#if DEBUG
        // Quick sanity check of the Board class
        var test = new Board();
        Debug.Assert(test.IsValid);
        Debug.Assert(!test.IsSolved);
#endif

        // Stopwatch is handy for measuring/improving performance
        var stopwatch = new Stopwatch();
        stopwatch.Start();

        // Initialize
        Board start = new Board();
        Board solved = null;
        var seen = new HashSet<Board>(start); // IEqualityComparer<Board>
        var todo = new Queue<Board>();
        todo.Enqueue(start);
        seen.Add(start);

        // Keep going as long as there are unseen states...
        while (0 < todo.Count)
        {
            // Get the next board and process its moves
            var board = todo.Dequeue();
            foreach (var move in board.GetMoves())
            {
                if (move.IsSolved)
                {
                    // Solved!
                    solved = move;
                    todo.Clear();
                    break;
                }
                if (!seen.Contains(move))
                {
                    // Enqueue the new state
                    todo.Enqueue(move);
                    seen.Add(move);
                }
            }
        }

        // Write elapsed time to debug window
        stopwatch.Stop();
        Debug.WriteLine("Elapsed time: {0}ms", stopwatch.ElapsedMilliseconds);

        // Reverse the solved->start parent chain
        Debug.Assert(null != solved);
        var solution = new Stack<Board>();
        while (null != solved)
        {
            solution.Push(solved);
            solved = solved.Parent;
        }

        // Display the solution start->solved
        foreach (var board in solution)
        {
            board.Show();
        }
    }

    // Representation of a board and the arrangement of pieces on it
    // (IEqualityComparer<Board> is used by HashSet<Board> above)
    private class Board : IEqualityComparer<Board>
    {
        // Constants for board size
        private const int Width = 10;
        private const int Height = 8;

        // Statics for pieces and moves
        private static readonly IReadOnlyList<Piece> _pieces;
        private static readonly IReadOnlyList<int> _deltas;

        static Board()
        {
            // Initialize (shared) piece instances
            // Pieces are defined by cells they occupy as deltas from their origin (top-left)
            // Cells are numbered 0..N going from top-left to bottom-right
            // The board is treated as a piece, but never allowed to move
            var pieces = new List<Piece>();
            pieces.Add(new Piece('A', 0, 1, 10, 11)); // Square
            pieces.Add(new Piece('B', 0, 1, 10, 11)); // Square
            pieces.Add(new Piece('C', 0, 1, 10, 11)); // Square
            pieces.Add(new Piece('D', 0, 1, 10));     // 'L'
            pieces.Add(new Piece('E', 1, 10, 11));    // 'L'
            pieces.Add(new Piece('F', 0, 1, 11));     // 'L'
            pieces.Add(new Piece('G', 0, 10, 11));    // 'L'
            pieces.Add(new Piece('#', 2, 3, 4, 5, 6, 7, 11, 18, 21, 28, 31, 38, 40, 49, 51, 54, 55, 58, 61, 64, 65, 68, 72, 73, 76, 77)); // Irregular board shape
            _pieces = pieces.AsReadOnly();
            // Initialize move deltas (each represents one cell left/right/up/down)
            _deltas = new int[] { -1, 1, -Width, Width };
        }

        // Piece locations
        private readonly int[] _locations;

        // Parent board
        public Board Parent { get; private set; }

        // Create starting state of the puzzle
        public Board()
        {
            // Board piece (last element) is always at offset 0
            _locations = new int[] { 52, 14, 34, 12, 22, 16, 26, 0 };
        }

        // Create a board from its parent
        private Board(Board parent, int[] locations)
        {
            Parent = parent;
            _locations = locations;
        }

        // Get the valid moves from the current board state
        public IEnumerable<Board> GetMoves()
        {
            // Try to move each piece (except for the board)...
            for (var p = 0; p < _pieces.Count - 1; p++)
            {
                // ... in each direction...
                foreach (var delta in _deltas)
                {
                    // ... to create the corresponding board...
                    var locations = (int[])_locations.Clone();
                    locations[p] += delta;
                    var board = new Board(this, locations);
                    // ... and return it if it's valid
                    if (board.IsValid)
                    {
                        yield return board;
                    }
                }
            }
        }

        // Checks whether a board is valid (i.e., has no overlapping cells)
        public bool IsValid
        {
            get
            {
                // Array to track occupied cells
                var locations = new bool[Width * Height];
                // For each piece (including the board)...
                for (var p = 0; p < _pieces.Count; p++)
                {
                    var piece = _pieces[p];
                    var offsets = piece.Offsets;
                    // ... for each cell it occupies...
                    for (var o = 0; o < offsets.Length; o++)
                    {
                        // ... check if the cell is occupied...
                        var location = _locations[p] + offsets[o];
                        if (locations[location])
                        {
                            // Already occupied; invalid board
                            return false;
                        }
                        // ... and mark it occupied
                        locations[location] = true;
                    }
                }
                return true;
            }
        }

        // Checks if the board is solved
        public bool IsSolved
        {
            get
            {
                // All that matters in *this* puzzle is whether the 'A' piece is at its destination
                return (56 == _locations[0]);
            }
        }

        // Show the board to the user
        public void Show()
        {
            // Clear the console
            Console.Clear();
            // For each piece (including the board)...
            for (var p = 0; p < _pieces.Count; p++)
            {
                var piece = _pieces[p];
                // ... for each offset...
                foreach (var offset in piece.Offsets)
                {
                    // ... determine the x,y of the cell...
                    var location = _locations[p] + offset;
                    var x = location % Width;
                    var y = location / Width;
                    // ... and plot it on the console
                    Console.SetCursorPosition(x, y);
                    Console.Write(piece.Marker);
                }
            }
            // Send the cursor to the bottom and wait for a key
            Console.SetCursorPosition(0, Height);
            Console.ReadKey();
        }

        // IEqualityComparer<Board> implemented on this class for convenience

        // Checks if two boards are identical
        public bool Equals(Board x, Board y)
        {
            return Enumerable.SequenceEqual(x._locations, y._locations);
        }

        // Gets a unique-ish hash code for the board
        // XORs the shifted piece locations into an int
        public int GetHashCode(Board b)
        {
            var hash = 0;
            var shift = 0;
            foreach (var i in b._locations)
            {
                hash ^= (i << shift);
                shift += 4;
            }
            return hash;
        }
    }

    // Representation of a piece, its visual representation, and the cells it occupies
    private class Piece
    {
        public char Marker { get; private set; }
        public int[] Offsets { get; private set; }

        public Piece(char marker, params int[] offsets)
        {
            Marker = marker;
            Offsets = offsets;
        }
    }
}

Tags: Miscellaneous Technical

64 bits ought to be enough for everybody [TextAnalysisTool.NET update for .NET 2.0 and 64-bit enables the analysis of larger files!]

Monday, May 20, 2013

TextAnalysisTool.NET is a free program designed to excel at viewing, searching, and navigating large files quickly and efficiently.

I wrote the first version of TextAnalysisTool back in 2000 using C++ and Win32. In 2003, I rewrote it using the new .NET 1.0 Framework - and upgraded to .NET 1.1 later that year. There were a variety of improvements between then and 2006, the date of the last update to TextAnalysisTool.NET. Along the way, I've heard from a lot of people who use this tool to simplify their daily workflow! In the past year, I've started getting requests for 64-bit support from folks working with extremely large files that don't fit in the 4GB virtual address space limits of a normal 32-bit process on Windows. Although .NET 1.1 didn't support 64-bit processes, .NET 2.0 does, and I've decided it's finally time to take the plunge.:)

Animated GIF showing basic TextAnalysisTool.NET functionality

With this release, TextAnalysisTool.NET has been compiled using the .NET 2.0 toolset and the AnyCPU option which automatically matches a process to the architecture of its host operating system. On 32-bit OSes, TextAnalysisTool.NET will continue to run as a 32-bit process, but on 64-bit OSes, it will run as a 64-bit process and have access to a significantly larger address space. This makes it possible to work with larger log files without the risk of crashing into the 4GB memory limit (which can end up being as low as 1.7 GB in practice)!

Other than a few exceedingly minor string updates, I have made no changes to the way TextAnalysisTool.NET behaves - so the new version should feel just like the previous one. The framework update means .NET 1.1 is no longer a supported platform and .NET 2.0 is now natively supported. The included .config file allows the same executable to run under .NET 4.0 as-is (for example on Windows 8 without the optional ".NET Framework 3.5" feature installed).

If you've ever run out of memory using TextAnalysisTool.NET, please give this new version a try! And if not, go ahead and continue using the previous version without worrying that you're missing out on anything.:)

Click here to download the latest version of TextAnalysisTool.NET

Click here to visit the TextAnalysisTool.NET web page for more information

Many thanks to everyone for all the great feedback - I love getting messages from people around the world who are using TextAnalysisTool.NET to make their lives easier!

Aside: As a matter of technical interest, details on the one bug I found with 64-bit TextAnalysisTool.NET: the following code had worked fine for the last decade (message.WParam is an IntPtr via Form.WndProc for WM_MOUSEWHEEL):
int wheelDelta = ((int)message.WParam)>>16; // HIWORD(WPARAM)
However, when that assignment ran under .NET 2.0 on a 64-bit OS, it quickly threw OverflowException! I was surprised, but it turns out this is documented behavior. Because that line interoperates with the Windows API, I couldn't change the types involved - but I could tweak the code to avoid the exception by avoiding the problematic explicit conversion:
int wheelDelta = ((int)((long)message.WParam))>>16; // HIWORD(WPARAM)
Yep, the proverbial one-line fix saves the day!

Tags: Technical TextAnalysisTool Utilities

"If I have seen further, it is by standing on the shoulders of giants" [An alternate implementation of HTTP gzip decompression for Windows Phone]

Thursday, April 19, 2012

The HTTP protocol supports data compression of network traffic via the Content-Encoding header. Compressing network traffic is beneficial because it reduces the amount of data that needs to be transmitted over the network - and sending fewer bytes obviously takes less time! The tradeoff is that it takes a bit of extra work to decompress the data, but because the bottleneck is nearly always network, HTTP compression should be a win pretty much every time. And when you're dealing with comparatively slow, unreliable networks like the ones used by cell phones, the advantages of compression are even more significant.

Aside: Because most networks are lossy and transfer data in packets, sending just one fewer byte can be meaningful if it reduces the number of packets.

You might reasonably expect that enabling compression for Windows Phone web requests is as simple as setting the HttpWebRequest.AutomaticDecompression property available since .NET 2.0. Unfortunately, this property is not supported by current versions of the Windows Phone platform, so it's up to application developers to add HTTP compression support themselves. Consequently, a number of home-grown solutions have cropped up.

One popular example is Morten Nielsen's GZipWebClient. Naturally, Morten didn't want to implement his own compression library, so he's using SharpZipLib to do the heavy lifting. This is a great example of reuse, but it's important to note that SharpZipLib is licensed under the GNU General Public License (GPL) and some developers won't be comfortable with the implications of using GPL code in their own project. (For more on what those implications are, the Wikipedia article for GPL has a fairly detailed overview which includes mention of ambiguities around the definition of "derivative works".)

Aside: Of course, there are other options. Another popular library is DotNetZip which claims to be under the MS-PL (Microsoft Public License), the same permissive "do pretty much whatever you want with the code" license I use for this blog. However, a brief look at the license files it comes with suggests maybe that's not the whole story because there are at least four other licenses called out there.

As you can probably tell, I'm not the biggest fan of lawyers, ambiguity, or unnecessary risk [ :) ], so I'm always happy to have a simple, "no restrictions" solution - even if that means I have to create one myself. In this case, what I wanted was a way to decompress gzip data on Windows Phone without needing a separate library. That's when I remembered another of Morten's posts, this one about unzipping files under Silverlight. I figured that maybe if I put the chocolate in the peanut butter, I could come up with a simple, dependency-free solution to the gzip problem on Windows Phone!

The basic idea is to create a bit of code that customizes the user's WebClient/HttpWebRequest to add the Accept-Encoding header. Once that's in place, servers that support gzip automatically compress their response bodies. To decompress the downloaded response on the phone, another bit of code is used to wrap the compressed response stream in a ZIP archive and hand it off to Application.GetResourceStream which does the heavy lifting and provides access to the decompressed response stream. What's nice about this technique is that the decompression implementation is part of the Silverlight framework - meaning applications don't need to pull in a bunch of external code or increase their size!

I wanted this to be easy, so I've created a WebClient subclass and all you have to do is change this:

client = new WebClient();

Into this:

client = new GzipWebClient();

After which your application's HTTP requests will automatically benefit from gzip compression!

Of course, some people like to work a little closer to the metal and I've done a similar thing for the HttpWebRequest/HttpWebResponse crowd. Change this:

request = WebRequest.CreateHttp(_uri);
request.BeginGetResponse(callback, request);
// ... response = (HttpWebResponse)request.EndGetResponse(result);
stream = response.GetResponseStream();
// ...

Into this:

request = WebRequest.CreateHttp(_uri);
request.BeginGetCompressedResponse(callback, request);
// ... response = (HttpWebResponse)request.EndGetResponse(result);
stream = response.GetCompressedResponseStream();
// ...

And you're set!

Great, that's all well and good, but does any of this really matter? Less is clearly more, but is there actually a noticeable difference when using gzip? I wanted to answer that question for myself, so the sample application for this post not only exercises the code I've written, it also acts as a simple, real-time performance report! The sample makes continuous requests for http://microsoft.com/ using WebClient, standard HttpWebRequest/HttpWebResponse, my custom GzipWebClient, my Compressed helpers for HttpWebRequest/HttpWebResponse, and (optionally) Morten's GZipWebClient (for comparison purposes). As the data is collected, it's charted via the Silverlight Toolkit's Data Visualization library (which I've previously shown running on Windows Phone).

Here's what it looks like running in the emulator using a wired connection:

In the chart above, you can clearly see which requests are using gzip and which aren't! Not only are the gzip-enabled requests noticeably faster on average, they're nearly always faster even in the worst case. What's more, while there's variability for both kinds of requests (that's part of how the internet works), the delta between best/worst times of gzip-compressed requests is smaller (i.e., they're more consistent).

That's pretty compelling data, but the real benefit comes when the phone's data connection is used. Here's the output from the same app using the cell network (via Excel this time):

All our previous observations remain true - and are even more pronounced in this scenario. Gzip-compressed HTTP requests are significantly faster (taking less than half the time) and more predictable than traditional requests for the same data.

Awesome!

Aside: Based on the chart above, it seems reasonable to claim all three gzip solutions are equivalent. That said, if you squint just right, it looks like using HttpWebRequest is - on average - marginally quicker than using WebClient (as you'd expect; WebClient calls HttpWebRequest under the covers). Additionally, SharpGIS.GZipWebClient appears to be - on average - very slightly quicker than Delay.GzipWebClient (which is also not surprising when you consider the hoops my code jumps through to avoid the external dependency). Bear in mind, though, that these differences only really show up at the millisecond level, and seem unlikely to be significant for most real-world scenarios.

[Click here to download the source code for the gzip helper classes and the sample application shown above]

[Click here to visit the NuGet gallery page for Delay.GzipWebClient which contains the code for both gzip helper classes]

Notes:

Using the code I've authored is quite easy:
- If you want to use GzipWebClient, simply add the GzipWebClient.cs and GzipExtensions.cs files to your project, reference the Delay namespace, and replace any instances of WebClient.
- If you're a HttpWebRequest/HttpWebResponse fan, you only need to add GzipExtensions.cs to your project, reference the Delay namespace, and invoke the extension methods HttpWebRequest.BeginGetCompressedResponse and HttpWebResponse.GetCompressedResponseStream (as shown above).
By default, the sample application does not include Morten's GZipWebClient because I don't want to get into the business of distributing someone else's code. However it's easy to include - the following comment in MainPage.xaml.cs tells you how:
```
// For an additional scenario, un-comment the following line and install the "SharpGIS.GZipWebClient" NuGet package
//#define SHARPGIS
```
The code to use that assembly is already in the sample application; you just need to provide the bits if you want to try it out. :)
The implementation of the gzip-to-ZIP wrapper is fairly straightforward: it reads the gzip-compressed response from the server, wraps it in a stream that represents a valid ZIP archive with a single file compressed using the deflate algorithm which both specifications share, and hands that off to the Silverlight framework to return a decompressed stream for that file. To be clear, the compressed data isn't altered - it's simply re-packaged into a format that's more useful. :) If you're interested in the specifics, you'll probably want to familiarize yourself with the gzip specification and the ZIP file specification and then have a look at the code.
There are two minor inefficiencies in my code, one of which seems unavoidable and the other of which could be dealt with.
- The unavoidable one - and the reason I think SharpZipLib might be a smidge quicker - is that the size/checksum data for the gzip data is provided at the end of the download stream, but is needed at the beginning of the wrapper stream. If the correct values aren't used, the ZIP wrapper will be rejected - but those values are not known until the entire response has been downloaded. Therefore, it's not possible for my implementation to proactively process the data while it is being downloaded; the download must be buffered instead. A decompression library won't suffer from this limitation and ought to be quicker as a result.
- The avoidable inefficiency is that a single copy of the input data is made when creating the ZIP wrapper stream. To be clear, this is the only time data is copied (I've structured the code so there isn't any buffer resizing, etc.), but it's not technically necessary because the ZIP wrapper stream could operate directly from the download buffers. I may make this improvement in the future, but expect the performance difference to be negligible.

Compressing HTTP traffic is a pretty clear win for desktop applications - and an even bigger benefit for mobile apps communicating over slower, less reliable cellular networks. The Windows Phone platform doesn't make using gzip easy, but it's possible if you're sufficiently motivated. For people who aren't intimidated by licenses, there are already some good options for gzip - but for those who aren't comfortable with what's out there, I'm offering a new option under one of the most permissive licenses around.

I hope you find it useful!

Tags: Technical Silverlight Silverlight Toolkit Windows Phone

"I never did mind about the little things..." [Free HideShowDesktopButtonOnTaskbar utility hides the "Show desktop" button on the Windows taskbar]

Tuesday, January 24, 2012

Over the holidays, a relative emailed me the following rant:

Most of my conversation with technical support was to find out how to turn off the "Show desktop" icon in the lower right corner of the screen. This icon can be engaged by clicking on it or hovering over it (if you select Peek). On a desktop or a laptop this is NO PROBLEM. But on a tablet, there is no difference between clicking and hovering so Peek is meaningless. VERY OFTEN when using and holding the slate/tablet the heel of my thumb ever so gently touches the "Show desktop" icon and I am taken to the desktop in the middle of what I am doing. As I work, I am in constant fear of touching this icon. I know that everything on my hard drive will not be erased if I do, but it is very annoying. I went on the internet and found a program that turns the icon off and it worked (so it can be done), but then my antivirus program said that it was a THREAT AND ADVISED ME TO PUT IT IN A VAULT WHERE IT COULD NOT HARM MY COMPUTER. I agreed and the program was deleted. I'm not the only one with this problem. The internet has many complaining about the "Show desktop" icon. MS allows me to turn the Clock and Volume icon off; why not the "Show Desktop" icon?????

If you want a somewhat more positive spin, "Aero Peek" and the "Show desktop" button are described in this article about new Windows 7 taskbar features. I don't use either myself, but a bit of internet searching confirmed some people really don't like these features.

The best way to disable a feature is to use an officially supported mechanism - and the good news is that disabling Aero Peek is easy to do by following the directions "To turn off desktop previews" near the bottom of this article. However, I was not able to find similar support for disabling the "Show desktop" button, so I resorted to looking for the next best thing: a group policy setting or documented registry key. Unfortunately, I struck out there, too. But I did find this snippet from a Channel 9 video where Gov Maharaj confirms there's no built-in way to disable the button.

Aside: The relevant discussion is interesting, so maybe have a look even if you're not opposed to the feature itself!

Well, if there's no official way to remove the "Show desktop" button and enough people want to do it, then it's time to start considering other solutions. According to the original rant (and the video discussion), there already are third-party utilities for this purpose - although they're not supported by Microsoft. But it sounds like at least one of these tools might be malware, so I'm not super enthusiastic about trying them out...

Fortunately, I have a rudimentary understanding of both Windows and programming [ :) ], so I wondered if the simplest, most obvious trick would work here. I coded it up one night while waiting for something to compile and was pleased to find that it worked! So I've gone ahead and prettied the code up and am sharing a simple utility to get rid of the Windows 7 taskbar's "Show desktop" button:

[Click here to download the HideShowDesktopButtonOnTaskbar utility and its complete source code.]

Of course, HideShowDesktopButtonOnTaskbar is just as unsupported and unofficial as the other utilities out there - so why choose it?

I've included the complete source code for HideShowDesktopButtonOnTaskbar, so you can review everything and re-compile it yourself if you're paranoid. Also, you can be pretty confident I'm not a 1337 h4x0r trying to root your box. :)
HideShowDesktopButtonOnTaskbar makes no persistent changes to the machine, so there are no lingering effects and a simple logoff is all it takes to restore everything to the way it was.
HideShowDesktopButtonOnTaskbar is simple, small, unobtrusive, and easy to use - just add it to your Startup group to have it run every time you log into Windows!

Okay, enough with the goofy sales pitch... how about some developer notes?

As I mentioned, HideShowDesktopButtonOnTaskbar works the simplest way you can imagine: it finds the window corresponding to the "Show desktop" button and hides it. Obviously, hidden windows aren't visible - but they also don't receive input (clicks or hover status), so although the taskbar is still listening for input messages, they don't get sent. Running HideShowDesktopButtonOnTaskbar a second time finds and unhides the "Show desktop" button, restoring things back to how they started. Logging off disposes of the entire taskbar and logging back on creates a new one from scratch, so HideShowDesktopButtonOnTaskbar's changes don't persist.
I figured out the right window class to target by using the handy-dandy Spy++ Windows development tool. Specifically, I ran Spy++, clicked the "Find Window" tool, dragged the crosshairs over the "Show desktop" button, and hit OK. That showed the window hierarchy to the right beginning with the desktop window at the top and going down to the "Show desktop" button. Translating that into three nested calls to FindWindowEx was trivial, as was adding a call to ShowWindow to hide the window. Using IsWindowVisible to unhide the button when it was already hidden was just icing on the cake. :)

Aside: If the window hierarchy changes or if any of the hard-coded class names is different in a newer version of Windows, this will stop working... Yep, that's how it is with simple hacks like this - if it matters to anyone, I can always tweak things to accommodate.
Because HideShowDesktopButtonOnTaskbar ended up being easy to write, I was looking for a bit more challenge and set out to make it small. Specifically, the executable is just 11,776 bytes - and 7,015 bytes of that is due to the icon! While I could have squeezed a few more bytes out if I felt like it, the big win was by not linking to the standard C Run-Time library (the use of which results in a 41,472 byte file assuming static linking (i.e., /MT) is used to remove the dependency on MSVCR100.dll). Eschewing the CRT is an advanced scenario, but it was easy to do for HideShowDesktopButtonOnTaskbar because it's so small and simple.

Aside: This is why the code's entry point is named WinMainCRTStartup instead of the usual WinMain. And by the way: if you're interested in reading more about what it means to get rid of the default CRT, Matt Pietrek's classic "Reduce EXE and DLL Size with LIBCTINY.LIB" is a good place to start.
Of course, everything has a price, and I needed to make two other changes as a result of omitting the default CRT. Both are found in Visual Studio's project Properties, Configuration Properties, C/C++, Code Generation settings: changing Basic Runtime Checks (known as <BasicRuntimeChecks> in the .vcxproj file) to Default (i.e., no /RTC? option) and changing Buffer Security Check (<BufferSecurityCheck>) to false (i.e., /GS-). Disabling these checks isn't something you should do in general (especially the latter), but HideShowDesktopButtonOnTaskbar takes no input, has no buffers, and is so simple that I'm (tentatively!) okay sacrificing these two security/resiliency measures.
The call to HeapSetInformation / HeapEnableTerminationOnCorruption is almost definitely overkill - but it's good practice and so easy to add that I did so anyway (FYI that the default CRT call this automatically). Plus, doing this made me feel a little better about losing /GS. :)

If you're bothered by Windows 7's "Show desktop" button and are looking for a solution, maybe HideShowDesktopButtonOnTaskbar is the answer. If you're curious how a hack like this works or are looking to dabble with replacing the CRT in your own programs, there might be something of interest here. Either way, HideShowDesktopButtonOnTaskbar was a fun side project - I hope you like it! :)

The code:

// Include system headers (at warning level 3 because they're not /Wall-friendly) #pragma warning(push, 3)
#include <windows.h> #pragma warning(pop)

// Disable harmless /Wall warning C4514 "unreferenced inline function has been removed" #pragma warning(disable: 4514)

// Default entry point function int __stdcall WinMainCRTStartup()
{
    // Enable "terminate-on-corruption"
    (void)HeapSetInformation(NULL, HeapEnableTerminationOnCorruption, NULL, 0);

    // Find the "Show desktop" button/window (starting from the Desktop window)    const HWND hwndTaskbar = FindWindowEx(NULL, NULL, TEXT("Shell_TrayWnd"), NULL);
    if (NULL != hwndTaskbar)
    {
        const HWND hwndNotify = FindWindowEx(hwndTaskbar, NULL, TEXT("TrayNotifyWnd"), NULL);
        if (NULL != hwndNotify)
        {
            const HWND hwndShowDesktopButton = FindWindowEx(hwndNotify, NULL, TEXT("TrayShowDesktopButtonWClass"), NULL);
            if (NULL != hwndShowDesktopButton)
            {
                // Toggle the visibility of the button                const int nCmdShow = IsWindowVisible(hwndShowDesktopButton) ? SW_HIDE : SW_SHOW;
                (void)ShowWindow(hwndShowDesktopButton, nCmdShow);
            }
        }
    }

    // Return 0 because a message loop wasn't entered    return 0;
}

Tags: Technical Utilities