Posts tagged "Technical"

Free hash [A reusable CRC-32 HashAlgorithm implementation for .NET]

Wednesday, January 14, 2009

In the notes for yesterday's release of the ComputeFileHashes tool (and source code), I mentioned that I'd written my own .NET HashAlgorithm class to compute CRC-32 hash values. The complete implementation can be found below and should behave just like every other HashAlgorithm subclass (ex: MD5 or SHA1). The code here is based on the CRC-32 reference implementation provided in Annex D of the PNG specification and pretty much "just worked". It implements the necessary Initialize, HashCore, and HashFinal methods as well as the technically optional (but practically necessary) Hash and HashSize properties. There's no test code to speak of, though it's worth pointing out that I've run tens of gigabytes of data through my ComputeFileHashes tool and have verified the correctness of the computed CRC-32 value for each test file. :)

Without further ado:

using System;
using System.Security.Cryptography;

namespace Delay
{
    /// <summary>
    /// HashAlgorithm implementation for CRC-32.
    /// </summary>
    [System.Diagnostics.CodeAnalysis.SuppressMessage("Microsoft.Naming",
        "CA1709:IdentifiersShouldBeCasedCorrectly", MessageId = "CRC",
        Justification = "Matching algorithm acronym.")]
    public class CRC32 : HashAlgorithm
    {
        // Shared, pre-computed lookup table for efficiency
        private static readonly uint[] _crc32Table;

        /// <summary>
        /// Initializes the shared lookup table.
        /// </summary>
        [System.Diagnostics.CodeAnalysis.SuppressMessage("Microsoft.Performance",
            "CA1810:InitializeReferenceTypeStaticFieldsInline", Justification =
            "Table values must be computed; not possible to remove the static constructor.")]
        static CRC32()
        {
            // Allocate table
            _crc32Table = new uint[256];

            // For each byte
            for (uint n = 0; n < 256; n++)
            {
                // For each bit
                uint c = n;
                for (int k = 0; k < 8; k++)
                {
                    // Compute value
                    if (0 != (c & 1))
                    {
                        c = 0xedb88320 ^ (c >> 1);
                    }
                    else
                    {
                        c = c >> 1;
                    }
                }

                // Store result in table
                _crc32Table[n] = c;
            }
        }

        // Current hash value
        private uint _crc32Value;

        // True if HashCore has been called
        private bool _hashCoreCalled;

        // True if HashFinal has been called
        private bool _hashFinalCalled;

        /// <summary>
        /// Initializes a new instance.
        /// </summary>
        public CRC32()
        {
            InitializeVariables();
        }

        /// <summary>
        /// Initializes internal state.
        /// </summary>
        public override void Initialize()
        {
            InitializeVariables();
        }

        /// <summary>
        /// Initializes variables.
        /// </summary>
        private void InitializeVariables()
        {
            _crc32Value = uint.MaxValue;
            _hashCoreCalled = false;
            _hashFinalCalled = false;
        }

        /// <summary>
        /// Updates the hash code for the provided data.
        /// </summary>
        /// <param name="array">Data.</param>
        /// <param name="ibStart">Start position.</param>
        /// <param name="cbSize">Number of bytes.</param>
        protected override void HashCore(byte[] array, int ibStart, int cbSize)
        {
            if (null == array)
            {
                throw new ArgumentNullException("array");
            }

            if (_hashFinalCalled)
            {
                throw new CryptographicException(
                    "Hash not valid for use in specified state.");
            }
            _hashCoreCalled = true;

            for (int i = ibStart; i < ibStart + cbSize; i++)
            {
                byte index = (byte)(_crc32Value ^ array[i]);
                _crc32Value = _crc32Table[index] ^ ((_crc32Value >> 8) & 0xffffff);
            }
        }

        /// <summary>
        /// Finalizes the hash code and returns it.
        /// </summary>
        /// <returns></returns>
        protected override byte[] HashFinal()
        {
            _hashFinalCalled = true;
            return Hash;
        }

        /// <summary>
        /// Returns the hash as an array of bytes.
        /// </summary>
        [System.Diagnostics.CodeAnalysis.SuppressMessage("Microsoft.Design",
            "CA1065:DoNotRaiseExceptionsInUnexpectedLocations", Justification =
            "Matching .NET behavior by throwing here.")]
        [System.Diagnostics.CodeAnalysis.SuppressMessage("Microsoft.Usage",
            "CA2201:DoNotRaiseReservedExceptionTypes", Justification =
            "Matching .NET behavior by throwing NullReferenceException.")]
        public override byte[] Hash
        {
            get
            {
                if (!_hashCoreCalled)
                {
                    throw new NullReferenceException();
                }
                if (!_hashFinalCalled)
                {
                    // Note: Not CryptographicUnexpectedOperationException because
                    // that can't be instantiated on Silverlight 4
                    throw new CryptographicException(
                        "Hash must be finalized before the hash value is retrieved.");
                }

                // Convert complement of hash code to byte array
                byte[] bytes = BitConverter.GetBytes(~_crc32Value);

                // Reverse for proper endianness, and return
                Array.Reverse(bytes);
                return bytes;
            }
        }

        // Return size of hash in bits.
        public override int HashSize
        {
            get
            {
                return 4 * 8;
            }
        }
    }
}

The CRC32 class presented here is nothing fancy, but it should be a pretty solid implementation of the once-popular CRC-32 algorithm that's ripe for reuse. Thanks to .NET's HashAlgorithm base class, it's easy to drop in CRC32 anywhere hashes are already being computed. I hope you find it useful!

Updated 2009-01-22: Corrected implementation of HashSize to return hash size in bits instead of bytes.

Updated 2009-02-16: Initialize the _crc32Value variable to its starting value for consistency with the Framework's HashAlgorithm classes where a call to Initialize is not necessary for a newly constructed instance.

Updated 2010-12-06: Added missing ibStart offset to HashCore loop.

Tags: Technical

Trust, but verify [Free tool (and source code) for computing commonly used hash codes!]

Tuesday, January 13, 2009

Internet downloads (particularly large ones) are often published with an associated checksum that can be used to verify that the file was successfully downloaded. While transmission errors are relatively rare, the popularity of file sharing and malware introduce the possibility that what you get isn't always exactly what you wanted. The posting of checksums by the original content publisher attempts to solve this problem by giving the end user an easy way to validate the downloaded file. Checksums are typically computed by applying a cryptographic hash function to the original file; the hash function computes a small (10-30 character) textual "snapshot" of the file that satisfies the following properties (quoting from Wikipedia):

It is easy to compute the hash for any given data

It is extremely difficult to construct a [file] that has a given hash

It is extremely difficult to modify a given [file] without changing its hash

It is extremely unlikely that two different [files] will have the same hash

Because of these four properties, a user can be fairly confident a file has not been tampered with or garbled as long as the checksum computed on their own machine matches what the publisher posted. I say fairly confident because there's always the possibility of a hash collision - two different files with the same checksum. No matter how good the hash function is, the pigeonhole principle guarantees there will ALWAYS be the possibility of collisions. The point of a good hash function is to make this possibility so unlikely as to be impossible for all intents and purposes.

Popular hash functions in use today are MD5 and SHA-1, with CRC-32 rapidly losing favor. Speaking in very broad terms, one might say that the quality of CRC-32 is "not good", MD5 is "good", and SHA-1 is "very good". For now, that is; research is always under way that could render any of these algorithms useless tomorrow... (For more information about the weaknesses of each algorithm, refer to the links above.)

In order for published checksums to be useful, the user needs an easy way to calculate them. I looked around a bit and didn't a lot of free tools for computing these popular hash functions that I was comfortable with, so I wrote my own using .NET. Here it is:

ComputeFileHashes
   Computes CRC32, MD5, and SHA1 hashes for the specified file(s)
   Version 2009-01-12
   http://blogs.msdn.com/Delay/

Syntax: ComputeFileHashes FileOrPattern [FileOrPattern [...]]
   Each FileOrPattern can specify a single file or a set of files

Examples:
   ComputeFileHashes Image.iso
   ComputeFileHashes *.iso
   ComputeFileHashes *.iso *.vhd

[Click here to download ComputeFileHashes along with its complete source code.]

Here's the output of running ComputeFileHashes on the recently released Windows 7 Beta ISO images. Anyone can verify the checksums below on the MSDN Subscriber Downloads site. (Note: You need to be a subscriber to download files from there; everyone else can download from the official Windows 7 link.)

C:\T>ComputeFileHashes.exe "M:\Windows 7\*"

M:\Windows 7\en_windows_7_beta_dvd_x64_x15-29074.iso
CRC32: 8E2FAD39
MD5: 773FC9CC60338C612AF716A2A14F177D
SHA1: E09FDBC1CB3A92CF6CC872040FDAF65553AB62A5

M:\Windows 7\en_windows_7_beta_dvd_x86_x15-29073.iso
CRC32: AABA5A48
MD5: F9DCE6EBD0A63930B44D8AE802B63825
SHA1: 6071184282B2156FF61CDC5260545C078CCA31EE

One of the things that was important to me when writing ComputeFileHashes was performance. Nobody likes to wait, and I'm probably even less patient than the average bear. One of the things I wanted my program to do was take advantage of multi-processing and the multi-core CPUs that are so prevalent these days. So ComputeFileHashes runs the three hash functions in parallel with each other and with loading the next bytes of the file. Theoretically, this can take advantage of four different cores - though in practice my limited testing suggests there's just not enough work to saturate them all. :)

While I paid attention to performance at a macro level (i.e., algorithm design), I didn't worry about it at a micro level (i.e., focused optimization). And I used .NET of all things. (Cue the doubters: "ZOMG, EPIC FAIL!!") If it uses .NET, it must be slow, right? Well, let's do the numbers:

I took some informal measurements of the time to compute hashes for the Windows 7 Beta x64 ISO image. ComputeFileHashes took just under 60 seconds to compute the CRC-32, MD5, and SHA-1 hashes of the 3.15 GB file on my home machine. Not instantaneous by any means, but also pretty reasonable for that amount of data.
Next up was my favorite program ever, Altap Salamander. I use Salamander for all my file management and it comes with a handy plugin for calculating/verifying checksums. Salamander is fully native code, so it should have a distinct performance advantage. Salamander took about 40 seconds to compute the CRC-32 and MD5 checksums in a single pass. This performance is noticeably better than ComputeFileHashes's, but it also doesn't include SHA-1 which is probably the most computationally intensive algorithm. I doubt adding SHA-1 would double the time, but it would almost certainly take longer. It's hard to balance missing features against better performance, so maybe this one is too close to call. :)
Last up was an internal tool I had called gethash. gethash is also native code and supports MD5, SHA-1, and some other hash types - but not CRC-32. However, gethash computes only one checksum a time. So I ran it once for SHA-1 in a time of just over 35 seconds and again for MD5 for another time of just over 35 seconds. The total time for gethash was therefore about 70 seconds and just like before we're missing one of the checksum types. In this case, one could argue that ComputeFileHashes wins the race because it's 10 seconds faster and delivers more value. Of course, if you only need one type of checksum, then you don't care that ComputeFileHashes generated others and gethash can do a single checksum in close to half the time it takes ComputeFileHashes. Again, no clear winner for me.

So depending on how you want to look at things, ComputeFileHashes is either the best or the worst of the lot. :) But recall that we're pitching an unoptimized .NET application against two solid native-code implementations. ComputeFileHashes pretty clearly held its ground and I'd say the doubters don't have much to complain about here.

Implementation notes:

.NET includes the HashAlgorithm abstract class and a variety of concrete subclasses for calculating cryptographic hashes. MD5 and SHA-1 are implemented by the Framework along with a handful of other useful hash functions. (In fact, it's trivial to add support for another HashAlgorithm to ComputeFileHashes simply by editing a table in the source code.) CRC-32 is not part of the Framework (due, I suspect, to its weaknesses), so I wrote my own CRC-32 HashAlgorithm subclass. I'll post more about this tomorrow.
The most efficient multi-hashing implementation would probably update each algorithm together as bytes are read from the file. However, the overhead of doing that with the .NET HashAlgorithm implementation (which sensibly expects blocks of data) seemed prohibitively expensive - and impractical to parallelize. An alternative would be to let different threads hash the complete file on their own. This parallelizes easily, but because the algorithms operate at different speeds, they'd soon be reading different parts of the file - and the disk thrashing would probably ruin performance. So I went with a compromise: ComputeFileHashes loads a 1 MB chunk of the file into memory, lets each of the algorithms process that block to completion on separate threads, then loads the next 1 MB of data and repeats. There's going to be some wasted time after the fastest algorithm completes and the slowest is still working, and there's probably some memory cache thrashing (for similar reasons as the disk) - but as the numbers above indicate, the performance of this approach is really quite reasonable. Of course, it's entirely possible that a different buffer size that would be even faster - so if you do some testing and find something better, please let me know! :)
ComputeFileHashes creates dedicated Threads for each of the hash functions it uses. Those threads stay around for the life of the application and are synchronized by a general-purpose helper class I wrote, WaitingRoom. I'll blog more about this in a couple of days.

ComputeFileHashes is a simple tool intended to make verifying checksums easy for anyone. I've wanted a good, fast, free, all-in-one solution I could trust for some time now, and the recent release of Windows 7 finally prompted me to write my own. I hope others to put ComputeFileHashes to use for themselves - perhaps even for the Windows 7 Beta! :)

Tags: Technical Utilities

The source code IS the executable [Releasing CSI, a C# interpreter (with source and tests) for .NET]

Wednesday, January 7, 2009

A few years ago I found myself spending a lot of time writing batch files to perform a variety of relatively simple tasks. For those who aren't familiar, you can do some surprisingly powerful things with batch files - but it usually involves a lot of trickery and arcane knowledge. I wanted a simpler way of doing things, but I didn't want to create a bunch of compiled executables and lose the elegant, easy-to-modify transparency that batch files offer. These days, I'd probably look to PowerShell for the solution, but back then there was no such thing...

What I did have was the power of .NET and some inspiration from a coworker's tool that was able to take .NET 1.1 source code and execute it "on the fly" without the need for compilation. What I did not have was something similar for .NET 2.0 or access to the source code for that tool (because the author had left the company). So I wrote my own:

============================================
==  CSI: C# Interpreter                   ==
==  Delay (http://blogs.msdn.com/Delay/)  ==
============================================


Summary
=======
CSI: C# Interpreter
     Version 2009-01-06 for .NET 3.5
     http://blogs.msdn.com/Delay/

Enables the use of C# as a scripting language by executing source code files
directly. The source code IS the executable, so it is easy to make changes and
there is no need to maintain a separate EXE file.

CSI (CodeFile)+ (-d DEFINE)* (-r Reference)* (-R)? (-q)? (-c)? (-a Arguments)?
   (CodeFile)+      One or more C# source code files to execute (*.cs)
   (-d DEFINE)*     Zero or more symbols to #define
   (-r Reference)*  Zero or more assembly files to reference (*.dll)
   (-R)?            Optional 'references' switch to include common references
   (-q)?            Optional 'quiet' switch to suppress unnecessary output
   (-c)?            Optional 'colorless' switch to suppress output coloring
   (-a Arguments)?  Zero or more optional arguments for the executing program

The list of common references included by the -R switch is:
   System.dll
   System.Data.dll
   System.Drawing.dll
   System.Windows.Forms.dll
   System.Xml.dll
   PresentationCore.dll
   PresentationFramework.dll
   WindowsBase.dll
   System.Core.dll
   System.Data.DataSetExtensions.dll
   System.Xml.Linq.dll

CSI's return code is 2147483647 if it failed to execute the program or 0 (or
whatever value the executed program returned) if it executed successfully.

Examples:
   CSI Example.cs
   CSI Example.cs -r System.Xml.dll -a ArgA ArgB -Switch
   CSI ExampleA.cs ExampleB.cs -d DEBUG -d TESTING -R


Notes
=====
CSI was inspired by net2bat, an internal .NET 1.1 tool whose author had left
Microsoft. CSI initially added support for .NET 2.0 and has now been extended
to support .NET 3.0 and .NET 3.5. Separate executables are provided to
accommodate environments where the latest version of .NET is not available.


Version History
===============

Version 2009-01-06
Initial public release

Version 2005-12-15
Initial internal release

[Click here to download CSI for .NET 3.5, 3.0, 2.0, and 1.1 - along with the complete source code and test suite.]

Using CSI is quite simple. Here's an example from the release package:

C:\T\CSI>TYPE Samples\HelloWorld.cs
using System;

public class HelloWorld
{
    public static void Main(string[] args)
    {
        Console.WriteLine("Hello world.");
    }
}

C:\T\CSI>CSI.exe Samples\HelloWorld.cs
Hello world.

And if you use the included batch files to register the .CSI file type (more on this in the notes below), it's even easier:

C:\T\CSI>TYPE Samples\Greetings.csi
using System;

public class Greetings
{
    public static void Main(string[] args)
    {
        Console.WriteLine("Hello {0}", string.Join(" ", args));
    }
}

C:\T\CSI>Samples\Greetings.csi out there world.
Hello out there world.

Notes:

Surprisingly, it's fairly straightforward to do what CSI does thanks to the CSharpCodeProvider class that's part of .NET. (In fact, the majority of the code for CSI is related to processing arguments, managing the display, and handling errors!) Behind the scenes, there's no real magic: CSharpCodeProvider is just a wrapper for the C# compiler which generates a temporary assembly that CSI calls into. Everything should work exactly the same under CSI as it would if compiled normally - the only difference is the convenience. :)
I hinted at the benefits of avoiding compilation above, but wanted to explain a bit more. In the usual way of doing things, there is typically (at least) a .CS file and a .EXE file for each tool used by, say a build process. The build process runs the .EXE file, and people refer to the .CS file. If anything ever needs to be changed, someone updates the .CS file, recompiles to generate a new .EXE, and checks both files in together. Usually... But sometimes they forget and only check in the .EXE - or there's some other .EXE-only modification that throws the synchronization of the two files out of whack and then it's just a mess to understand what's going on because the source code no longer matches the executable. CSI avoids that problem by allowing the build process to "run" the .CS file directly. Because there's only one file to manage, there's no chance of it getting out of sync!
Two simple scripts are included that can be used to RegisterCSI.cmd or UnregisterCSI.cmd. When run with administrative privileges, these scripts set up (or remove) a file association for .CSI files (which are just renamed .CS files) that automatically invokes CSI for the specified file. So, as shown in the earlier example, it's possible to run C# code files by name in a command window or by double-clicking them in the Windows Explorer!
There's a separate version of CSI compiled for each major .NET version that's in use today: 1.1, 2.0, 3.0, and 3.5. Aside from some minor functional limitations with the 1.1 version (due to non-existent features in that version of .NET), they're all the same CSI with the same options, etc.. The most obvious difference is the list of default assemblies included by the -R option: each version matches what Visual Studio 2008 provides as the default set of assemblies for a new project targeting the corresponding version of the Framework. The other big difference is that the .NET 3.5 version of CSI uses the C# 3.0 compiler (which can be a little tricky to do until someone shows you how), so it also supports new language features like var and all the LINQy syntax goodness that's available in .NET 3.5.
Editing standalone .CS files can be a bit annoying because the lack of an associated .CSPROJ file means that Visual Studio's IntelliSense isn't available. So I'll usually begin by creating a project in Visual Studio, taking advantage of its rich editing and debugging facilities to get everything working the way I want - then I pull out the resulting .CS file and run it with CSI. If there are any subsequent modifications to the code, they're usually trivial enough that the lack of IntelliSense isn't an issue. Alternatively, you could create a new project and add a link to the existing file in order to enable IntelliSense. OR you could use something like Snippet Compiler which offers a surprisingly rich Visual Studio-like editing and execution environment (with IntelliSense!) and works well with standalone .CS files.
Given that I've already mentioned PowerShell, what's the point of CSI? Well, maybe there isn't one... :) But I think there's still value here - even in a PowerShell world. (In fact, what prompted me to release CSI was an internal request to add support for .NET 3.5, so it seems at least one other person agrees with me.) Aside from knowing a bit about PowerShell and how it works, I haven't used it, so I won't try to do a feature comparison here. Some of the things PowerShell does are very cool and quite useful - and I think it's considerably better than .CMD files for automating typical tasks. Also, I love that it's tightly integrated with the .NET Framework. That said, it's another "language" to learn and not everybody has the time for that. Additionally, I don't think the current development and debugging options are quite as rich as Visual Studio already is. And PowerShell isn't available everywhere, whereas CSI offers a no-touch, install-free solution anywhere the .NET Framework can be found. Lastly, CSI is small (really small!) and very simple.
Though its name bears passing resemblance to a popular TV series, that's unintentional: once you know the C# compiler is named CSC.exe, it's obvious the C# interpreter should be named CSI.exe.

CSI was a fun project in its day and I've done quite a bit with it that would have been fairly tedious otherwise. Today's world offers plenty of alternatives - but if you're comfortable with C# and prefer to stick to one language for all your programming needs, CSI is pretty handy to have around. Because you never know when the urge to code will strike! :)

Tags: Technical Utilities

Easily create an ISO from a CD/DVD [Releasing ExtractISO tool and source]

Thursday, November 6, 2008

A few years back I found myself in need of a tool to create an ISO image from a CD I owned. I searched around a bit, but the only tools I could find to do that were part of much larger programs and cost money I didn't want to spend. The concept seemed simple enough that I figured it would be fairly easy to write my own tool to do this. So I did. :)

ExtractISO is a native, console-mode application that does what its name suggests: extracts the contents of a CD or DVD and saves it to a file. The tool and complete source code are attached to this post as ExtractISO.zip. Here's the documentation that comes with it:

====================================
==  ExtractISO                    ==
==  http://blogs.msdn.com/Delay/  ==
====================================


Summary
=======

ExtractISO - Creates an ISO image file from a CD/DVD

ExtractISO reads raw sectors from the specified CD/DVD volume and writes them
to the specified ISO image file.  The resulting file can be burned to a blank
CD/DVD disc, mounted in a virtual CD/DVD drive, or opened in an ISO viewer.

For a discussion of the limitations of this process, please refer to:
   http://www.cdrfaq.org/faq02.html#S2-28

Note that direct access to a CD/DVD volume requires administrative privileges.

ExtractISO attempts to read damaged media by retrying failed reads a few times
before giving up. In the event of a persistent read failure, the partial output
file is not deleted so that subsequent extraction attempts (possibly after
cleaning the media and/or using a different drive to read it) can be attempted
without the loss of any successfully extracted data.

Syntax: ExtractISO D: File.iso
   D: is the CD/DVD drive to extract from
   File.iso is the file to write to


Version History
===============

Version 1.11, 2008-11-04
Improved formatting of "bytes extracted" display value
Initial public release

Version 1.10, 2006-03-12
Added support for damaged media recovery (see above)

Version 1.01, 2005-05-18
Fixed silly integer overflow problem with "bytes extracted" display
Note: Problem affected display only; no functional impact

Version 1.0, 2005-04-27
Initial release

ExtractISO has been available to anyone inside Microsoft since I wrote it. Last week, I got an unexpected request to make it public so certain customers could use it. I was happy to do so, but one thing had been bothering me for a while: the status display of the bytes extracted so far didn't group by thousands. Instead of "1,000,000", the counter displayed as "1000000" which I find more difficult to parse. This was such a small issue, I never got around to fixing it - but now seemed like the ideal time to do so!

Unfortunately, this formatting task which is so simple in .NET seems to be quite a bit more involved with the Win32 API. The first problem is that printf doesn't support outputting the thousands separator. I considered using something like the sample code the previous link suggests, but wanted something simpler and easier to understand. A bit of research turned up the GetNumberFormat API which seemed like the perfect solution. However, the GetNumberFormat API suffers from at least two notable shortcomings: it's not easy to customize the output and the input needs to be a string. I did a quick test and found that the output of GetNumberFormat for "1000000" is "1,000,000.00" (on my machine using the default United States settings). This is close to what I wanted, but displaying the integral byte count with two digits of decimal precision just seems silly to me. So I looked for an easy way to customize the output via NUMBERFMT, and ran into the same issues Michael talks about in the post I linked to. When further research turned up no better alternatives, I decided to use GetNumberFormat and then "fix" its output by calling GetLocaleInfo(LOCALE_SDECIMAL) and removing everything after the localized decimal character(s). I like this approach because it is almost all platform code (i.e., code I don't have to write/test) and should be correct for all cultures where numbers are written as "1,000,000.00", "1.000.000,00", etc..

Other than the formatting issue I've just discussed, I made no other changes to the ExtractISO implementation. The only things I did were to update some of the VERSIONINFO settings, tweak the Build.cmd script, and recompile with the latest Visual Studio 2008.

ExtractISO works reasonably quickly, but it's worth mentioning that performance was specifically not a goal when I wrote it. The code implements a simple read/write loop with a 1 MB buffer and makes no attempt to interleave the two operations. The large-ish buffer should help minimize the overhead of the read/write calls, but I suspect a different implementation that issues the reads and writes in parallel would be faster. However, achieving that parallelism comes at the price of complexity - and I didn't feel the time it would have taken to write and debug that code was justified here. Even at top speed, the extraction operation is going to take a minute or two - an extra minute on top of that seems a small price to pay for simpler, easier to maintain code. You are welcome to disagree, of course. :) If you develop a faster implementation, I'd love to hear about it!

ExtractISO is a simple tool with a simple purpose - and one that I've used quite happily for a number of years now. If you've got a hankering for ISOs and like free stuff, please have a look at ExtractISO!

PS - ExtractISO cannot be used to duplicate audio CDs or copy-protected DVDs: audio CDs use a different storage scheme than data CDs and copy-protected DVDs store their encryption key in an "inaccessible" location of the disk.

[ExtractISO.zip]

Tags: Technical Utilities

Just a little too eager with the clicking... [Updated binaries and source for MouseButtonClicker]

Friday, October 10, 2008

I described the idea behind my MouseButtonClicker utility in the introductory post for MouseButtonClicker. As you might expect, I've been using this tool ever since I released it a few weeks ago.

All was well - except that every so often I'd get an automatic click that shouldn't have happened. Not frequently enough to stop me from using MouseButtonClicker, but enough to be annoying. I assumed this was simply "jitter" (see the original post for background) beyond what the jitter-prevention code was already filtering out, but I didn't think so because I'd watched the input events and the 2-unit threshold really seemed like it should be large enough.

So one evening when I had a bit of time, I compiled an instrumented version of MouseButtonClicker, started it up, and went about my work. Sure enough, after a half-hour or so, I got an unexpected click! So I had a look at the debug output and found that the bogus click had occurred after a 1-unit jitter movement. But that was below the 2 unit threshold, so the jitter filter should have suppressed the click. To my dismay, it seemed my code had a bug...

And sure enough, once I knew what to look for, I spotted it right off. :) The jitter filter was working fine for the general case - but there was a special case that wasn't handled properly. Specifically, the jitter threshold is always reset when MouseButtonClicker clicks the mouse - because the mouse has stopped moving is likely to stay at rest for a while. However, if the user moved the mouse and clicked the mouse button manually, MouseButtonClicker would suppress its own click (correct behavior) but was neglecting to reset the jitter filter! This problem wasn't immediately obvious to the user because the observable behavior so far was 100% correct - it was only if the mouse was left alone for about 10 minutes and jittered again that the jitter threshold wouldn't do its job and a bogus click would be made.

So I fixed the bug, compiled version 1.01 of MouseButtonClicker, ran with the new bits for a few days to verify the problem was solved (it was!), and updated the public download:

Click here to download the MouseButtonClicker executables (32- and 64-bit) and the complete Visual Studio 2008 source code.

I'm sorry for any trouble this may have caused - I hope the new bits prove even more helpful than the last! :)

Tags: Technical Utilities

"MouseButtonClicker clicks the mouse so you don't have to!" [Releasing binaries and source for a nifty mouse utility]

Thursday, September 25, 2008

I first came across the notion of automatic mouse button clicking some time ago. The basic idea can be summarized as follows:

The nearly universal pattern of mouse use is: move/click/wait... move/click/wait... move/click/wait.... In the overwhelming majority of cases, the only reason the mouse gets moved is to position the pointer over the next user interface element that needs to be clicked. Because every move is immediately followed by a click, it should be possible to simplify the process by performing the click automatically when the mouse stops moving (i.e., moves to a new location and stays still for a few moments). This automatic click saves the user only a tiny bit of effort each time it happens, but it eliminates a conceptually unnecessary, repetitive motion that's carried out many, many times over the course of every day. As an additional ergonomic benefit, automatic clicking enables the user to hold the mouse in a variety of new ways now that it's no longer necessary to keep a finger on the mouse button.

My first reaction was that the automatic clicking behavior would be nearly impossible to live with - and for the first couple of minutes I tried using it, it was. :) But, once I got out of the habit of wiggling the mouse needlessly and got into the habit of moving it where I needed when I needed, I found I quite liked the behavior after all. As I became more comfortable, I got used to the timing and found I could manage double clicks with just a single click by timing that click to happen right after the automatic click! The only challenge was to learn what parts of the user interface do nothing when clicked on - because these are the "safe areas" where the mouse can be "parked" if you start moving it and then decide you don't really want to click anything after all.

I found I liked automatic clicking so much, I wrote my own tool to implement it - and I've used that tool happily for nearly a decade. But my tool has a specific hardware dependency as a consequence of some other functionality it implements and that became a problem a few days ago when I changed hardware. So I decided this was a great opportunity to extract the clicking functionality into its own, dedicated, hardware-agnostic tool - and then blog about it!

The new tool I wrote is called MouseButtonClicker and does exactly what I describe above. It's written completely in native code - partly because I saw no need for the overhead of managed code for a simple "always-on" scenario like this one - and partly because it was a good excuse to brush up on my (rusty) native coding skills. MouseButtonClicker is a UI-less application - which means it doesn't have a window, or a notification icon, or anything to look at - it just runs invisibly and does its job. Of course, it wouldn't be hard to add a simple notification icon, but that's unnecessary for my purposes: MouseButtonClicker auto-starts with my computer and never gets closed. If you find you need to exit MouseButtonClicker, simply kill it with Task Manager or via TaskKill /IM MouseButtonClicker.exe - MouseButtonClicker maintains no persistent state and terminating it is completely safe to do at any time. And while the standard 32-bit executable works just fine on 64-bit operating systems, I've also provided a 64-bit version for those who prefer not to mix their bits. :)

Click here to download the MouseButtonClicker executables (32- and 64-bit) and the complete Visual Studio 2008 source code.

Implementation notes:

MouseButtonClicker finds out about mouse activity via the raw input API which provides low-level access to input devices like mice and keyboards in a simple, flexible manner. Using the raw input API makes it easy to distinguish between mouse input (where automatic clicking makes sense) and input from a tablet pen or digitizer device (where it does not make sense). This is of particular interest to me because I'm a long-time user of Wacom's fine tablet products and frequently switch between mouse and pen while using my computer; having the wrong behavior for the pen would be simply unacceptable.
MouseButtonClicker generates the automatic click with the SendInput function which provides an easy way to inject input events into the system. This works quite nicely, but it's worth noting that on recent operating systems like Vista and Server 2008, the SendInput API is restricted by User Interface Privilege Isolation. Specifically, MSDN notes that "Applications are permitted to inject input only into applications that are at an equal or lesser integrity level". It's my experience that this restriction usually exhibits itself as an inability to click processes that are running in an elevated security context (i.e., as Administrator) or to click in Command Windows that aren't currently active.
There are a couple of cases where MouseButtonClicker suppresses the automatic click. For example, there will be no automatic click if any mouse button has been manually clicked since the mouse stopped moving or if any of the mouse buttons is currently pressed. As such, the automatic click doesn't interfere with right clicking to get a context menu, pausing while dragging a window around the screen, or simply clicking the button yourself when you're in too much of a hurry to let MouseButtonClicker do so for you!
Some mouse hardware exhibits a small amount of "jitter" that causes single unit move messages from time to time (note: the size of a unit is less than that of a pixel, so these jitters aren't usually visible). My thinking is that this jitter is due to the mouse being *just* on the threshold of moving and then getting nudged slightly (perhaps the table gets bumped, a butterfly flaps its wings, etc.). Whatever the cause, this occasional jitter is an annoyance because it results in occasional - seemingly unprovoked - clicks wherever the mouse happens to be. So MouseButtonClicker has a small jitter-prevention measure that requires the mouse to move at least a little bit (2 units) before an automatic click will be performed.
MouseButtonClicker deliberately does not clean up its application-level resources (like the window class it registers) because Windows contractually promises to clean such things up during process exit. I formerly believed this to be lazy, but eventually decided that the best code is code I don't have to write, debug, and maintain. :) For cases like this where someone else is already going to handle resource clean-up - probably more quickly and efficiently than I could - it seems prudent to let them do so. This approach avoids the possibility of bugs in clean-up code, keeps the executable size down (which avoids unnecessary disk access), and keeps the source code just a little bit simpler.
The COMPILE_TIME_ASSERT macro provides a handy way of implementing a compile-time assert (also known as a static assert). Compile-time asserts are handy because they do the same thing normal asserts do, but they take effect at compile time and will actually fail the build process if the specified condition is not true. So instead of having to hope that the test cases exercise all the assertions in the code, the simple presence of a successful compile guarantees that all compile-time asserts are valid!
The variably-sized buffer for the WM_INPUT message is allocated on the stack via the _malloca/_freea CRT library calls. Allocating memory on the stack isn't usually appropriate, but for situations like this where a small, short-lived buffer of unknown size is needed, allocating on the stack can be more efficient than allocating on the heap.
The delay before the automatic click is determined by the result of the GetDoubleClickTime() API. Though the delay MouseButtonClicker implements is not technically for a double-click, the result of this call is the unofficial basis for a variety of different UI timings across the system.
Yes, I drew the icon myself; no, I don't have any artistic skill. :) It's supposed to be a robotic finger clicking a mouse button, but I agree it's probably more of a Rorschach inkblot test... If you can do better and want to send me an ICO file with 32x32 and 16x16 images I can freely redistribute, I will pick my favorite submission, include it with the next version of MouseButtonClicker, and credit you publically for your contribution.

So if you're in the mood to try something new and you don't mind a bit of a learning curve, give MouseButtonClicker a try for an hour or two. If you're lucky, you may never need to click again... :)

PS - For the benefit of casual readers and search engines, here is MouseButtonClicker.cpp:

// MouseButtonClicker - http://blogs.msdn.com/Delay
//
// "MouseButtonClicker clicks the mouse so you don't have to!"
//
// Simple Windows utility to click the primary mouse button whenever the mouse
// stops moving as a usability convenience/enhancement.


#include <windows.h>
#include <tchar.h>
#include <malloc.h>
#include <strsafe.h>

// Macro for compile-time assert
#define COMPILE_TIME_ASSERT(e) typedef int CTA[(e) ? 1 : -1]

// Constants for code simplification
#define RI_ALL_MOUSE_BUTTONS_DOWN (RI_MOUSE_BUTTON_1_DOWN | RI_MOUSE_BUTTON_2_DOWN | RI_MOUSE_BUTTON_3_DOWN | RI_MOUSE_BUTTON_4_DOWN | RI_MOUSE_BUTTON_5_DOWN)
#define RI_ALL_MOUSE_BUTTONS_UP (RI_MOUSE_BUTTON_1_UP | RI_MOUSE_BUTTON_2_UP | RI_MOUSE_BUTTON_3_UP | RI_MOUSE_BUTTON_4_UP | RI_MOUSE_BUTTON_5_UP)

// Check that bit-level assumptions are correct
COMPILE_TIME_ASSERT(RI_MOUSE_BUTTON_1_DOWN == (RI_MOUSE_BUTTON_1_UP >> 1));
COMPILE_TIME_ASSERT(RI_MOUSE_BUTTON_2_DOWN == (RI_MOUSE_BUTTON_2_UP >> 1));
COMPILE_TIME_ASSERT(RI_MOUSE_BUTTON_3_DOWN == (RI_MOUSE_BUTTON_3_UP >> 1));
COMPILE_TIME_ASSERT(RI_MOUSE_BUTTON_4_DOWN == (RI_MOUSE_BUTTON_4_UP >> 1));
COMPILE_TIME_ASSERT(RI_MOUSE_BUTTON_5_DOWN == (RI_MOUSE_BUTTON_5_UP >> 1));
COMPILE_TIME_ASSERT(RI_ALL_MOUSE_BUTTONS_DOWN == (RI_ALL_MOUSE_BUTTONS_UP >> 1));

// Macro for absolute value
#define ABS(v) ((0 <= v) ? v : -v)

// Application-level constants
#define APPLICATION_NAME (TEXT("MouseButtonClicker"))
#define TIMER_EVENT_ID (1)
#define MOUSE_MOVE_THRESHOLD (2)

// Window procedure
LRESULT CALLBACK WndProc(const HWND hWnd, const UINT message, const WPARAM wParam, const LPARAM lParam)
{
    // Tracks the mouse move delta threshold across calls to WndProc
    static LONG lLastClickDeltaX = 0;
    static LONG lLastClickDeltaY = 0;
    static bool fOkToClick = false;

    switch (message)
    {
    // Raw input message
    case WM_INPUT:
        {
            // Query for required buffer size
            UINT cbSize = 0;
            if(0 == GetRawInputData(reinterpret_cast<HRAWINPUT>(lParam), RID_INPUT, NULL, &cbSize, sizeof(RAWINPUTHEADER)))
            {
                // Allocate buffer on stack (falls back to heap)
                const LPVOID pData = _malloca(cbSize);
                if(NULL != pData)
                {
                    // Get raw input data
                    if(cbSize == GetRawInputData(reinterpret_cast<HRAWINPUT>(lParam), RID_INPUT, pData, &cbSize, sizeof(RAWINPUTHEADER)))
                    {
                        // Only interested in mouse input
                        const RAWINPUT* const pRawInput = static_cast<LPRAWINPUT>(pData);
                        if (RIM_TYPEMOUSE == pRawInput->header.dwType)
                        {
                            // Only interested in devices that use relative coordinates
                            // Specifically, input from pens/tablets is ignored
                            if(0 == (pRawInput->data.mouse.usFlags & MOUSE_MOVE_ABSOLUTE))
                            {
                                // Tracks the state of the mouse buttons across calls to WndProc
                                static UINT usMouseButtonsDown = 0;

                                // Update mouse delta variables
                                lLastClickDeltaX += pRawInput->data.mouse.lLastX;
                                lLastClickDeltaY += pRawInput->data.mouse.lLastY;

                                // Enable clicking once the mouse has exceeded the threshold in any direction
                                fOkToClick |= ((MOUSE_MOVE_THRESHOLD < ABS(lLastClickDeltaX)) || (MOUSE_MOVE_THRESHOLD < ABS(lLastClickDeltaY)));

                                // Determine the input type
                                const UINT usButtonFlags = pRawInput->data.mouse.usButtonFlags;
                                if(0 == (usButtonFlags & (RI_ALL_MOUSE_BUTTONS_DOWN | RI_ALL_MOUSE_BUTTONS_UP | RI_MOUSE_WHEEL)))
                                {
                                    // Mouse move: (Re)set click timer if no buttons down and mouse moved enough to avoid jitter
                                    if((0 == usMouseButtonsDown) && fOkToClick)
                                    {
                                        // Use double-click time as an indication of the user's responsiveness preference
                                        (void)SetTimer(hWnd, TIMER_EVENT_ID, GetDoubleClickTime(), NULL);
                                    }
                                }
                                else
                                {
                                    // Mouse button down/up or wheel rotation: Cancel click timer
                                    (void)KillTimer(hWnd, TIMER_EVENT_ID);

                                    // Update mouse button state variable (asserts above ensure the bit manipulations are correct)
                                    usMouseButtonsDown |= (usButtonFlags & RI_ALL_MOUSE_BUTTONS_DOWN);
                                    usMouseButtonsDown &= ~((usButtonFlags & RI_ALL_MOUSE_BUTTONS_UP) >> 1);
                                }
                            }
                        }
                    }
                    // Free buffer
                    (void)_freea(pData);
                }
            }
        }
        break;
    // Timer message
    case WM_TIMER:
        {
            // Timeout, stop timer and click primary button
            (void)KillTimer(hWnd, TIMER_EVENT_ID);
            INPUT pInputs[2] = {0};
            pInputs[0].type = INPUT_MOUSE;
            pInputs[0].mi.dwFlags = MOUSEEVENTF_LEFTDOWN;
            pInputs[1].type = INPUT_MOUSE;
            pInputs[1].mi.dwFlags = MOUSEEVENTF_LEFTUP;
            (void)SendInput(2, pInputs, sizeof(INPUT));

            // Reset mouse delta and threshold variables
            lLastClickDeltaX = 0;
            lLastClickDeltaY = 0;
            fOkToClick = false;
        }
        break;
    // Close message
    case WM_DESTROY:
        (void)PostQuitMessage(0);
        break;
    // Unhandled message
    default:
        return DefWindowProc(hWnd, message, wParam, lParam);
    }
    // Return value 0 indicates message was processed
    return 0;
}

// WinMain entry point
int APIENTRY _tWinMain(const HINSTANCE hInstance, const HINSTANCE hPrevInstance, const LPTSTR lpCmdLine, const int nCmdShow)
{
    // Avoid compiler warnings for unreferenced parameters
    UNREFERENCED_PARAMETER(hPrevInstance);
    UNREFERENCED_PARAMETER(lpCmdLine);
    UNREFERENCED_PARAMETER(nCmdShow);

    // Create a mutex to prevent running multiple simultaneous instances
    const HANDLE mutex = CreateMutex(NULL, FALSE, APPLICATION_NAME);
    if((NULL != mutex) && (ERROR_ALREADY_EXISTS != GetLastError()))
    {
        // Register the window class
        WNDCLASS wc = {0};
        wc.lpfnWndProc = WndProc;
        wc.hInstance = hInstance;
        wc.lpszClassName = APPLICATION_NAME;
        if(0 != RegisterClass(&wc))
        {
            // Create a message-only window to receive WM_INPUT and WM_TIMER
            const HWND hWnd = CreateWindow(APPLICATION_NAME, APPLICATION_NAME, 0, CW_USEDEFAULT, CW_USEDEFAULT, CW_USEDEFAULT, CW_USEDEFAULT, HWND_MESSAGE, NULL, hInstance, NULL);
            if (NULL != hWnd)
            {
                // Register for the mouse's raw input data
                RAWINPUTDEVICE rid = {0};
                rid.usUsagePage = 1;  // HID_DEVICE_SYSTEM_MOUSE
                rid.usUsage = 2;  // HID_DEVICE_SYSTEM_MOUSE
                rid.dwFlags = RIDEV_INPUTSINK;
                rid.hwndTarget = hWnd;
                if(RegisterRawInputDevices(&rid, 1, sizeof(rid)))
                {
                    // Pump Windows messages
                    MSG msg = {0};
                    while (GetMessage(&msg, NULL, 0, 0))
                    {
                        TranslateMessage(&msg);
                        DispatchMessage(&msg);
                    }
                    // Return success
                    return static_cast<int>(msg.wParam);
                }
            }
        }
        // Failed to initialize, output a diagnostic message (which is not more
        // friendly because it represents a scenario that should never occur)
        TCHAR szMessage[64];
        if(SUCCEEDED(StringCchPrintf(szMessage, sizeof(szMessage)/sizeof(szMessage[0]), TEXT("Initialization failure. GetLastError=%d\r\n"), GetLastError())))
        {
            (void)MessageBox(NULL, szMessage, APPLICATION_NAME, MB_OK | MB_ICONERROR);
        }
    }
    // Return failure
    return 0;
    // By contract, Windows frees all resources as part of process exit
}

Tags: Technical Utilities

Blogging code samples a tad more easily [Updated free ConvertClipboardRtfToHtmlText tool and source code!]

Thursday, April 3, 2008

Kind readers gave some great feedback on my previous post of the ConvertClipboardRtfToHtmlText tool and source code. Accordingly, I have made three small tweaks to the tool:

The code to detect the start of the RTF text worked only if Visual Studio's font size was set to 8pt. That's what I use, but it's not the default, so this would cause problems for most people who tried the tool. The relevant code no longer looks for a specific font-size.
Tab characters in the RTF text were ignored, causing layout problems for code with tabs (vs. spaces). Tabs are now auto-expanded to the mostly-standard value of 4 spaces.
The use of the private Color class was unnecessary because it added nothing over System.Drawing.Color. System.Drawing.Color is now used to save a few lines of code.

The sample code and tool in the previous post have been updated with these changes, so please go there to get the latest version.

Tags: Technical Utilities

Blogging code samples should be easy [Free ConvertClipboardRtfToHtmlText tool and source code!]

Friday, March 14, 2008

I've been including a lot of source code examples in my blog lately and needed a good way to paste properly formatted code in blog posts. I write my posts in HTML and wanted a tool that was easy to use, simple to install, produced concise HTML, and worked well with Visual Studio 2008. I was aware of a handful of tools for this purpose but none of them were quite what I was looking for, so I wrote my own one evening. :)

Caveat: ConvertClipboardRtfToHtmlText is a simple utility written for my specific scenario. I'm releasing the tool and code here in case anyone wants to use it, enhance it, or whatever. I have NOT attempted to write a solid, general-purpose RTF-to-HTML converter. Instead, ConvertClipboardRtfToHtmlText assumes its input is in the exact format used by Visual Studio 2008 (and probably VS 2005).

Using ConvertClipboardRtfToHtmlText is simple:

Copy code (C#, XAML, etc.) to clipboard from Visual Studio 2008
Run ConvertClipboardRtfToHtmlText to convert the RTF representation of the code on the clipboard to HTML as text (there's no UI because the tool does the conversion and immediately exits)
Paste the HTML code on the clipboard into your blog post, web page, etc.

I'm the first to acknowledge there's a lot of room for improvement in step #2. :) While the current implementation is good enough for my purposes, I encourage interested parties to consider turning this into a real application - maybe with a notify icon and a system-wide hotkey. If you do so, please let me know because I'd love to check it out!

The compiled tool and code are available in a ZIP file attached to this post. The complete source code is also available below. Naturally, I used ConvertClipboardRtfToHtmlText to post its own source code. :)

Enjoy!

Updated 2008-04-02: Minor changes to code and tool

using System;
using System.Collections.Generic;
using System.Drawing;
using System.Text;
using System.Windows.Forms;

// Convert Visual Studio 2008 RTF clipboard format into HTML by replacing the
// clipboard contents with its HTML representation in text format suitable for
// pasting into a web page or blog.
// USE: Copy to clipboard in VS, run this app (no UI), paste converted text
// NOTE: This is NOT a general-purpose RTF-to-HTML converter! It works well
// enough on the simple input I've tried, but may break for other input.
// TODO: Convert into a real application with a notify icon and hotkey.
namespace ConvertClipboardRtfToHtmlText
{
    static class ConvertClipboardRtfToHtmlText
    {
        private const string colorTbl = "\\colortbl;\r\n";
        private const string colorFieldTag = "cf";
        private const string tabExpansion = "    ";

        [STAThread]
        static void Main()
        {
            if (Clipboard.ContainsText(TextDataFormat.Rtf))
            {
                // Create color table, populate with default color
                List<Color> colors = new List<Color>();
                Color defaultColor = Color.FromArgb(0, 0, 0);
                colors.Add(defaultColor);

                // Get RTF
                string rtf = Clipboard.GetText(TextDataFormat.Rtf);

                // Parse color table
                int i = rtf.IndexOf(colorTbl);
                if (-1 != i)
                {
                    i += colorTbl.Length;
                    while ((i < rtf.Length) && ('}' != rtf[i]))
                    {
                        // Add color to color table
                        SkipExpectedText(rtf, ref i, "\\red");
                        byte red = (byte)ParseNumericField(rtf, ref i);
                        SkipExpectedText(rtf, ref i, "\\green");
                        byte green = (byte)ParseNumericField(rtf, ref i);
                        SkipExpectedText(rtf, ref i, "\\blue");
                        byte blue = (byte)ParseNumericField(rtf, ref i);
                        colors.Add(Color.FromArgb(red, green, blue));
                        SkipExpectedText(rtf, ref i, ";");
                    }
                }
                else
                {
                    throw new NotSupportedException("Missing/unknown colorTbl.");
                }

                // Find start of text and parse
                i = rtf.IndexOf("\\fs");
                if (-1 != i)
                {
                    // Skip font size tag
                    while ((i < rtf.Length) && (' ' != rtf[i]))
                    {
                        i++;
                    }
                    i++;

                    // Begin building HTML text
                    StringBuilder sb = new StringBuilder();
                    sb.AppendFormat("<pre><span style='color:#{0:x2}{1:x2}{2:x2}'>",
                        defaultColor.R, defaultColor.G, defaultColor.B);
                    while (i < rtf.Length)
                    {
                        if ('\\' == rtf[i])
                        {
                            // Parse escape code
                            i++;
                            if ((i < rtf.Length) &&
                                (('{' == rtf[i]) || ('}' == rtf[i]) || ('\\' == rtf[i])))
                            {
                                // Escaped '{' or '}' or '\'
                                sb.Append(rtf[i]);
                            }
                            else
                            {
                                // Parse tag
                                int tagEnd = rtf.IndexOf(' ', i);
                                if (-1 != tagEnd)
                                {
                                    if (rtf.Substring(i, tagEnd - i).StartsWith(colorFieldTag))
                                    {
                                        // Parse color field tag
                                        i += colorFieldTag.Length;
                                        int colorIndex = ParseNumericField(rtf, ref i);
                                        if ((colorIndex < 0) || (colors.Count <= colorIndex))
                                        {
                                            throw new NotSupportedException("Bad color index.");
                                        }

                                        // Change to new color
                                        sb.AppendFormat(
                                            "</span><span style='color:#{0:x2}{1:x2}{2:x2}'>",
                                            colors[colorIndex].R, colors[colorIndex].G,
                                            colors[colorIndex].B);
                                    }
                                    else if("tab" == rtf.Substring(i, tagEnd-i))
                                    {
                                        sb.Append(tabExpansion);
                                    }

                                    // Skip tag
                                    i = tagEnd;
                                }
                                else
                                {
                                    throw new NotSupportedException("Malformed tag.");
                                }
                            }
                        }
                        else if ('}' == rtf[i])
                        {
                            // Terminal curly; done
                            break;
                        }
                        else
                        {
                            // Normal character; HTML-escape '<', '>', and '&'
                            switch (rtf[i])
                            {
                                case '<':
                                    sb.Append("&lt;");
                                    break;
                                case '>':
                                    sb.Append("&gt;");
                                    break;
                                case '&':
                                    sb.Append("&amp;");
                                    break;
                                default:
                                    sb.Append(rtf[i]);
                                    break;
                            }
                        }
                        i++;
                    }

                    // Finish building HTML text
                    sb.Append("</span></pre>");

                    // Update the clipboard text
                    Clipboard.SetText(sb.ToString());
                }
                else
                {
                    throw new NotSupportedException("Missing text section.");
                }
            }
        }

        // Skip the specified text
        private static void SkipExpectedText(string s, ref int i, string text)
        {
            foreach (char c in text)
            {
                if ((s.Length <= i) || (c != s[i]))
                {
                    throw new NotSupportedException("Expected text missing.");
                }
                i++;
            }
        }

        // Parse a numeric field
        private static int ParseNumericField(string s, ref int i)
        {
            int value = 0;
            while ((i < s.Length) && char.IsDigit(s[i]))
            {
                value *= 10;
                value += s[i] - '0';
                i++;
            }
            return value;
        }
    }
}

[ConvertClipboardRtfToHtmlText.zip]

Tags: Technical Utilities

An easy way to keep your windows where you want them [Releasing WindowPlacementTool with source code!]

Tuesday, July 3, 2007

I wrote WindowPlacementTool in December of 2000 to solve a problem I had after beginning to use Terminal Services/Remote Desktop regularly. I made WindowPlacementTool available internally in 2001. Last week someone asked about getting access the source code to make some customizations and I figured I'd post the tool and its source here for anyone to use.

Download WindowPlacementTool and its source code by clicking here.

Details:

Summary
=======

If you're picky about the layout of the windows on your desktop or if you
connect to your machine with Terminal Services at differing resolutions, you're
probably annoyed by having to re-layout your windows on a regular basis.  It
seems like something (or someone!) is always coming along and messing with your
layout.  But now that's a problem of the past; WindowPlacementTool can do all
the work for you!  Just run it once to capture the layout you like, and then
run it again whenever you need to restore that layout.  And because you can
save multiple layouts, switching resolutions is a breeze.  Yep, it's that easy!


Command Line Help
=================

WindowPlacementTool
Copyright (c) 2000 by David Anson (DavidAns@Microsoft.com)

[-h | -help | -?]
        This help screen

[-c | -capture] [Capture_file_name.txt]
        Capture the current window positions to a file (or standard output if
        no file name is given)

[Restore_file_name_1.txt] [Restore_file_name_2.txt] ...
        Restore the window positions from the data previously captured in the
        specified file(s) (or standard input if no file name is given)


Example Setup
=============

[Layout your windows however you'd like them]

[Capture the current layout to a file]
C:\Temp>WindowPlacementTool.exe -c 800x600.txt

[Optional: Edit the file to remove any programs you don't care about]
C:\Temp>notepad 800x600.txt

[Optional: Create a shortcut on your desktop for easy access to this layout]
[Here, the shortcut would run "WindowPlacementTool.exe C:\Temp\800x600.txt"]


Example Use
===========

[Run the shortcut you created above or run WindowPlacementTool manually]
C:\Temp>WindowPlacementTool.exe 800x600.txt


Notes
=====

* WindowPlacementTool saves the RESTORED locations of windows.  If a window is
  currently maximized or minimized, its restored location will be adjusted, but
  the window will not be un-maximized or un-minimized by WindowPlacementTool.
* I have placed a shortcut to a layout for each resolution I use on my taskbar
  so that it's always available when I need it.

Additional notes:

When restoring window positions, the window class name must match the saved value exactly, but the window title comparison is just a substring match. This makes it easy to target windows that change their window title depending on the current document. (Example: "Untitled - Notepad" and "File.txt - Notepad" will both match the window title string "Notepad".)
The code for WindowPlacementTool was written many years ago and may not follow conventional coding standards. Feel free to reformat it in your favorite editor. :)
I've made only the necessary modifications to get the original code compiling successfully under Visual Studio 2005 SP1. I specifically did NOT spend time resolving the four new compiler warnings.
While I've always tried to take security and correctness seriously, the original code was written well before security was as big an issue as it is today. Recent modifications to Microsoft's CRT have deprecated some functions in favor of safer alternatives. This is the source of 3 of the 4 compiler warnings; read more about security enhancements in the CRT for additional details. The 4th compiler warning is due to improved CRT warnings associated with the increasing popularity of 64-bit computing (specifically compiler warning C4267). My code is typically error and warning free at warning level 4, but it's hard to be immune to future enhancements to the compiler/CRT. :)
Parts of the original code (ASSERT, VERIFY, ARRAY_LENGTH, WideString/AnsiString, _sntprintfz) used a helper library I wrote - I've pulled the relevant bits of that library into the main CPP file to remove the external dependency. Strsafe.h wasn't around at the time, but some of the code in this helper library of mine was already doing similar things. For example, the *z string functions improved upon the * versions by preventing buffer overflows and guaranteeing null-termination of the target string (i.e., strcpyz vs. strcpy).
While the source code compiles under VS 2005, the EXE included in the ZIP archive contains the bits as they were compiled back in 2000 with whatever compiler was current at the time.

WindowPlacementTool has served me well over the years - I hope others find it useful and/or educational!

Tags: Technical Utilities

Powerful log file analysis for everyone [Releasing TextAnalysisTool.NET!]

Friday, June 22, 2007

A number of years ago, the product team I was on spent a lot of team analyzing large log files. These log files contained thousands of lines of output tracing what the code was doing, what its current state was, and gobs of other diagnostic information. Typically, we were only interested in a handful of lines - but had no idea which ones at first. Often one would start by searching for a generic error message, get some information from that, search for some more specific information, obtain more context, and continue on in that manner until the problem was identified. It was usually the case that interesting lines were spread across the entire file and could only really be understood when viewed together - but gathering them all could be inconvenient. Different people had different tricks and tools to make different aspects of the search more efficient, but nothing really addressed the end-to-end scenario and I decided I'd try to come up with something better.

TextAnalysisTool was first released to coworkers in July of 2000 as a native C++ application written from scratch. It went through a few revisions over the next year and a half and served myself and others well during that time. Later, as the .NET Framework became popular, I decided it would be a good learning exercise to rewrite TextAnalysisTool to run on .NET as a way to learn the Framework and make some architectural improvements to the application. TextAnalysisTool.NET was released in February of 2003 as a fully managed .NET 1.0 C# application with the same functionality of the C++ application it replaced. TextAnalysisTool.NET has gone through a few revisions since then and has slowly made its way across parts of the company. (It's always neat to get an email from someone in a group I had no idea was using TextAnalysisTool.NET!) TextAnalysisTool.NET is popular enough among its users that I started getting requests to make it available outside the company so that customers could use it to help with investigations.

The effort of getting something posted to Microsoft.com seemed overwhelming at the time, so TextAnalysisTool.NET stayed internal until now. With the latest request, I realized my blog would be a great way to help internal groups and customers by making TextAnalysisTool.NET available to the public!

You can download the latest version of TextAnalysisTool.NET by clicking here (or on the image above).

In the above demonstration of identifying the errors and warnings from sample build output, note how the use of regular expression text filters and selective hiding of surrounding content make it easy to zoom in on the interesting parts of the file - and then zoom out to get context.

Additional information can be found in the TextAnalysisTool.NET.txt file that's included in the ZIP download (or from within the application via Help | Documentation). The first section of that file is a tutorial and the second section gives a more detailed overview of TextAnalysisTool.NET (excerpted below). The download also includes a ReadMe.txt with release notes and a few other things worth reading.

The Problem: For those times when you have to analyze a large amount of textual data, picking out the relevant line(s) of interest can be quite difficult. Standard text editors usually provide a generic "find" function, but the limitations of that simple approach quickly become apparent (e.g., when it is necessary to compare two or more widely separated lines). Some more sophisticated editors do better by allowing you to "bookmark" lines of interest; this can be a big help, but is often not enough.

The Solution: TextAnalysisTool.NET - a program designed from the start to excel at viewing, searching, and navigating large files quickly and efficiently. TextAnalysisTool.NET provides a view of the file that you can easily manipulate (through the use of various filters) to display exactly the information you need - as you need it.

Filters: Before displaying the lines of a file, TextAnalysisTool.NET passes the lines of that file through a set of user-defined filters, dimming or hiding all lines that do not satisfy any of the filters. Filters can select only the lines that contain a sub-string, those that have been marked with a particular marker type, or those that match a regular expression. A color can be associated with each filter so lines matching a particular filter stand out and so lines matching different filters can be easily distinguished. In addition to the normal "including" filters that isolate lines of text you DO want to see, there are also "excluding" filters that can be used to suppress lines you do NOT want to see. Excluding filters are configured just like including filters but are processed afterward and remove all matching lines from the set. Excluding filters allow you to easily refine your search even further.

Markers: Markers are another way that TextAnalysisTool.NET makes it easy to navigate a file; you can mark any line with one or more of eight different marker types. Once lines have been marked, you can quickly navigate between similarly marked lines - or add a "marked by" filter to view only those lines.

Find: TextAnalysisTool.NET also provides a flexible "find" function that allows you to search for text anywhere within a file. This text can be a literal string or a regular expression, so it's easy to find a specific line. If you decide to turn a find string into a filter, the history feature of both dialogs makes it easy.

Summary: TextAnalysisTool.NET was written with speed and ease of use in mind throughout. It saves you time by allowing you to save and load filter sets; it lets you import text by opening a file, dragging-and-dropping a file or text from another application, or by pasting text from the clipboard; and it allows you to share the results of your filters by copying lines to the clipboard or by saving the current lines to a file. TextAnalysisTool.NET supports files encoded with ANSI, UTF-8, Unicode, and big-endian Unicode and is designed to handle large files efficiently.

I maintain a TODO list with a variety of user requests, but I thought I'd see what kind of feedback I got from releasing TextAnalysisTool.NET to the public before I decide where to go with the next release. I welcome suggestions - and problem reports - so please share them with me if you've got any!

I hope you find TextAnalysisTool.NET useful as I have!

Tags: Technical TextAnalysisTool Utilities