Posts tagged "Utilities"

The proverbial "one line fix" [ComputeFileHashes works around a troublesome Silverlight-on-Mac issue]

Sunday, February 1, 2009

When I achieved cross-platform parity by adding MD5 support to the Silverlight version of ComputeFileHashes, I thought I was done for a while. But then I got an email from a coworker reporting that the Silverlight version of ComputeFileHashes running on a Mac under Safari presented an "Add Files" dialog that did not actually let the user select any files. Ouch, that's no good...

I started investigating with a quick web search; the top hit for "OpenFileDialog Mac" showed that others had experienced similar problems and the Silverlight team confirmed a bug. So at least my application wasn't totally broken. :) I wanted to understand the scenario better, but I don't own a Mac (which is why this problem escaped my notice in the first place). Fortunately, I found one at work that I could borrow some cycles on and I wrote a simple test application to invoke the OpenFileDialog with a few different values for the Filter property. ComputeFileHashes was initially passing the value "All Files (*)|*" - effectively just "*" - which was intended to match all files. And, indeed, it does so in WPF and Silverlight/PC. However, on Silverlight/Mac that value seems to match no files. Someone suggested "*.*", but to me that matches all files with a '.' in their name and I didn't want to exclude files that don't happen to have an extension. So I tried "" instead, and that did exactly what I wanted on Silverlight/Mac and Silverlight/PC. I thought I'd found the solution - until I tried the new value on WPF and it caused an exception...

At this point I was tired of cross-platform trial-and-error, and I decided I was inviting trouble by passing any filter string at all! The default behavior of OpenFileDialog is to allow the selection of all files, so I wasn't really adding much value by passing a custom filter that did the same thing. Well, I was providing more explicit filter text in the drop-down of the dialog, but it wasn't worth the compatibility problems I was dealing with. So I removed the line of code that set the Filter property, recompiled, republished, and called it done. :)

The latest version of ComputeFileHashes is now 2009-01-30. I've updated all the binaries in order to avoid version number confusion, but the only real change here is the filter string and the improvement is only visible on Silverlight/Mac. (Note: I did not update the screenshots below, so the versions shown there are out of date.)

If you're using Silverlight to run ComputeFileHashes, you'll automatically get the new version next time you run ComputeFileHashes.
If you're using ClickOnce to run ComputeFileHashes, the application will automatically update itself after you run it a couple of times.
If you're using the WPF or command-line versions, you'll need to download the new binaries and update manually.

Please refer to the original release announcement for more information about supported platforms, source code, implementation, etc..

Click here or on the image below to run the Silverlight version of ComputeFileHashes in your browser.

Click here or on the image below to download the command-line and WPF versions of ComputeFileHashes - along with the ClickOnce and Silverlight versions AND the complete source code for everything!

Seamless cross-platform support is a tricky matter that's usually best left to others who have the time and resources to do it right. I didn't realize I was introducing a platform dependency by specifying a filter string, but I was... and I got burned by it. That's why it's important to test an application on all the supported configurations: you never know what problem might show up where you least expect it! That said, I'm probably not going to run out and buy myself a Mac just because of this incident - so please accept my apologies in advance should I fall victim to a similar problem in the future. :)

Cross-platform feature parity: achieved [Silverlight version of ComputeFileHashes now includes MD5!]

Tuesday, January 27, 2009

I was very happy with last week's release of ComputeFileHashes supporting the command-line, WPF, Silverlight, *and* ClickOnce. Only one thing bothered me: the Silverlight version didn't do MD5 due to the lack of support for that type of checksum by Silverlight 2. Recall that I'd fairly happily implemented my own CRC-32 class because none of the platforms supported it. [Also, it was relatively simple and had a good reference implementation. :) ] But because MD5 is a more complex algorithm and was only missing on Silverlight, I was reluctant to do the same thing for MD5...

What I really wanted was a freely available, Silverlight compatible HashAlgorithm-based MD5 implementation that I could trivially drop into my code and use on Silverlight. So I was excited when kind reader (and teammate!) Jeff Wilcox left a comment pointing to something that sounded perfect for my needs. I told Jeff I'd add MD5 for Silverlight and mentally breathed a sigh of relief that all four of ComputeFileHashes's supported platforms would provide the same set of checksums.

As it turns out, after a bit of research I decided not to use that MD5 implementation. (I'll explain why in my next post.) However, now that I'd fully bought in to the idea of MD5 on Silverlight, I was reluctant to let it go... So I spent some time working on an alternate solution and developed something I'm quite happy with. So I'm able to release an update to ComputeFileHashes that offers MD5 support on Silverlight!

The latest version of ComputeFileHashes is now 2009-01-26. I've updated all the binaries in order to avoid version number confusion - but the only real change here is the addition of MD5 for Silverlight. (FYI, I only updated the screenshot of the Silverlight version below.)

If you're using Silverlight to run ComputeFileHashes, you'll automatically get the new version next time you run ComputeFileHashes.
If you're using ClickOnce to run ComputeFileHashes, the application will automatically update itself after you run it a couple of times.
If you're using the WPF or command-line versions, you'll need to download the new binaries and update manually.

Please refer to the original release announcement for more information about supported platforms, source code, implementation, etc..

Click here or on the image below to run the Silverlight version of ComputeFileHashes in your browser.

I've said that "ComputeFileHashes is a simple tool intended to make verifying checksums easy for anyone.". And in some ways, I think the Silverlight version is the easiest option of all because there's no need to install it on your machine and it runs everywhere Silverlight 2 does (PC, Mac, (Linux soon!), Internet Explorer, Firefox, Safari, ...). So I'm really glad to add MD5 support to ComputeFileHashes for Silverlight - I hope you enjoy the new functionality!

Tags: Technical Silverlight WPF Utilities

Gratuitous platform support [ComputeFileHashes works on the command-line, on WPF, on Silverlight, and via ClickOnce!]

Wednesday, January 21, 2009

Last week, I released the ComputeFileHashes tool for calculating file checksums. (To read more about what checksums are and why they're useful, please refer to that post.) ComputeFileHashes is a fairly simple .NET command-line application for calculating the MD5, SHA-1, and CRC-32 hashes of one or more files. It takes advantage of the multi-processing capabilities of today's hardware to complete that task quickly - roughly on par with native-code implementations. ComputeFileHashes works quite well and I happily used it to verify the recently released Windows 7 Beta ISO images I'd downloaded.

Because not everybody is a fan of command-line tools, I thought it would be nice to use WPF to create a more user-friendly version of ComputeFileHashes. Once I'd done that, I knew it would be a trivial matter to publish the WPF version via ClickOnce to enable an absurdly easy install scenario. From there, porting to Silverlight would be straightforward and would offer an install-free, completely web-based solution with cross-platform (ex: PC/Mac), cross-browser (ex: IE/Firefox/Safari) appeal. What's more, because all of these platforms are built on .NET, so I expected to be able to take significant advantage of code sharing!

Click here or on the image below to run the Silverlight version of ComputeFileHashes in your browser.

Implementation notes:

The command-line version of ComputeFileHashes is a standard .NET 2.0 application and should work pretty much everywhere. The Silverlight version requires Silverlight 2 which is tiny and can be completely installed in less than a minute start-to-finish. The WPF/ClickOnce versions are a little more advanced and require .NET 3.5 SP1 (conveniently pre-installed on all Windows 7 machines!). If you don't already have .NET 3.5 SP1 (and you may not because Windows Update still doesn't seem to offer it), you can get the .NET 3.5 SP1 installer from here. Unfortunately, the only indication of .NET 3.5 SP1 not being installed seems to be an application crash immediately after starting the stand-alone WPF version. :( Fortunately, the ClickOnce version knows about the .NET 3.5 SP1 prerequisite and should offer to install it automatically if it's not already present.
None of the computation or file processing is performed on the main user interface thread under WPF or Silverlight, so ComputeFileHashes remains responsive even when working on a large file. Additional files can be queued for processing or the application/browser can be closed without the user having to wait.
As I hoped, I was able to achieve a very high degree of code sharing. By refactoring the original ComputeFileHashes code slightly, I pulled the core implementation out into a common class/file that everything shares. Then I put nearly all of the user interface functionality into another class/file that the WPF and Silverlight implementations both share. The XAML for the WPF and Silverlight versions is separate, but very similar. (There are enough slight differences between the two versions that I deliberately did not attempt to share the same XAML file.)

The source code structure looks like this:

`ComputeFileHashesCore.cs`	Core implementation of the file hashing code shared by all implementations. Makes use of multiple threads to perform hash calculations in parallel.
`ComputeFileHashesUI.cs`	User interface code shared by the WPF and Silverlight implementations. Makes use of a worker thread to push all computation off of the user interface thread and keep the application responsive. Defers to ComputeFileHashesCore for hashing functionality.
`CRC32.cs WaitingRoom.cs`	Custom CRC-32 HashAlgorithm implementation and synchronization object shared by all implementations.
`HashFileInfo.cs BlockingQueue.cs`	Data object for tracking state and custom Queue subclass that are shared by the WPF and Silverlight implementations.
`ComputeFileHashesCL\` `ComputeFileHashesCL.cs`	Command-line interface for handling arguments and displaying progress. Defers to ComputeFileHashesCore for hashing functionality.
`ComputeFileHashesWPF\` `Window1.xaml` `Window1.xaml.cs`	WPF definition of the application window. Defers to ComputeFileHashesUI for nearly all functionality.
`ComputeFileHashesSL\` `Page.xaml` `Page.xaml.cs`	Silverlight definition of the application window. Defers to ComputeFileHashesUI for nearly all functionality.

A look at the screenshots above reveals a few differences between the WPF and Silverlight implementations:
- The DataGrids look different. On WPF, ComputeFileHashes makes use of the WPFToolkit's DataGrid; on Silverlight it uses the DataGrid in the SDK. The two are very similar to use from a developer perspective, but they draw themselves differently and have some slightly user-level functionality changes due to platform differences. This was actually my first experience with either DataGrid and I was happy to find that they both worked the same - and pretty much the way I expected them to!
- The Silverlight version does not calculate the MD5 hash. This is because Silverlight's .NET doesn't implement the MD5 HashAlgorithm subclass while the desktop's .NET does. MD5 is not trivial to write, so I wasn't too interested in developing my own implementation like I did for CRC-32 (which isn't supported on any .NET platform).
- The Silverlight version includes a "Details" column that's not present on the WPF version. On WPF, it's trivial to create a ToolTip for a DataGrid column, but on Silverlight the ToolTipService class must be used and my attempts to set its attached property with a Style were met with ... resistance. So if an exception is thrown when processing a file, the WPF version will show the exception text in a ToolTip while the Silverlight version shows it in the Details column.
- The Silverlight version does not support drag-and-drop from the Windows Explorer. Running within a browser imposes certain limitations on Silverlight; an inability to integrate quite as richly with the operating system is one of them.
- The hyperlinks aren't quite the same. Silverlight ships with HyperlinkButton which is exactly the right control for the job here. WPF doesn't have that control, so the similar Hyperlink control is made to behave as desired with a small bit of code.

In the original release announcement, I wrote that "ComputeFileHashes is a simple tool intended to make verifying checksums easy for anyone.". Well, that's still the case - and adding support for WPF, ClickOnce, and Silverlight should make it even easier for everyone to use. Just decide what kind of user interface you prefer, and start using that version of ComputeFileHashes for all your checksumming needs! :)

Tags: Technical Silverlight WPF Utilities

Trust, but verify [Free tool (and source code) for computing commonly used hash codes!]

Tuesday, January 13, 2009

Internet downloads (particularly large ones) are often published with an associated checksum that can be used to verify that the file was successfully downloaded. While transmission errors are relatively rare, the popularity of file sharing and malware introduce the possibility that what you get isn't always exactly what you wanted. The posting of checksums by the original content publisher attempts to solve this problem by giving the end user an easy way to validate the downloaded file. Checksums are typically computed by applying a cryptographic hash function to the original file; the hash function computes a small (10-30 character) textual "snapshot" of the file that satisfies the following properties (quoting from Wikipedia):

It is easy to compute the hash for any given data

It is extremely difficult to construct a [file] that has a given hash

It is extremely difficult to modify a given [file] without changing its hash

It is extremely unlikely that two different [files] will have the same hash

Because of these four properties, a user can be fairly confident a file has not been tampered with or garbled as long as the checksum computed on their own machine matches what the publisher posted. I say fairly confident because there's always the possibility of a hash collision - two different files with the same checksum. No matter how good the hash function is, the pigeonhole principle guarantees there will ALWAYS be the possibility of collisions. The point of a good hash function is to make this possibility so unlikely as to be impossible for all intents and purposes.

Popular hash functions in use today are MD5 and SHA-1, with CRC-32 rapidly losing favor. Speaking in very broad terms, one might say that the quality of CRC-32 is "not good", MD5 is "good", and SHA-1 is "very good". For now, that is; research is always under way that could render any of these algorithms useless tomorrow... (For more information about the weaknesses of each algorithm, refer to the links above.)

In order for published checksums to be useful, the user needs an easy way to calculate them. I looked around a bit and didn't a lot of free tools for computing these popular hash functions that I was comfortable with, so I wrote my own using .NET. Here it is:

ComputeFileHashes
   Computes CRC32, MD5, and SHA1 hashes for the specified file(s)
   Version 2009-01-12
   http://blogs.msdn.com/Delay/

Syntax: ComputeFileHashes FileOrPattern [FileOrPattern [...]]
   Each FileOrPattern can specify a single file or a set of files

Examples:
   ComputeFileHashes Image.iso
   ComputeFileHashes *.iso
   ComputeFileHashes *.iso *.vhd

[Click here to download ComputeFileHashes along with its complete source code.]

Here's the output of running ComputeFileHashes on the recently released Windows 7 Beta ISO images. Anyone can verify the checksums below on the MSDN Subscriber Downloads site. (Note: You need to be a subscriber to download files from there; everyone else can download from the official Windows 7 link.)

C:\T>ComputeFileHashes.exe "M:\Windows 7\*"

M:\Windows 7\en_windows_7_beta_dvd_x64_x15-29074.iso
CRC32: 8E2FAD39
MD5: 773FC9CC60338C612AF716A2A14F177D
SHA1: E09FDBC1CB3A92CF6CC872040FDAF65553AB62A5

M:\Windows 7\en_windows_7_beta_dvd_x86_x15-29073.iso
CRC32: AABA5A48
MD5: F9DCE6EBD0A63930B44D8AE802B63825
SHA1: 6071184282B2156FF61CDC5260545C078CCA31EE

One of the things that was important to me when writing ComputeFileHashes was performance. Nobody likes to wait, and I'm probably even less patient than the average bear. One of the things I wanted my program to do was take advantage of multi-processing and the multi-core CPUs that are so prevalent these days. So ComputeFileHashes runs the three hash functions in parallel with each other and with loading the next bytes of the file. Theoretically, this can take advantage of four different cores - though in practice my limited testing suggests there's just not enough work to saturate them all. :)

While I paid attention to performance at a macro level (i.e., algorithm design), I didn't worry about it at a micro level (i.e., focused optimization). And I used .NET of all things. (Cue the doubters: "ZOMG, EPIC FAIL!!") If it uses .NET, it must be slow, right? Well, let's do the numbers:

I took some informal measurements of the time to compute hashes for the Windows 7 Beta x64 ISO image. ComputeFileHashes took just under 60 seconds to compute the CRC-32, MD5, and SHA-1 hashes of the 3.15 GB file on my home machine. Not instantaneous by any means, but also pretty reasonable for that amount of data.
Next up was my favorite program ever, Altap Salamander. I use Salamander for all my file management and it comes with a handy plugin for calculating/verifying checksums. Salamander is fully native code, so it should have a distinct performance advantage. Salamander took about 40 seconds to compute the CRC-32 and MD5 checksums in a single pass. This performance is noticeably better than ComputeFileHashes's, but it also doesn't include SHA-1 which is probably the most computationally intensive algorithm. I doubt adding SHA-1 would double the time, but it would almost certainly take longer. It's hard to balance missing features against better performance, so maybe this one is too close to call. :)
Last up was an internal tool I had called gethash. gethash is also native code and supports MD5, SHA-1, and some other hash types - but not CRC-32. However, gethash computes only one checksum a time. So I ran it once for SHA-1 in a time of just over 35 seconds and again for MD5 for another time of just over 35 seconds. The total time for gethash was therefore about 70 seconds and just like before we're missing one of the checksum types. In this case, one could argue that ComputeFileHashes wins the race because it's 10 seconds faster and delivers more value. Of course, if you only need one type of checksum, then you don't care that ComputeFileHashes generated others and gethash can do a single checksum in close to half the time it takes ComputeFileHashes. Again, no clear winner for me.

So depending on how you want to look at things, ComputeFileHashes is either the best or the worst of the lot. :) But recall that we're pitching an unoptimized .NET application against two solid native-code implementations. ComputeFileHashes pretty clearly held its ground and I'd say the doubters don't have much to complain about here.

Implementation notes:

.NET includes the HashAlgorithm abstract class and a variety of concrete subclasses for calculating cryptographic hashes. MD5 and SHA-1 are implemented by the Framework along with a handful of other useful hash functions. (In fact, it's trivial to add support for another HashAlgorithm to ComputeFileHashes simply by editing a table in the source code.) CRC-32 is not part of the Framework (due, I suspect, to its weaknesses), so I wrote my own CRC-32 HashAlgorithm subclass. I'll post more about this tomorrow.
The most efficient multi-hashing implementation would probably update each algorithm together as bytes are read from the file. However, the overhead of doing that with the .NET HashAlgorithm implementation (which sensibly expects blocks of data) seemed prohibitively expensive - and impractical to parallelize. An alternative would be to let different threads hash the complete file on their own. This parallelizes easily, but because the algorithms operate at different speeds, they'd soon be reading different parts of the file - and the disk thrashing would probably ruin performance. So I went with a compromise: ComputeFileHashes loads a 1 MB chunk of the file into memory, lets each of the algorithms process that block to completion on separate threads, then loads the next 1 MB of data and repeats. There's going to be some wasted time after the fastest algorithm completes and the slowest is still working, and there's probably some memory cache thrashing (for similar reasons as the disk) - but as the numbers above indicate, the performance of this approach is really quite reasonable. Of course, it's entirely possible that a different buffer size that would be even faster - so if you do some testing and find something better, please let me know! :)
ComputeFileHashes creates dedicated Threads for each of the hash functions it uses. Those threads stay around for the life of the application and are synchronized by a general-purpose helper class I wrote, WaitingRoom. I'll blog more about this in a couple of days.

ComputeFileHashes is a simple tool intended to make verifying checksums easy for anyone. I've wanted a good, fast, free, all-in-one solution I could trust for some time now, and the recent release of Windows 7 finally prompted me to write my own. I hope others to put ComputeFileHashes to use for themselves - perhaps even for the Windows 7 Beta! :)

Tags: Technical Utilities

The source code IS the executable [Releasing CSI, a C# interpreter (with source and tests) for .NET]

Wednesday, January 7, 2009

A few years ago I found myself spending a lot of time writing batch files to perform a variety of relatively simple tasks. For those who aren't familiar, you can do some surprisingly powerful things with batch files - but it usually involves a lot of trickery and arcane knowledge. I wanted a simpler way of doing things, but I didn't want to create a bunch of compiled executables and lose the elegant, easy-to-modify transparency that batch files offer. These days, I'd probably look to PowerShell for the solution, but back then there was no such thing...

What I did have was the power of .NET and some inspiration from a coworker's tool that was able to take .NET 1.1 source code and execute it "on the fly" without the need for compilation. What I did not have was something similar for .NET 2.0 or access to the source code for that tool (because the author had left the company). So I wrote my own:

============================================
==  CSI: C# Interpreter                   ==
==  Delay (http://blogs.msdn.com/Delay/)  ==
============================================


Summary
=======
CSI: C# Interpreter
     Version 2009-01-06 for .NET 3.5
     http://blogs.msdn.com/Delay/

Enables the use of C# as a scripting language by executing source code files
directly. The source code IS the executable, so it is easy to make changes and
there is no need to maintain a separate EXE file.

CSI (CodeFile)+ (-d DEFINE)* (-r Reference)* (-R)? (-q)? (-c)? (-a Arguments)?
   (CodeFile)+      One or more C# source code files to execute (*.cs)
   (-d DEFINE)*     Zero or more symbols to #define
   (-r Reference)*  Zero or more assembly files to reference (*.dll)
   (-R)?            Optional 'references' switch to include common references
   (-q)?            Optional 'quiet' switch to suppress unnecessary output
   (-c)?            Optional 'colorless' switch to suppress output coloring
   (-a Arguments)?  Zero or more optional arguments for the executing program

The list of common references included by the -R switch is:
   System.dll
   System.Data.dll
   System.Drawing.dll
   System.Windows.Forms.dll
   System.Xml.dll
   PresentationCore.dll
   PresentationFramework.dll
   WindowsBase.dll
   System.Core.dll
   System.Data.DataSetExtensions.dll
   System.Xml.Linq.dll

CSI's return code is 2147483647 if it failed to execute the program or 0 (or
whatever value the executed program returned) if it executed successfully.

Examples:
   CSI Example.cs
   CSI Example.cs -r System.Xml.dll -a ArgA ArgB -Switch
   CSI ExampleA.cs ExampleB.cs -d DEBUG -d TESTING -R


Notes
=====
CSI was inspired by net2bat, an internal .NET 1.1 tool whose author had left
Microsoft. CSI initially added support for .NET 2.0 and has now been extended
to support .NET 3.0 and .NET 3.5. Separate executables are provided to
accommodate environments where the latest version of .NET is not available.


Version History
===============

Version 2009-01-06
Initial public release

Version 2005-12-15
Initial internal release

[Click here to download CSI for .NET 3.5, 3.0, 2.0, and 1.1 - along with the complete source code and test suite.]

Using CSI is quite simple. Here's an example from the release package:

C:\T\CSI>TYPE Samples\HelloWorld.cs
using System;

public class HelloWorld
{
    public static void Main(string[] args)
    {
        Console.WriteLine("Hello world.");
    }
}

C:\T\CSI>CSI.exe Samples\HelloWorld.cs
Hello world.

And if you use the included batch files to register the .CSI file type (more on this in the notes below), it's even easier:

C:\T\CSI>TYPE Samples\Greetings.csi
using System;

public class Greetings
{
    public static void Main(string[] args)
    {
        Console.WriteLine("Hello {0}", string.Join(" ", args));
    }
}

C:\T\CSI>Samples\Greetings.csi out there world.
Hello out there world.

Notes:

Surprisingly, it's fairly straightforward to do what CSI does thanks to the CSharpCodeProvider class that's part of .NET. (In fact, the majority of the code for CSI is related to processing arguments, managing the display, and handling errors!) Behind the scenes, there's no real magic: CSharpCodeProvider is just a wrapper for the C# compiler which generates a temporary assembly that CSI calls into. Everything should work exactly the same under CSI as it would if compiled normally - the only difference is the convenience. :)
I hinted at the benefits of avoiding compilation above, but wanted to explain a bit more. In the usual way of doing things, there is typically (at least) a .CS file and a .EXE file for each tool used by, say a build process. The build process runs the .EXE file, and people refer to the .CS file. If anything ever needs to be changed, someone updates the .CS file, recompiles to generate a new .EXE, and checks both files in together. Usually... But sometimes they forget and only check in the .EXE - or there's some other .EXE-only modification that throws the synchronization of the two files out of whack and then it's just a mess to understand what's going on because the source code no longer matches the executable. CSI avoids that problem by allowing the build process to "run" the .CS file directly. Because there's only one file to manage, there's no chance of it getting out of sync!
Two simple scripts are included that can be used to RegisterCSI.cmd or UnregisterCSI.cmd. When run with administrative privileges, these scripts set up (or remove) a file association for .CSI files (which are just renamed .CS files) that automatically invokes CSI for the specified file. So, as shown in the earlier example, it's possible to run C# code files by name in a command window or by double-clicking them in the Windows Explorer!
There's a separate version of CSI compiled for each major .NET version that's in use today: 1.1, 2.0, 3.0, and 3.5. Aside from some minor functional limitations with the 1.1 version (due to non-existent features in that version of .NET), they're all the same CSI with the same options, etc.. The most obvious difference is the list of default assemblies included by the -R option: each version matches what Visual Studio 2008 provides as the default set of assemblies for a new project targeting the corresponding version of the Framework. The other big difference is that the .NET 3.5 version of CSI uses the C# 3.0 compiler (which can be a little tricky to do until someone shows you how), so it also supports new language features like var and all the LINQy syntax goodness that's available in .NET 3.5.
Editing standalone .CS files can be a bit annoying because the lack of an associated .CSPROJ file means that Visual Studio's IntelliSense isn't available. So I'll usually begin by creating a project in Visual Studio, taking advantage of its rich editing and debugging facilities to get everything working the way I want - then I pull out the resulting .CS file and run it with CSI. If there are any subsequent modifications to the code, they're usually trivial enough that the lack of IntelliSense isn't an issue. Alternatively, you could create a new project and add a link to the existing file in order to enable IntelliSense. OR you could use something like Snippet Compiler which offers a surprisingly rich Visual Studio-like editing and execution environment (with IntelliSense!) and works well with standalone .CS files.
Given that I've already mentioned PowerShell, what's the point of CSI? Well, maybe there isn't one... :) But I think there's still value here - even in a PowerShell world. (In fact, what prompted me to release CSI was an internal request to add support for .NET 3.5, so it seems at least one other person agrees with me.) Aside from knowing a bit about PowerShell and how it works, I haven't used it, so I won't try to do a feature comparison here. Some of the things PowerShell does are very cool and quite useful - and I think it's considerably better than .CMD files for automating typical tasks. Also, I love that it's tightly integrated with the .NET Framework. That said, it's another "language" to learn and not everybody has the time for that. Additionally, I don't think the current development and debugging options are quite as rich as Visual Studio already is. And PowerShell isn't available everywhere, whereas CSI offers a no-touch, install-free solution anywhere the .NET Framework can be found. Lastly, CSI is small (really small!) and very simple.
Though its name bears passing resemblance to a popular TV series, that's unintentional: once you know the C# compiler is named CSC.exe, it's obvious the C# interpreter should be named CSI.exe.

CSI was a fun project in its day and I've done quite a bit with it that would have been fairly tedious otherwise. Today's world offers plenty of alternatives - but if you're comfortable with C# and prefer to stick to one language for all your programming needs, CSI is pretty handy to have around. Because you never know when the urge to code will strike! :)

Tags: Technical Utilities

Easily create an ISO from a CD/DVD [Releasing ExtractISO tool and source]

Thursday, November 6, 2008

A few years back I found myself in need of a tool to create an ISO image from a CD I owned. I searched around a bit, but the only tools I could find to do that were part of much larger programs and cost money I didn't want to spend. The concept seemed simple enough that I figured it would be fairly easy to write my own tool to do this. So I did. :)

ExtractISO is a native, console-mode application that does what its name suggests: extracts the contents of a CD or DVD and saves it to a file. The tool and complete source code are attached to this post as ExtractISO.zip. Here's the documentation that comes with it:

====================================
==  ExtractISO                    ==
==  http://blogs.msdn.com/Delay/  ==
====================================


Summary
=======

ExtractISO - Creates an ISO image file from a CD/DVD

ExtractISO reads raw sectors from the specified CD/DVD volume and writes them
to the specified ISO image file.  The resulting file can be burned to a blank
CD/DVD disc, mounted in a virtual CD/DVD drive, or opened in an ISO viewer.

For a discussion of the limitations of this process, please refer to:
   http://www.cdrfaq.org/faq02.html#S2-28

Note that direct access to a CD/DVD volume requires administrative privileges.

ExtractISO attempts to read damaged media by retrying failed reads a few times
before giving up. In the event of a persistent read failure, the partial output
file is not deleted so that subsequent extraction attempts (possibly after
cleaning the media and/or using a different drive to read it) can be attempted
without the loss of any successfully extracted data.

Syntax: ExtractISO D: File.iso
   D: is the CD/DVD drive to extract from
   File.iso is the file to write to


Version History
===============

Version 1.11, 2008-11-04
Improved formatting of "bytes extracted" display value
Initial public release

Version 1.10, 2006-03-12
Added support for damaged media recovery (see above)

Version 1.01, 2005-05-18
Fixed silly integer overflow problem with "bytes extracted" display
Note: Problem affected display only; no functional impact

Version 1.0, 2005-04-27
Initial release

ExtractISO has been available to anyone inside Microsoft since I wrote it. Last week, I got an unexpected request to make it public so certain customers could use it. I was happy to do so, but one thing had been bothering me for a while: the status display of the bytes extracted so far didn't group by thousands. Instead of "1,000,000", the counter displayed as "1000000" which I find more difficult to parse. This was such a small issue, I never got around to fixing it - but now seemed like the ideal time to do so!

Unfortunately, this formatting task which is so simple in .NET seems to be quite a bit more involved with the Win32 API. The first problem is that printf doesn't support outputting the thousands separator. I considered using something like the sample code the previous link suggests, but wanted something simpler and easier to understand. A bit of research turned up the GetNumberFormat API which seemed like the perfect solution. However, the GetNumberFormat API suffers from at least two notable shortcomings: it's not easy to customize the output and the input needs to be a string. I did a quick test and found that the output of GetNumberFormat for "1000000" is "1,000,000.00" (on my machine using the default United States settings). This is close to what I wanted, but displaying the integral byte count with two digits of decimal precision just seems silly to me. So I looked for an easy way to customize the output via NUMBERFMT, and ran into the same issues Michael talks about in the post I linked to. When further research turned up no better alternatives, I decided to use GetNumberFormat and then "fix" its output by calling GetLocaleInfo(LOCALE_SDECIMAL) and removing everything after the localized decimal character(s). I like this approach because it is almost all platform code (i.e., code I don't have to write/test) and should be correct for all cultures where numbers are written as "1,000,000.00", "1.000.000,00", etc..

Other than the formatting issue I've just discussed, I made no other changes to the ExtractISO implementation. The only things I did were to update some of the VERSIONINFO settings, tweak the Build.cmd script, and recompile with the latest Visual Studio 2008.

ExtractISO works reasonably quickly, but it's worth mentioning that performance was specifically not a goal when I wrote it. The code implements a simple read/write loop with a 1 MB buffer and makes no attempt to interleave the two operations. The large-ish buffer should help minimize the overhead of the read/write calls, but I suspect a different implementation that issues the reads and writes in parallel would be faster. However, achieving that parallelism comes at the price of complexity - and I didn't feel the time it would have taken to write and debug that code was justified here. Even at top speed, the extraction operation is going to take a minute or two - an extra minute on top of that seems a small price to pay for simpler, easier to maintain code. You are welcome to disagree, of course. :) If you develop a faster implementation, I'd love to hear about it!

ExtractISO is a simple tool with a simple purpose - and one that I've used quite happily for a number of years now. If you've got a hankering for ISOs and like free stuff, please have a look at ExtractISO!

PS - ExtractISO cannot be used to duplicate audio CDs or copy-protected DVDs: audio CDs use a different storage scheme than data CDs and copy-protected DVDs store their encryption key in an "inaccessible" location of the disk.

[ExtractISO.zip]

Tags: Technical Utilities

Just a little too eager with the clicking... [Updated binaries and source for MouseButtonClicker]

Friday, October 10, 2008

I described the idea behind my MouseButtonClicker utility in the introductory post for MouseButtonClicker. As you might expect, I've been using this tool ever since I released it a few weeks ago.

All was well - except that every so often I'd get an automatic click that shouldn't have happened. Not frequently enough to stop me from using MouseButtonClicker, but enough to be annoying. I assumed this was simply "jitter" (see the original post for background) beyond what the jitter-prevention code was already filtering out, but I didn't think so because I'd watched the input events and the 2-unit threshold really seemed like it should be large enough.

So one evening when I had a bit of time, I compiled an instrumented version of MouseButtonClicker, started it up, and went about my work. Sure enough, after a half-hour or so, I got an unexpected click! So I had a look at the debug output and found that the bogus click had occurred after a 1-unit jitter movement. But that was below the 2 unit threshold, so the jitter filter should have suppressed the click. To my dismay, it seemed my code had a bug...

And sure enough, once I knew what to look for, I spotted it right off. :) The jitter filter was working fine for the general case - but there was a special case that wasn't handled properly. Specifically, the jitter threshold is always reset when MouseButtonClicker clicks the mouse - because the mouse has stopped moving is likely to stay at rest for a while. However, if the user moved the mouse and clicked the mouse button manually, MouseButtonClicker would suppress its own click (correct behavior) but was neglecting to reset the jitter filter! This problem wasn't immediately obvious to the user because the observable behavior so far was 100% correct - it was only if the mouse was left alone for about 10 minutes and jittered again that the jitter threshold wouldn't do its job and a bogus click would be made.

So I fixed the bug, compiled version 1.01 of MouseButtonClicker, ran with the new bits for a few days to verify the problem was solved (it was!), and updated the public download:

Click here to download the MouseButtonClicker executables (32- and 64-bit) and the complete Visual Studio 2008 source code.

I'm sorry for any trouble this may have caused - I hope the new bits prove even more helpful than the last! :)

Tags: Technical Utilities

"MouseButtonClicker clicks the mouse so you don't have to!" [Releasing binaries and source for a nifty mouse utility]

Thursday, September 25, 2008

I first came across the notion of automatic mouse button clicking some time ago. The basic idea can be summarized as follows:

The nearly universal pattern of mouse use is: move/click/wait... move/click/wait... move/click/wait.... In the overwhelming majority of cases, the only reason the mouse gets moved is to position the pointer over the next user interface element that needs to be clicked. Because every move is immediately followed by a click, it should be possible to simplify the process by performing the click automatically when the mouse stops moving (i.e., moves to a new location and stays still for a few moments). This automatic click saves the user only a tiny bit of effort each time it happens, but it eliminates a conceptually unnecessary, repetitive motion that's carried out many, many times over the course of every day. As an additional ergonomic benefit, automatic clicking enables the user to hold the mouse in a variety of new ways now that it's no longer necessary to keep a finger on the mouse button.

My first reaction was that the automatic clicking behavior would be nearly impossible to live with - and for the first couple of minutes I tried using it, it was. :) But, once I got out of the habit of wiggling the mouse needlessly and got into the habit of moving it where I needed when I needed, I found I quite liked the behavior after all. As I became more comfortable, I got used to the timing and found I could manage double clicks with just a single click by timing that click to happen right after the automatic click! The only challenge was to learn what parts of the user interface do nothing when clicked on - because these are the "safe areas" where the mouse can be "parked" if you start moving it and then decide you don't really want to click anything after all.

I found I liked automatic clicking so much, I wrote my own tool to implement it - and I've used that tool happily for nearly a decade. But my tool has a specific hardware dependency as a consequence of some other functionality it implements and that became a problem a few days ago when I changed hardware. So I decided this was a great opportunity to extract the clicking functionality into its own, dedicated, hardware-agnostic tool - and then blog about it!

The new tool I wrote is called MouseButtonClicker and does exactly what I describe above. It's written completely in native code - partly because I saw no need for the overhead of managed code for a simple "always-on" scenario like this one - and partly because it was a good excuse to brush up on my (rusty) native coding skills. MouseButtonClicker is a UI-less application - which means it doesn't have a window, or a notification icon, or anything to look at - it just runs invisibly and does its job. Of course, it wouldn't be hard to add a simple notification icon, but that's unnecessary for my purposes: MouseButtonClicker auto-starts with my computer and never gets closed. If you find you need to exit MouseButtonClicker, simply kill it with Task Manager or via TaskKill /IM MouseButtonClicker.exe - MouseButtonClicker maintains no persistent state and terminating it is completely safe to do at any time. And while the standard 32-bit executable works just fine on 64-bit operating systems, I've also provided a 64-bit version for those who prefer not to mix their bits. :)

Click here to download the MouseButtonClicker executables (32- and 64-bit) and the complete Visual Studio 2008 source code.

Implementation notes:

MouseButtonClicker finds out about mouse activity via the raw input API which provides low-level access to input devices like mice and keyboards in a simple, flexible manner. Using the raw input API makes it easy to distinguish between mouse input (where automatic clicking makes sense) and input from a tablet pen or digitizer device (where it does not make sense). This is of particular interest to me because I'm a long-time user of Wacom's fine tablet products and frequently switch between mouse and pen while using my computer; having the wrong behavior for the pen would be simply unacceptable.
MouseButtonClicker generates the automatic click with the SendInput function which provides an easy way to inject input events into the system. This works quite nicely, but it's worth noting that on recent operating systems like Vista and Server 2008, the SendInput API is restricted by User Interface Privilege Isolation. Specifically, MSDN notes that "Applications are permitted to inject input only into applications that are at an equal or lesser integrity level". It's my experience that this restriction usually exhibits itself as an inability to click processes that are running in an elevated security context (i.e., as Administrator) or to click in Command Windows that aren't currently active.
There are a couple of cases where MouseButtonClicker suppresses the automatic click. For example, there will be no automatic click if any mouse button has been manually clicked since the mouse stopped moving or if any of the mouse buttons is currently pressed. As such, the automatic click doesn't interfere with right clicking to get a context menu, pausing while dragging a window around the screen, or simply clicking the button yourself when you're in too much of a hurry to let MouseButtonClicker do so for you!
Some mouse hardware exhibits a small amount of "jitter" that causes single unit move messages from time to time (note: the size of a unit is less than that of a pixel, so these jitters aren't usually visible). My thinking is that this jitter is due to the mouse being *just* on the threshold of moving and then getting nudged slightly (perhaps the table gets bumped, a butterfly flaps its wings, etc.). Whatever the cause, this occasional jitter is an annoyance because it results in occasional - seemingly unprovoked - clicks wherever the mouse happens to be. So MouseButtonClicker has a small jitter-prevention measure that requires the mouse to move at least a little bit (2 units) before an automatic click will be performed.
MouseButtonClicker deliberately does not clean up its application-level resources (like the window class it registers) because Windows contractually promises to clean such things up during process exit. I formerly believed this to be lazy, but eventually decided that the best code is code I don't have to write, debug, and maintain. :) For cases like this where someone else is already going to handle resource clean-up - probably more quickly and efficiently than I could - it seems prudent to let them do so. This approach avoids the possibility of bugs in clean-up code, keeps the executable size down (which avoids unnecessary disk access), and keeps the source code just a little bit simpler.
The COMPILE_TIME_ASSERT macro provides a handy way of implementing a compile-time assert (also known as a static assert). Compile-time asserts are handy because they do the same thing normal asserts do, but they take effect at compile time and will actually fail the build process if the specified condition is not true. So instead of having to hope that the test cases exercise all the assertions in the code, the simple presence of a successful compile guarantees that all compile-time asserts are valid!
The variably-sized buffer for the WM_INPUT message is allocated on the stack via the _malloca/_freea CRT library calls. Allocating memory on the stack isn't usually appropriate, but for situations like this where a small, short-lived buffer of unknown size is needed, allocating on the stack can be more efficient than allocating on the heap.
The delay before the automatic click is determined by the result of the GetDoubleClickTime() API. Though the delay MouseButtonClicker implements is not technically for a double-click, the result of this call is the unofficial basis for a variety of different UI timings across the system.
Yes, I drew the icon myself; no, I don't have any artistic skill. :) It's supposed to be a robotic finger clicking a mouse button, but I agree it's probably more of a Rorschach inkblot test... If you can do better and want to send me an ICO file with 32x32 and 16x16 images I can freely redistribute, I will pick my favorite submission, include it with the next version of MouseButtonClicker, and credit you publically for your contribution.

So if you're in the mood to try something new and you don't mind a bit of a learning curve, give MouseButtonClicker a try for an hour or two. If you're lucky, you may never need to click again... :)

PS - For the benefit of casual readers and search engines, here is MouseButtonClicker.cpp:

// MouseButtonClicker - http://blogs.msdn.com/Delay
//
// "MouseButtonClicker clicks the mouse so you don't have to!"
//
// Simple Windows utility to click the primary mouse button whenever the mouse
// stops moving as a usability convenience/enhancement.


#include <windows.h>
#include <tchar.h>
#include <malloc.h>
#include <strsafe.h>

// Macro for compile-time assert
#define COMPILE_TIME_ASSERT(e) typedef int CTA[(e) ? 1 : -1]

// Constants for code simplification
#define RI_ALL_MOUSE_BUTTONS_DOWN (RI_MOUSE_BUTTON_1_DOWN | RI_MOUSE_BUTTON_2_DOWN | RI_MOUSE_BUTTON_3_DOWN | RI_MOUSE_BUTTON_4_DOWN | RI_MOUSE_BUTTON_5_DOWN)
#define RI_ALL_MOUSE_BUTTONS_UP (RI_MOUSE_BUTTON_1_UP | RI_MOUSE_BUTTON_2_UP | RI_MOUSE_BUTTON_3_UP | RI_MOUSE_BUTTON_4_UP | RI_MOUSE_BUTTON_5_UP)

// Check that bit-level assumptions are correct
COMPILE_TIME_ASSERT(RI_MOUSE_BUTTON_1_DOWN == (RI_MOUSE_BUTTON_1_UP >> 1));
COMPILE_TIME_ASSERT(RI_MOUSE_BUTTON_2_DOWN == (RI_MOUSE_BUTTON_2_UP >> 1));
COMPILE_TIME_ASSERT(RI_MOUSE_BUTTON_3_DOWN == (RI_MOUSE_BUTTON_3_UP >> 1));
COMPILE_TIME_ASSERT(RI_MOUSE_BUTTON_4_DOWN == (RI_MOUSE_BUTTON_4_UP >> 1));
COMPILE_TIME_ASSERT(RI_MOUSE_BUTTON_5_DOWN == (RI_MOUSE_BUTTON_5_UP >> 1));
COMPILE_TIME_ASSERT(RI_ALL_MOUSE_BUTTONS_DOWN == (RI_ALL_MOUSE_BUTTONS_UP >> 1));

// Macro for absolute value
#define ABS(v) ((0 <= v) ? v : -v)

// Application-level constants
#define APPLICATION_NAME (TEXT("MouseButtonClicker"))
#define TIMER_EVENT_ID (1)
#define MOUSE_MOVE_THRESHOLD (2)

// Window procedure
LRESULT CALLBACK WndProc(const HWND hWnd, const UINT message, const WPARAM wParam, const LPARAM lParam)
{
    // Tracks the mouse move delta threshold across calls to WndProc
    static LONG lLastClickDeltaX = 0;
    static LONG lLastClickDeltaY = 0;
    static bool fOkToClick = false;

    switch (message)
    {
    // Raw input message
    case WM_INPUT:
        {
            // Query for required buffer size
            UINT cbSize = 0;
            if(0 == GetRawInputData(reinterpret_cast<HRAWINPUT>(lParam), RID_INPUT, NULL, &cbSize, sizeof(RAWINPUTHEADER)))
            {
                // Allocate buffer on stack (falls back to heap)
                const LPVOID pData = _malloca(cbSize);
                if(NULL != pData)
                {
                    // Get raw input data
                    if(cbSize == GetRawInputData(reinterpret_cast<HRAWINPUT>(lParam), RID_INPUT, pData, &cbSize, sizeof(RAWINPUTHEADER)))
                    {
                        // Only interested in mouse input
                        const RAWINPUT* const pRawInput = static_cast<LPRAWINPUT>(pData);
                        if (RIM_TYPEMOUSE == pRawInput->header.dwType)
                        {
                            // Only interested in devices that use relative coordinates
                            // Specifically, input from pens/tablets is ignored
                            if(0 == (pRawInput->data.mouse.usFlags & MOUSE_MOVE_ABSOLUTE))
                            {
                                // Tracks the state of the mouse buttons across calls to WndProc
                                static UINT usMouseButtonsDown = 0;

                                // Update mouse delta variables
                                lLastClickDeltaX += pRawInput->data.mouse.lLastX;
                                lLastClickDeltaY += pRawInput->data.mouse.lLastY;

                                // Enable clicking once the mouse has exceeded the threshold in any direction
                                fOkToClick |= ((MOUSE_MOVE_THRESHOLD < ABS(lLastClickDeltaX)) || (MOUSE_MOVE_THRESHOLD < ABS(lLastClickDeltaY)));

                                // Determine the input type
                                const UINT usButtonFlags = pRawInput->data.mouse.usButtonFlags;
                                if(0 == (usButtonFlags & (RI_ALL_MOUSE_BUTTONS_DOWN | RI_ALL_MOUSE_BUTTONS_UP | RI_MOUSE_WHEEL)))
                                {
                                    // Mouse move: (Re)set click timer if no buttons down and mouse moved enough to avoid jitter
                                    if((0 == usMouseButtonsDown) && fOkToClick)
                                    {
                                        // Use double-click time as an indication of the user's responsiveness preference
                                        (void)SetTimer(hWnd, TIMER_EVENT_ID, GetDoubleClickTime(), NULL);
                                    }
                                }
                                else
                                {
                                    // Mouse button down/up or wheel rotation: Cancel click timer
                                    (void)KillTimer(hWnd, TIMER_EVENT_ID);

                                    // Update mouse button state variable (asserts above ensure the bit manipulations are correct)
                                    usMouseButtonsDown |= (usButtonFlags & RI_ALL_MOUSE_BUTTONS_DOWN);
                                    usMouseButtonsDown &= ~((usButtonFlags & RI_ALL_MOUSE_BUTTONS_UP) >> 1);
                                }
                            }
                        }
                    }
                    // Free buffer
                    (void)_freea(pData);
                }
            }
        }
        break;
    // Timer message
    case WM_TIMER:
        {
            // Timeout, stop timer and click primary button
            (void)KillTimer(hWnd, TIMER_EVENT_ID);
            INPUT pInputs[2] = {0};
            pInputs[0].type = INPUT_MOUSE;
            pInputs[0].mi.dwFlags = MOUSEEVENTF_LEFTDOWN;
            pInputs[1].type = INPUT_MOUSE;
            pInputs[1].mi.dwFlags = MOUSEEVENTF_LEFTUP;
            (void)SendInput(2, pInputs, sizeof(INPUT));

            // Reset mouse delta and threshold variables
            lLastClickDeltaX = 0;
            lLastClickDeltaY = 0;
            fOkToClick = false;
        }
        break;
    // Close message
    case WM_DESTROY:
        (void)PostQuitMessage(0);
        break;
    // Unhandled message
    default:
        return DefWindowProc(hWnd, message, wParam, lParam);
    }
    // Return value 0 indicates message was processed
    return 0;
}

// WinMain entry point
int APIENTRY _tWinMain(const HINSTANCE hInstance, const HINSTANCE hPrevInstance, const LPTSTR lpCmdLine, const int nCmdShow)
{
    // Avoid compiler warnings for unreferenced parameters
    UNREFERENCED_PARAMETER(hPrevInstance);
    UNREFERENCED_PARAMETER(lpCmdLine);
    UNREFERENCED_PARAMETER(nCmdShow);

    // Create a mutex to prevent running multiple simultaneous instances
    const HANDLE mutex = CreateMutex(NULL, FALSE, APPLICATION_NAME);
    if((NULL != mutex) && (ERROR_ALREADY_EXISTS != GetLastError()))
    {
        // Register the window class
        WNDCLASS wc = {0};
        wc.lpfnWndProc = WndProc;
        wc.hInstance = hInstance;
        wc.lpszClassName = APPLICATION_NAME;
        if(0 != RegisterClass(&wc))
        {
            // Create a message-only window to receive WM_INPUT and WM_TIMER
            const HWND hWnd = CreateWindow(APPLICATION_NAME, APPLICATION_NAME, 0, CW_USEDEFAULT, CW_USEDEFAULT, CW_USEDEFAULT, CW_USEDEFAULT, HWND_MESSAGE, NULL, hInstance, NULL);
            if (NULL != hWnd)
            {
                // Register for the mouse's raw input data
                RAWINPUTDEVICE rid = {0};
                rid.usUsagePage = 1;  // HID_DEVICE_SYSTEM_MOUSE
                rid.usUsage = 2;  // HID_DEVICE_SYSTEM_MOUSE
                rid.dwFlags = RIDEV_INPUTSINK;
                rid.hwndTarget = hWnd;
                if(RegisterRawInputDevices(&rid, 1, sizeof(rid)))
                {
                    // Pump Windows messages
                    MSG msg = {0};
                    while (GetMessage(&msg, NULL, 0, 0))
                    {
                        TranslateMessage(&msg);
                        DispatchMessage(&msg);
                    }
                    // Return success
                    return static_cast<int>(msg.wParam);
                }
            }
        }
        // Failed to initialize, output a diagnostic message (which is not more
        // friendly because it represents a scenario that should never occur)
        TCHAR szMessage[64];
        if(SUCCEEDED(StringCchPrintf(szMessage, sizeof(szMessage)/sizeof(szMessage[0]), TEXT("Initialization failure. GetLastError=%d\r\n"), GetLastError())))
        {
            (void)MessageBox(NULL, szMessage, APPLICATION_NAME, MB_OK | MB_ICONERROR);
        }
    }
    // Return failure
    return 0;
    // By contract, Windows frees all resources as part of process exit
}

Tags: Technical Utilities

Blogging code samples a tad more easily [Updated free ConvertClipboardRtfToHtmlText tool and source code!]

Thursday, April 3, 2008

Kind readers gave some great feedback on my previous post of the ConvertClipboardRtfToHtmlText tool and source code. Accordingly, I have made three small tweaks to the tool:

The code to detect the start of the RTF text worked only if Visual Studio's font size was set to 8pt. That's what I use, but it's not the default, so this would cause problems for most people who tried the tool. The relevant code no longer looks for a specific font-size.
Tab characters in the RTF text were ignored, causing layout problems for code with tabs (vs. spaces). Tabs are now auto-expanded to the mostly-standard value of 4 spaces.
The use of the private Color class was unnecessary because it added nothing over System.Drawing.Color. System.Drawing.Color is now used to save a few lines of code.

The sample code and tool in the previous post have been updated with these changes, so please go there to get the latest version.

Tags: Technical Utilities

Blogging code samples should be easy [Free ConvertClipboardRtfToHtmlText tool and source code!]

Friday, March 14, 2008

I've been including a lot of source code examples in my blog lately and needed a good way to paste properly formatted code in blog posts. I write my posts in HTML and wanted a tool that was easy to use, simple to install, produced concise HTML, and worked well with Visual Studio 2008. I was aware of a handful of tools for this purpose but none of them were quite what I was looking for, so I wrote my own one evening. :)

Caveat: ConvertClipboardRtfToHtmlText is a simple utility written for my specific scenario. I'm releasing the tool and code here in case anyone wants to use it, enhance it, or whatever. I have NOT attempted to write a solid, general-purpose RTF-to-HTML converter. Instead, ConvertClipboardRtfToHtmlText assumes its input is in the exact format used by Visual Studio 2008 (and probably VS 2005).

Using ConvertClipboardRtfToHtmlText is simple:

Copy code (C#, XAML, etc.) to clipboard from Visual Studio 2008
Run ConvertClipboardRtfToHtmlText to convert the RTF representation of the code on the clipboard to HTML as text (there's no UI because the tool does the conversion and immediately exits)
Paste the HTML code on the clipboard into your blog post, web page, etc.

I'm the first to acknowledge there's a lot of room for improvement in step #2. :) While the current implementation is good enough for my purposes, I encourage interested parties to consider turning this into a real application - maybe with a notify icon and a system-wide hotkey. If you do so, please let me know because I'd love to check it out!

The compiled tool and code are available in a ZIP file attached to this post. The complete source code is also available below. Naturally, I used ConvertClipboardRtfToHtmlText to post its own source code. :)

Enjoy!

Updated 2008-04-02: Minor changes to code and tool

using System;
using System.Collections.Generic;
using System.Drawing;
using System.Text;
using System.Windows.Forms;

// Convert Visual Studio 2008 RTF clipboard format into HTML by replacing the
// clipboard contents with its HTML representation in text format suitable for
// pasting into a web page or blog.
// USE: Copy to clipboard in VS, run this app (no UI), paste converted text
// NOTE: This is NOT a general-purpose RTF-to-HTML converter! It works well
// enough on the simple input I've tried, but may break for other input.
// TODO: Convert into a real application with a notify icon and hotkey.
namespace ConvertClipboardRtfToHtmlText
{
    static class ConvertClipboardRtfToHtmlText
    {
        private const string colorTbl = "\\colortbl;\r\n";
        private const string colorFieldTag = "cf";
        private const string tabExpansion = "    ";

        [STAThread]
        static void Main()
        {
            if (Clipboard.ContainsText(TextDataFormat.Rtf))
            {
                // Create color table, populate with default color
                List<Color> colors = new List<Color>();
                Color defaultColor = Color.FromArgb(0, 0, 0);
                colors.Add(defaultColor);

                // Get RTF
                string rtf = Clipboard.GetText(TextDataFormat.Rtf);

                // Parse color table
                int i = rtf.IndexOf(colorTbl);
                if (-1 != i)
                {
                    i += colorTbl.Length;
                    while ((i < rtf.Length) && ('}' != rtf[i]))
                    {
                        // Add color to color table
                        SkipExpectedText(rtf, ref i, "\\red");
                        byte red = (byte)ParseNumericField(rtf, ref i);
                        SkipExpectedText(rtf, ref i, "\\green");
                        byte green = (byte)ParseNumericField(rtf, ref i);
                        SkipExpectedText(rtf, ref i, "\\blue");
                        byte blue = (byte)ParseNumericField(rtf, ref i);
                        colors.Add(Color.FromArgb(red, green, blue));
                        SkipExpectedText(rtf, ref i, ";");
                    }
                }
                else
                {
                    throw new NotSupportedException("Missing/unknown colorTbl.");
                }

                // Find start of text and parse
                i = rtf.IndexOf("\\fs");
                if (-1 != i)
                {
                    // Skip font size tag
                    while ((i < rtf.Length) && (' ' != rtf[i]))
                    {
                        i++;
                    }
                    i++;

                    // Begin building HTML text
                    StringBuilder sb = new StringBuilder();
                    sb.AppendFormat("<pre><span style='color:#{0:x2}{1:x2}{2:x2}'>",
                        defaultColor.R, defaultColor.G, defaultColor.B);
                    while (i < rtf.Length)
                    {
                        if ('\\' == rtf[i])
                        {
                            // Parse escape code
                            i++;
                            if ((i < rtf.Length) &&
                                (('{' == rtf[i]) || ('}' == rtf[i]) || ('\\' == rtf[i])))
                            {
                                // Escaped '{' or '}' or '\'
                                sb.Append(rtf[i]);
                            }
                            else
                            {
                                // Parse tag
                                int tagEnd = rtf.IndexOf(' ', i);
                                if (-1 != tagEnd)
                                {
                                    if (rtf.Substring(i, tagEnd - i).StartsWith(colorFieldTag))
                                    {
                                        // Parse color field tag
                                        i += colorFieldTag.Length;
                                        int colorIndex = ParseNumericField(rtf, ref i);
                                        if ((colorIndex < 0) || (colors.Count <= colorIndex))
                                        {
                                            throw new NotSupportedException("Bad color index.");
                                        }

                                        // Change to new color
                                        sb.AppendFormat(
                                            "</span><span style='color:#{0:x2}{1:x2}{2:x2}'>",
                                            colors[colorIndex].R, colors[colorIndex].G,
                                            colors[colorIndex].B);
                                    }
                                    else if("tab" == rtf.Substring(i, tagEnd-i))
                                    {
                                        sb.Append(tabExpansion);
                                    }

                                    // Skip tag
                                    i = tagEnd;
                                }
                                else
                                {
                                    throw new NotSupportedException("Malformed tag.");
                                }
                            }
                        }
                        else if ('}' == rtf[i])
                        {
                            // Terminal curly; done
                            break;
                        }
                        else
                        {
                            // Normal character; HTML-escape '<', '>', and '&'
                            switch (rtf[i])
                            {
                                case '<':
                                    sb.Append("&lt;");
                                    break;
                                case '>':
                                    sb.Append("&gt;");
                                    break;
                                case '&':
                                    sb.Append("&amp;");
                                    break;
                                default:
                                    sb.Append(rtf[i]);
                                    break;
                            }
                        }
                        i++;
                    }

                    // Finish building HTML text
                    sb.Append("</span></pre>");

                    // Update the clipboard text
                    Clipboard.SetText(sb.ToString());
                }
                else
                {
                    throw new NotSupportedException("Missing text section.");
                }
            }
        }

        // Skip the specified text
        private static void SkipExpectedText(string s, ref int i, string text)
        {
            foreach (char c in text)
            {
                if ((s.Length <= i) || (c != s[i]))
                {
                    throw new NotSupportedException("Expected text missing.");
                }
                i++;
            }
        }

        // Parse a numeric field
        private static int ParseNumericField(string s, ref int i)
        {
            int value = 0;
            while ((i < s.Length) && char.IsDigit(s[i]))
            {
                value *= 10;
                value += s[i] - '0';
                i++;
            }
            return value;
        }
    }
}

[ConvertClipboardRtfToHtmlText.zip]

Tags: Technical Utilities