The blog of dlaa.me

My new home page, expanded [Updated collection of great Silverlight Charting resources!]

Friday, February 6, 2009

It's been a couple of months since I shared my semi-comprehensive page of Charting resources on the web. During that time, the Silverlight Toolkit's December release came out with some great new Charting features (Woot!) and there have been a number of fantastic Charting posts that I'd like people to be aware of. So I've updated my previous list of links (FYI: old links are grayed-out) with all the new content that caught my eye:

Overviews (100 level)

Silverlight Toolkit Released – More controls! - Tim Heuer's during the PDC keynote overview set the stage for good Charting content.
Silverlight Toolkit (Silverlight 2 Control Pack) - Charting - Pete Brown followed up minutes later with another good overview. (Party trivia: The styles seen in his blog offer a rare glimpse of the pre-release Charting styles.)
Silverlight 2 - Introducing the Chart Control - Chad Campbell was also ready with a good "zero-day" overview and code samples.
The Weekly Source Code 36 - PDC, BabySmash and Silverlight Charting - Scott Hanselman's post includes a BabySmash tie-in and a smidge of flattery. :)
Graphing – Silverlight Toolkit - Jesse Liberty introduces Charting and covers some of the basics.
Bubble chart - Jesse Liberty introduces the BubbleSeries class and describes some interesting use-cases.

Scenarios (200 level)

Zooming into a chart - Ruurd Boeke did the "zooming" scenario sample for the live Charting samples page - here's how he did it.
Silverlight 2 - Getting Started with the Chart Control - Chad Campbell again - a thorough walkthrough of creating your first chart.
Silverlight 2 Custom Stock Charts With Silverlight Toolkit - Peter Bromberg with his own walkthrough - creating a stock chart with plenty of code.
Styling the Charts in the Silverlight Toolkit - Pete Brown again - this time with a good styling overview for designers.
Designer’s Guide to Styling Silverlight Toolkit Charting Controls - Mehdi Slaoui Andaloussi goes over the top with a "soup to nuts" designer-oriented guide to styling everything. It probably helps that Mehdi works on my team and authored the shipping styles for Charting as well. :)
Pie Chart Styling in the Silverlight Toolkit - Cross-Slice Gradients - Pete Brown takes a proof-of-concept for holistic pie chart styling and uses it to great effect.
Bubbles – 1 Chart – three Axes - Jesse Liberty takes advantage of BubbleSeries to visualize historical data in an interesting way.
Customizing the Labels on an Axis of the Silverlight Toolkit Chart - Kevin Dockx calls out the AxisLabelStyle property which allows designers to easily customize the labels of an axis.
Dazzling Silverlight Toolkit Pie Charts with Overlays - Pete Brown continues working with pie charts and creates a further level of polish by adding some shiny overlays that make the visuals "pop".
Silverlight Charting: Creating rich data point tooltips - Jeff Wilcox shows how he customized the ToolTips of a LineSeries to show a wealth of information relevant information in a very user-friendly manner.
Pie Chart and Column Chart videos post - Jesse Liberty demonstrates the use of PieSeries and ColumnSeries in a set of "How Do I" videos.
Styling Charts with the Silverlight Toolkit - Pete Brown's comprehensive Expression Newsletter article provides some of the best information available anywhere about Charting styling!

Internals (300 level)

Building an Observable Model in Silverlight - Jafar Husain goes into deep technical detail about the observable model that Charting uses to enable its rich dynamic data support.

My own Charting posts (Ego level)

Many, many thanks to everyone who has spent time helping others learn how to use Silverlight Charting!

PS - If I've missed any good resources, please leave a comment with a link - I'm always happy to find more quality Charting content! :)

Tags: Silverlight Silverlight Toolkit

Columns of a different color [Customizing the appearance of Silverlight charts with re-templating and MVVM]

Wednesday, February 4, 2009

When we created Silverlight Charting (background reading here and here), we tried to make things as designer-friendly as possible. So friendly, in fact, that it would be possible for someone to take the default look-and-feel of what we'd released and significantly enhance it without changing the Charting framework at all. :) That said, it's worth noting that Charting controls are a little different than typical WPF/Silverlight controls: while it might make sense to completely change how a ListBox looks, there are certain aspects of a chart that can't be changed without rendering the visualization meaningless. And so there are certain assumptions behind our Charting implementation around things we didn't expect users to want to change. But that's the great thing about users: they want to change these things anyway! :)

One of the fundamentals of column/bar charts is that the columns/bars of a single series are all drawn the same; that's what ties them together and makes it clear they represent a single series. If you create a column chart in Excel, the default color for the columns is blue. It's easy to change that color to orange or green or plaid, but by default all of the columns of the series change together because they're all part of the same series. (Incidentally, it is possible to change the colors of an individual column in Excel, but it's not entirely obvious how to do so and it's clearly not a mainline scenario.) With that in mind, it's no surprise that our charts behave similarly: you can provide whatever look you want for the columns and bars (via the ColumnSeries.DataPointStyle property, perhaps), but the columns and bars of a particular series always look the same.

But what if your scenario is such that you want to do things a little differently and you want more control over the colors of individual columns and bars? Well, you take advantage of re-templating and Model-View-ViewModel (MVVM), that's what! :) You're reading this blog, so I'll assume you already know what re-templating is - if not, here's a good place to start. Model-View-ViewModel (MVVM) is probably less well known to date - it's an approach to application development commonly used with WPF and Silverlight where simple wrapper classes are used to expose aspects of the underlying data types in a manner that's easy for the UI layer to deal with. You can read lots more about MVVM on John Gossman's blog or this recent MSDN article by Josh Smith. But I'm not here to teach you what re-templating or MVVM are - I'm here to show you how to use them with Charting to implement the multi-colored column scenario!

[Click here to download the complete Silverlight 2 source code for the sample application shown/discussed below.]

Imagine that you're a teacher and you want to chart the grades of your students. You've already got a basic Student class that exposes some basic properties and you can create instances from a database or a file or something. The Student class probably looks like this:

// Standard data object representing a Student
public class Student : INotifyPropertyChanged
{
    // Student's name
    public string Name { get; private set; }

    // Student's favorite color
    public Brush FavoriteColor { get; private set; }

    // Student's grade
    public double Grade
    {
        get { return _grade; }
        set
        {
            _grade = value;
            Helpers.InvokePropertyChanged(PropertyChanged, this, "Grade");
        }
    }
    private double _grade;

    // Student constructor
    public Student(string name, Brush favoriteColor)
    {
        Name = name;
        FavoriteColor = favoriteColor;
    }

    // INotifyPropertyChanged event
    public event PropertyChangedEventHandler PropertyChanged;
}

The class above exposes a name and a favorite color (which I've implemented here as a Brush for convenience). There's also a grade, but we'll come back to that shortly... The goal is for each column representing a student to be drawn using that student's favorite color. To accomplish this, all we need to do is re-template. Using a designer tool like Blend or something simple like my SilverlightDefaultStyleBrowser, we can copy the default Style for ColumnDataPoint and paste it into our project's resources. By removing stuff that's not relevant to the demonstration and making a single change (highlighted below), we arrive at something like the following:

<Style
    x:Key="ColorByPreferenceColumn"
    TargetType="charting:ColumnDataPoint">
    <Setter Property="Background" Value="DarkGray"/>
    <Setter Property="Template">
        <Setter.Value>
            <ControlTemplate
                TargetType="charting:ColumnDataPoint">
                <Border
                    BorderBrush="{TemplateBinding BorderBrush}"
                    BorderThickness="{TemplateBinding BorderThickness}">
                    <Grid Background="{Binding FavoriteColor}">
                        <Rectangle>
                            <Rectangle.Fill>
                                <LinearGradientBrush>
                                    <GradientStop Color="#77ffffff" Offset="0"/>
                                    <GradientStop Color="#00ffffff" Offset="1"/>
                                </LinearGradientBrush>
                            </Rectangle.Fill>
                        </Rectangle>
                        <Border BorderBrush="#ccffffff" BorderThickness="1">
                            <Border BorderBrush="#77ffffff" BorderThickness="1"/>
                        </Border>
                    </Grid>
                </Border>
            </ControlTemplate>
        </Setter.Value>
    </Setter>
</Style>

This is just a tweak of the default template so that each column pulls its Background Brush from the FavoriteColor property of the underlying data object. Hook that up to a Chart/ColumnSeries in XAML, and that's all there is to it:

By the way, here's the XAML for that Chart:

<charting:Chart
    x:Name="FavoriteColorColumnChart"
    Title="Grades - By Favorite Color"
    Grid.Column="0">
    <charting:ColumnSeries
        DependentValueBinding="{Binding Grade}"
        IndependentValueBinding="{Binding Name}"
        DataPointStyle="{StaticResource ColorByPreferenceColumn}">
        <charting:ColumnSeries.DependentRangeAxis>
            <charting:LinearAxis
                Minimum="0"
                Maximum="100"
                Title="Grade"
                ShowGridLines="True"/>
        </charting:ColumnSeries.DependentRangeAxis>
    </charting:ColumnSeries>
</charting:Chart>

Aside: This process is even easier on WPF! (Assuming I had access to something like daily builds of Charting for WPF, I might have even mocked this up quickly to prove it to myself...) Unfortunately, the necessary "Binding in a Setter" capability is not supported by Silverlight 2 in XAML or code:
<Style
    x:Key="ColorByPreferenceColumn"
    TargetType="charting:ColumnDataPoint">
    <Setter Property="Background" Value="{Binding FavoriteColor}"/>
</Style>

So that's how easy it is to get custom column and bar colors if your data objects already expose the information you need!

But what if you want to base the custom colors on something that's not directly available on the data objects and you also don't have the freedom to change the data objects themselves? In other words - continuing the example above - let's say we decided to change things so the columns are colored according to each student's current grade: great grades get green columns, satisfactory grades get yellow columns, and unsatisfactory grades get red columns.

The first thing to consider when faced with a problem like this is whether an IValueConverter will work. I've written about the usefulness of IValueConverter before, so I won't spend more time on that here. IValueConverter is great if you want to take a single property and mutate it as part of a Binding. But what if you want to do something more complicated than that? Well, on WPF there's IMultiValueConverter which might do the trick, but that's not available on Silverlight and it's not always the answer anyway. So let's take advantage of MVVM to wrap our existing Student data objects with an object that's more view-friendly: StudentViewModel. Here's a trivial StudentViewModel class that exposes a Student and a Brush that's colored according to the Student's Grade property. Because Student implements INotifyPropertyChanged (like a well behaved class should), StudentViewModel can listen for changes to the Grade property and update its Brush automatically. StudentViewModel also implements INotifyPropertyChanged - so that anything referencing it will be notified about changes to the GradeColor property it exposes. Here's how it looks in code:

// Custom data object to wrap a Student object for the view model
public class StudentViewModel : INotifyPropertyChanged
{
    // Student object
    public Student Student { get; private set; }

    // Color representing Student's Grade
    public Brush GradeColor { get; private set; }

    // StudentViewModel constructor
    public StudentViewModel(Student student)
    {
        Student = student;
        student.PropertyChanged += new PropertyChangedEventHandler(HandleStudentPropertyChanged);
    }

    // Detect changes to the Student's grade and update GradeColor
    void HandleStudentPropertyChanged(object sender, PropertyChangedEventArgs e)
    {
        if ("Grade" == e.PropertyName)
        {
            if (Student.Grade < 50)
            {
                GradeColor = new SolidColorBrush { Color = Colors.Red };
            }
            else if (Student.Grade < 80)
            {
                GradeColor = new SolidColorBrush { Color = Colors.Yellow };
            }
            else
            {
                GradeColor = new SolidColorBrush { Color = Colors.Green };
            }
            Helpers.InvokePropertyChanged(PropertyChanged, this, "GradeColor");
        }
    }

    // INotifyPropertyChanged event
    public event PropertyChangedEventHandler PropertyChanged;
}

Aside: I've typically seen view model classes implemented by re-exposing each of the interesting data object properties - so for each property Foo on the data object, there will be a property Foo' on the view model object (which is either identical to the original property or some derivative of it). While I can see the value of this approach in some cases, the duplication of properties always bothers me and so I've instead exposed the entire Student object from the StudentViewModel object as a property (along with the new GradeColor property). This saves me from duplicating any existing properties, exposes the entire Student object to users of the StudentViewModel object, and is completely future-proof because any updates to the Student implementation will automatically show up for users of StudentViewModel.

Now that we've got a view model class that exposes a view-friendly property that is exactly what we need, our job is easy: change the chart to use StudentViewModels and change the custom template to reference the GradeColor property. Here's the new template (with the same kind of change as before):

<Style
    x:Key="ColorByGradeColumn"
    TargetType="charting:ColumnDataPoint">
    <Setter Property="Background" Value="DarkGray"/>
    <Setter Property="Template">
        <Setter.Value>
            <ControlTemplate
                TargetType="charting:ColumnDataPoint">
                <Border
                    BorderBrush="{TemplateBinding BorderBrush}"
                    BorderThickness="{TemplateBinding BorderThickness}">
                    <Grid Background="{Binding GradeColor}">
                        <Rectangle>
                            <Rectangle.Fill>
                                <LinearGradientBrush>
                                    <GradientStop Color="#77ffffff" Offset="0"/>
                                    <GradientStop Color="#00ffffff" Offset="1"/>
                                </LinearGradientBrush>
                            </Rectangle.Fill>
                        </Rectangle>
                        <Border BorderBrush="#ccffffff" BorderThickness="1">
                            <Border BorderBrush="#77ffffff" BorderThickness="1"/>
                        </Border>
                    </Grid>
                </Border>
            </ControlTemplate>
        </Setter.Value>
    </Setter>
</Style>

The XAML for this chart is nearly identical and the end result looks just how we wanted it to:

Having shown off how re-templating and MVVM enable more advanced Charting customization scenarios, I've accomplished what I set out to do and could have stopped here... But there was still one customer scenario I wanted to address: synchronizing the colors of pie slices in a pie chart with the colors of columns in a column chart. Given what we've just discussed, the solution is easy: just repeat the re-templating process for a second chart with a PieSeries and PieDataPoints. Because the column/slice colors come from the data objects and because both charts are sharing the same data objects, the color for every data object (student) will naturally be the same across both charts. The re-templated XAML is the same as before and the final result is exactly what we want:

Well, actually, that's not entirely true; Getting the pie slices right was trivial - but there was a bit of additional effort required to synchronize the colors of the pie chart's legend items with the pie slices...

The way things work is that the Series creates whatever LegendItems it needs. As part of that creation, it also creates a "fake" DataPoint that's styled just like the "real" ones displayed in the chart. This fake data point exists so that the LegendItem's default Template can create Bindings for things like the Background and BorderBrush properties. (Recall that users can completely change the look of a DataPoint, so the only way we have to know how something will look is to create it and see.) This approach works out pretty well, but there was an oversight that caused problems for me when I tried to provide my own PieSeries.LegendItemStyle: the DataContext of the fake PieDataPoint wasn't set to the corresponding slice's data object. Normally, that's no big deal because it's unused - however in this case it's a problem because the custom Template we created above gets its color from the data object. Without a bound data object to provide context, the legend items weren't using the right colors. :(

I thought about a few ways to work around this, but eventually decided the fix (the setting of the DataContext property for the fake PieDataPoint) belonged in the Charting code itself. Fortunately, Charting is open source, so it's easy for anybody to make such changes if/when the need arises! I've included a copy of the relevant source file with the one-line change I made (Changes\PieSeries.cs, line 317) and changed the sample project to use a custom build of Charting's Microsoft.Windows.Controls.DataVisualization.dll assembly that includes this change.

And because I'm a nice guy, I also made the same change to the actual charting source code that's under development, got it reviewed, and submitted it (along with an associated unit test) for inclusion in the next official release of the Silverlight Toolkit! After all, if I needed this to work for my sample, chances are good that someone else might need it to work for their application as well. :)

With that fix in place, here is the Style that applies the proper color to the LegendItems:

<Style
    x:Key="ColorByPreferenceLegendItem"
    TargetType="charting:LegendItem">
    <Setter Property="Template">
        <Setter.Value>
            <ControlTemplate TargetType="charting:LegendItem">
                <StackPanel Orientation="Horizontal">
                    <Rectangle
                        Width="8" Height="8"
                        Fill="{Binding DataContext.FavoriteColor}"
                        Stroke="{Binding BorderBrush}"
                        StrokeThickness="1" Margin="0,0,3,0"/>
                    <datavis:Title Content="{TemplateBinding Content}"/>
                </StackPanel>
            </ControlTemplate>
        </Setter.Value>
    </Setter>
</Style>

And there you have it: a few simple ways to take Charting and extend it to do exactly what you want! The examples here are fairly simple, but re-templating and MVVM are very powerful concepts which enable a high degree of customization for Silverlight and WPF applications that's pretty hard to come by in other platforms. If you're trying to do something unique and you're not having any luck the "normal" way, please take a few moments to consider the techniques discussed here - you may find that your problem has an easy solution after all!

Tags: Silverlight Silverlight Toolkit

The proverbial "one line fix" [ComputeFileHashes works around a troublesome Silverlight-on-Mac issue]

Sunday, February 1, 2009

When I achieved cross-platform parity by adding MD5 support to the Silverlight version of ComputeFileHashes, I thought I was done for a while. But then I got an email from a coworker reporting that the Silverlight version of ComputeFileHashes running on a Mac under Safari presented an "Add Files" dialog that did not actually let the user select any files. Ouch, that's no good...

I started investigating with a quick web search; the top hit for "OpenFileDialog Mac" showed that others had experienced similar problems and the Silverlight team confirmed a bug. So at least my application wasn't totally broken. :) I wanted to understand the scenario better, but I don't own a Mac (which is why this problem escaped my notice in the first place). Fortunately, I found one at work that I could borrow some cycles on and I wrote a simple test application to invoke the OpenFileDialog with a few different values for the Filter property. ComputeFileHashes was initially passing the value "All Files (*)|*" - effectively just "*" - which was intended to match all files. And, indeed, it does so in WPF and Silverlight/PC. However, on Silverlight/Mac that value seems to match no files. Someone suggested "*.*", but to me that matches all files with a '.' in their name and I didn't want to exclude files that don't happen to have an extension. So I tried "" instead, and that did exactly what I wanted on Silverlight/Mac and Silverlight/PC. I thought I'd found the solution - until I tried the new value on WPF and it caused an exception...

At this point I was tired of cross-platform trial-and-error, and I decided I was inviting trouble by passing any filter string at all! The default behavior of OpenFileDialog is to allow the selection of all files, so I wasn't really adding much value by passing a custom filter that did the same thing. Well, I was providing more explicit filter text in the drop-down of the dialog, but it wasn't worth the compatibility problems I was dealing with. So I removed the line of code that set the Filter property, recompiled, republished, and called it done. :)

The latest version of ComputeFileHashes is now 2009-01-30. I've updated all the binaries in order to avoid version number confusion, but the only real change here is the filter string and the improvement is only visible on Silverlight/Mac. (Note: I did not update the screenshots below, so the versions shown there are out of date.)

If you're using Silverlight to run ComputeFileHashes, you'll automatically get the new version next time you run ComputeFileHashes.
If you're using ClickOnce to run ComputeFileHashes, the application will automatically update itself after you run it a couple of times.
If you're using the WPF or command-line versions, you'll need to download the new binaries and update manually.

Please refer to the original release announcement for more information about supported platforms, source code, implementation, etc..

Click here or on the image below to run the Silverlight version of ComputeFileHashes in your browser.

Click here or on the image below to download the command-line and WPF versions of ComputeFileHashes - along with the ClickOnce and Silverlight versions AND the complete source code for everything!

Seamless cross-platform support is a tricky matter that's usually best left to others who have the time and resources to do it right. I didn't realize I was introducing a platform dependency by specifying a filter string, but I was... and I got burned by it. That's why it's important to test an application on all the supported configurations: you never know what problem might show up where you least expect it! That said, I'm probably not going to run out and buy myself a Mac just because of this incident - so please accept my apologies in advance should I fall victim to a similar problem in the future. :)

Tags: Technical Silverlight WPF Utilities

Thank goodness for reference implementations [Low-overhead .NET MD5 implementation (source code and tests) works great on Silverlight!]

Wednesday, January 28, 2009

In yesterday's post announcing ComputeFileHashes's new support for MD5 on Silverlight, I promised to share some details about my experience getting an MD5 HashAlgorithm implementation for Silverlight. Recall that an MD5 class is available in the desktop .NET Framework, but is not part of Silverlight 2's subset of the .NET Framework. (Probably in order to save space by excluding one of the less-secure cryptographic hash functions - a completely sensible tradeoff.) Because I didn't want to write my own code for MD5 (it's a non-trivial algorithm), the challenge was to find something freely available that I could just drop in and take advantage of. So I was very interested when I found out about Reid Borsuk's managed implementation of an MD5 HashAlgorithm for Silverlight because it sounded perfect for my needs.

The first step of incorporating something like this is to check the license: this code is under Ms-PL, so there were no problems there. The next step is to skim the code and get a general feel for how it works - and it was while doing this that I realized I wouldn't be able to use this code as-is...

To understand why, a little background is required:

The HashAlgorithm abstract class requires that derived classes implement the following methods: Initialize, HashCore, and HashFinal. Initialize gets called once at the start of hashing, then HashCore is called many times (being passed a different block of data each time), and then HashFinal is called once at the end of hashing to finalize any computations and return the computed hash value. It's a straightforward model and is flexible enough to accommodate a wide variety of checksum algorithms. Other than maintaining a few bytes of internal state across calls, there's no need for the hash algorithm to allocate anything: the data flows in from the user, gets processed, and is immediately forgotten about.

Or at least that's how it's supposed to work...

What I found when I looked at the aforementioned MD5 implementation was that it would allocate an internal buffer during Initialize, repeatedly re-allocate that buffer and append new data to the end of it during every call to HashCore, and then process the entire buffer all at once in HashFinal. While this approach works fine for fairly small inputs, it was completely impractical for ComputeFileHashes which is expected to process multi-gigabyte files as a matter of course. All those reallocations and the large internal buffer would quickly exhaust the physical memory of virtually any system in use today on something like the Windows 7 Beta ISO images I've been using for my examples. (In fact, it's a bit more dire than it initially seems: this technique requires twice the memory of the original data: that last call to HashCore needs to copy the nearly-full-sized buffer into a new full-sized buffer.)

Okay, so if I couldn't use the code as-is; the next step is to see what it would take to modify it so that it would work for my scenario. Well, this implementation uses a small HashAlgorithm wrapper class around a core MD5 implementation class, I wondered if it would be a simple matter of changing the way the wrapper called into the core. But looking at things a bit more closely, it seemed the core was not structured for that - and separating things like I wanted might not be trivial.

It's decision time: Do I start changing the structure of this code to work the way I need it to, or do I investigate other options? I considered the implications of both approaches, but it was something a coworker said that convinced me to spend a bit of time looking elsewhere. He asked, "If you don't like a fundamental part of the implementation and feel the need to fix it, why do you think you won't be compelled to make changes to the rest of it as well?" That question expressed my concerns pretty well, so I decided to look into other options for a bit. After all, I could always come back to this if nothing panned out.

The obvious place to start was the MD5 specification: RFC1321, The MD5 Message-Digest Algorithm. The body of this document describes the algorithm in great detail and would be a great place to start writing my own implementation if I was willing to spend a considerable amount of time developing and testing. But the real gem is in the appendix: a reference implementation of MD5 written in C! Fortunately, C's not so different from C# - and I've ported things before - so I had a decent idea what to expect. And it sure is hard to beat the reference implementation from the point of view of obtaining an accurate, (typically) bug-free, chunk of code. There is an accompanying license, but it's open (this is a public specification, after all) and primarily requires that derivative works identify themselves as being "derived from the RSA Data Security, Inc. MD5 Message-Digest Algorithm." (like I just did). So things seemed promising!

I decided to spend a bus ride porting the reference implementation and see how far I got. As it happens (and I'm sure this is no accident), the reference implementation uses exactly the Initialize/HashCore/HashFinal pattern that HashAlgorithm expects. Consequently, each of my own HashAlgorithm wrapper methods simply makes a single call into the ported reference implementation - and all of a sudden concerns about memory exhaustion are a thing of the past! By the end of the bus ride, I had successfully ported the reference implementation to C# and had it passing the seven test cases that are part of the specification.

My mind was pretty much made up at this point: I'd use my port of the MD5 reference implementation for the Silverlight version of ComputeFileHashes. This was code from a reliable source, code I had become familiar with, and code that I'd feel comfortable debugging or tuning if necessary. I beefed up the test cases a bit by exercising all of them for all the possible chunk sizes, addressed a couple of code analysis warnings, and had something ready in a jiffy. I added the MD5Managed class to the Silverlight build of ComputeFileHashes and - yep - it just worked. :)

So here's a(nother) completely managed MD5 implementation that anyone is free to use (subject to the reference implementation's license). I haven't spent time optimizing it - but that was kind of the point (see the class comments below for more). I'd started out trying to avoid writing my own MD5 implementation and I only partly succeeded - but I'm glad with how this worked out and maybe some of you can benefit from what I've done. Even if all you do is run the Silverlight version of ComputeFileHashes from time to time, I feel like my relatively minimal investment was worthwhile! :)

[Click here to download a Visual Studio solution containing the source code for a Silverlight-ready managed implementation of MD5 along with the simple test cases discussed above.]

using System;
using System.Diagnostics.CodeAnalysis;
using System.Security.Cryptography;

namespace Delay
{
    /// <summary>
    /// MD5Managed: A HashAlgorithm implementation that acts as a thin wrapper
    /// around a C# translation of the MD5 reference implementation. The C code
    /// has been translated as closely as possible so that most of the original
    /// structure remains and comparisons between the two are straightforward.
    /// </summary>
    /// <remarks>
    /// Derived from the RSA Data Security, Inc. MD5 Message-Digest Algorithm.
    /// 
    /// Specification:
    /// RFC1321 - The MD5 Message-Digest Algorithm
    /// http://www.faqs.org/rfcs/rfc1321.html
    /// 
    /// Original license:
    /// Copyright (C) 1991-2, RSA Data Security, Inc. Created 1991. All
    /// rights reserved.
    /// 
    /// License to copy and use this software is granted provided that it
    /// is identified as the "RSA Data Security, Inc. MD5 Message-Digest
    /// Algorithm" in all material mentioning or referencing this software
    /// or this function.
    /// 
    /// License is also granted to make and use derivative works provided
    /// that such works are identified as "derived from the RSA Data
    /// Security, Inc. MD5 Message-Digest Algorithm" in all material
    /// mentioning or referencing the derived work.
    /// 
    /// RSA Data Security, Inc. makes no representations concerning either
    /// the merchantability of this software or the suitability of this
    /// software for any particular purpose. It is provided "as is"
    /// without express or implied warranty of any kind.
    /// 
    /// These notices must be retained in any copies of any part of this
    /// documentation and/or software.
    /// </remarks>
    public class MD5Managed : HashAlgorithm
    {
        // Current context
        private readonly MD5_CTX _context = new MD5_CTX();
        // Last hash result
        private readonly byte[] _digest = new byte[16];
        // True if HashCore has been called
        private bool _hashCoreCalled;
        // True if HashFinal has been called
        private bool _hashFinalCalled;

        /// <summary>
        /// Initializes a new instance.
        /// </summary>
        public MD5Managed()
        {
            InitializeVariables();
        }

        /// <summary>
        /// Initializes internal state.
        /// </summary>
        public override void Initialize()
        {
            InitializeVariables();
        }

        /// <summary>
        /// Initializes variables.
        /// </summary>
        private void InitializeVariables()
        {
            MD5Init(_context);
            _hashCoreCalled = false;
            _hashFinalCalled = false;
        }

        /// <summary>
        /// Updates the hash code with the data provided.
        /// </summary>
        /// <param name="array">Data to hash.</param>
        /// <param name="ibStart">Start position.</param>
        /// <param name="cbSize">Number of bytes.</param>
        protected override void HashCore(byte[] array, int ibStart, int cbSize)
        {
            if (null == array)
            {
                throw new ArgumentNullException("array");
            }

            if (_hashFinalCalled)
            {
                throw new CryptographicException("Hash not valid for use in specified state.");
            }
            _hashCoreCalled = true;

            MD5Update(_context, array, (uint)ibStart, (uint)cbSize);
        }

        /// <summary>
        /// Finalizes the hash code and returns it.
        /// </summary>
        /// <returns></returns>
        protected override byte[] HashFinal()
        {
            _hashFinalCalled = true;
            MD5Final(_digest, _context);
            return Hash;
        }

        /// <summary>
        /// Returns the hash as an array of bytes.
        /// </summary>
        [SuppressMessage("Microsoft.Design", "CA1065:DoNotRaiseExceptionsInUnexpectedLocations", Justification = "Matching .NET behavior by throwing here.")]
        [SuppressMessage("Microsoft.Usage", "CA2201:DoNotRaiseReservedExceptionTypes", Justification = "Matching .NET behavior by throwing NullReferenceException.")]
        public override byte[] Hash
        {
            get
            {
                if (!_hashCoreCalled)
                {
                    throw new NullReferenceException();
                }
                if (!_hashFinalCalled)
                {
                    // Note: Not CryptographicUnexpectedOperationException because that can't be instantiated on Silverlight 4
                    throw new CryptographicException("Hash must be finalized before the hash value is retrieved.");
                }

                return _digest;
            }
        }

        // Return size of hash in bits.
        public override int HashSize
        {
            get
            {
                return _digest.Length * 8;
            }
        }

        ///////////////////////////////////////////////
        // MD5 reference implementation begins here. //
        ///////////////////////////////////////////////

        /* MD5 context. */
        private class MD5_CTX
        {
            public readonly uint[] state;   /* state (ABCD) */
            public readonly uint[] count;   /* number of bits, modulo 2^64 (lsb first) */
            public readonly byte[] buffer;  /* input buffer */

            public MD5_CTX()
            {
                state = new uint[4];
                count = new uint[2];
                buffer = new byte[64];
            }

            public void Clear()
            {
                Array.Clear(state, 0, state.Length);
                Array.Clear(count, 0, count.Length);
                Array.Clear(buffer, 0, buffer.Length);
            }
        }

        /* Constants for MD5Transform routine. */
        private const int S11 = 7;
        private const int S12 = 12;
        private const int S13 = 17;
        private const int S14 = 22;
        private const int S21 = 5;
        private const int S22 = 9;
        private const int S23 = 14;
        private const int S24 = 20;
        private const int S31 = 4;
        private const int S32 = 11;
        private const int S33 = 16;
        private const int S34 = 23;
        private const int S41 = 6;
        private const int S42 = 10;
        private const int S43 = 15;
        private const int S44 = 21;

        private static byte[] PADDING;

        [SuppressMessage("Microsoft.Performance", "CA1810:InitializeReferenceTypeStaticFieldsInline", Justification = "More compact this way")]
        static MD5Managed()
        {
            PADDING = new byte[64];
            PADDING[0] = 0x80;
        }

        /* F, G, H and I are basic MD5 functions. */
        private static uint F(uint x, uint y, uint z) { return (((x) & (y)) | ((~x) & (z))); }
        private static uint G(uint x, uint y, uint z) { return (((x) & (z)) | ((y) & (~z))); }
        private static uint H(uint x, uint y, uint z) { return ((x) ^ (y) ^ (z)); }
        private static uint I(uint x, uint y, uint z) { return ((y) ^ ((x) | (~z))); }

        /* ROTATE_LEFT rotates x left n bits. */
        private static uint ROTATE_LEFT(uint x, int n) { return (((x) << (n)) | ((x) >> (32 - (n)))); }

        /* FF, GG, HH, and II transformations for rounds 1, 2, 3, and 4.
           Rotation is separate from addition to prevent recomputation. */
        private static void FF(ref uint a, uint b, uint c, uint d, uint x, int s, uint ac)
        {
            (a) += F((b), (c), (d)) + (x) + (uint)(ac);
            (a) = ROTATE_LEFT((a), (s));
            (a) += (b);
        }
        private static void GG(ref uint a, uint b, uint c, uint d, uint x, int s, uint ac)
        {
            (a) += G((b), (c), (d)) + (x) + (uint)(ac);
            (a) = ROTATE_LEFT((a), (s));
            (a) += (b);
        }
        private static void HH(ref uint a, uint b, uint c, uint d, uint x, int s, uint ac)
        {
            (a) += H((b), (c), (d)) + (x) + (uint)(ac);
            (a) = ROTATE_LEFT((a), (s));
            (a) += (b);
        }
        private static void II(ref uint a, uint b, uint c, uint d, uint x, int s, uint ac)
        {
            (a) += I((b), (c), (d)) + (x) + (uint)(ac);
            (a) = ROTATE_LEFT((a), (s));
            (a) += (b);
        }

        /* MD5 initialization. Begins an MD5 operation, writing a new context. */
        private static void MD5Init(MD5_CTX context)  /* context */
        {
            context.count[0] = context.count[1] = 0;

            /* Load magic initialization constants. */
            context.state[0] = 0x67452301;
            context.state[1] = 0xefcdab89;
            context.state[2] = 0x98badcfe;
            context.state[3] = 0x10325476;
        }

        /* MD5 block update operation. Continues an MD5 message-digest
           operation, processing another message block, and updating the
           context. */
        private static void MD5Update(MD5_CTX context,  /* context */
                                      byte[] input,     /* input block */
                                      uint inputIndex,  // Starting index for input block
                                      uint inputLen)    /* length of input block */
        {
            /* Compute number of bytes mod 64 */
            uint index = (uint)((context.count[0] >> 3) & 0x3F);

            /* Update number of bits */
            if ((context.count[0] += ((uint)inputLen << 3)) < ((uint)inputLen << 3))
            {
                context.count[1]++;
            }
            context.count[1] += ((uint)inputLen >> 29);

            uint partLen = 64 - index;

            /* Transform as many times as possible. */
            uint i = 0;
            if (inputLen >= partLen)
            {
                Buffer.BlockCopy(input, (int)inputIndex, context.buffer, (int)index, (int)partLen);
                MD5Transform(context.state, context.buffer, 0);

                for (i = partLen; i + 63 < inputLen; i += 64)
                {
                    MD5Transform(context.state, input, inputIndex + i);
                }

                index = 0;
            }

            /* Buffer remaining input */
            Buffer.BlockCopy(input, (int)(inputIndex + i), context.buffer, (int)index, (int)(inputLen - i));
        }

        /* MD5 finalization. Ends an MD5 message-digest operation, writing the
           the message digest and zeroizing the context. */
        private static void MD5Final(byte[] digest,    /* message digest */
                                     MD5_CTX context)  /* context */
        {
            byte[] bits = new byte[8];

            /* Save number of bits */
            Encode(bits, context.count, 8);

            /* Pad out to 56 mod 64. */
            uint index = (uint)((context.count[0] >> 3) & 0x3f);
            uint padLen = (index < 56) ? (56 - index) : (120 - index);
            MD5Update(context, PADDING, 0, padLen);

            /* Append length (before padding) */
            MD5Update(context, bits, 0, 8);

            /* Store state in digest */
            Encode(digest, context.state, 16);

            /* Zeroize sensitive information. */
            context.Clear();
        }

        /* MD5 basic transformation. Transforms state based on block. */
        private static void MD5Transform(uint[] state,
                                         byte[] block,
                                         uint blockIndex)
        {
            uint a = state[0], b = state[1], c = state[2], d = state[3];
            uint[] x = new uint[16];

            Decode(x, block, blockIndex, 64);

            /* Round 1 */
            FF(ref a, b, c, d, x[0],  S11, 0xd76aa478); /* 1 */
            FF(ref d, a, b, c, x[1],  S12, 0xe8c7b756); /* 2 */
            FF(ref c, d, a, b, x[2],  S13, 0x242070db); /* 3 */
            FF(ref b, c, d, a, x[3],  S14, 0xc1bdceee); /* 4 */
            FF(ref a, b, c, d, x[4],  S11, 0xf57c0faf); /* 5 */
            FF(ref d, a, b, c, x[5],  S12, 0x4787c62a); /* 6 */
            FF(ref c, d, a, b, x[6],  S13, 0xa8304613); /* 7 */
            FF(ref b, c, d, a, x[7],  S14, 0xfd469501); /* 8 */
            FF(ref a, b, c, d, x[8],  S11, 0x698098d8); /* 9 */
            FF(ref d, a, b, c, x[9],  S12, 0x8b44f7af); /* 10 */
            FF(ref c, d, a, b, x[10], S13, 0xffff5bb1); /* 11 */
            FF(ref b, c, d, a, x[11], S14, 0x895cd7be); /* 12 */
            FF(ref a, b, c, d, x[12], S11, 0x6b901122); /* 13 */
            FF(ref d, a, b, c, x[13], S12, 0xfd987193); /* 14 */
            FF(ref c, d, a, b, x[14], S13, 0xa679438e); /* 15 */
            FF(ref b, c, d, a, x[15], S14, 0x49b40821); /* 16 */

            /* Round 2 */
            GG(ref a, b, c, d, x[1],  S21, 0xf61e2562); /* 17 */
            GG(ref d, a, b, c, x[6],  S22, 0xc040b340); /* 18 */
            GG(ref c, d, a, b, x[11], S23, 0x265e5a51); /* 19 */
            GG(ref b, c, d, a, x[0],  S24, 0xe9b6c7aa); /* 20 */
            GG(ref a, b, c, d, x[5],  S21, 0xd62f105d); /* 21 */
            GG(ref d, a, b, c, x[10], S22, 0x02441453); /* 22 */
            GG(ref c, d, a, b, x[15], S23, 0xd8a1e681); /* 23 */
            GG(ref b, c, d, a, x[4],  S24, 0xe7d3fbc8); /* 24 */
            GG(ref a, b, c, d, x[9],  S21, 0x21e1cde6); /* 25 */
            GG(ref d, a, b, c, x[14], S22, 0xc33707d6); /* 26 */
            GG(ref c, d, a, b, x[3],  S23, 0xf4d50d87); /* 27 */
            GG(ref b, c, d, a, x[8],  S24, 0x455a14ed); /* 28 */
            GG(ref a, b, c, d, x[13], S21, 0xa9e3e905); /* 29 */
            GG(ref d, a, b, c, x[2],  S22, 0xfcefa3f8); /* 30 */
            GG(ref c, d, a, b, x[7],  S23, 0x676f02d9); /* 31 */
            GG(ref b, c, d, a, x[12], S24, 0x8d2a4c8a); /* 32 */

            /* Round 3 */
            HH(ref a, b, c, d, x[5],  S31, 0xfffa3942); /* 33 */
            HH(ref d, a, b, c, x[8],  S32, 0x8771f681); /* 34 */
            HH(ref c, d, a, b, x[11], S33, 0x6d9d6122); /* 35 */
            HH(ref b, c, d, a, x[14], S34, 0xfde5380c); /* 36 */
            HH(ref a, b, c, d, x[1],  S31, 0xa4beea44); /* 37 */
            HH(ref d, a, b, c, x[4],  S32, 0x4bdecfa9); /* 38 */
            HH(ref c, d, a, b, x[7],  S33, 0xf6bb4b60); /* 39 */
            HH(ref b, c, d, a, x[10], S34, 0xbebfbc70); /* 40 */
            HH(ref a, b, c, d, x[13], S31, 0x289b7ec6); /* 41 */
            HH(ref d, a, b, c, x[0],  S32, 0xeaa127fa); /* 42 */
            HH(ref c, d, a, b, x[3],  S33, 0xd4ef3085); /* 43 */
            HH(ref b, c, d, a, x[6],  S34, 0x04881d05); /* 44 */
            HH(ref a, b, c, d, x[9],  S31, 0xd9d4d039); /* 45 */
            HH(ref d, a, b, c, x[12], S32, 0xe6db99e5); /* 46 */
            HH(ref c, d, a, b, x[15], S33, 0x1fa27cf8); /* 47 */
            HH(ref b, c, d, a, x[2],  S34, 0xc4ac5665); /* 48 */

            /* Round 4 */
            II(ref a, b, c, d, x[0],  S41, 0xf4292244); /* 49 */
            II(ref d, a, b, c, x[7],  S42, 0x432aff97); /* 50 */
            II(ref c, d, a, b, x[14], S43, 0xab9423a7); /* 51 */
            II(ref b, c, d, a, x[5],  S44, 0xfc93a039); /* 52 */
            II(ref a, b, c, d, x[12], S41, 0x655b59c3); /* 53 */
            II(ref d, a, b, c, x[3],  S42, 0x8f0ccc92); /* 54 */
            II(ref c, d, a, b, x[10], S43, 0xffeff47d); /* 55 */
            II(ref b, c, d, a, x[1],  S44, 0x85845dd1); /* 56 */
            II(ref a, b, c, d, x[8],  S41, 0x6fa87e4f); /* 57 */
            II(ref d, a, b, c, x[15], S42, 0xfe2ce6e0); /* 58 */
            II(ref c, d, a, b, x[6],  S43, 0xa3014314); /* 59 */
            II(ref b, c, d, a, x[13], S44, 0x4e0811a1); /* 60 */
            II(ref a, b, c, d, x[4],  S41, 0xf7537e82); /* 61 */
            II(ref d, a, b, c, x[11], S42, 0xbd3af235); /* 62 */
            II(ref c, d, a, b, x[2],  S43, 0x2ad7d2bb); /* 63 */
            II(ref b, c, d, a, x[9],  S44, 0xeb86d391); /* 64 */

            state[0] += a;
            state[1] += b;
            state[2] += c;
            state[3] += d;

            /* Zeroize sensitive information. */
            Array.Clear(x, 0, x.Length);
        }

        /* Encodes input (UINT4) into output (unsigned char). Assumes len is
           a multiple of 4. */
        private static void Encode(byte[] output,
                                   uint[] input,
                                   uint len)
        {
            for (uint i = 0, j = 0; j < len; i++, j += 4)
            {
                output[j] = (byte)(input[i] & 0xff);
                output[j + 1] = (byte)((input[i] >> 8) & 0xff);
                output[j + 2] = (byte)((input[i] >> 16) & 0xff);
                output[j + 3] = (byte)((input[i] >> 24) & 0xff);
            }
        }

        /* Decodes input (unsigned char) into output (UINT4). Assumes len is
           a multiple of 4. */
        private static void Decode(uint[] output,
                                   byte[] input,
                                   uint inputIndex,
                                   uint len)
        {
            for (uint i = 0, j = 0; j < len; i++, j += 4)
            {
                output[i] = ((uint)input[inputIndex + j]) |
                    (((uint)input[inputIndex + j + 1]) << 8) |
                    (((uint)input[inputIndex + j + 2]) << 16) |
                    (((uint)input[inputIndex + j + 3]) << 24);
            }
        }
    }
}

Updated 2009-02-16: Call MD5Init from the constructor for consistency with the Framework's HashAlgorithm classes where a call to Initialize is not necessary for a newly constructed instance.

Updated 2010-12-06: Added missing inputIndex offset to MD5Update method.

Tags: Technical Silverlight

Cross-platform feature parity: achieved [Silverlight version of ComputeFileHashes now includes MD5!]

Tuesday, January 27, 2009

I was very happy with last week's release of ComputeFileHashes supporting the command-line, WPF, Silverlight, *and* ClickOnce. Only one thing bothered me: the Silverlight version didn't do MD5 due to the lack of support for that type of checksum by Silverlight 2. Recall that I'd fairly happily implemented my own CRC-32 class because none of the platforms supported it. [Also, it was relatively simple and had a good reference implementation. :) ] But because MD5 is a more complex algorithm and was only missing on Silverlight, I was reluctant to do the same thing for MD5...

What I really wanted was a freely available, Silverlight compatible HashAlgorithm-based MD5 implementation that I could trivially drop into my code and use on Silverlight. So I was excited when kind reader (and teammate!) Jeff Wilcox left a comment pointing to something that sounded perfect for my needs. I told Jeff I'd add MD5 for Silverlight and mentally breathed a sigh of relief that all four of ComputeFileHashes's supported platforms would provide the same set of checksums.

As it turns out, after a bit of research I decided not to use that MD5 implementation. (I'll explain why in my next post.) However, now that I'd fully bought in to the idea of MD5 on Silverlight, I was reluctant to let it go... So I spent some time working on an alternate solution and developed something I'm quite happy with. So I'm able to release an update to ComputeFileHashes that offers MD5 support on Silverlight!

The latest version of ComputeFileHashes is now 2009-01-26. I've updated all the binaries in order to avoid version number confusion - but the only real change here is the addition of MD5 for Silverlight. (FYI, I only updated the screenshot of the Silverlight version below.)

If you're using Silverlight to run ComputeFileHashes, you'll automatically get the new version next time you run ComputeFileHashes.
If you're using ClickOnce to run ComputeFileHashes, the application will automatically update itself after you run it a couple of times.
If you're using the WPF or command-line versions, you'll need to download the new binaries and update manually.

Please refer to the original release announcement for more information about supported platforms, source code, implementation, etc..

Click here or on the image below to run the Silverlight version of ComputeFileHashes in your browser.

I've said that "ComputeFileHashes is a simple tool intended to make verifying checksums easy for anyone.". And in some ways, I think the Silverlight version is the easiest option of all because there's no need to install it on your machine and it runs everywhere Silverlight 2 does (PC, Mac, (Linux soon!), Internet Explorer, Firefox, Safari, ...). So I'm really glad to add MD5 support to ComputeFileHashes for Silverlight - I hope you enjoy the new functionality!

Tags: Technical Silverlight WPF Utilities

Math is hard, let's go shopping. [Minor bug fix for free CRC-32 HashAlgorithm implementation for .NET]

Friday, January 23, 2009

While working on code for an upcoming blog post, I found myself dealing with the HashAlgorithm.HashSize property again and realized I'd made a silly mistake a few days ago... :(

I'm pretty sure I remember consulting the documentation when implementing this method for my free CRC-32 HashAlgorithm implementation, and I obviously believed the correct behavior was to return the size in bytes because that's what my comment says and that's what my code does. However, the documentation seems pretty clear on the matter: Gets the size, in bits, of the computed hash code." (emphasis mine). So my initial implementation of this property was wrong. Fortunately, none of the four implementations of ComputeFileHashes (command-line, WPF, ClickOnce, Silverlight) make use of HashSize, so they're not affected by this bug. Unfortunately, anyone who decided to use my CRC-32 implementation and referenced this property would see the wrong value. For their sake, I've just made the trivial fix to the code (multiplying the byte count by 8 to get bit count), updated the comment for that property, added a note about the update to the bottom of the original post, and republished it.

I'm very sorry for the error and any trouble this may have caused.

PS - The version of CRC32.cs in the ComputeFileHashes source code download has not been updated - but will be in a few days as part of an upcoming post.

Tags: Technical

Gratuitous platform support [ComputeFileHashes works on the command-line, on WPF, on Silverlight, and via ClickOnce!]

Wednesday, January 21, 2009

Last week, I released the ComputeFileHashes tool for calculating file checksums. (To read more about what checksums are and why they're useful, please refer to that post.) ComputeFileHashes is a fairly simple .NET command-line application for calculating the MD5, SHA-1, and CRC-32 hashes of one or more files. It takes advantage of the multi-processing capabilities of today's hardware to complete that task quickly - roughly on par with native-code implementations. ComputeFileHashes works quite well and I happily used it to verify the recently released Windows 7 Beta ISO images I'd downloaded.

Because not everybody is a fan of command-line tools, I thought it would be nice to use WPF to create a more user-friendly version of ComputeFileHashes. Once I'd done that, I knew it would be a trivial matter to publish the WPF version via ClickOnce to enable an absurdly easy install scenario. From there, porting to Silverlight would be straightforward and would offer an install-free, completely web-based solution with cross-platform (ex: PC/Mac), cross-browser (ex: IE/Firefox/Safari) appeal. What's more, because all of these platforms are built on .NET, so I expected to be able to take significant advantage of code sharing!

Click here or on the image below to run the Silverlight version of ComputeFileHashes in your browser.

Implementation notes:

The command-line version of ComputeFileHashes is a standard .NET 2.0 application and should work pretty much everywhere. The Silverlight version requires Silverlight 2 which is tiny and can be completely installed in less than a minute start-to-finish. The WPF/ClickOnce versions are a little more advanced and require .NET 3.5 SP1 (conveniently pre-installed on all Windows 7 machines!). If you don't already have .NET 3.5 SP1 (and you may not because Windows Update still doesn't seem to offer it), you can get the .NET 3.5 SP1 installer from here. Unfortunately, the only indication of .NET 3.5 SP1 not being installed seems to be an application crash immediately after starting the stand-alone WPF version. :( Fortunately, the ClickOnce version knows about the .NET 3.5 SP1 prerequisite and should offer to install it automatically if it's not already present.
None of the computation or file processing is performed on the main user interface thread under WPF or Silverlight, so ComputeFileHashes remains responsive even when working on a large file. Additional files can be queued for processing or the application/browser can be closed without the user having to wait.
As I hoped, I was able to achieve a very high degree of code sharing. By refactoring the original ComputeFileHashes code slightly, I pulled the core implementation out into a common class/file that everything shares. Then I put nearly all of the user interface functionality into another class/file that the WPF and Silverlight implementations both share. The XAML for the WPF and Silverlight versions is separate, but very similar. (There are enough slight differences between the two versions that I deliberately did not attempt to share the same XAML file.)

The source code structure looks like this:

`ComputeFileHashesCore.cs`	Core implementation of the file hashing code shared by all implementations. Makes use of multiple threads to perform hash calculations in parallel.
`ComputeFileHashesUI.cs`	User interface code shared by the WPF and Silverlight implementations. Makes use of a worker thread to push all computation off of the user interface thread and keep the application responsive. Defers to ComputeFileHashesCore for hashing functionality.
`CRC32.cs WaitingRoom.cs`	Custom CRC-32 HashAlgorithm implementation and synchronization object shared by all implementations.
`HashFileInfo.cs BlockingQueue.cs`	Data object for tracking state and custom Queue subclass that are shared by the WPF and Silverlight implementations.
`ComputeFileHashesCL\` `ComputeFileHashesCL.cs`	Command-line interface for handling arguments and displaying progress. Defers to ComputeFileHashesCore for hashing functionality.
`ComputeFileHashesWPF\` `Window1.xaml` `Window1.xaml.cs`	WPF definition of the application window. Defers to ComputeFileHashesUI for nearly all functionality.
`ComputeFileHashesSL\` `Page.xaml` `Page.xaml.cs`	Silverlight definition of the application window. Defers to ComputeFileHashesUI for nearly all functionality.

A look at the screenshots above reveals a few differences between the WPF and Silverlight implementations:
- The DataGrids look different. On WPF, ComputeFileHashes makes use of the WPFToolkit's DataGrid; on Silverlight it uses the DataGrid in the SDK. The two are very similar to use from a developer perspective, but they draw themselves differently and have some slightly user-level functionality changes due to platform differences. This was actually my first experience with either DataGrid and I was happy to find that they both worked the same - and pretty much the way I expected them to!
- The Silverlight version does not calculate the MD5 hash. This is because Silverlight's .NET doesn't implement the MD5 HashAlgorithm subclass while the desktop's .NET does. MD5 is not trivial to write, so I wasn't too interested in developing my own implementation like I did for CRC-32 (which isn't supported on any .NET platform).
- The Silverlight version includes a "Details" column that's not present on the WPF version. On WPF, it's trivial to create a ToolTip for a DataGrid column, but on Silverlight the ToolTipService class must be used and my attempts to set its attached property with a Style were met with ... resistance. So if an exception is thrown when processing a file, the WPF version will show the exception text in a ToolTip while the Silverlight version shows it in the Details column.
- The Silverlight version does not support drag-and-drop from the Windows Explorer. Running within a browser imposes certain limitations on Silverlight; an inability to integrate quite as richly with the operating system is one of them.
- The hyperlinks aren't quite the same. Silverlight ships with HyperlinkButton which is exactly the right control for the job here. WPF doesn't have that control, so the similar Hyperlink control is made to behave as desired with a small bit of code.

In the original release announcement, I wrote that "ComputeFileHashes is a simple tool intended to make verifying checksums easy for anyone.". Well, that's still the case - and adding support for WPF, ClickOnce, and Silverlight should make it even easier for everyone to use. Just decide what kind of user interface you prefer, and start using that version of ComputeFileHashes for all your checksumming needs! :)

Tags: Technical Silverlight WPF Utilities

The doctor will see you now... [WaitingRoom is a reusable synchronization object for .NET]

Thursday, January 15, 2009

In my notes for the release of the ComputeFileHashes tool (and source code), I mentioned that I'd written a synchronization object for managing the parallel computation of checksums across multiple threads. I called this class WaitingRoom (after failing to come up with anything better) and thought I'd write about the pattern it implements so others might use it as well.

To understand the motivation behind the WaitingRoom class, it's helpful to understand a bit about how ComputeFileHashes works. For those who haven't read the implementation notes, the basic goal is to enable parallel computation of multiple checksum algorithms for sequential chunks of a file. So there's a primary thread which is responsible for opening the file and making consecutive blocks of it available for processing and there are one or more worker threads which are responsible for processing the data in each block that becomes available. For performance reasons, there are two buffers for these blocks and they're swapped repeatedly: one is the "current" block (which has valid data) and the other is the "next" block (which is being filled-in). It's important to ensure the worker threads only access a block when it's valid, so the primary thread needs a way to tell the worker threads when it's safe to start as well as a way to find out when they've all finished with a block.

The WaitingRoom class makes this synchronization process easy by exposing three methods: Wait, Release, and Arrive. Both Wait and Release are used by the primary thread: it calls Wait to block until all worker threads are "in the waiting room" (during which time they're all blocked) and then calls Release to "open the waiting room door" and unblock the worker threads. [Okay, the analogy breaks down a bit here because most doctors admit only *one* patient at a time. :) ] The worker threads call Arrive to signal they've "entered the waiting room" and automatically block until the primary thread releases them. What's great about using WaitingRoom is that the order of the threads doesn't matter: the primary thread can be ready first, or the worker threads can be ready first, or the primary thread can be ready after some - but not all - of the worker threads are ready. Whatever the order, WaitingRoom coordinates things smoothly!

As it happens, ComputeFileHashes uses two WaitingRoom instances. The first instance is used to wait for the worker threads to be ready to process a new file. When this is going on, the primary thread has nothing else to do, so it calls Wait and then immediately calls Release when the wait completes. The second instance is used to wait for the worker threads to finish processing a block of data. In this case, the primary thread has some work it can do in parallel with the workers: it reads the next block of data from disk and updates the status display. So it makes a call to Release (which gets the worker threads going), does its own work while they're busy, then calls Wait to wait for them to finish. After that, it's safe to swap the "current" and "next" buffers and repeat the same steps with the next block.

The complete implementation of WaitingRoom is below. It uses only the standard .NET Monitor class - along with the C# lock statement to simplify the syntax a bit. There are a few Debug.Asserts in there to try to keep callers honest, but it's up to the developer to ensure that Wait and Release are only called by the primary thread and that Arrive is only called by the worker threads.

Here's what it looks like:

using System.Diagnostics;
using System.Threading;

namespace ComputeFileHashes
{
    /// <summary>
    /// Implements a synchronization object that allows an owner
    /// thread to synchronize with its worker threads.
    /// </summary>
    class WaitingRoom
    {
        // Number of worker threads
        private readonly int _capacity;
        // Object on which to lock for entry
        private readonly object _entryLock = new object();
        // Object on which to lock for exit
        private readonly object _exitLock = new object();
        // Current count of worker threads
        private int _count;
        // "Sign" of owner/worker threads
        private bool _sign;

        /// <summary>
        /// Initializes a new instance.
        /// </summary>
        /// <param name="capacity">Number of worker threads.</param>
        public WaitingRoom(int capacity)
        {
            Debug.Assert(0 < capacity);
            _capacity = capacity;
        }

        /// <summary>
        /// Waits for all worker threads to call the Arrive method.
        /// </summary>
        public void Wait()
        {
            // Claim entry lock
            lock (_entryLock)
            {
                Debug.Assert((0 <= _count) && (_count <= _capacity));

                // Block if all worker threads have not arrived
                if (_count < _capacity)
                {
                    Monitor.Wait(_entryLock);
                }
            }
        }

        /// <summary>
        /// Signals the presence/availability of a worker thread.
        /// </summary>
        public void Arrive()
        {
            // Claim the entry lock
            bool sign;
            lock (_entryLock)
            {
                Debug.Assert((0 <= _count) && (_count < _capacity));

                // Capture sign
                sign = _sign;

                // Wake owner thread if all worker threads present
                _count++;
                if (_count == _capacity)
                {
                    Monitor.Pulse(_entryLock);
                }
            }

            // Claim the exit lock
            lock (_exitLock)
            {
                // Block if owner has not yet released the worker threads
                if (sign == _sign)
                {
                    Monitor.Wait(_exitLock);
                }
            }
        }

        public void Release()
        {
            // Claim the exit lock
            lock (_exitLock)
            {
                Debug.Assert(_count == _capacity);

                // Reset count and flip sign
                _count = 0;
                _sign = !_sign;

                // Wake worker threads
                Monitor.PulseAll(_exitLock);
            }
        }
    }
}

There's probably an official name for this kind of synchronization primitive, but I don't know what it is. :| I poked around a bit on Wikipedia's "Currency Control" page just now and found something called room synchronization that sounds close... I think WaitingRoom may be a type of room synchronization - with some specialized restrictions for the specifics of the ComputeFileHashes scenario. But whatever you call it, WaitingRoom is a pretty handy class to work with. Feel free to use WaitingRoom in your own code - maybe even drop me a note if you find other interesting uses. And if anyone reading knows the real name for what it's doing, please let me know! :)

Tags: Technical

Free hash [A reusable CRC-32 HashAlgorithm implementation for .NET]

Wednesday, January 14, 2009

In the notes for yesterday's release of the ComputeFileHashes tool (and source code), I mentioned that I'd written my own .NET HashAlgorithm class to compute CRC-32 hash values. The complete implementation can be found below and should behave just like every other HashAlgorithm subclass (ex: MD5 or SHA1). The code here is based on the CRC-32 reference implementation provided in Annex D of the PNG specification and pretty much "just worked". It implements the necessary Initialize, HashCore, and HashFinal methods as well as the technically optional (but practically necessary) Hash and HashSize properties. There's no test code to speak of, though it's worth pointing out that I've run tens of gigabytes of data through my ComputeFileHashes tool and have verified the correctness of the computed CRC-32 value for each test file. :)

Without further ado:

using System;
using System.Security.Cryptography;

namespace Delay
{
    /// <summary>
    /// HashAlgorithm implementation for CRC-32.
    /// </summary>
    [System.Diagnostics.CodeAnalysis.SuppressMessage("Microsoft.Naming",
        "CA1709:IdentifiersShouldBeCasedCorrectly", MessageId = "CRC",
        Justification = "Matching algorithm acronym.")]
    public class CRC32 : HashAlgorithm
    {
        // Shared, pre-computed lookup table for efficiency
        private static readonly uint[] _crc32Table;

        /// <summary>
        /// Initializes the shared lookup table.
        /// </summary>
        [System.Diagnostics.CodeAnalysis.SuppressMessage("Microsoft.Performance",
            "CA1810:InitializeReferenceTypeStaticFieldsInline", Justification =
            "Table values must be computed; not possible to remove the static constructor.")]
        static CRC32()
        {
            // Allocate table
            _crc32Table = new uint[256];

            // For each byte
            for (uint n = 0; n < 256; n++)
            {
                // For each bit
                uint c = n;
                for (int k = 0; k < 8; k++)
                {
                    // Compute value
                    if (0 != (c & 1))
                    {
                        c = 0xedb88320 ^ (c >> 1);
                    }
                    else
                    {
                        c = c >> 1;
                    }
                }

                // Store result in table
                _crc32Table[n] = c;
            }
        }

        // Current hash value
        private uint _crc32Value;

        // True if HashCore has been called
        private bool _hashCoreCalled;

        // True if HashFinal has been called
        private bool _hashFinalCalled;

        /// <summary>
        /// Initializes a new instance.
        /// </summary>
        public CRC32()
        {
            InitializeVariables();
        }

        /// <summary>
        /// Initializes internal state.
        /// </summary>
        public override void Initialize()
        {
            InitializeVariables();
        }

        /// <summary>
        /// Initializes variables.
        /// </summary>
        private void InitializeVariables()
        {
            _crc32Value = uint.MaxValue;
            _hashCoreCalled = false;
            _hashFinalCalled = false;
        }

        /// <summary>
        /// Updates the hash code for the provided data.
        /// </summary>
        /// <param name="array">Data.</param>
        /// <param name="ibStart">Start position.</param>
        /// <param name="cbSize">Number of bytes.</param>
        protected override void HashCore(byte[] array, int ibStart, int cbSize)
        {
            if (null == array)
            {
                throw new ArgumentNullException("array");
            }

            if (_hashFinalCalled)
            {
                throw new CryptographicException(
                    "Hash not valid for use in specified state.");
            }
            _hashCoreCalled = true;

            for (int i = ibStart; i < ibStart + cbSize; i++)
            {
                byte index = (byte)(_crc32Value ^ array[i]);
                _crc32Value = _crc32Table[index] ^ ((_crc32Value >> 8) & 0xffffff);
            }
        }

        /// <summary>
        /// Finalizes the hash code and returns it.
        /// </summary>
        /// <returns></returns>
        protected override byte[] HashFinal()
        {
            _hashFinalCalled = true;
            return Hash;
        }

        /// <summary>
        /// Returns the hash as an array of bytes.
        /// </summary>
        [System.Diagnostics.CodeAnalysis.SuppressMessage("Microsoft.Design",
            "CA1065:DoNotRaiseExceptionsInUnexpectedLocations", Justification =
            "Matching .NET behavior by throwing here.")]
        [System.Diagnostics.CodeAnalysis.SuppressMessage("Microsoft.Usage",
            "CA2201:DoNotRaiseReservedExceptionTypes", Justification =
            "Matching .NET behavior by throwing NullReferenceException.")]
        public override byte[] Hash
        {
            get
            {
                if (!_hashCoreCalled)
                {
                    throw new NullReferenceException();
                }
                if (!_hashFinalCalled)
                {
                    // Note: Not CryptographicUnexpectedOperationException because
                    // that can't be instantiated on Silverlight 4
                    throw new CryptographicException(
                        "Hash must be finalized before the hash value is retrieved.");
                }

                // Convert complement of hash code to byte array
                byte[] bytes = BitConverter.GetBytes(~_crc32Value);

                // Reverse for proper endianness, and return
                Array.Reverse(bytes);
                return bytes;
            }
        }

        // Return size of hash in bits.
        public override int HashSize
        {
            get
            {
                return 4 * 8;
            }
        }
    }
}

The CRC32 class presented here is nothing fancy, but it should be a pretty solid implementation of the once-popular CRC-32 algorithm that's ripe for reuse. Thanks to .NET's HashAlgorithm base class, it's easy to drop in CRC32 anywhere hashes are already being computed. I hope you find it useful!

Updated 2009-01-22: Corrected implementation of HashSize to return hash size in bits instead of bytes.

Updated 2009-02-16: Initialize the _crc32Value variable to its starting value for consistency with the Framework's HashAlgorithm classes where a call to Initialize is not necessary for a newly constructed instance.

Updated 2010-12-06: Added missing ibStart offset to HashCore loop.

Tags: Technical

Trust, but verify [Free tool (and source code) for computing commonly used hash codes!]

Tuesday, January 13, 2009

Internet downloads (particularly large ones) are often published with an associated checksum that can be used to verify that the file was successfully downloaded. While transmission errors are relatively rare, the popularity of file sharing and malware introduce the possibility that what you get isn't always exactly what you wanted. The posting of checksums by the original content publisher attempts to solve this problem by giving the end user an easy way to validate the downloaded file. Checksums are typically computed by applying a cryptographic hash function to the original file; the hash function computes a small (10-30 character) textual "snapshot" of the file that satisfies the following properties (quoting from Wikipedia):

It is easy to compute the hash for any given data

It is extremely difficult to construct a [file] that has a given hash

It is extremely difficult to modify a given [file] without changing its hash

It is extremely unlikely that two different [files] will have the same hash

Because of these four properties, a user can be fairly confident a file has not been tampered with or garbled as long as the checksum computed on their own machine matches what the publisher posted. I say fairly confident because there's always the possibility of a hash collision - two different files with the same checksum. No matter how good the hash function is, the pigeonhole principle guarantees there will ALWAYS be the possibility of collisions. The point of a good hash function is to make this possibility so unlikely as to be impossible for all intents and purposes.

Popular hash functions in use today are MD5 and SHA-1, with CRC-32 rapidly losing favor. Speaking in very broad terms, one might say that the quality of CRC-32 is "not good", MD5 is "good", and SHA-1 is "very good". For now, that is; research is always under way that could render any of these algorithms useless tomorrow... (For more information about the weaknesses of each algorithm, refer to the links above.)

In order for published checksums to be useful, the user needs an easy way to calculate them. I looked around a bit and didn't a lot of free tools for computing these popular hash functions that I was comfortable with, so I wrote my own using .NET. Here it is:

ComputeFileHashes
   Computes CRC32, MD5, and SHA1 hashes for the specified file(s)
   Version 2009-01-12
   http://blogs.msdn.com/Delay/

Syntax: ComputeFileHashes FileOrPattern [FileOrPattern [...]]
   Each FileOrPattern can specify a single file or a set of files

Examples:
   ComputeFileHashes Image.iso
   ComputeFileHashes *.iso
   ComputeFileHashes *.iso *.vhd

[Click here to download ComputeFileHashes along with its complete source code.]

Here's the output of running ComputeFileHashes on the recently released Windows 7 Beta ISO images. Anyone can verify the checksums below on the MSDN Subscriber Downloads site. (Note: You need to be a subscriber to download files from there; everyone else can download from the official Windows 7 link.)

C:\T>ComputeFileHashes.exe "M:\Windows 7\*"

M:\Windows 7\en_windows_7_beta_dvd_x64_x15-29074.iso
CRC32: 8E2FAD39
MD5: 773FC9CC60338C612AF716A2A14F177D
SHA1: E09FDBC1CB3A92CF6CC872040FDAF65553AB62A5

M:\Windows 7\en_windows_7_beta_dvd_x86_x15-29073.iso
CRC32: AABA5A48
MD5: F9DCE6EBD0A63930B44D8AE802B63825
SHA1: 6071184282B2156FF61CDC5260545C078CCA31EE

One of the things that was important to me when writing ComputeFileHashes was performance. Nobody likes to wait, and I'm probably even less patient than the average bear. One of the things I wanted my program to do was take advantage of multi-processing and the multi-core CPUs that are so prevalent these days. So ComputeFileHashes runs the three hash functions in parallel with each other and with loading the next bytes of the file. Theoretically, this can take advantage of four different cores - though in practice my limited testing suggests there's just not enough work to saturate them all. :)

While I paid attention to performance at a macro level (i.e., algorithm design), I didn't worry about it at a micro level (i.e., focused optimization). And I used .NET of all things. (Cue the doubters: "ZOMG, EPIC FAIL!!") If it uses .NET, it must be slow, right? Well, let's do the numbers:

I took some informal measurements of the time to compute hashes for the Windows 7 Beta x64 ISO image. ComputeFileHashes took just under 60 seconds to compute the CRC-32, MD5, and SHA-1 hashes of the 3.15 GB file on my home machine. Not instantaneous by any means, but also pretty reasonable for that amount of data.
Next up was my favorite program ever, Altap Salamander. I use Salamander for all my file management and it comes with a handy plugin for calculating/verifying checksums. Salamander is fully native code, so it should have a distinct performance advantage. Salamander took about 40 seconds to compute the CRC-32 and MD5 checksums in a single pass. This performance is noticeably better than ComputeFileHashes's, but it also doesn't include SHA-1 which is probably the most computationally intensive algorithm. I doubt adding SHA-1 would double the time, but it would almost certainly take longer. It's hard to balance missing features against better performance, so maybe this one is too close to call. :)
Last up was an internal tool I had called gethash. gethash is also native code and supports MD5, SHA-1, and some other hash types - but not CRC-32. However, gethash computes only one checksum a time. So I ran it once for SHA-1 in a time of just over 35 seconds and again for MD5 for another time of just over 35 seconds. The total time for gethash was therefore about 70 seconds and just like before we're missing one of the checksum types. In this case, one could argue that ComputeFileHashes wins the race because it's 10 seconds faster and delivers more value. Of course, if you only need one type of checksum, then you don't care that ComputeFileHashes generated others and gethash can do a single checksum in close to half the time it takes ComputeFileHashes. Again, no clear winner for me.

So depending on how you want to look at things, ComputeFileHashes is either the best or the worst of the lot. :) But recall that we're pitching an unoptimized .NET application against two solid native-code implementations. ComputeFileHashes pretty clearly held its ground and I'd say the doubters don't have much to complain about here.

Implementation notes:

.NET includes the HashAlgorithm abstract class and a variety of concrete subclasses for calculating cryptographic hashes. MD5 and SHA-1 are implemented by the Framework along with a handful of other useful hash functions. (In fact, it's trivial to add support for another HashAlgorithm to ComputeFileHashes simply by editing a table in the source code.) CRC-32 is not part of the Framework (due, I suspect, to its weaknesses), so I wrote my own CRC-32 HashAlgorithm subclass. I'll post more about this tomorrow.
The most efficient multi-hashing implementation would probably update each algorithm together as bytes are read from the file. However, the overhead of doing that with the .NET HashAlgorithm implementation (which sensibly expects blocks of data) seemed prohibitively expensive - and impractical to parallelize. An alternative would be to let different threads hash the complete file on their own. This parallelizes easily, but because the algorithms operate at different speeds, they'd soon be reading different parts of the file - and the disk thrashing would probably ruin performance. So I went with a compromise: ComputeFileHashes loads a 1 MB chunk of the file into memory, lets each of the algorithms process that block to completion on separate threads, then loads the next 1 MB of data and repeats. There's going to be some wasted time after the fastest algorithm completes and the slowest is still working, and there's probably some memory cache thrashing (for similar reasons as the disk) - but as the numbers above indicate, the performance of this approach is really quite reasonable. Of course, it's entirely possible that a different buffer size that would be even faster - so if you do some testing and find something better, please let me know! :)
ComputeFileHashes creates dedicated Threads for each of the hash functions it uses. Those threads stay around for the life of the application and are synchronized by a general-purpose helper class I wrote, WaitingRoom. I'll blog more about this in a couple of days.

ComputeFileHashes is a simple tool intended to make verifying checksums easy for anyone. I've wanted a good, fast, free, all-in-one solution I could trust for some time now, and the recent release of Windows 7 finally prompted me to write my own. I hope others to put ComputeFileHashes to use for themselves - perhaps even for the Windows 7 Beta! :)

Tags: Technical Utilities