The blog of dlaa.me

Sometimes all it takes is a little encouragement [How to: Automatically update the widths of ListView columns]

I was working on a WPF project the other day and wanted an easy way to display data in a simple tabular format: a few columns (with headers) that would automatically size to fit their contents. The obvious choice was the ListView control and its GridView View which do exactly this. As you might expect, using this control was straightforward and it worked just like I wanted. Well, almost... There was one small catch: my ListView was hooked up to a data source that changed dynamically (via a Binding on its ItemsSource property) and I noticed that when the data source was updated, the widths of the columns were not automatically adjusted to fit the new content.

Here's a sample application I wrote for this post - notice how the text in the first ListView's "Value" column is truncated because the columns widths have not been updated:

ListViewColumnWidthAutoUpdate sample

This behavior was kind of annoying - and a brief web search showed that I'm hardly the first person to want to change it. There are a few different ways to tell the ListView to update its columns - the one I prefer looks something like this:

// Technique for updating column widths of a ListView's GridView manually
public static void UpdateColumnWidths(GridView gridView)
{
    // For each column...
    foreach (var column in gridView.Columns)
    {
        // If this is an "auto width" column...
        if (double.IsNaN(column.Width))
        {
            // Set its Width back to NaN to auto-size again
            column.Width = 0;
            column.Width = double.NaN;
        }
    }
}

Calling this method after a ListView's ItemsSource property changes is simple and does exactly what we want. So if your scenario is such that you always know exactly when your data changes, you can add a call to this method after that happens and stop reading now because your problem is already solved. :)

Okay, so if you're still reading, then your scenario is probably like mine: changes to the data source can occur without the application explicitly knowing about it. That last bit may not make a lot of sense until you realize that it's possible to implement a great deal of an application's functionality entirely in XAML. Specifically, it's quite easy to connect a ListView to the SelectedItem property of a ListBox so that changes to the selected item of the ListBox automatically re-populate the data in the ListView. Because this can be done entirely in XAML, these updates aren't automatically visible to the application.

The solution for the slightly more complicated scenario begins by realizing that it's necessary to know when the ItemsSource Binding updates. Fortunately, this is quite easy in WPF! :) By setting the NotifyOnTargetUpdated property of the ItemsSource Binding to true and handling the Binding.TargetUpdated attached event on the ListView, we have a fairly simple way of generating an event that can run a bit of code that calls the above method to update the column widths. What's more, this technique is fairly designer-friendly because it gives the designer complete freedom to set up such cross-control Bindings in their XAML without having to be intimately involved with the developer responsible for the application's code. Granted, the developer needs to implement the handler for the generated event, but that code is completely general and can be reused across multiple different ListViews.

The second ListView of the sample application uses this approach; notice how the column widths are correct in the image above. The XAML looks like this:

<ListView
    ItemsSource="{Binding Details, NotifyOnTargetUpdated=True}"
    Binding.TargetUpdated="ListViewTargetUpdated"
    ...

And the code for the event handler looks like this:

// Handler for the ListView's TargetUpdated event
private void ListViewTargetUpdated(object sender, DataTransferEventArgs e)
{
    // Get a reference to the ListView's GridView...
    var listView = sender as ListView;
    if (null != listView)
    {
        var gridView = listView.View as GridView;
        if (null != gridView)
        {
            // ... and update its column widths
            ListViewBehaviors.UpdateColumnWidths(gridView);
        }
    }
}

I'd arrived at the above solution and was going to consider the problem solved - and that's when Dr. WPF suggested I could use an attached behavior to encapsulate what I'd done into something that would be even simpler to use from XAML and wouldn't require the developer's involvement at all (aside from referencing the code that implements the attached behavior, of course). Attached behaviors are a powerful technique that allow the introduction of changes to the functionality of a control simply by setting an attached property on it. (If you're not familiar with attached behaviors, you can read more about them in this post by John Gossman or this article by Josh Smith.)

In the case of the attached behavior solution to this problem, we make use of the DependencyPropertyDescriptor class to attach a change handler to the ItemsSource property of the ListView - and then call the method above to actually update the widths of the columns. There end up being a few more lines of code with this solution because of what it takes to create an attached DependencyProperty and attach/remove a handler for it, but that code is completely self-contained and can live entirely in its own dedicated class (whereas the method used by the previous solution needs to be part of one of the application's classes). More importantly, the number of XAML edits drops to just one and it's no longer even necessary that a Binding changes the data source - even direct assignments to the ItemsSource property will do!

The third ListView of the simple application uses this approach; the XAML looks like this:

<ListView
    ItemsSource="{Binding Details}"
    local:ListViewBehaviors.IsAutoUpdatingColumnWidths="true"
    ...

And here's the complete implementation of the attached DependencyProperty:

// Class implementing handy behaviors for the ListView control
public static class ListViewBehaviors
{
    // Technique for updating column widths of a ListView's GridView manually
    public static void UpdateColumnWidths(GridView gridView)
    {
        // For each column...
        foreach (var column in gridView.Columns)
        {
            // If this is an "auto width" column...
            if (double.IsNaN(column.Width))
            {
                // Set its Width back to NaN to auto-size again
                column.Width = 0;
                column.Width = double.NaN;
            }
        }
    }

    // Definition of the IsAutoUpdatingColumnWidthsProperty attached DependencyProperty
    public static readonly DependencyProperty IsAutoUpdatingColumnWidthsProperty =
        DependencyProperty.RegisterAttached(
            "IsAutoUpdatingColumnWidths",
            typeof(bool),
            typeof(ListViewBehaviors),
            new UIPropertyMetadata(false, OnIsAutoUpdatingColumnWidthsChanged));

    // Get/set methods for the attached DependencyProperty
    [SuppressMessage("Microsoft.Design", "CA1011:ConsiderPassingBaseTypesAsParameters",
        Justification = "Only applies to ListView instances.")]
    public static bool GetIsAutoUpdatingColumnWidths(ListView listView)
    {
        return (bool)listView.GetValue(IsAutoUpdatingColumnWidthsProperty);
    }
    [SuppressMessage("Microsoft.Design", "CA1011:ConsiderPassingBaseTypesAsParameters",
        Justification = "Only applies to ListView instances.")]
    public static void SetIsAutoUpdatingColumnWidths(ListView listView, bool value)
    {
        listView.SetValue(IsAutoUpdatingColumnWidthsProperty, value);
    }

    // Change handler for the attached DependencyProperty
    private static void OnIsAutoUpdatingColumnWidthsChanged(DependencyObject o, DependencyPropertyChangedEventArgs e)
    {
        // Get the ListView instance and new bool value
        var listView = o as ListView;
        if ((null != listView) && (e.NewValue is bool))
        {
            // Get a descriptor for the ListView's ItemsSource property
            var descriptor = DependencyPropertyDescriptor.FromProperty(ListView.ItemsSourceProperty, typeof(ListView));
            if ((bool)e.NewValue)
            {
                // Enabling the feature, so add the change handler
                descriptor.AddValueChanged(listView, OnListViewItemsSourceValueChanged);
            }
            else
            {
                // Disabling the feature, so remove the change handler
                descriptor.RemoveValueChanged(listView, OnListViewItemsSourceValueChanged);
            }
        }
    }

    // Handler for changes to the ListView's ItemsSource updates the column widths
    private static void OnListViewItemsSourceValueChanged(object sender, EventArgs e)
    {
        // Get a reference to the ListView's GridView...
        var listView = sender as ListView;
        if (null != listView)
        {
            var gridView = listView.View as GridView;
            if (null != gridView)
            {
                // And update its column widths
                UpdateColumnWidths(gridView);
            }
        }
    }
}

 

[Click here to download the sample application demonstrating everything described here.]

 

What's neat is how something that started out as a minor annoyance turned into a great learning opportunity! I haven't needed to use DependencyPropertyDescriptor before now, but it's definitely something I'll keep in mind next time something like this comes up. And while I've made use of attached behaviors in the past, I didn't initially think to use one here - my thanks go out to Marlon Grech and Dr. WPF for encouraging me to do so. As it turns out, I like the attached behavior solution best of all for its simplicity, clarity, and separation of concerns. I've incorporated this change into my project and now my ListViews are behaving exactly how I want them to!

Tags: WPF

My new home page, expanded [Updated collection of great Silverlight Charting resources!]

It's been a couple of months since I shared my semi-comprehensive page of Charting resources on the web. During that time, the Silverlight Toolkit's December release came out with some great new Charting features (Woot!) and there have been a number of fantastic Charting posts that I'd like people to be aware of. So I've updated my previous list of links (FYI: old links are grayed-out) with all the new content that caught my eye:

Overviews (100 level)

Scenarios (200 level)

Internals (300 level)

My own Charting posts (Ego level)

Many, many thanks to everyone who has spent time helping others learn how to use Silverlight Charting!

PS - If I've missed any good resources, please leave a comment with a link - I'm always happy to find more quality Charting content! :)

Columns of a different color [Customizing the appearance of Silverlight charts with re-templating and MVVM]

When we created Silverlight Charting (background reading here and here), we tried to make things as designer-friendly as possible. So friendly, in fact, that it would be possible for someone to take the default look-and-feel of what we'd released and significantly enhance it without changing the Charting framework at all. :) That said, it's worth noting that Charting controls are a little different than typical WPF/Silverlight controls: while it might make sense to completely change how a ListBox looks, there are certain aspects of a chart that can't be changed without rendering the visualization meaningless. And so there are certain assumptions behind our Charting implementation around things we didn't expect users to want to change. But that's the great thing about users: they want to change these things anyway! :)

One of the fundamentals of column/bar charts is that the columns/bars of a single series are all drawn the same; that's what ties them together and makes it clear they represent a single series. If you create a column chart in Excel, the default color for the columns is blue. It's easy to change that color to orange or green or plaid, but by default all of the columns of the series change together because they're all part of the same series. (Incidentally, it is possible to change the colors of an individual column in Excel, but it's not entirely obvious how to do so and it's clearly not a mainline scenario.) With that in mind, it's no surprise that our charts behave similarly: you can provide whatever look you want for the columns and bars (via the ColumnSeries.DataPointStyle property, perhaps), but the columns and bars of a particular series always look the same.

But what if your scenario is such that you want to do things a little differently and you want more control over the colors of individual columns and bars? Well, you take advantage of re-templating and Model-View-ViewModel (MVVM), that's what! :) You're reading this blog, so I'll assume you already know what re-templating is - if not, here's a good place to start. Model-View-ViewModel (MVVM) is probably less well known to date - it's an approach to application development commonly used with WPF and Silverlight where simple wrapper classes are used to expose aspects of the underlying data types in a manner that's easy for the UI layer to deal with. You can read lots more about MVVM on John Gossman's blog or this recent MSDN article by Josh Smith. But I'm not here to teach you what re-templating or MVVM are - I'm here to show you how to use them with Charting to implement the multi-colored column scenario!

 

[Click here to download the complete Silverlight 2 source code for the sample application shown/discussed below.]

 

Imagine that you're a teacher and you want to chart the grades of your students. You've already got a basic Student class that exposes some basic properties and you can create instances from a database or a file or something. The Student class probably looks like this:

// Standard data object representing a Student
public class Student : INotifyPropertyChanged
{
    // Student's name
    public string Name { get; private set; }

    // Student's favorite color
    public Brush FavoriteColor { get; private set; }

    // Student's grade
    public double Grade
    {
        get { return _grade; }
        set
        {
            _grade = value;
            Helpers.InvokePropertyChanged(PropertyChanged, this, "Grade");
        }
    }
    private double _grade;

    // Student constructor
    public Student(string name, Brush favoriteColor)
    {
        Name = name;
        FavoriteColor = favoriteColor;
    }

    // INotifyPropertyChanged event
    public event PropertyChangedEventHandler PropertyChanged;
}

The class above exposes a name and a favorite color (which I've implemented here as a Brush for convenience). There's also a grade, but we'll come back to that shortly... The goal is for each column representing a student to be drawn using that student's favorite color. To accomplish this, all we need to do is re-template. Using a designer tool like Blend or something simple like my SilverlightDefaultStyleBrowser, we can copy the default Style for ColumnDataPoint and paste it into our project's resources. By removing stuff that's not relevant to the demonstration and making a single change (highlighted below), we arrive at something like the following:

<Style
    x:Key="ColorByPreferenceColumn"
    TargetType="charting:ColumnDataPoint">
    <Setter Property="Background" Value="DarkGray"/>
    <Setter Property="Template">
        <Setter.Value>
            <ControlTemplate
                TargetType="charting:ColumnDataPoint">
                <Border
                    BorderBrush="{TemplateBinding BorderBrush}"
                    BorderThickness="{TemplateBinding BorderThickness}">
                    <Grid Background="{Binding FavoriteColor}">
                        <Rectangle>
                            <Rectangle.Fill>
                                <LinearGradientBrush>
                                    <GradientStop Color="#77ffffff" Offset="0"/>
                                    <GradientStop Color="#00ffffff" Offset="1"/>
                                </LinearGradientBrush>
                            </Rectangle.Fill>
                        </Rectangle>
                        <Border BorderBrush="#ccffffff" BorderThickness="1">
                            <Border BorderBrush="#77ffffff" BorderThickness="1"/>
                        </Border>
                    </Grid>
                </Border>
            </ControlTemplate>
        </Setter.Value>
    </Setter>
</Style>

This is just a tweak of the default template so that each column pulls its Background Brush from the FavoriteColor property of the underlying data object. Hook that up to a Chart/ColumnSeries in XAML, and that's all there is to it:

Color from Student

By the way, here's the XAML for that Chart:

<charting:Chart
    x:Name="FavoriteColorColumnChart"
    Title="Grades - By Favorite Color"
    Grid.Column="0">
    <charting:ColumnSeries
        DependentValueBinding="{Binding Grade}"
        IndependentValueBinding="{Binding Name}"
        DataPointStyle="{StaticResource ColorByPreferenceColumn}">
        <charting:ColumnSeries.DependentRangeAxis>
            <charting:LinearAxis
                Minimum="0"
                Maximum="100"
                Title="Grade"
                ShowGridLines="True"/>
        </charting:ColumnSeries.DependentRangeAxis>
    </charting:ColumnSeries>
</charting:Chart>
Aside: This process is even easier on WPF! (Assuming I had access to something like daily builds of Charting for WPF, I might have even mocked this up quickly to prove it to myself...) Unfortunately, the necessary "Binding in a Setter" capability is not supported by Silverlight 2 in XAML or code:
<Style
    x:Key="ColorByPreferenceColumn"
    TargetType="charting:ColumnDataPoint">
    <Setter Property="Background" Value="{Binding FavoriteColor}"/>
</Style>

So that's how easy it is to get custom column and bar colors if your data objects already expose the information you need!

But what if you want to base the custom colors on something that's not directly available on the data objects and you also don't have the freedom to change the data objects themselves? In other words - continuing the example above - let's say we decided to change things so the columns are colored according to each student's current grade: great grades get green columns, satisfactory grades get yellow columns, and unsatisfactory grades get red columns.

The first thing to consider when faced with a problem like this is whether an IValueConverter will work. I've written about the usefulness of IValueConverter before, so I won't spend more time on that here. IValueConverter is great if you want to take a single property and mutate it as part of a Binding. But what if you want to do something more complicated than that? Well, on WPF there's IMultiValueConverter which might do the trick, but that's not available on Silverlight and it's not always the answer anyway. So let's take advantage of MVVM to wrap our existing Student data objects with an object that's more view-friendly: StudentViewModel. Here's a trivial StudentViewModel class that exposes a Student and a Brush that's colored according to the Student's Grade property. Because Student implements INotifyPropertyChanged (like a well behaved class should), StudentViewModel can listen for changes to the Grade property and update its Brush automatically. StudentViewModel also implements INotifyPropertyChanged - so that anything referencing it will be notified about changes to the GradeColor property it exposes. Here's how it looks in code:

// Custom data object to wrap a Student object for the view model
public class StudentViewModel : INotifyPropertyChanged
{
    // Student object
    public Student Student { get; private set; }

    // Color representing Student's Grade
    public Brush GradeColor { get; private set; }

    // StudentViewModel constructor
    public StudentViewModel(Student student)
    {
        Student = student;
        student.PropertyChanged += new PropertyChangedEventHandler(HandleStudentPropertyChanged);
    }

    // Detect changes to the Student's grade and update GradeColor
    void HandleStudentPropertyChanged(object sender, PropertyChangedEventArgs e)
    {
        if ("Grade" == e.PropertyName)
        {
            if (Student.Grade < 50)
            {
                GradeColor = new SolidColorBrush { Color = Colors.Red };
            }
            else if (Student.Grade < 80)
            {
                GradeColor = new SolidColorBrush { Color = Colors.Yellow };
            }
            else
            {
                GradeColor = new SolidColorBrush { Color = Colors.Green };
            }
            Helpers.InvokePropertyChanged(PropertyChanged, this, "GradeColor");
        }
    }

    // INotifyPropertyChanged event
    public event PropertyChangedEventHandler PropertyChanged;
}
Aside: I've typically seen view model classes implemented by re-exposing each of the interesting data object properties - so for each property Foo on the data object, there will be a property Foo' on the view model object (which is either identical to the original property or some derivative of it). While I can see the value of this approach in some cases, the duplication of properties always bothers me and so I've instead exposed the entire Student object from the StudentViewModel object as a property (along with the new GradeColor property). This saves me from duplicating any existing properties, exposes the entire Student object to users of the StudentViewModel object, and is completely future-proof because any updates to the Student implementation will automatically show up for users of StudentViewModel.

Now that we've got a view model class that exposes a view-friendly property that is exactly what we need, our job is easy: change the chart to use StudentViewModels and change the custom template to reference the GradeColor property. Here's the new template (with the same kind of change as before):

<Style
    x:Key="ColorByGradeColumn"
    TargetType="charting:ColumnDataPoint">
    <Setter Property="Background" Value="DarkGray"/>
    <Setter Property="Template">
        <Setter.Value>
            <ControlTemplate
                TargetType="charting:ColumnDataPoint">
                <Border
                    BorderBrush="{TemplateBinding BorderBrush}"
                    BorderThickness="{TemplateBinding BorderThickness}">
                    <Grid Background="{Binding GradeColor}">
                        <Rectangle>
                            <Rectangle.Fill>
                                <LinearGradientBrush>
                                    <GradientStop Color="#77ffffff" Offset="0"/>
                                    <GradientStop Color="#00ffffff" Offset="1"/>
                                </LinearGradientBrush>
                            </Rectangle.Fill>
                        </Rectangle>
                        <Border BorderBrush="#ccffffff" BorderThickness="1">
                            <Border BorderBrush="#77ffffff" BorderThickness="1"/>
                        </Border>
                    </Grid>
                </Border>
            </ControlTemplate>
        </Setter.Value>
    </Setter>
</Style>

The XAML for this chart is nearly identical and the end result looks just how we wanted it to:

Color from Grade

Having shown off how re-templating and MVVM enable more advanced Charting customization scenarios, I've accomplished what I set out to do and could have stopped here... But there was still one customer scenario I wanted to address: synchronizing the colors of pie slices in a pie chart with the colors of columns in a column chart. Given what we've just discussed, the solution is easy: just repeat the re-templating process for a second chart with a PieSeries and PieDataPoints. Because the column/slice colors come from the data objects and because both charts are sharing the same data objects, the color for every data object (student) will naturally be the same across both charts. The re-templated XAML is the same as before and the final result is exactly what we want:

Color from Student as Pie

Well, actually, that's not entirely true; Getting the pie slices right was trivial - but there was a bit of additional effort required to synchronize the colors of the pie chart's legend items with the pie slices...

The way things work is that the Series creates whatever LegendItems it needs. As part of that creation, it also creates a "fake" DataPoint that's styled just like the "real" ones displayed in the chart. This fake data point exists so that the LegendItem's default Template can create Bindings for things like the Background and BorderBrush properties. (Recall that users can completely change the look of a DataPoint, so the only way we have to know how something will look is to create it and see.) This approach works out pretty well, but there was an oversight that caused problems for me when I tried to provide my own PieSeries.LegendItemStyle: the DataContext of the fake PieDataPoint wasn't set to the corresponding slice's data object. Normally, that's no big deal because it's unused - however in this case it's a problem because the custom Template we created above gets its color from the data object. Without a bound data object to provide context, the legend items weren't using the right colors. :(

I thought about a few ways to work around this, but eventually decided the fix (the setting of the DataContext property for the fake PieDataPoint) belonged in the Charting code itself. Fortunately, Charting is open source, so it's easy for anybody to make such changes if/when the need arises! I've included a copy of the relevant source file with the one-line change I made (Changes\PieSeries.cs, line 317) and changed the sample project to use a custom build of Charting's Microsoft.Windows.Controls.DataVisualization.dll assembly that includes this change.

And because I'm a nice guy, I also made the same change to the actual charting source code that's under development, got it reviewed, and submitted it (along with an associated unit test) for inclusion in the next official release of the Silverlight Toolkit! After all, if I needed this to work for my sample, chances are good that someone else might need it to work for their application as well. :)

With that fix in place, here is the Style that applies the proper color to the LegendItems:

<Style
    x:Key="ColorByPreferenceLegendItem"
    TargetType="charting:LegendItem">
    <Setter Property="Template">
        <Setter.Value>
            <ControlTemplate TargetType="charting:LegendItem">
                <StackPanel Orientation="Horizontal">
                    <Rectangle
                        Width="8" Height="8"
                        Fill="{Binding DataContext.FavoriteColor}"
                        Stroke="{Binding BorderBrush}"
                        StrokeThickness="1" Margin="0,0,3,0"/>
                    <datavis:Title Content="{TemplateBinding Content}"/>
                </StackPanel>
            </ControlTemplate>
        </Setter.Value>
    </Setter>
</Style>

 

And there you have it: a few simple ways to take Charting and extend it to do exactly what you want! The examples here are fairly simple, but re-templating and MVVM are very powerful concepts which enable a high degree of customization for Silverlight and WPF applications that's pretty hard to come by in other platforms. If you're trying to do something unique and you're not having any luck the "normal" way, please take a few moments to consider the techniques discussed here - you may find that your problem has an easy solution after all!

The proverbial "one line fix" [ComputeFileHashes works around a troublesome Silverlight-on-Mac issue]

When I achieved cross-platform parity by adding MD5 support to the Silverlight version of ComputeFileHashes, I thought I was done for a while. But then I got an email from a coworker reporting that the Silverlight version of ComputeFileHashes running on a Mac under Safari presented an "Add Files" dialog that did not actually let the user select any files. Ouch, that's no good...

I started investigating with a quick web search; the top hit for "OpenFileDialog Mac" showed that others had experienced similar problems and the Silverlight team confirmed a bug. So at least my application wasn't totally broken. :) I wanted to understand the scenario better, but I don't own a Mac (which is why this problem escaped my notice in the first place). Fortunately, I found one at work that I could borrow some cycles on and I wrote a simple test application to invoke the OpenFileDialog with a few different values for the Filter property. ComputeFileHashes was initially passing the value "All Files (*)|*" - effectively just "*" - which was intended to match all files. And, indeed, it does so in WPF and Silverlight/PC. However, on Silverlight/Mac that value seems to match no files. Someone suggested "*.*", but to me that matches all files with a '.' in their name and I didn't want to exclude files that don't happen to have an extension. So I tried "" instead, and that did exactly what I wanted on Silverlight/Mac and Silverlight/PC. I thought I'd found the solution - until I tried the new value on WPF and it caused an exception...

At this point I was tired of cross-platform trial-and-error, and I decided I was inviting trouble by passing any filter string at all! The default behavior of OpenFileDialog is to allow the selection of all files, so I wasn't really adding much value by passing a custom filter that did the same thing. Well, I was providing more explicit filter text in the drop-down of the dialog, but it wasn't worth the compatibility problems I was dealing with. So I removed the line of code that set the Filter property, recompiled, republished, and called it done. :)

The latest version of ComputeFileHashes is now 2009-01-30. I've updated all the binaries in order to avoid version number confusion, but the only real change here is the filter string and the improvement is only visible on Silverlight/Mac. (Note: I did not update the screenshots below, so the versions shown there are out of date.)

  • If you're using Silverlight to run ComputeFileHashes, you'll automatically get the new version next time you run ComputeFileHashes.
  • If you're using ClickOnce to run ComputeFileHashes, the application will automatically update itself after you run it a couple of times.
  • If you're using the WPF or command-line versions, you'll need to download the new binaries and update manually.

Please refer to the original release announcement for more information about supported platforms, source code, implementation, etc..

 

ClickOnce ComputeFileHashes

Click here or on the image below to run the Silverlight version of ComputeFileHashes in your browser.

Silverlight ComputeFileHashes

Click here or on the image below to download the command-line and WPF versions of ComputeFileHashes - along with the ClickOnce and Silverlight versions AND the complete source code for everything!

Command-line ComputeFileHashes

 

Seamless cross-platform support is a tricky matter that's usually best left to others who have the time and resources to do it right. I didn't realize I was introducing a platform dependency by specifying a filter string, but I was... and I got burned by it. That's why it's important to test an application on all the supported configurations: you never know what problem might show up where you least expect it! That said, I'm probably not going to run out and buy myself a Mac just because of this incident - so please accept my apologies in advance should I fall victim to a similar problem in the future. :)

Thank goodness for reference implementations [Low-overhead .NET MD5 implementation (source code and tests) works great on Silverlight!]

In yesterday's post announcing ComputeFileHashes's new support for MD5 on Silverlight, I promised to share some details about my experience getting an MD5 HashAlgorithm implementation for Silverlight. Recall that an MD5 class is available in the desktop .NET Framework, but is not part of Silverlight 2's subset of the .NET Framework. (Probably in order to save space by excluding one of the less-secure cryptographic hash functions - a completely sensible tradeoff.) Because I didn't want to write my own code for MD5 (it's a non-trivial algorithm), the challenge was to find something freely available that I could just drop in and take advantage of. So I was very interested when I found out about Reid Borsuk's managed implementation of an MD5 HashAlgorithm for Silverlight because it sounded perfect for my needs.

The first step of incorporating something like this is to check the license: this code is under Ms-PL, so there were no problems there. The next step is to skim the code and get a general feel for how it works - and it was while doing this that I realized I wouldn't be able to use this code as-is...

To understand why, a little background is required:

The HashAlgorithm abstract class requires that derived classes implement the following methods: Initialize, HashCore, and HashFinal. Initialize gets called once at the start of hashing, then HashCore is called many times (being passed a different block of data each time), and then HashFinal is called once at the end of hashing to finalize any computations and return the computed hash value. It's a straightforward model and is flexible enough to accommodate a wide variety of checksum algorithms. Other than maintaining a few bytes of internal state across calls, there's no need for the hash algorithm to allocate anything: the data flows in from the user, gets processed, and is immediately forgotten about.

Or at least that's how it's supposed to work...

What I found when I looked at the aforementioned MD5 implementation was that it would allocate an internal buffer during Initialize, repeatedly re-allocate that buffer and append new data to the end of it during every call to HashCore, and then process the entire buffer all at once in HashFinal. While this approach works fine for fairly small inputs, it was completely impractical for ComputeFileHashes which is expected to process multi-gigabyte files as a matter of course. All those reallocations and the large internal buffer would quickly exhaust the physical memory of virtually any system in use today on something like the Windows 7 Beta ISO images I've been using for my examples. (In fact, it's a bit more dire than it initially seems: this technique requires twice the memory of the original data: that last call to HashCore needs to copy the nearly-full-sized buffer into a new full-sized buffer.)

Okay, so if I couldn't use the code as-is; the next step is to see what it would take to modify it so that it would work for my scenario. Well, this implementation uses a small HashAlgorithm wrapper class around a core MD5 implementation class, I wondered if it would be a simple matter of changing the way the wrapper called into the core. But looking at things a bit more closely, it seemed the core was not structured for that - and separating things like I wanted might not be trivial.

It's decision time: Do I start changing the structure of this code to work the way I need it to, or do I investigate other options? I considered the implications of both approaches, but it was something a coworker said that convinced me to spend a bit of time looking elsewhere. He asked, "If you don't like a fundamental part of the implementation and feel the need to fix it, why do you think you won't be compelled to make changes to the rest of it as well?" That question expressed my concerns pretty well, so I decided to look into other options for a bit. After all, I could always come back to this if nothing panned out.

The obvious place to start was the MD5 specification: RFC1321, The MD5 Message-Digest Algorithm. The body of this document describes the algorithm in great detail and would be a great place to start writing my own implementation if I was willing to spend a considerable amount of time developing and testing. But the real gem is in the appendix: a reference implementation of MD5 written in C! Fortunately, C's not so different from C# - and I've ported things before - so I had a decent idea what to expect. And it sure is hard to beat the reference implementation from the point of view of obtaining an accurate, (typically) bug-free, chunk of code. There is an accompanying license, but it's open (this is a public specification, after all) and primarily requires that derivative works identify themselves as being "derived from the RSA Data Security, Inc. MD5 Message-Digest Algorithm." (like I just did). So things seemed promising!

I decided to spend a bus ride porting the reference implementation and see how far I got. As it happens (and I'm sure this is no accident), the reference implementation uses exactly the Initialize/HashCore/HashFinal pattern that HashAlgorithm expects. Consequently, each of my own HashAlgorithm wrapper methods simply makes a single call into the ported reference implementation - and all of a sudden concerns about memory exhaustion are a thing of the past! By the end of the bus ride, I had successfully ported the reference implementation to C# and had it passing the seven test cases that are part of the specification.

My mind was pretty much made up at this point: I'd use my port of the MD5 reference implementation for the Silverlight version of ComputeFileHashes. This was code from a reliable source, code I had become familiar with, and code that I'd feel comfortable debugging or tuning if necessary. I beefed up the test cases a bit by exercising all of them for all the possible chunk sizes, addressed a couple of code analysis warnings, and had something ready in a jiffy. I added the MD5Managed class to the Silverlight build of ComputeFileHashes and - yep - it just worked. :)

So here's a(nother) completely managed MD5 implementation that anyone is free to use (subject to the reference implementation's license). I haven't spent time optimizing it - but that was kind of the point (see the class comments below for more). I'd started out trying to avoid writing my own MD5 implementation and I only partly succeeded - but I'm glad with how this worked out and maybe some of you can benefit from what I've done. Even if all you do is run the Silverlight version of ComputeFileHashes from time to time, I feel like my relatively minimal investment was worthwhile! :)

 

[Click here to download a Visual Studio solution containing the source code for a Silverlight-ready managed implementation of MD5 along with the simple test cases discussed above.]

 

using System;
using System.Diagnostics.CodeAnalysis;
using System.Security.Cryptography;

namespace Delay
{
    /// <summary>
    /// MD5Managed: A HashAlgorithm implementation that acts as a thin wrapper
    /// around a C# translation of the MD5 reference implementation. The C code
    /// has been translated as closely as possible so that most of the original
    /// structure remains and comparisons between the two are straightforward.
    /// </summary>
    /// <remarks>
    /// Derived from the RSA Data Security, Inc. MD5 Message-Digest Algorithm.
    /// 
    /// Specification:
    /// RFC1321 - The MD5 Message-Digest Algorithm
    /// http://www.faqs.org/rfcs/rfc1321.html
    /// 
    /// Original license:
    /// Copyright (C) 1991-2, RSA Data Security, Inc. Created 1991. All
    /// rights reserved.
    /// 
    /// License to copy and use this software is granted provided that it
    /// is identified as the "RSA Data Security, Inc. MD5 Message-Digest
    /// Algorithm" in all material mentioning or referencing this software
    /// or this function.
    /// 
    /// License is also granted to make and use derivative works provided
    /// that such works are identified as "derived from the RSA Data
    /// Security, Inc. MD5 Message-Digest Algorithm" in all material
    /// mentioning or referencing the derived work.
    /// 
    /// RSA Data Security, Inc. makes no representations concerning either
    /// the merchantability of this software or the suitability of this
    /// software for any particular purpose. It is provided "as is"
    /// without express or implied warranty of any kind.
    /// 
    /// These notices must be retained in any copies of any part of this
    /// documentation and/or software.
    /// </remarks>
    public class MD5Managed : HashAlgorithm
    {
        // Current context
        private readonly MD5_CTX _context = new MD5_CTX();
        // Last hash result
        private readonly byte[] _digest = new byte[16];
        // True if HashCore has been called
        private bool _hashCoreCalled;
        // True if HashFinal has been called
        private bool _hashFinalCalled;

        /// <summary>
        /// Initializes a new instance.
        /// </summary>
        public MD5Managed()
        {
            InitializeVariables();
        }

        /// <summary>
        /// Initializes internal state.
        /// </summary>
        public override void Initialize()
        {
            InitializeVariables();
        }

        /// <summary>
        /// Initializes variables.
        /// </summary>
        private void InitializeVariables()
        {
            MD5Init(_context);
            _hashCoreCalled = false;
            _hashFinalCalled = false;
        }

        /// <summary>
        /// Updates the hash code with the data provided.
        /// </summary>
        /// <param name="array">Data to hash.</param>
        /// <param name="ibStart">Start position.</param>
        /// <param name="cbSize">Number of bytes.</param>
        protected override void HashCore(byte[] array, int ibStart, int cbSize)
        {
            if (null == array)
            {
                throw new ArgumentNullException("array");
            }

            if (_hashFinalCalled)
            {
                throw new CryptographicException("Hash not valid for use in specified state.");
            }
            _hashCoreCalled = true;

            MD5Update(_context, array, (uint)ibStart, (uint)cbSize);
        }

        /// <summary>
        /// Finalizes the hash code and returns it.
        /// </summary>
        /// <returns></returns>
        protected override byte[] HashFinal()
        {
            _hashFinalCalled = true;
            MD5Final(_digest, _context);
            return Hash;
        }

        /// <summary>
        /// Returns the hash as an array of bytes.
        /// </summary>
        [SuppressMessage("Microsoft.Design", "CA1065:DoNotRaiseExceptionsInUnexpectedLocations", Justification = "Matching .NET behavior by throwing here.")]
        [SuppressMessage("Microsoft.Usage", "CA2201:DoNotRaiseReservedExceptionTypes", Justification = "Matching .NET behavior by throwing NullReferenceException.")]
        public override byte[] Hash
        {
            get
            {
                if (!_hashCoreCalled)
                {
                    throw new NullReferenceException();
                }
                if (!_hashFinalCalled)
                {
                    // Note: Not CryptographicUnexpectedOperationException because that can't be instantiated on Silverlight 4
                    throw new CryptographicException("Hash must be finalized before the hash value is retrieved.");
                }

                return _digest;
            }
        }

        // Return size of hash in bits.
        public override int HashSize
        {
            get
            {
                return _digest.Length * 8;
            }
        }

        ///////////////////////////////////////////////
        // MD5 reference implementation begins here. //
        ///////////////////////////////////////////////

        /* MD5 context. */
        private class MD5_CTX
        {
            public readonly uint[] state;   /* state (ABCD) */
            public readonly uint[] count;   /* number of bits, modulo 2^64 (lsb first) */
            public readonly byte[] buffer;  /* input buffer */

            public MD5_CTX()
            {
                state = new uint[4];
                count = new uint[2];
                buffer = new byte[64];
            }

            public void Clear()
            {
                Array.Clear(state, 0, state.Length);
                Array.Clear(count, 0, count.Length);
                Array.Clear(buffer, 0, buffer.Length);
            }
        }

        /* Constants for MD5Transform routine. */
        private const int S11 = 7;
        private const int S12 = 12;
        private const int S13 = 17;
        private const int S14 = 22;
        private const int S21 = 5;
        private const int S22 = 9;
        private const int S23 = 14;
        private const int S24 = 20;
        private const int S31 = 4;
        private const int S32 = 11;
        private const int S33 = 16;
        private const int S34 = 23;
        private const int S41 = 6;
        private const int S42 = 10;
        private const int S43 = 15;
        private const int S44 = 21;

        private static byte[] PADDING;

        [SuppressMessage("Microsoft.Performance", "CA1810:InitializeReferenceTypeStaticFieldsInline", Justification = "More compact this way")]
        static MD5Managed()
        {
            PADDING = new byte[64];
            PADDING[0] = 0x80;
        }

        /* F, G, H and I are basic MD5 functions. */
        private static uint F(uint x, uint y, uint z) { return (((x) & (y)) | ((~x) & (z))); }
        private static uint G(uint x, uint y, uint z) { return (((x) & (z)) | ((y) & (~z))); }
        private static uint H(uint x, uint y, uint z) { return ((x) ^ (y) ^ (z)); }
        private static uint I(uint x, uint y, uint z) { return ((y) ^ ((x) | (~z))); }

        /* ROTATE_LEFT rotates x left n bits. */
        private static uint ROTATE_LEFT(uint x, int n) { return (((x) << (n)) | ((x) >> (32 - (n)))); }

        /* FF, GG, HH, and II transformations for rounds 1, 2, 3, and 4.
           Rotation is separate from addition to prevent recomputation. */
        private static void FF(ref uint a, uint b, uint c, uint d, uint x, int s, uint ac)
        {
            (a) += F((b), (c), (d)) + (x) + (uint)(ac);
            (a) = ROTATE_LEFT((a), (s));
            (a) += (b);
        }
        private static void GG(ref uint a, uint b, uint c, uint d, uint x, int s, uint ac)
        {
            (a) += G((b), (c), (d)) + (x) + (uint)(ac);
            (a) = ROTATE_LEFT((a), (s));
            (a) += (b);
        }
        private static void HH(ref uint a, uint b, uint c, uint d, uint x, int s, uint ac)
        {
            (a) += H((b), (c), (d)) + (x) + (uint)(ac);
            (a) = ROTATE_LEFT((a), (s));
            (a) += (b);
        }
        private static void II(ref uint a, uint b, uint c, uint d, uint x, int s, uint ac)
        {
            (a) += I((b), (c), (d)) + (x) + (uint)(ac);
            (a) = ROTATE_LEFT((a), (s));
            (a) += (b);
        }

        /* MD5 initialization. Begins an MD5 operation, writing a new context. */
        private static void MD5Init(MD5_CTX context)  /* context */
        {
            context.count[0] = context.count[1] = 0;

            /* Load magic initialization constants. */
            context.state[0] = 0x67452301;
            context.state[1] = 0xefcdab89;
            context.state[2] = 0x98badcfe;
            context.state[3] = 0x10325476;
        }

        /* MD5 block update operation. Continues an MD5 message-digest
           operation, processing another message block, and updating the
           context. */
        private static void MD5Update(MD5_CTX context,  /* context */
                                      byte[] input,     /* input block */
                                      uint inputIndex,  // Starting index for input block
                                      uint inputLen)    /* length of input block */
        {
            /* Compute number of bytes mod 64 */
            uint index = (uint)((context.count[0] >> 3) & 0x3F);

            /* Update number of bits */
            if ((context.count[0] += ((uint)inputLen << 3)) < ((uint)inputLen << 3))
            {
                context.count[1]++;
            }
            context.count[1] += ((uint)inputLen >> 29);

            uint partLen = 64 - index;

            /* Transform as many times as possible. */
            uint i = 0;
            if (inputLen >= partLen)
            {
                Buffer.BlockCopy(input, (int)inputIndex, context.buffer, (int)index, (int)partLen);
                MD5Transform(context.state, context.buffer, 0);

                for (i = partLen; i + 63 < inputLen; i += 64)
                {
                    MD5Transform(context.state, input, inputIndex + i);
                }

                index = 0;
            }

            /* Buffer remaining input */
            Buffer.BlockCopy(input, (int)(inputIndex + i), context.buffer, (int)index, (int)(inputLen - i));
        }

        /* MD5 finalization. Ends an MD5 message-digest operation, writing the
           the message digest and zeroizing the context. */
        private static void MD5Final(byte[] digest,    /* message digest */
                                     MD5_CTX context)  /* context */
        {
            byte[] bits = new byte[8];

            /* Save number of bits */
            Encode(bits, context.count, 8);

            /* Pad out to 56 mod 64. */
            uint index = (uint)((context.count[0] >> 3) & 0x3f);
            uint padLen = (index < 56) ? (56 - index) : (120 - index);
            MD5Update(context, PADDING, 0, padLen);

            /* Append length (before padding) */
            MD5Update(context, bits, 0, 8);

            /* Store state in digest */
            Encode(digest, context.state, 16);

            /* Zeroize sensitive information. */
            context.Clear();
        }

        /* MD5 basic transformation. Transforms state based on block. */
        private static void MD5Transform(uint[] state,
                                         byte[] block,
                                         uint blockIndex)
        {
            uint a = state[0], b = state[1], c = state[2], d = state[3];
            uint[] x = new uint[16];

            Decode(x, block, blockIndex, 64);

            /* Round 1 */
            FF(ref a, b, c, d, x[0],  S11, 0xd76aa478); /* 1 */
            FF(ref d, a, b, c, x[1],  S12, 0xe8c7b756); /* 2 */
            FF(ref c, d, a, b, x[2],  S13, 0x242070db); /* 3 */
            FF(ref b, c, d, a, x[3],  S14, 0xc1bdceee); /* 4 */
            FF(ref a, b, c, d, x[4],  S11, 0xf57c0faf); /* 5 */
            FF(ref d, a, b, c, x[5],  S12, 0x4787c62a); /* 6 */
            FF(ref c, d, a, b, x[6],  S13, 0xa8304613); /* 7 */
            FF(ref b, c, d, a, x[7],  S14, 0xfd469501); /* 8 */
            FF(ref a, b, c, d, x[8],  S11, 0x698098d8); /* 9 */
            FF(ref d, a, b, c, x[9],  S12, 0x8b44f7af); /* 10 */
            FF(ref c, d, a, b, x[10], S13, 0xffff5bb1); /* 11 */
            FF(ref b, c, d, a, x[11], S14, 0x895cd7be); /* 12 */
            FF(ref a, b, c, d, x[12], S11, 0x6b901122); /* 13 */
            FF(ref d, a, b, c, x[13], S12, 0xfd987193); /* 14 */
            FF(ref c, d, a, b, x[14], S13, 0xa679438e); /* 15 */
            FF(ref b, c, d, a, x[15], S14, 0x49b40821); /* 16 */

            /* Round 2 */
            GG(ref a, b, c, d, x[1],  S21, 0xf61e2562); /* 17 */
            GG(ref d, a, b, c, x[6],  S22, 0xc040b340); /* 18 */
            GG(ref c, d, a, b, x[11], S23, 0x265e5a51); /* 19 */
            GG(ref b, c, d, a, x[0],  S24, 0xe9b6c7aa); /* 20 */
            GG(ref a, b, c, d, x[5],  S21, 0xd62f105d); /* 21 */
            GG(ref d, a, b, c, x[10], S22, 0x02441453); /* 22 */
            GG(ref c, d, a, b, x[15], S23, 0xd8a1e681); /* 23 */
            GG(ref b, c, d, a, x[4],  S24, 0xe7d3fbc8); /* 24 */
            GG(ref a, b, c, d, x[9],  S21, 0x21e1cde6); /* 25 */
            GG(ref d, a, b, c, x[14], S22, 0xc33707d6); /* 26 */
            GG(ref c, d, a, b, x[3],  S23, 0xf4d50d87); /* 27 */
            GG(ref b, c, d, a, x[8],  S24, 0x455a14ed); /* 28 */
            GG(ref a, b, c, d, x[13], S21, 0xa9e3e905); /* 29 */
            GG(ref d, a, b, c, x[2],  S22, 0xfcefa3f8); /* 30 */
            GG(ref c, d, a, b, x[7],  S23, 0x676f02d9); /* 31 */
            GG(ref b, c, d, a, x[12], S24, 0x8d2a4c8a); /* 32 */

            /* Round 3 */
            HH(ref a, b, c, d, x[5],  S31, 0xfffa3942); /* 33 */
            HH(ref d, a, b, c, x[8],  S32, 0x8771f681); /* 34 */
            HH(ref c, d, a, b, x[11], S33, 0x6d9d6122); /* 35 */
            HH(ref b, c, d, a, x[14], S34, 0xfde5380c); /* 36 */
            HH(ref a, b, c, d, x[1],  S31, 0xa4beea44); /* 37 */
            HH(ref d, a, b, c, x[4],  S32, 0x4bdecfa9); /* 38 */
            HH(ref c, d, a, b, x[7],  S33, 0xf6bb4b60); /* 39 */
            HH(ref b, c, d, a, x[10], S34, 0xbebfbc70); /* 40 */
            HH(ref a, b, c, d, x[13], S31, 0x289b7ec6); /* 41 */
            HH(ref d, a, b, c, x[0],  S32, 0xeaa127fa); /* 42 */
            HH(ref c, d, a, b, x[3],  S33, 0xd4ef3085); /* 43 */
            HH(ref b, c, d, a, x[6],  S34, 0x04881d05); /* 44 */
            HH(ref a, b, c, d, x[9],  S31, 0xd9d4d039); /* 45 */
            HH(ref d, a, b, c, x[12], S32, 0xe6db99e5); /* 46 */
            HH(ref c, d, a, b, x[15], S33, 0x1fa27cf8); /* 47 */
            HH(ref b, c, d, a, x[2],  S34, 0xc4ac5665); /* 48 */

            /* Round 4 */
            II(ref a, b, c, d, x[0],  S41, 0xf4292244); /* 49 */
            II(ref d, a, b, c, x[7],  S42, 0x432aff97); /* 50 */
            II(ref c, d, a, b, x[14], S43, 0xab9423a7); /* 51 */
            II(ref b, c, d, a, x[5],  S44, 0xfc93a039); /* 52 */
            II(ref a, b, c, d, x[12], S41, 0x655b59c3); /* 53 */
            II(ref d, a, b, c, x[3],  S42, 0x8f0ccc92); /* 54 */
            II(ref c, d, a, b, x[10], S43, 0xffeff47d); /* 55 */
            II(ref b, c, d, a, x[1],  S44, 0x85845dd1); /* 56 */
            II(ref a, b, c, d, x[8],  S41, 0x6fa87e4f); /* 57 */
            II(ref d, a, b, c, x[15], S42, 0xfe2ce6e0); /* 58 */
            II(ref c, d, a, b, x[6],  S43, 0xa3014314); /* 59 */
            II(ref b, c, d, a, x[13], S44, 0x4e0811a1); /* 60 */
            II(ref a, b, c, d, x[4],  S41, 0xf7537e82); /* 61 */
            II(ref d, a, b, c, x[11], S42, 0xbd3af235); /* 62 */
            II(ref c, d, a, b, x[2],  S43, 0x2ad7d2bb); /* 63 */
            II(ref b, c, d, a, x[9],  S44, 0xeb86d391); /* 64 */

            state[0] += a;
            state[1] += b;
            state[2] += c;
            state[3] += d;

            /* Zeroize sensitive information. */
            Array.Clear(x, 0, x.Length);
        }

        /* Encodes input (UINT4) into output (unsigned char). Assumes len is
           a multiple of 4. */
        private static void Encode(byte[] output,
                                   uint[] input,
                                   uint len)
        {
            for (uint i = 0, j = 0; j < len; i++, j += 4)
            {
                output[j] = (byte)(input[i] & 0xff);
                output[j + 1] = (byte)((input[i] >> 8) & 0xff);
                output[j + 2] = (byte)((input[i] >> 16) & 0xff);
                output[j + 3] = (byte)((input[i] >> 24) & 0xff);
            }
        }

        /* Decodes input (unsigned char) into output (UINT4). Assumes len is
           a multiple of 4. */
        private static void Decode(uint[] output,
                                   byte[] input,
                                   uint inputIndex,
                                   uint len)
        {
            for (uint i = 0, j = 0; j < len; i++, j += 4)
            {
                output[i] = ((uint)input[inputIndex + j]) |
                    (((uint)input[inputIndex + j + 1]) << 8) |
                    (((uint)input[inputIndex + j + 2]) << 16) |
                    (((uint)input[inputIndex + j + 3]) << 24);
            }
        }
    }
}

 

Updated 2009-02-16: Call MD5Init from the constructor for consistency with the Framework's HashAlgorithm classes where a call to Initialize is not necessary for a newly constructed instance.

Updated 2010-12-06: Added missing inputIndex offset to MD5Update method.

Cross-platform feature parity: achieved [Silverlight version of ComputeFileHashes now includes MD5!]

I was very happy with last week's release of ComputeFileHashes supporting the command-line, WPF, Silverlight, *and* ClickOnce. Only one thing bothered me: the Silverlight version didn't do MD5 due to the lack of support for that type of checksum by Silverlight 2. Recall that I'd fairly happily implemented my own CRC-32 class because none of the platforms supported it. [Also, it was relatively simple and had a good reference implementation. :) ] But because MD5 is a more complex algorithm and was only missing on Silverlight, I was reluctant to do the same thing for MD5...

What I really wanted was a freely available, Silverlight compatible HashAlgorithm-based MD5 implementation that I could trivially drop into my code and use on Silverlight. So I was excited when kind reader (and teammate!) Jeff Wilcox left a comment pointing to something that sounded perfect for my needs. I told Jeff I'd add MD5 for Silverlight and mentally breathed a sigh of relief that all four of ComputeFileHashes's supported platforms would provide the same set of checksums.

As it turns out, after a bit of research I decided not to use that MD5 implementation. (I'll explain why in my next post.) However, now that I'd fully bought in to the idea of MD5 on Silverlight, I was reluctant to let it go... So I spent some time working on an alternate solution and developed something I'm quite happy with. So I'm able to release an update to ComputeFileHashes that offers MD5 support on Silverlight!

The latest version of ComputeFileHashes is now 2009-01-26. I've updated all the binaries in order to avoid version number confusion - but the only real change here is the addition of MD5 for Silverlight. (FYI, I only updated the screenshot of the Silverlight version below.)

  • If you're using Silverlight to run ComputeFileHashes, you'll automatically get the new version next time you run ComputeFileHashes.
  • If you're using ClickOnce to run ComputeFileHashes, the application will automatically update itself after you run it a couple of times.
  • If you're using the WPF or command-line versions, you'll need to download the new binaries and update manually.

Please refer to the original release announcement for more information about supported platforms, source code, implementation, etc..

 

ClickOnce ComputeFileHashes

Click here or on the image below to run the Silverlight version of ComputeFileHashes in your browser.

Silverlight ComputeFileHashes

Click here or on the image below to download the command-line and WPF versions of ComputeFileHashes - along with the ClickOnce and Silverlight versions AND the complete source code for everything!

Command-line ComputeFileHashes

 

I've said that "ComputeFileHashes is a simple tool intended to make verifying checksums easy for anyone.". And in some ways, I think the Silverlight version is the easiest option of all because there's no need to install it on your machine and it runs everywhere Silverlight 2 does (PC, Mac, (Linux soon!), Internet Explorer, Firefox, Safari, ...). So I'm really glad to add MD5 support to ComputeFileHashes for Silverlight - I hope you enjoy the new functionality!

Math is hard, let's go shopping. [Minor bug fix for free CRC-32 HashAlgorithm implementation for .NET]

While working on code for an upcoming blog post, I found myself dealing with the HashAlgorithm.HashSize property again and realized I'd made a silly mistake a few days ago... :(

I'm pretty sure I remember consulting the documentation when implementing this method for my free CRC-32 HashAlgorithm implementation, and I obviously believed the correct behavior was to return the size in bytes because that's what my comment says and that's what my code does. However, the documentation seems pretty clear on the matter: Gets the size, in bits, of the computed hash code." (emphasis mine). So my initial implementation of this property was wrong. Fortunately, none of the four implementations of ComputeFileHashes (command-line, WPF, ClickOnce, Silverlight) make use of HashSize, so they're not affected by this bug. Unfortunately, anyone who decided to use my CRC-32 implementation and referenced this property would see the wrong value. For their sake, I've just made the trivial fix to the code (multiplying the byte count by 8 to get bit count), updated the comment for that property, added a note about the update to the bottom of the original post, and republished it.

I'm very sorry for the error and any trouble this may have caused.

 

PS - The version of CRC32.cs in the ComputeFileHashes source code download has not been updated - but will be in a few days as part of an upcoming post.

Tags: Technical

Gratuitous platform support [ComputeFileHashes works on the command-line, on WPF, on Silverlight, and via ClickOnce!]

Last week, I released the ComputeFileHashes tool for calculating file checksums. (To read more about what checksums are and why they're useful, please refer to that post.) ComputeFileHashes is a fairly simple .NET command-line application for calculating the MD5, SHA-1, and CRC-32 hashes of one or more files. It takes advantage of the multi-processing capabilities of today's hardware to complete that task quickly - roughly on par with native-code implementations. ComputeFileHashes works quite well and I happily used it to verify the recently released Windows 7 Beta ISO images I'd downloaded.

Because not everybody is a fan of command-line tools, I thought it would be nice to use WPF to create a more user-friendly version of ComputeFileHashes. Once I'd done that, I knew it would be a trivial matter to publish the WPF version via ClickOnce to enable an absurdly easy install scenario. From there, porting to Silverlight would be straightforward and would offer an install-free, completely web-based solution with cross-platform (ex: PC/Mac), cross-browser (ex: IE/Firefox/Safari) appeal. What's more, because all of these platforms are built on .NET, so I expected to be able to take significant advantage of code sharing!

 

ClickOnce ComputeFileHashes

Click here or on the image below to run the Silverlight version of ComputeFileHashes in your browser.

Silverlight ComputeFileHashes

Click here or on the image below to download the command-line and WPF versions of ComputeFileHashes - along with the ClickOnce and Silverlight versions AND the complete source code for everything!

Command-line ComputeFileHashes

 

Implementation notes:

  • The command-line version of ComputeFileHashes is a standard .NET 2.0 application and should work pretty much everywhere. The Silverlight version requires Silverlight 2 which is tiny and can be completely installed in less than a minute start-to-finish. The WPF/ClickOnce versions are a little more advanced and require .NET 3.5 SP1 (conveniently pre-installed on all Windows 7 machines!). If you don't already have .NET 3.5 SP1 (and you may not because Windows Update still doesn't seem to offer it), you can get the .NET 3.5 SP1 installer from here. Unfortunately, the only indication of .NET 3.5 SP1 not being installed seems to be an application crash immediately after starting the stand-alone WPF version. :( Fortunately, the ClickOnce version knows about the .NET 3.5 SP1 prerequisite and should offer to install it automatically if it's not already present.
  • None of the computation or file processing is performed on the main user interface thread under WPF or Silverlight, so ComputeFileHashes remains responsive even when working on a large file. Additional files can be queued for processing or the application/browser can be closed without the user having to wait.
  • As I hoped, I was able to achieve a very high degree of code sharing. By refactoring the original ComputeFileHashes code slightly, I pulled the core implementation out into a common class/file that everything shares. Then I put nearly all of the user interface functionality into another class/file that the WPF and Silverlight implementations both share. The XAML for the WPF and Silverlight versions is separate, but very similar. (There are enough slight differences between the two versions that I deliberately did not attempt to share the same XAML file.)
  • The source code structure looks like this:
    ComputeFileHashesCore.cs Core implementation of the file hashing code shared by all implementations. Makes use of multiple threads to perform hash calculations in parallel.
    ComputeFileHashesUI.cs User interface code shared by the WPF and Silverlight implementations. Makes use of a worker thread to push all computation off of the user interface thread and keep the application responsive. Defers to ComputeFileHashesCore for hashing functionality.
    CRC32.cs
    WaitingRoom.cs
    Custom CRC-32 HashAlgorithm implementation and synchronization object shared by all implementations.
    HashFileInfo.cs
    BlockingQueue.cs
    Data object for tracking state and custom Queue subclass that are shared by the WPF and Silverlight implementations.
    ComputeFileHashesCL\
       ComputeFileHashesCL.cs
    Command-line interface for handling arguments and displaying progress. Defers to ComputeFileHashesCore for hashing functionality.
    ComputeFileHashesWPF\
       Window1.xaml
       Window1.xaml.cs
    WPF definition of the application window. Defers to ComputeFileHashesUI for nearly all functionality.
    ComputeFileHashesSL\
       Page.xaml
       Page.xaml.cs
    Silverlight definition of the application window. Defers to ComputeFileHashesUI for nearly all functionality.
  • A look at the screenshots above reveals a few differences between the WPF and Silverlight implementations:
    • The DataGrids look different. On WPF, ComputeFileHashes makes use of the WPFToolkit's DataGrid; on Silverlight it uses the DataGrid in the SDK. The two are very similar to use from a developer perspective, but they draw themselves differently and have some slightly user-level functionality changes due to platform differences. This was actually my first experience with either DataGrid and I was happy to find that they both worked the same - and pretty much the way I expected them to!
    • The Silverlight version does not calculate the MD5 hash. This is because Silverlight's .NET doesn't implement the MD5 HashAlgorithm subclass while the desktop's .NET does. MD5 is not trivial to write, so I wasn't too interested in developing my own implementation like I did for CRC-32 (which isn't supported on any .NET platform).
    • The Silverlight version includes a "Details" column that's not present on the WPF version. On WPF, it's trivial to create a ToolTip for a DataGrid column, but on Silverlight the ToolTipService class must be used and my attempts to set its attached property with a Style were met with ... resistance. So if an exception is thrown when processing a file, the WPF version will show the exception text in a ToolTip while the Silverlight version shows it in the Details column.
    • The Silverlight version does not support drag-and-drop from the Windows Explorer. Running within a browser imposes certain limitations on Silverlight; an inability to integrate quite as richly with the operating system is one of them.
    • The hyperlinks aren't quite the same. Silverlight ships with HyperlinkButton which is exactly the right control for the job here. WPF doesn't have that control, so the similar Hyperlink control is made to behave as desired with a small bit of code.

 

In the original release announcement, I wrote that "ComputeFileHashes is a simple tool intended to make verifying checksums easy for anyone.". Well, that's still the case - and adding support for WPF, ClickOnce, and Silverlight should make it even easier for everyone to use. Just decide what kind of user interface you prefer, and start using that version of ComputeFileHashes for all your checksumming needs! :)

The doctor will see you now... [WaitingRoom is a reusable synchronization object for .NET]

In my notes for the release of the ComputeFileHashes tool (and source code), I mentioned that I'd written a synchronization object for managing the parallel computation of checksums across multiple threads. I called this class WaitingRoom (after failing to come up with anything better) and thought I'd write about the pattern it implements so others might use it as well.

To understand the motivation behind the WaitingRoom class, it's helpful to understand a bit about how ComputeFileHashes works. For those who haven't read the implementation notes, the basic goal is to enable parallel computation of multiple checksum algorithms for sequential chunks of a file. So there's a primary thread which is responsible for opening the file and making consecutive blocks of it available for processing and there are one or more worker threads which are responsible for processing the data in each block that becomes available. For performance reasons, there are two buffers for these blocks and they're swapped repeatedly: one is the "current" block (which has valid data) and the other is the "next" block (which is being filled-in). It's important to ensure the worker threads only access a block when it's valid, so the primary thread needs a way to tell the worker threads when it's safe to start as well as a way to find out when they've all finished with a block.

The WaitingRoom class makes this synchronization process easy by exposing three methods: Wait, Release, and Arrive. Both Wait and Release are used by the primary thread: it calls Wait to block until all worker threads are "in the waiting room" (during which time they're all blocked) and then calls Release to "open the waiting room door" and unblock the worker threads. [Okay, the analogy breaks down a bit here because most doctors admit only *one* patient at a time. :) ] The worker threads call Arrive to signal they've "entered the waiting room" and automatically block until the primary thread releases them. What's great about using WaitingRoom is that the order of the threads doesn't matter: the primary thread can be ready first, or the worker threads can be ready first, or the primary thread can be ready after some - but not all - of the worker threads are ready. Whatever the order, WaitingRoom coordinates things smoothly!

As it happens, ComputeFileHashes uses two WaitingRoom instances. The first instance is used to wait for the worker threads to be ready to process a new file. When this is going on, the primary thread has nothing else to do, so it calls Wait and then immediately calls Release when the wait completes. The second instance is used to wait for the worker threads to finish processing a block of data. In this case, the primary thread has some work it can do in parallel with the workers: it reads the next block of data from disk and updates the status display. So it makes a call to Release (which gets the worker threads going), does its own work while they're busy, then calls Wait to wait for them to finish. After that, it's safe to swap the "current" and "next" buffers and repeat the same steps with the next block.

The complete implementation of WaitingRoom is below. It uses only the standard .NET Monitor class - along with the C# lock statement to simplify the syntax a bit. There are a few Debug.Asserts in there to try to keep callers honest, but it's up to the developer to ensure that Wait and Release are only called by the primary thread and that Arrive is only called by the worker threads.

Here's what it looks like:

using System.Diagnostics;
using System.Threading;

namespace ComputeFileHashes
{
    /// <summary>
    /// Implements a synchronization object that allows an owner
    /// thread to synchronize with its worker threads.
    /// </summary>
    class WaitingRoom
    {
        // Number of worker threads
        private readonly int _capacity;
        // Object on which to lock for entry
        private readonly object _entryLock = new object();
        // Object on which to lock for exit
        private readonly object _exitLock = new object();
        // Current count of worker threads
        private int _count;
        // "Sign" of owner/worker threads
        private bool _sign;

        /// <summary>
        /// Initializes a new instance.
        /// </summary>
        /// <param name="capacity">Number of worker threads.</param>
        public WaitingRoom(int capacity)
        {
            Debug.Assert(0 < capacity);
            _capacity = capacity;
        }

        /// <summary>
        /// Waits for all worker threads to call the Arrive method.
        /// </summary>
        public void Wait()
        {
            // Claim entry lock
            lock (_entryLock)
            {
                Debug.Assert((0 <= _count) && (_count <= _capacity));

                // Block if all worker threads have not arrived
                if (_count < _capacity)
                {
                    Monitor.Wait(_entryLock);
                }
            }
        }

        /// <summary>
        /// Signals the presence/availability of a worker thread.
        /// </summary>
        public void Arrive()
        {
            // Claim the entry lock
            bool sign;
            lock (_entryLock)
            {
                Debug.Assert((0 <= _count) && (_count < _capacity));

                // Capture sign
                sign = _sign;

                // Wake owner thread if all worker threads present
                _count++;
                if (_count == _capacity)
                {
                    Monitor.Pulse(_entryLock);
                }
            }

            // Claim the exit lock
            lock (_exitLock)
            {
                // Block if owner has not yet released the worker threads
                if (sign == _sign)
                {
                    Monitor.Wait(_exitLock);
                }
            }
        }

        public void Release()
        {
            // Claim the exit lock
            lock (_exitLock)
            {
                Debug.Assert(_count == _capacity);

                // Reset count and flip sign
                _count = 0;
                _sign = !_sign;

                // Wake worker threads
                Monitor.PulseAll(_exitLock);
            }
        }
    }
}

There's probably an official name for this kind of synchronization primitive, but I don't know what it is. :| I poked around a bit on Wikipedia's "Currency Control" page just now and found something called room synchronization that sounds close... I think WaitingRoom may be a type of room synchronization - with some specialized restrictions for the specifics of the ComputeFileHashes scenario. But whatever you call it, WaitingRoom is a pretty handy class to work with. Feel free to use WaitingRoom in your own code - maybe even drop me a note if you find other interesting uses. And if anyone reading knows the real name for what it's doing, please let me know! :)

Tags: Technical

Free hash [A reusable CRC-32 HashAlgorithm implementation for .NET]

In the notes for yesterday's release of the ComputeFileHashes tool (and source code), I mentioned that I'd written my own .NET HashAlgorithm class to compute CRC-32 hash values. The complete implementation can be found below and should behave just like every other HashAlgorithm subclass (ex: MD5 or SHA1). The code here is based on the CRC-32 reference implementation provided in Annex D of the PNG specification and pretty much "just worked". It implements the necessary Initialize, HashCore, and HashFinal methods as well as the technically optional (but practically necessary) Hash and HashSize properties. There's no test code to speak of, though it's worth pointing out that I've run tens of gigabytes of data through my ComputeFileHashes tool and have verified the correctness of the computed CRC-32 value for each test file. :)

Without further ado:

using System;
using System.Security.Cryptography;

namespace Delay
{
    /// <summary>
    /// HashAlgorithm implementation for CRC-32.
    /// </summary>
    [System.Diagnostics.CodeAnalysis.SuppressMessage("Microsoft.Naming",
        "CA1709:IdentifiersShouldBeCasedCorrectly", MessageId = "CRC",
        Justification = "Matching algorithm acronym.")]
    public class CRC32 : HashAlgorithm
    {
        // Shared, pre-computed lookup table for efficiency
        private static readonly uint[] _crc32Table;

        /// <summary>
        /// Initializes the shared lookup table.
        /// </summary>
        [System.Diagnostics.CodeAnalysis.SuppressMessage("Microsoft.Performance",
            "CA1810:InitializeReferenceTypeStaticFieldsInline", Justification =
            "Table values must be computed; not possible to remove the static constructor.")]
        static CRC32()
        {
            // Allocate table
            _crc32Table = new uint[256];

            // For each byte
            for (uint n = 0; n < 256; n++)
            {
                // For each bit
                uint c = n;
                for (int k = 0; k < 8; k++)
                {
                    // Compute value
                    if (0 != (c & 1))
                    {
                        c = 0xedb88320 ^ (c >> 1);
                    }
                    else
                    {
                        c = c >> 1;
                    }
                }

                // Store result in table
                _crc32Table[n] = c;
            }
        }

        // Current hash value
        private uint _crc32Value;

        // True if HashCore has been called
        private bool _hashCoreCalled;

        // True if HashFinal has been called
        private bool _hashFinalCalled;

        /// <summary>
        /// Initializes a new instance.
        /// </summary>
        public CRC32()
        {
            InitializeVariables();
        }

        /// <summary>
        /// Initializes internal state.
        /// </summary>
        public override void Initialize()
        {
            InitializeVariables();
        }

        /// <summary>
        /// Initializes variables.
        /// </summary>
        private void InitializeVariables()
        {
            _crc32Value = uint.MaxValue;
            _hashCoreCalled = false;
            _hashFinalCalled = false;
        }

        /// <summary>
        /// Updates the hash code for the provided data.
        /// </summary>
        /// <param name="array">Data.</param>
        /// <param name="ibStart">Start position.</param>
        /// <param name="cbSize">Number of bytes.</param>
        protected override void HashCore(byte[] array, int ibStart, int cbSize)
        {
            if (null == array)
            {
                throw new ArgumentNullException("array");
            }

            if (_hashFinalCalled)
            {
                throw new CryptographicException(
                    "Hash not valid for use in specified state.");
            }
            _hashCoreCalled = true;

            for (int i = ibStart; i < ibStart + cbSize; i++)
            {
                byte index = (byte)(_crc32Value ^ array[i]);
                _crc32Value = _crc32Table[index] ^ ((_crc32Value >> 8) & 0xffffff);
            }
        }

        /// <summary>
        /// Finalizes the hash code and returns it.
        /// </summary>
        /// <returns></returns>
        protected override byte[] HashFinal()
        {
            _hashFinalCalled = true;
            return Hash;
        }

        /// <summary>
        /// Returns the hash as an array of bytes.
        /// </summary>
        [System.Diagnostics.CodeAnalysis.SuppressMessage("Microsoft.Design",
            "CA1065:DoNotRaiseExceptionsInUnexpectedLocations", Justification =
            "Matching .NET behavior by throwing here.")]
        [System.Diagnostics.CodeAnalysis.SuppressMessage("Microsoft.Usage",
            "CA2201:DoNotRaiseReservedExceptionTypes", Justification =
            "Matching .NET behavior by throwing NullReferenceException.")]
        public override byte[] Hash
        {
            get
            {
                if (!_hashCoreCalled)
                {
                    throw new NullReferenceException();
                }
                if (!_hashFinalCalled)
                {
                    // Note: Not CryptographicUnexpectedOperationException because
                    // that can't be instantiated on Silverlight 4
                    throw new CryptographicException(
                        "Hash must be finalized before the hash value is retrieved.");
                }

                // Convert complement of hash code to byte array
                byte[] bytes = BitConverter.GetBytes(~_crc32Value);

                // Reverse for proper endianness, and return
                Array.Reverse(bytes);
                return bytes;
            }
        }

        // Return size of hash in bits.
        public override int HashSize
        {
            get
            {
                return 4 * 8;
            }
        }
    }
}

The CRC32 class presented here is nothing fancy, but it should be a pretty solid implementation of the once-popular CRC-32 algorithm that's ripe for reuse. Thanks to .NET's HashAlgorithm base class, it's easy to drop in CRC32 anywhere hashes are already being computed. I hope you find it useful!

Updated 2009-01-22: Corrected implementation of HashSize to return hash size in bits instead of bytes.

Updated 2009-02-16: Initialize the _crc32Value variable to its starting value for consistency with the Framework's HashAlgorithm classes where a call to Initialize is not necessary for a newly constructed instance.

Updated 2010-12-06: Added missing ibStart offset to HashCore loop.

Tags: Technical