Hash for the holidays [Managed implementation of CRC32 and MD5 algorithms updated; new release of ComputeFileHashes for Silverlight, WPF, and the command-line!]
It feels like a long time since I last wrote about hash functions (though certain curmudgeonly coworkers would say not long enough!), and there were a few loose ends I've been meaning to deal with...
Aside: If my hashing efforts are new to you, more information can be found in my introduction to the ComputeFileHashes command-line tool and the subsequent release of ComputeFileHashes versions for the WPF and Silverlight platforms.
When I first needed a managed implementation of the CRC-32 algorithm a while back, I ended up creating one from the reference implementation. Thanks to the strong similarities between C and C#, the algorithm itself required only minimal tweaks and the majority of my effort was packaging it up as a .NET HashAlgorithm. Because HashAlgorithm
is the base class of all .NET hash functions, the CRC32
class ends up being trivial to drop into any .NET application that already deals with hashing.
The Silverlight platform doesn't include an implementation of the MD5 algorithm like "desktop" .NET does, and I soon ended up creating an MD5 implementation from the reference code so I could support that algorithm on Silverlight (and now Windows Phone, too). Again, the C algorithm translated to C# fairly easily - though there's quite a lot more code for MD5 than CRC32 - and the HashAlgorithm
base class makes it easy to reuse. Over the next few days, I made a couple of minor revisions to the CRC32
and MD5Managed
classes, but have otherwise left things alone. I've used ComputeFileHashes successfully ever since, and things seemed to be in a pretty good state.
Then one day kind reader Maurizio contacted me (from Italy!) to report a bug in my CRC32 wrapper: there was a missing variable in a loop that could lead to problems if someone passed a non-0 value as the inputOffset
parameter of TransformBlock. Fortunately, this isn't a particularly common scenario - the "primary" overload of ComputeHash doesn't do it, none of my ComputeFileHashes code does it, and most typical scenarios probably won't do it, either. That said, a bug is a bug (is a bug), and I made a note to fix it when I got a chance... And I finally had that chance last week! :)
C:\T>ComputeFileHashesCL.exe ZeroByteFile.txt C:\T\ZeroByteFile.txt 100.0% CRC32: 00000000 MD5: D41D8CD98F00B204E9800998ECF8427E SHA1: DA39A3EE5E6B4B0D3255BFEF95601890AFD80709 SHA256: E3B0C44298FC1C149AFBF4C8996FB92427AE41E4649B934CA495991B7852B855 SHA384: 38B060A751AC96384CD9327EB1B1E36A21FDB71114BE07434C0CC7BF63F6E1DA274EDEBFE76F65FBD51AD2F14898B95B SHA512: CF83E1357EEFB8BDF1542850D66D8007D620E4050B5715DC83F4A921D36CE9CE47D0D13C5D85F2B0FF8318D2877EEC2F63B931BD47417A81A538327AF927DA3E RIPEMD160: 9C1185A5C5E9FC54612808977EE8F548B2258D31
And as long as I was already messing with the ComputeFileHashes code (which meant recompiling for each platform, re-packaging, uploading, etc.), there were a few other things I decided to take care of at the same time. And it's just as well - in the process of doing so, I discovered (and fixed) a seemingly obscure bug in MD5Managed
(which I suspect has never been hit in real life). Along the way, I added the complete suite of .NET HashAlgorithm
s to each tool so you'll automatically get the results of every supported algorithm when you hash a file!
ComputeFileHashes has been a fun project and a nice demonstration of how .NET lets you run the same code across a wide variety of environments and platforms. And the comprehensive automated test framework I added this time around makes me feel better about the correctness of these two HashAlgorithm
s. I deal with hashes regularly and have found all flavors of ComputeFileHashes to be handy tools to have around - especially the Silverlight version which brings simple, lightweight, install-free hashing to nearly every machine in the world. :)
Click here or on the image below to run ComputeFileHashes in your browser with Silverlight 4:
Note: Bookmark the link above for easy access to hashing anytime, anywhere, on any machine!
Here are the major changes since last time:
-
CRC32
bug fix: HashCore was not adding the ibStart offset in its for loop. This is the issue Maurizio reported and would affect all scenarios where a non-0 value was passed foribStart
. -
MD5Managed
bug fix: MD5Update was not adding the inputIndex offset in its call to MD5Transform. This is the obscure issue found by the new test framework - in certain fairly specific circumstances (mostly around odd offsets and buffer sizes), the incorrect offset could result in an invalid hash result. -
All supported algorithms are run on each file. Initially, only CRC32, MD5, and SHA1 were supported because they were the most common at the time and because I didn't want to waste CPU cycles on obscure algorithms that were hardly ever used. But since then, some of the "obscure" algorithms have become more common (ex: SHA256) and multi-core CPUs have become much more widespread. Because the ComputeFileHashes tools are already multi-processor friendly and because quad-processor CPUs are now commonplace, I've decided not to limit them due to CPU cycles. Most files hash instantaneously anyway, so the additional algorithms won't slow things down there; for longer files where CPU might start to dominate over disk access, the additional overhead shouldn't be that big of a deal. With this release, the bias is for convenience, and I'm optimistic that's the right tradeoff most of the time. :)
Here's what each tool supports:
- ComputeFileHashesCL, ComputeFileHashesWPF: CRC32, MD5, SHA1, SHA256, SHA384, SHA512, RIPEMD160
- ComputeFileHashesSL: CRC32, MD5, SHA1, SHA256
(Note: The Silverlight platform doesn't provide SHA384/SHA512/RIPEMD160. (And I haven't done my own implementation. (Yet...)))
-
There's a comprehensive set of automated tests for
CRC32
andMD5Managed
. I'd done some basic testing of this code in the past, but hadn't covered the edge cases - and a few bugs slipped by because of that. So I wanted to create a thorough automated test suite this time around and do what I could to cover all the bases.- The automated tests have a small library of different inputs and known-good hashes and process it in lots of different ways: all at once, in all different chunk sizes, with and without Initialize, etc.. If any of these techniques generates the wrong hash, the test suite reports the failure.
- The new tests are thorough enough to yield 100% code coverage on both the
CRC32
andMD5Managed
implementations. - In addition to testing my
CRC32
andMD5Managed
classes, the automated tests also test the .NET MD5CryptoServiceProvider class - not because I expect to find errors in it, but so I can ensure both MD5 implementations behave the same in each scenario. - Consequently (and as a result of the more thorough coverage)
CRC32
andMD5Managed
now behave the same as the .NET implementations for invalid scenarios, too - all the way to exception-level compatibility for misuse of the API! - The only difference is for CryptographicUnexpectedOperationException which can't be constructed on Silverlight - its base class CryptographicException is thrown instead in cases where the hash value is retrieved before the process has been finalized.
-
The Visual Studio solution and project files have been upgraded to the Visual Studio 2010 format. This makes developing ComputeFileHashes in Visual Studio 2010 easy and enables the use of its new and improved feature set.
-
The
CRC32
/MD5Managed
classes and the three ComputeFileHashes programs are now code analysis-clean for the complete Visual Studio 2010 rule set. Additional code analysis rules were introduced with VS 2010 and they reported some new violations in the code. These have all been fixed (or suppressed where appropriate). -
The namespace of the
CRC32
andMD5Managed
classes has been changed to "Delay". This change brings these classes inline with the rest of the sample code I publish and makes their generality a bit clearer. -
ComputeFileHashesCL remains a .NET 2.0 application for maximum versatility. By targeting .NET 2.0, ComputeFileHashesCL runs nearly everywhere .NET does.
-
ComputeFileHashesWPF is now a .NET 4 application for compactness and ease of distribution. ComputeFileHashesWPF used to have a dependency on the WPFToolkit for its DataGrid control. Because that control is part of the .NET 4 Framework, the new ComputeFileHashesWPF no longer depends on any non-Framework assemblies and can be distributed as a single file.
-
ComputeFileHashesSL is now a Silverlight 4 application to make use of new features in that platform. Most notably, ComputeFileHashesSL uses Silverlight's drag+drop support to enable the handy scenario of dragging a file directly from Windows Explorer and dropping it onto the ComputeFileHashesSL window to hash it (just like ComputeFileHashesWPF already supported). Additionally, I'm making use of my SetterValueBindingHelper class to use Bindings in a Setter and create the same ToolTip experience that ComputeFileHashesWPF already had: hovering over a hash failure shows the reason for the failure (typically because the file was locked by another process). Consequently, the "Details" column is no longer necessary in ComputeFileHashesSL and has been removed.
-
The ClickOnce flavor of ComputeFileHashesWPF is no longer supported. With ComputeFileHashesSL's functionality getting closer to that of ComputeFileHashesWPF and the elimination of the
WPFToolkit.dll
dependency from ComputeFileHashesWPF, the need for a ClickOnce install seems minimal and has been removed. -
The version number has been updated to 2010-11-30.