The blog of dlaa.me

Posts from May 2011

Safe X (ml parsing with XLINQ) [XLinqExtensions helps make XML parsing with .NET's XLINQ a bit safer and easier]

XLINQ (aka LINQ-to-XML) is a set of classes that make it simple to work with XML by exposing the element tree in a way that's easy to manipulate using standard LINQ queries. So, for example, it's trivial to write code to select specific nodes for reading, create well-formed XML fragments, or transform an entire document. Because of its query-oriented nature, XLINQ makes it easy to ignore parts of a document that aren't relevant: if you don't query for them, they don't show up! Because it's so handy and powerful, I encourage folks who aren't already familiar to find out more.

Aside: As usual, flexibility comes with a cost and it is often more efficient to read and write XML with the underlying XmlReader and XmlWriter classes because they don't expose the same high-level abstractions. However, I'll suggest that the extra productivity of developing with XLINQ will often outweigh the minor computational cost it incurs.

 

When I wrote the world's simplest RSS reader as a sample for my post on WebBrowserExtensions, I needed some code to parse the RSS feed for my blog and dashed off the simplest thing possible using XLINQ. Here's a simplified version of that RSS feed for reference:

<rss version="2.0">
  <channel>
    <title>Delay's Blog</title>
    <item>
      <title>First Post</title>
      <pubDate>Sat, 21 May 2011 13:00:00 GMT</pubDate>
      <description>Post description.</description>
    </item>
    <item>
      <title>Another Post</title>
      <pubDate>Sun, 22 May 2011 14:00:00 GMT</pubDate>
      <description>Another post description.</description>
    </item>
  </channel>
</rss>

The code I wrote at the time looked a lot like the following:

private static void NoChecking(XElement feedRoot)
{
    var version = feedRoot.Attribute("version").Value;
    var title = feedRoot.Element("channel").Element("title").Value;
    ShowFeed(version, title);
    foreach (var item in feedRoot.Element("channel").Elements("item"))
    {
        title = item.Element("title").Value;
        var publishDate = DateTime.Parse(item.Element("pubDate").Value);
        var description = item.Element("description").Value;
        ShowItem(title, publishDate, description);
    }
}

Not surprisingly, running it on the XML above leads to the following output:

Delay's Blog (RSS 2.0)
  First Post
    Date: 5/21/2011
    Characters: 17
  Another Post
    Date: 5/22/2011
    Characters: 25

 

That code is simple, easy to read, and obvious in its intent. However (as is typical for sample code tangential to the topic of interest), there's no error checking or handling of malformed data. If anything within the feed changes, it's quite likely the code I show above will throw an exception (for example: because the result of the Element method is null when the named element can't be found). And although I don't expect changes to the format of this RSS feed, I'd be wary of shipping code like that because it's so fragile.

Aside: Safely parsing external data is a challenging task; many exploits take advantage of parsing errors to corrupt a process's state. In the discussion here, I'm focusing mainly on "safety" in the sense of "resiliency": the ability of code to continue to work (or at least not throw an exception) despite changes to the format of the data it's dealing with. Naturally, more resilient parsing code is likely to be less vulnerable to hacking, too - but I'm not specifically concerned with making code hack-proof here.

 

Adding the necessary error-checking to get the above snippet into shape for real-world use isn't particularly hard - but it does add a lot more code. Consequently, readability suffers; although the following method performs exactly the same task, its implementation is decidedly harder to follow than the original:

private static void Checking(XElement feedRoot)
{
    var version = "";
    var versionAttribute = feedRoot.Attribute("version");
    if (null != versionAttribute)
    {
        version = versionAttribute.Value;
    }
    var channelElement = feedRoot.Element("channel");
    if (null != channelElement)
    {
        var title = "";
        var titleElement = channelElement.Element("title");
        if (null != titleElement)
        {
            title = titleElement.Value;
        }
        ShowFeed(version, title);
        foreach (var item in channelElement.Elements("item"))
        {
            title = "";
            titleElement = item.Element("title");
            if (null != titleElement)
            {
                title = titleElement.Value;
            }
            var publishDate = DateTime.MinValue;
            var pubDateElement = item.Element("pubDate");
            if (null != pubDateElement)
            {
                if (!DateTime.TryParse(pubDateElement.Value, out publishDate))
                {
                    publishDate = DateTime.MinValue;
                }
            }
            var description = "";
            var descriptionElement = item.Element("description");
            if (null != descriptionElement)
            {
                description = descriptionElement.Value;
            }
            ShowItem(title, publishDate, description);
        }
    }
}

 

It would be nice if we could somehow combine the two approaches to arrive at something that reads easily while also handling malformed content gracefully... And that's what the XLinqExtensions extension methods are all about!

Using the naming convention SafeGet* where "*" can be Element, Attribute, StringValue, or DateTimeValue, these methods are simple wrappers that avoid problems by always returning a valid object - even if they have to create an empty one themselves. In this manner, calls that are expected to return an XElement always do; calls that are expected to return a DateTime always do (with a user-provided fallback value for scenarios where the underlying string doesn't parse successfully). To be clear, there's no magic here - all the code is very simple - but by pushing error handling into the accessor methods, the overall experience feels much nicer.

To see what I mean, here's what the same code looks like after it has been changed to use XLinqExtensions - note how similar it looks to the original implementation that used the simple "write it the obvious way" approach:

private static void Safe(XElement feedRoot)
{
    var version = feedRoot.SafeGetAttribute("version").SafeGetStringValue();
    var title = feedRoot.SafeGetElement("channel").SafeGetElement("title").SafeGetStringValue();
    ShowFeed(version, title);
    foreach (var item in feedRoot.SafeGetElement("channel").Elements("item"))
    {
        title = item.SafeGetElement("title").SafeGetStringValue();
        var publishDate = item.SafeGetElement("pubDate").SafeGetDateTimeValue(DateTime.MinValue);
        var description = item.SafeGetElement("description").SafeGetStringValue();
        ShowItem(title, publishDate, description);
    }
}

Not only is the XLinqExtensions version almost as easy to read as the simple approach, it has all the resiliancy benefits of the complex one! What's not to like?? :)

 

[Click here to download the XLinqExtensions sample application containing everything shown here.]

 

I've found the XLinqExtensions approach helpful in my own projects because it enables me to parse XML with ease and peace of mind. The example I've provided here only scratches the surface of what's possible (ex: SafeGetIntegerValue, SafeGetUriValue, etc.), and is intended to set the stage for others to adopt a more robust approach to XML parsing. So if you find yourself parsing XML, please consider something similar!

 

PS - The complete set of XLinqExtensions methods I use in the sample is provided below. Implementation of additional methods to suit custom scenarios is left as an exercise to the reader. :)

/// <summary>
/// Class that exposes a variety of extension methods to make parsing XML with XLINQ easier and safer.
/// </summary>
static class XLinqExtensions
{
    /// <summary>
    /// Gets the named XElement child of the specified XElement.
    /// </summary>
    /// <param name="element">Specified element.</param>
    /// <param name="name">Name of the child.</param>
    /// <returns>XElement instance.</returns>
    public static XElement SafeGetElement(this XElement element, XName name)
    {
        Debug.Assert(null != element);
        Debug.Assert(null != name);
        return element.Element(name) ?? new XElement(name, "");
    }

    /// <summary>
    /// Gets the named XAttribute of the specified XElement.
    /// </summary>
    /// <param name="element">Specified element.</param>
    /// <param name="name">Name of the attribute.</param>
    /// <returns>XAttribute instance.</returns>
    public static XAttribute SafeGetAttribute(this XElement element, XName name)
    {
        Debug.Assert(null != element);
        Debug.Assert(null != name);
        return element.Attribute(name) ?? new XAttribute(name, "");
    }

    /// <summary>
    /// Gets the string value of the specified XElement.
    /// </summary>
    /// <param name="element">Specified element.</param>
    /// <returns>String value.</returns>
    public static string SafeGetStringValue(this XElement element)
    {
        Debug.Assert(null != element);
        return element.Value;
    }

    /// <summary>
    /// Gets the string value of the specified XAttribute.
    /// </summary>
    /// <param name="attribute">Specified attribute.</param>
    /// <returns>String value.</returns>
    public static string SafeGetStringValue(this XAttribute attribute)
    {
        Debug.Assert(null != attribute);
        return attribute.Value;
    }

    /// <summary>
    /// Gets the DateTime value of the specified XElement, falling back to a provided value in case of failure.
    /// </summary>
    /// <param name="element">Specified element.</param>
    /// <param name="fallback">Fallback value.</param>
    /// <returns>DateTime value.</returns>
    public static DateTime SafeGetDateTimeValue(this XElement element, DateTime fallback)
    {
        Debug.Assert(null != element);
        DateTime value;
        if (!DateTime.TryParse(element.Value, out value))
        {
            value = fallback;
        }
        return value;
    }
}
Tags: Technical

"Sort" of a follow-up post [IListExtensions class enables easy sorting of .NET list types; today's updates make some scenarios faster or more convenient]

Recently, I wrote a post about the IListExtensions collection of extension methods I created to make it easy to maintain a sorted list based on any IList(T) implementation without needing to create a special subclass. In that post, I explained why I implemented IListExtensions the way I did and outlined some of the benefits for scenarios like using ObservableCollection(T) for dynamic updates on Silverlight, WPF, and Windows Phone where the underlying class doesn't intrinsically support sorting. A couple of readers followed up with some good questions and clarifications which I'd encourage having a look for additional context.

 

During the time I've been using IListExtensions in a project of my own, I have noticed two patterns that prompted today's update:

  1. It's easy to get performant set-like behavior from a sorted list. Recall that a set is simply a collection in which a particular item appears either 0 or 1 times (i.e., there are no duplicates in the collection). While this invariant can be easily maintained with any sorted list by performing a remove before each add (recall that ICollection(T).Remove (and therefore IListExtensions.RemoveSorted) doesn't throw if an element is not present), it also means there are two searches of the list every time an item is added: one for the call to RemoveSorted and another for the call to AddSorted. While it's possible to be a bit more clever and avoid the extra search sometimes, the API doesn't let you to "remember" the right index between calls to *Sorted methods, so you can't get rid of the redundant search every time.

    Therefore, I created the AddOrReplaceSorted method which has the same signature as AddSorted (and therefore ICollection(T).Add) and implements the set-like behavior of ensuring there is at most one instance of a particular item (i.e., the IComparable(T) search key) present in the collection at any time. Because this one method does everything, it only ever needs to perform a single search of the list and can help save a few CPU cycles in relevant scenarios.

  2. It's convenient to be able to call RemoveSorted/IndexOfSorted/ContainsSorted with an instance of the search key. Recall from the original post that IListExtensions requires items in the list to implement the IComparable(T) interface in order to define their sort order. This is fine most of the time, but can require a bit of extra overhead in situations where the items' sort order depends on only some (or commonly just one) of their properties.

    For example, note that the sort order the Person class below depends only on the Name property:

    class Person : IComparable<Person>
    {
        public string Name { get; set; }
        public string Details { get; set; }
    
        public int CompareTo(Person other)
        {
            return Name.CompareTo(other.Name);
        }
    }

    In this case, using ContainsSorted on a List(Person) to search for a particular name would require the creation of a fake Person instance to pass as the parameter to ContainsSorted in order to match the type of the underlying collection. This isn't usually a big deal (though it can be if the class doesn't have a public constructor!), but it complicates the code and seems like it ought to be unnecessary.

    Therefore, I've added new versions of RemoveSorted/IndexOfSorted/ContainsSorted that take a key parameter and a keySelector Func(T, K). The selector is passed an item from the list and needs to return that item's sort key (the thing that its IComparable(T).CompareTo operates on). Not surprisingly, the underlying type of the keys must implement IComparable(T); keys are then compared directly (instead of indirectly via the containing items). In this way, it's possible to look up (or remove) a Person in a List(Person) by passing only the person's name and not having to bother with the temporary Person object at all!

 

In addition to the code changes discussed above, I've updated the automated test project that comes with IListExtensions to cover all the new scenarios. Conveniently, the new implementation of AddOrReplaceSorted is nearly identical to that of AddSorted and can be easily validated with SortedSet(T). Similarly, the three new key-based methods have all been implemented as variations of the pre-existing methods and those have been modified to call directly into the new methods. Aside from a bit of clear, deliberate redundancy for AddOrReplaceSorted, there's hardly any more code in this release than there was in the previous one - yet refactoring the implementation slightly enabled some handy new scenarios!

 

[Click here to download the IListExtensions implementation and its complete unit test project.]

 

Proper sorting libraries offer a wide variety of ways to sort, compare, and work with sorted lists. IListExtensions is not a proper sorting library - nor does it aspire to be one. :) Rather, it's a small collection of handy methods that make it easy to incorporate sorting into some common Silverlight, WPF, and Windows Phone scenarios. Sometimes you're forced to use a collection (like ObservableCollection(T)) that doesn't do everything you want - but if all you're missing is basic sorting functionality, then IListExtensions just might be the answer!

When you live on the bleeding edge, be prepared for a few nicks [Minor update to the Delay.Web.Helpers ASP.NET assembly download to avoid a NuGet packaging bug affecting Razor installs]

I had a surprise last week when coworker Bilal Aslam mentioned he was using my Delay.Web.Helpers assembly but wasn't able to install the 1.1.0 version with the ASP.NET Razor "_Admin" control panel. (Fortunately, the previous version (1.0.0) did install and had the functionality he needed, so he was using that one for the time being.) I quickly told Bilal he was crazy because I remembered testing with the NuGet plugin for Visual Studio and knew it installed successfully. At which point he demonstrated the problem for me - and I was forced to admit defeat. :)

Aside: Delay.Web.Helpers is a collection of ASP.NET web helpers that provide access to Amazon Simple Storage Service (S3) buckets and blobs as well as easy ways to create "data URIs". (And eventually more stuff as I get time to add it...)

Naturally, the first thing I did was to repeat my previous testing in Visual Studio - and it worked fine just like I remembered. So I tried with the Razor administration interface and it failed just like Bilal showed me: "System.InvalidOperationException: The 'schemaVersion' attribute is not declared.". Because the previous version (1.0.0) didn't have this problem, I was a little confused; I'd built everything from the same .nuspec file, so it wasn't clear why the Razor/1.1.0 scenario would be uniquely broken.

At that point, I contacted a couple folks on the NuGet team and got a quick answer: for some (short) period of time, the official version of nuget.exe created packages with a schemaVersion attribute on the package/metadata element of the embedded .nuspec file and the presence of this attribute causes the Razor install implementation to fail with the exception we were seeing. I'd created 1.0.0 with a good version of nuget.exe, but apparantly created 1.1.0 with the broken version. :(

The team's recommendedation was to re-create my packages with the current nuget.exe and re-deploy them to the NuGet servers. I did that and the result is version 1.1.1 of the Delay.Web.Helpers package and its associated Delay.Web.Helpers.SampleWebSite package. "Once bitten, twice shy", so I verified the install in both Visual Studio and Razor now that I know they're different and can fail independently.

Aside: There are no changes to the Delay.Web.Helpers assembly or samples in this release. The only changes are the necessary tweaks to the NuGet metadata for both packages to install successfully under Razor. Therefore, if you've already installed 1.1.0 successfully, there's no need to upgrade.
Further aside: The standalone ZIP file with the assembly, source code, automated tests, and sample web site is unaffected by this update.

 

To sum things up, if you created a NuGet package sometime around early April and you expect it to be installable with the Razor administration panel, I'd highly recommend trying it out to be sure! :)

Something "sort" of handy... [IListExtensions adds easy sorting to .NET list types - enabling faster search and removal, too!]

If you want to display a dynamically changing collection of items in WPF, Silverlight, or Windows Phone, there are a lot of collection classes to pick from - but there's really just one good choice: ObservableCollection(T). Although nearly all the IList(T)/ICollection(T)/IEnumerable(T) implementations work well for static data, dynamic data only updates automatically when it's in a collection that implements INotifyCollectionChanged. And while it's possible to write your own INotifyCollectionChanged code, doing a good job takes a fair amount of work. Fortunately, ObservableCollection(T) does nearly everything you'd want and is a great choice nearly all of the time.

Unless you want your data sorted...

By design, ObservableCollection(T) doesn't sort data - that's left to the CollectionView class which is the officially recommended way to sort lists for display (for more details, please refer to the Data Binding Overview's "Binding to Collections" section). The way CollectionView works is to add an additional layer of indirection on top of your list. That gets sorted and the underlying collection isn't modified at all. This is a fine, flexible design (it enables a variety of other scenarios like filtering, grouping, and multiple views), but sometimes it'd be easier if the actual collection were sorted and the extra layer wasn't present (in addition to imposing a bit of overhead, working with CollectionView requires additional code to account for the indirection).

 

So it would be nice if there were a handy way to sort an ObservableCollection(T) - something like the List(T).Sort method. Unfortunately, ObservableCollection(T) doesn't derive from List(T), so it doesn't have that method... Besides, it'd be better if adding items to the list put them in the right place to begin with - instead of adding them to the wrong place and then re-sorting the entire list after the fact. Along the same lines, scenarios that could take advantage of sorting for faster look-ups would benefit from something like List(T).BinarySearch - which also doesn't exist on ObservableCollection(T).

All we really need to do here is provide custom implementations of add/remove/contains/index-of for ObservableCollection(T) and we'd have the best of both worlds. One way of doing that is to subclass - but that ties the code to a specific base class and limits its usefulness somewhat (just like Sort and BinarySearch for List(T) above). What we can do instead is implement these helper methods in a standalone class and enable them to target the least common denominator, IList(T), and therefore apply in a variety of scenarios (i.e., all classes that implement that interface). What's more, these helpers can be trivially written as extension methods so they'll look just like APIs on the underlying classes!

 

This sounds promising - let's see how it might work by considering the complete IList(T) interface hierarchy:

public interface IList<T> : ICollection<T>, IEnumerable<T>, IEnumerable
{
    T this[int index] { get; set; }         // Good as-is
    int IndexOf(T item);                    // Okay as-is; could be faster if sorted
    void Insert(int index, T item);         // Should NOT be used with a sorted collection (might un-sort it)
    void RemoveAt(int index);               // Good as-is
}
public interface ICollection<T> : IEnumerable<T>, IEnumerable
{
    int Count { get; }                      // Good as-is
    bool IsReadOnly { get; }                // Good as-is
    void Add(T item);                       // Needs custom implementation that preserves sort order
    void Clear();                           // Good as-is
    bool Contains(T item);                  // Okay as-is; could be faster if sorted
    void CopyTo(T[] array, int arrayIndex); // Good as-is
    bool Remove(T item);                    // Okay as-is; could be faster if sorted
}
public interface IEnumerable<T> : IEnumerable
{
    IEnumerator<T> GetEnumerator();         // Good as-is
}
public interface IEnumerable
{
    IEnumerator GetEnumerator();            // Good as-is
}

To create a sorted IList(T), there's only one method that needs to be written (add) and three others that should be written to take advantage of the sorted collection for better performance (remove, contains, and index-of). (Aside: If you know a list is sorted, finding the right location changes from an O(n) problem to an O(log n) problem. Read more about "big O" notation here.) The only additional requirement we'll impose is that the elements of the collection must have a natural order. One way this is commonly done is by implementing the IComparable(T) interface on the item class. Basic .NET types already do this, as do other classes in the framework (ex: DateTime, Tuple, etc.). Because this interface has just one method, it's easy to add - and can often be implemented in terms of IComparable(T) for its constituent parts!

 

So here's what the IListExtensions class I've created looks like:

static class IListExtensions
{
    public static void AddSorted<T>(this IList<T> list, T item) where T : IComparable<T> { ... }
    public static bool RemoveSorted<T>(this IList<T> list, T item) where T : IComparable<T> { ... }
    public static int IndexOfSorted<T>(this IList<T> list, T item) where T : IComparable<T> { ... }
    public static bool ContainsSorted<T>(this IList<T> list, T item) where T : IComparable<T> { ... }
}

You can use it to create and manage a sorted ObservableCollection(T) simply by adding "Sorted" to the code you already have!

 

[Click here to download the IListExtensions implementation and its complete unit test project.]

 

One downside to the extension method approach is that the existing List(T) methods remain visible and can be called by code that doesn't know to use the *Sorted versions instead. For Contains, IndexOf, and Remove, this is inefficient, but will still yield the correct answer - but for Add and Insert it's a bug because these two methods are likely to ruin the sorted nature of the list when used without care. Once a list becomes unsorted, the *Sorted methods will return incorrect results because they optimize searches based on the assumption that the list is correctly sorted. Subclassing would be the obvious "solution" to this problem, but it's not a good option here because the original methods aren't virtual on ObservableCollection(T)...

I'm not aware of a good way to make things foolproof without giving up on the nice generality benefits of the current approach, so this seems like one of those times where you just need to be careful about what you're doing. Fortunately, most programs probably only call the relevant methods a couple of times, so it's pretty easy to visit all the call sites and change them to use the corresponding *Sorted method instead. [Trust me, I've done this myself. :) ]

Aside: There's a subtle ambiguity regarding what to do if the collection contains duplicate items (i.e., multiple items that sort to the same location). It doesn't seem like it will matter most of the time, so IListExtensions takes the performant way out and returns the first correct answer it finds. It's important to note this is not necessarily the first of a group of duplicate items, nor the last of them - nor will it always be the same one of them! Basically, if the items' IComparable(T) implementation says two items are equivalent, then IListExtensions assumes they are and that they're equally valid answers. If the distinction matters in your scenario, please feel free to tweak this code and take the corresponding performance hit. :) (Alternatively, if the items' IComparable(T) implementation can be modified to distinguish between otherwise "identical" items, the underlying ambiguity will be resolved and things will be deterministic again.)

 

It's usually best to leverage platform support for something when it's available, so please look to CollectionView for your sorting needs in WPF, Silverlight, and Windows Phone applications. But if you end up in a situation where it'd be better to maintain a sorted list yourself, maybe IListExtensions is just what you need!