When framework designers outsmart themselves [How to: Perform streaming HTTP uploads with .NET]

Wednesday, September 9, 2009

As part of a personal project, I had a scenario where I expected to be doing large HTTP uploads (ex: PUT) over a slow network connection. The typical user experience here is to show a progress bar, and that's exactly what I wanted to do. So I wrote some code to start the upload and then write to the resulting NetworkStream in small chunks, updating the progress bar UI after each chunk was sent. In theory (and my test harness), this approach worked perfectly; in practice, it did not...

What I saw instead was that the progress bar would quickly go from 0% to 100% - then the application would stall for a long time before completing the upload. Which is not a good user experience, I'm afraid. I'll show what I did wrong in a bit, but first let's take a step back to look at the sample application I've written for this post.

The core of the sample app is a simple HttpListener that logs a message whenever it begins an operation, reads an uploaded byte, and finishes reading a request:

// Create a simple HTTP listener
using (var listener = new HttpListener())
{
    listener.Prefixes.Add(_uri);
    listener.Start();

    // ...

    // Trivially handle each action's incoming request
    for (int i = 0; i < actions.Length; i++)
    {
        var context = listener.GetContext();
        var request = context.Request;
        Log('S', "Got " + request.HttpMethod + " request");
        using (var stream = request.InputStream)
        {
            while (-1 != stream.ReadByte())
            {
                Log('S', "Read request byte");
            }
            Log('S', "Request complete");
        }
        context.Response.Close();
    }
}

Aside: The code is straightforward, but it's important to note that HttpListener is only able to start listening when run with Administrator privileges (otherwise it throws "HttpListenerException: Access is denied"). So if you're going to try the sample yourself, please remember to run it from an elevated Visual Studio or Command Prompt instance.

With our test harness in place, let's start with the simplest possible code to upload some data:

/// <summary>
/// Test action that uses WebClient's UploadData to do the PUT.
/// </summary>
private static void PutWithWebClient()
{
    using (var client = new WebClient())
    {
        Log('C', "Start WebClient.UploadData");
        client.UploadData(_uri, "PUT", _data);
        Log('C', "End WebClient.UploadData");
    }
}

Here's the resulting output from the Client and Server pieces):

09:27:07.72 <C> Start WebClient.UploadData
09:27:07.76 <S> Got PUT request
09:27:07.76 <S> Read request byte
09:27:07.76 <S> Read request byte
09:27:07.76 <S> Read request byte
09:27:07.76 <S> Read request byte
09:27:07.76 <S> Read request byte
09:27:07.76 <S> Request complete
09:27:07.76 <C> End WebClient.UploadData

The WebClient's UploadData method offers a super-simple way of performing an upload that's a great choice when it works for your scenario. However, all the upload data must be passed as a parameter to the method call, and that's not always desirable (especially for large amounts of data like I was dealing with). Furthermore, it's all sent to the server in arbitrarily large chunks, so our attempt at frequent progress updates isn't likely to work out very well. And while there's the OnUploadProgressChanged event for getting status information about an upload, WebClient doesn't offer the granular level of control that's often nice to have.

So WebClient is a great entry-level API for uploading - but if you're looking for more control, you probably want to upgrade to HttpWebRequest:

/// <summary>
/// Test action that uses a normal HttpWebRequest to do the PUT.
/// </summary>
private static void PutWithNormalHttpWebRequest()
{
    var request = (HttpWebRequest)(WebRequest.Create(_uri));
    request.Method = "PUT";
    Log('C', "Start normal HttpWebRequest");
    using (var stream = request.GetRequestStream())
    {
        foreach (var b in _data)
        {
            Thread.Sleep(1000);
            Log('C', "Writing byte");
            stream.WriteByte(b);
        }
    }
    Log('C', "End normal HttpWebRequest");
    ((IDisposable)(request.GetResponse())).Dispose();
}

Aside from the Sleep call I've added to simulate client-side processing delays, this is quite similar to the code I wrote for my original scenario. Here's the output:

09:27:08.78 <C> Start normal HttpWebRequest
09:27:09.79 <C> Writing byte
09:27:10.81 <C> Writing byte
09:27:11.82 <C> Writing byte
09:27:12.83 <C> Writing byte
09:27:13.85 <C> Writing byte
09:27:13.85 <C> End normal HttpWebRequest
09:27:13.85 <S> Got PUT request
09:27:13.85 <S> Read request byte
09:27:13.85 <S> Read request byte
09:27:13.85 <S> Read request byte
09:27:13.85 <S> Read request byte
09:27:13.85 <S> Read request byte
09:27:13.85 <S> Request complete

Although I've foreshadowed this unsatisfactory result, maybe you can try to act a little surprised that it didn't work the way we wanted... :) The data bytes got written at 1 second intervals over the span of 5 seconds to simulate a gradual upload from the client - but on the server side it all arrived at the same time after the client was completely finished making its request. This is exactly the behavior I was seeing in my application, so it's nice that we've managed to reproduce the problem.

But what in the world is going on here? Why is that data sitting around on the client for so long?

The answer lies in the documentation for the AllowWriteStreamBuffering property (default value: True):

Remarks
When AllowWriteStreamBuffering is true, the data is buffered in memory so it is ready to be resent in the event of redirections or authentication requests.

Notes to Implementers:
Setting AllowWriteStreamBuffering to true might cause performance problems when uploading large datasets because the data buffer could use all available memory.

In trying to save me from the hassle of redirects and authentication requests, HttpWebRequest has broken my cool streaming scenario. :( Fortunately, it's easy to fix - just set the property to False, right?

Wrong; that'll get you one of these:

ProtocolViolationException: When performing a write operation with AllowWriteStreamBuffering set to false, you must either set ContentLength to a non-negative number or set SendChunked to true.

Okay, so we need to set one more property before we're done. Fortunately, the choice was easy for me - my target server didn't support chunked transfer encoding, so choosing to set ContentLength was a no-brainer. The only catch is that you need to know how much data you're going to upload before you start - but that's probably true most of the time anyway! And I think ContentLength is a better choice in general, because the average web server is more likely to support it than chunked encoding.

Making the highlighted changes below gives the streaming upload behavior we've been working toward:

/// <summary>
/// Test action that uses an unbuffered HttpWebRequest to do the PUT.
/// </summary>
private static void PutWithUnbufferedHttpWebRequest()
{
    var request = (HttpWebRequest)(WebRequest.Create(_uri));
    request.Method = "PUT";
    // Disable AllowWriteStreamBuffering allows the request bytes to send immediately
    request.AllowWriteStreamBuffering = false;
    // Doing nothing else will result in "ProtocolViolationException: When performing
    // a write operation with AllowWriteStreamBuffering set to false, you must either
    // set ContentLength to a non-negative number or set SendChunked to true.
    // The most widely supported approach is to set the ContentLength property
    request.ContentLength = _data.Length;
    Log('C', "Start unbuffered HttpWebRequest");
    using (var stream = request.GetRequestStream())
    {
        foreach (var b in _data)
        {
            Thread.Sleep(1000);
            Log('C', "Writing byte");
            stream.WriteByte(b);
        }
    }
    Log('C', "End unbuffered HttpWebRequest");
    ((IDisposable)(request.GetResponse())).Dispose();
}

Here's the proof - note how each byte gets uploaded to the server as soon as it's written:

09:27:14.86 <C> Start unbuffered HttpWebRequest
09:27:14.86 <S> Got PUT request
09:27:15.88 <C> Writing byte
09:27:15.88 <S> Read request byte
09:27:16.89 <C> Writing byte
09:27:16.89 <S> Read request byte
09:27:17.90 <C> Writing byte
09:27:17.90 <S> Read request byte
09:27:18.92 <C> Writing byte
09:27:18.92 <S> Read request byte
09:27:19.93 <C> Writing byte
09:27:19.93 <C> End unbuffered HttpWebRequest
09:27:19.93 <S> Read request byte
09:27:19.93 <S> Request complete

Like many things in life, it's easy once you know the answer! :) So if you find yourself wondering why your streaming uploads aren't so streaming, have a look at the AllowWriteStreamBuffering property and see if maybe that's the cause of your problems.

[Please click here to download a sample application demonstrating everything shown here.]

Tags: Technical