The blog of dlaa.me

Video frame grabbing made easy [How to: Quickly capture multiple video frames with WPF]

When I transfer clips from my digital camera to my computer, I briefly check that the video transferred successfully. For images, this is usually a matter of opening a few of them - for video files I usually load them in Media Player and seek to a couple of different points. Mine is hardly a fail proof system, but it's fairly quick and it makes me feel a little safer about deleting the originals from the camera. :) One day I got to thinking that it would be pretty easy to automatically generate a set of video thumbnails instead of seeking around manually...

So I did some research into video frame capture with WPF and found that it boiled down to a few main ideas:

  1. All it takes to capture a frame of video is an instance of MediaPlayer and a call to RenderTargetBitmap
  2. Setting the ScrubbingEnabled property tells MediaPlayer to update its content dynamically when changes to the Position property are made
  3. The video updates are asynchronous, so it's necessary to delay the capture a short while after changing the Position property in order to capture the new frame
  4. Performing a capture in response to the MediaOpened event is a handy way to capture the first frame reliably
  5. However, nothing I found demonstrated a good way to capture additional frames reliably...

Because of the asynchronous nature of the video frame updates, attempting to capture multiple frames in rapid succession (ex: seek, capture, seek, capture, ...) resulted in duplicate frames every time the capture occurred too soon for the frame's content to update (which was most of the time!). This was clearly unsuitable for my scenario because I wanted my application to generate its thumbnails as quickly as possible. After a bit of experimentation and playing around, I arrived at a technique that seems to work quite reliably in practice.

I wrote a simple WPF application to show how it works - here's an image of VideoThumbnailer displaying a summary of a video of someone tubing in the snow:

VideoThumbnailer sample frame capture application

(The complete source code to VideoThumbnailer is available as an attachment to this post.)

The technique itself is fairly simple, if not entirely as straightforward as it would be if changes to the Position property were synchronous:

  1. Create a MediaPlayer instance and hook the MediaOpened and Changed events
  2. Set ScrubbingEnabled and open the media file
  3. When the MediaOpened handler fires, capture the initial frame and set Position to the first location of interest
  4. When the Changed handler fires, capture the current frame
  5. If that frame is new/different than before, seek to the next position of interest and repeat as necessary; otherwise wait for the Changed handler to fire again

There's nothing particularly clever going on here, but there are a few complicating circumstances worth mentioning:

  • The Changed event fires whenever the MediaPlayer instance gets a new frame from its underlying implementation. However, a single Position change results in multiple invocations of the Changed event (2-5 seems to be the usual range). Typically, the first Changed event doesn't actually correspond to new data, but rather the same frame as before - so simply listening for the first Changed event is not sufficient. What's most desirable is to listen for the last Changed event - but without knowing how many there will be in advance, it's difficult to know which one is the last frame...
  • Fortunately, in practice the first Changed event corresponding to different content (as determined by a pixel-by-pixel comparison of the captured frames) seems to also correspond to the final, desired content. So a very reasonable solution is to wait for the first changed content, call it "good enough", and move on by seeking to the next position in the file.
  • So now the application is quickly seeking to arbitrary positions and capturing the desired frame with a high degree of reliability. We're done, right? Well, not quite... What if a video file contains the same exact content at two different positions? (Note: This is uncommon in motion video, but VERY common at the beginning and ending of clips (ex: 100% black frame) or in generated content (ex: title frame).) The approach described so far will spend forever waiting for a new frame that will never arrive...
  • What's needed here is a separate watchdog timer that fires after a reasonable time has elapsed and aborts the infinite wait. The catch is that this timer should be long enough to give the new video content sufficient time to be made available, but also short enough that it's not onerous if it fires many times during the generation of thumbnails. I experimentally settled on 1 second for the watchdog timeout because it seems to strike a good balance between these competing goals. With the watchdog timer in place, the process of generating a video summary should never hang - and we're done!

Having implemented the technique above, we should have a fast, reliable solution for video file thumbnail generation that works with a wide variety of file types by virtue of the underlying operating system's support. And, indeed, VideoThumbnailer works fine with all the WMV, DVR-MS, MPEG, and AVI files I've tried (and probably supports some other types I didn't try). There are plenty of changes one could make to VideoThumbnailer to make it a more useful application, but I hope it serves pretty well as a concrete example of the concepts discussed here.

Happy frame grabbing!

[VideoThumbnailer.zip]

Tags: WPF