The blog of dlaa.me
Plug it in, plug it in [Sample code for two TextAnalysisTool.NET plug-ins demonstrates support for custom file types]
Tuesday, April 29th 2014

A few days ago, @_yabloki tweeted asking how to write a TextAnalysisTool.NET plug-in. I've answered this question a few times in email, but never blogged it before now.

To understand the basis of the question, you need to know what TextAnalysisTool.NET is; for that, I refer you to the TextAnalysisTool.NET page for an overview.

Animated GIF showing basic TextAnalysisTool.NET functionality

To understand the rest of the question, you need to know what a plug-in is; for that, there's the following paragraph from the documentation:

TextAnalysisTool.NET's support for plug-ins allows users to add in their own
code that understands specialized file types.  Every time a file is opened,
each plug-in is given a chance to take responsibility for parsing that file.
When a plug-in takes responsibility for parsing a file, it becomes that plug-
in's job to produce a textual representation of the file for display in the
usual line display.  If no plug-in supports a particular file, then it gets
opened using TextAnalysisTool.NET's default parser (which displays the file's
contents directly).  One example of what a plug-in could do is read a binary
file format and produce meaningful textual output from it (e.g., if the file is
compressed or encrypted).  Another plug-in might add support for the .zip
format and display a list of the files within the archive.  A particularly
ambitious plug-in might translate text files from one language to another.  The
possibilities are endless!

 

Armed with an understanding of TextAnalysisTool.NET and its support for plug-ins, we're ready to look at the interface plug-ins must implement:

namespace TextAnalysisTool.NET.Plugin
{
    /// <summary>
    /// Interface that all TextAnalysisTool.NET plug-ins must implement
    /// </summary>
    internal interface ITextAnalysisToolPlugin
    {
        /// <summary>
        /// Gets a meaningful string describing the type of file supported by the plug-in
        /// </summary>
        /// <remarks>
        /// Used to populate the "Files of type" combo box in the Open file dialog
        /// </remarks>
        /// <example>
        /// "XML Files"
        /// </example>
        /// <returns>descriptive string</returns>
        string GetFileTypeDescription();

        /// <summary>
        /// Gets the file type pattern describing the type(s) of file supported by the plug-in
        /// </summary>
        /// <remarks>
        /// Used to populate the "Files of type" combo box in the Open file dialog
        /// </remarks>
        /// <example>
        /// "*.xml"
        /// </example>
        /// <returns>file type pattern</returns>
        string GetFileTypePattern();

        /// <summary>
        /// Indicates whether the plug-in is able to parse the specified file
        /// </summary>
        /// <param name="fileName">full path to the file</param>
        /// <remarks>
        /// Called whenever a file is being opened to give the plug-in a chance to handle it;
        /// ideally the result can be returned based solely on the file name, but it is
        /// acceptable to open, read, and close the file if necessary
        /// </remarks>
        /// <returns>true iff the file is supported</returns>
        bool IsFileTypeSupported(string fileName);

        /// <summary>
        /// Returns a TextReader instance that will be used to read the specified file
        /// </summary>
        /// <param name="fileName">full path to the file</param>
        /// <remarks>
        /// The only methods that will be called (and therefore need to be implemented) are
        /// TextReader.ReadLine() and IDisposable.Dispose()
        /// </remarks>
        /// <returns>TextReader instance</returns>
        System.IO.TextReader GetReaderForFile(string fileName);
    }
}

Disclaimer: I wrote TextAnalysisTool.NET many years ago as a way to learn the (then) newly-released .NET 1.0 Framework. Extensibility frameworks like MEF weren't available yet, so please forgive the omission! :)

 

As you can see, the plug-in interface is simple, straightforward, automatically integrates into the standard File|Open UI, and leaves a great deal of freedom around implementation and function. Specifically, the TextReader instance returned by GetReaderForFile can do pretty much whatever you want. For example:

  • Simple tweaks to the input (ex: normalizing time stamps)
  • Filtering of the input (ex: to remove irrelevant lines)
  • Complex transformations of the input (ex: format conversions)
  • Completely unrelated data (ex: input from a network socket)

There's a lot of flexibility, and maybe the open-endedness is daunting? :) To make things concrete, I've packaged two of the samples I came up with during the original plug-in definition.

 

TATPlugin_SampleData

Loads files named like 3.lines and renders that many lines of sample text into the display.

Input (file name):

3.lines

Output:

1: The quick brown fox jumps over a lazy dog.
2: The quick brown fox jumps over a lazy dog.
3: The quick brown fox jumps over a lazy dog.

 

TATPlugin_XMLFormatter

Loads well-formed XML and pretty-prints it for easier reading.

Input:

<root><element><nested>value</nested></element><element><shallow><deep>value</deep></shallow></element></root>

Output:

<root>
  <element>
    <nested>value</nested>
  </element>
  <element>
    <shallow>
      <deep>value</deep>
    </shallow>
  </element>
</root>

 

[Click here to download the source code and supporting files for the sample TextAnalysisTool.NET plug-ins]

The download ZIP also includes Plugin.cs (the file defining the above interface), a few sample data files, and some trivial Build.cmd scripts to compile everything from a Visual Studio Developer Command Prompt (or similar environment where csc.exe and MSBuild.exe are available).

Note: When experimenting with the samples, remember that TextAnalysisTool.NET loads its plugins from the current directory at startup. So put a copy of TextAnalysisTool.NET (and its .config file) alongside the DLL outputs in the root of the samples directory and remember to re-start it if you change one of the samples. To check that plug-ins are loaded successfully, use the Help|Installed plug-ins menu item.

Aside: Plug-ins are generally UI-less, but they don't have to be - take a look at what Tomer did with the WPPFormatter plug-in for an example.

Tags: Technical TextAnalysisTool Utilities
Comments powered by Disqus.