Plug it in, plug it in [Sample code for two TextAnalysisTool.NET plug-ins demonstrates support for custom file types]
A few days ago, @_yabloki tweeted asking how to write a TextAnalysisTool.NET plug-in. I've answered this question a few times in email, but never blogged it before now.
To understand the basis of the question, you need to know what TextAnalysisTool.NET is; for that, I refer you to the TextAnalysisTool.NET page for an overview.
To understand the rest of the question, you need to know what a plug-in is; for that, there's the following paragraph from the documentation:
TextAnalysisTool.NET's support for plug-ins allows users to add in their own
code that understands specialized file types. Every time a file is opened,
each plug-in is given a chance to take responsibility for parsing that file.
When a plug-in takes responsibility for parsing a file, it becomes that plug-
in's job to produce a textual representation of the file for display in the
usual line display. If no plug-in supports a particular file, then it gets
opened using TextAnalysisTool.NET's default parser (which displays the file's
contents directly). One example of what a plug-in could do is read a binary
file format and produce meaningful textual output from it (e.g., if the file is
compressed or encrypted). Another plug-in might add support for the .zip
format and display a list of the files within the archive. A particularly
ambitious plug-in might translate text files from one language to another. The
possibilities are endless!
Armed with an understanding of TextAnalysisTool.NET and its support for plug-ins, we're ready to look at the interface plug-ins must implement:
namespace TextAnalysisTool.NET.Plugin
{
/// <summary>
/// Interface that all TextAnalysisTool.NET plug-ins must implement
/// </summary>
internal interface ITextAnalysisToolPlugin
{
/// <summary>
/// Gets a meaningful string describing the type of file supported by the plug-in
/// </summary>
/// <remarks>
/// Used to populate the "Files of type" combo box in the Open file dialog
/// </remarks>
/// <example>
/// "XML Files"
/// </example>
/// <returns>descriptive string</returns>
string GetFileTypeDescription();
/// <summary>
/// Gets the file type pattern describing the type(s) of file supported by the plug-in
/// </summary>
/// <remarks>
/// Used to populate the "Files of type" combo box in the Open file dialog
/// </remarks>
/// <example>
/// "*.xml"
/// </example>
/// <returns>file type pattern</returns>
string GetFileTypePattern();
/// <summary>
/// Indicates whether the plug-in is able to parse the specified file
/// </summary>
/// <param name="fileName">full path to the file</param>
/// <remarks>
/// Called whenever a file is being opened to give the plug-in a chance to handle it;
/// ideally the result can be returned based solely on the file name, but it is
/// acceptable to open, read, and close the file if necessary
/// </remarks>
/// <returns>true iff the file is supported</returns>
bool IsFileTypeSupported(string fileName);
/// <summary>
/// Returns a TextReader instance that will be used to read the specified file
/// </summary>
/// <param name="fileName">full path to the file</param>
/// <remarks>
/// The only methods that will be called (and therefore need to be implemented) are
/// TextReader.ReadLine() and IDisposable.Dispose()
/// </remarks>
/// <returns>TextReader instance</returns>
System.IO.TextReader GetReaderForFile(string fileName);
}
}
Disclaimer: I wrote TextAnalysisTool.NET many years ago as a way to learn the (then) newly-released .NET 1.0 Framework. Extensibility frameworks like MEF weren't available yet, so please forgive the omission! :)
As you can see, the plug-in interface is simple, straightforward, automatically integrates into the standard File|Open
UI, and leaves a great deal of freedom around implementation and function.
Specifically, the TextReader instance returned by GetReaderForFile
can do pretty much whatever you want.
For example:
- Simple tweaks to the input (ex: normalizing time stamps)
- Filtering of the input (ex: to remove irrelevant lines)
- Complex transformations of the input (ex: format conversions)
- Completely unrelated data (ex: input from a network socket)
There's a lot of flexibility, and maybe the open-endedness is daunting? :) To make things concrete, I've packaged two of the samples I came up with during the original plug-in definition.
TATPlugin_SampleData
Loads files named like 3.lines
and renders that many lines of sample text into the display.
Input (file name):
3.lines
Output:
1: The quick brown fox jumps over a lazy dog.
2: The quick brown fox jumps over a lazy dog.
3: The quick brown fox jumps over a lazy dog.
TATPlugin_XMLFormatter
Loads well-formed XML and pretty-prints it for easier reading.
Input:
<root><element><nested>value</nested></element><element><shallow><deep>value</deep></shallow></element></root>
Output:
<root>
<element>
<nested>value</nested>
</element>
<element>
<shallow>
<deep>value</deep>
</shallow>
</element>
</root>
The download ZIP also includes Plugin.cs
(the file defining the above interface), a few sample data files, and some trivial Build.cmd
scripts to compile everything from a Visual Studio Developer Command Prompt (or similar environment where csc.exe
and MSBuild.exe
are available).
Note: When experimenting with the samples, remember that TextAnalysisTool.NET loads its plugins from the current directory at startup. So put a copy of TextAnalysisTool.NET (and its
.config
file) alongside theDLL
outputs in the root of the samples directory and remember to re-start it if you change one of the samples. To check that plug-ins are loaded successfully, use theHelp|Installed plug-ins
menu item.Aside: Plug-ins are generally UI-less, but they don't have to be - take a look at what Tomer did with the WPPFormatter plug-in for an example.