The blog of dlaa.me

Search: markdownlint

If one is good, two must be better [markdownlint-cli2 is a new kind of command-line interface for markdownlint]

About 5 years ago, Igor Shubovych and I discussed the idea of writing a CLI for markdownlint. I wasn't ready at the time, so Igor created markdownlint-cli and it has been a tremendous help for the popularity of the library. I didn't do much with it at first, but for the past 3+ years I have been the primary contributor to the project. This CLI is the primary way that many users interact with the markdownlint library, so I think it is important to maintain it.

However, I've always felt a little bit like a guest in someone else's home and while I have added new features, there were always some things I wasn't comfortable changing. A few months ago, I decided to address this by creating my own CLI - and approaching the problem from a slightly different/unusual perspective so as not to duplicate the fine work that had already been done. My implementation is named markdownlint-cli2 and you can find it here:

markdownlint-cli2 on GitHub
markdownlint-cli2 on npm

markdownlint-cli2 has a few principles that motivate its interface and behavior:

  • Faster is better. There are three phases of execution: globbing/configuration parsing, linting of each configuration set, and summarizing results. Each of these phases takes full advantage of asynchronous function calls to execute operations concurrently and make the best use of Node.js's single-threaded architecture. Because it's inefficient to enumerate files and directories that end up being ignored by a filter, all glob patterns for input (inclusive and exclusive) are expected to be passed on the command-line so they can be used by the glob library to optimize file system access.

    How much faster does it run? Well, it depends. :) In many cases, probably only a little bit faster - all the same Markdown files need to be processed by the same library code. That said, linting is done concurrently, so slow disk scenarios offer one opportunity for speed-ups. In testing, an artificial 5 millisecond delay for every file access was completely overcome by this concurrency. In situations that play to the strengths of the new implementation - such as with many ignored files and few Markdown files (common for Node.js packages with deep node_modules) - the difference can be significant. One early user reported times exceeding 100 seconds dropped to less than 1 second.

  • Configuration should be flexible. Command line arguments are never as expressive as data or code, so all configuration for markdownlint-cli2 is specified via appropriately-named JSON, YAML, or JavaScript files. These options files can live anywhere and automatically apply to their part of the directory tree. Settings cascade and inherit, so it's easy to customize a particular scenario without repeating yourself. Other than two necessary exceptions, all options (including custom rules and parser plugins) can be set or changed in any directory being linted.

    It's unconventional for a command-line tool not to allow configuration via command-line arguments, but this model keeps the input (glob patterns) separate from the configuration (files) and allows easier sharing of settings across tools (like the markdownlint extension for VS Code). It's also good for scenarios where the user may not have the ability to alter the command line (such as GitHub's Super-Linter action).

    In addition to support for custom rules, it's possible to provide custom markdown-it plugins - for each directory if desired. This can be necessary for scenarios that involve custom rendering and make use of non-standard CommonMark syntax. By using an appropriate plugin, the custom syntax gets parsed correctly and the linting rules can work with the intended structure of the document. A common scenario is when embedding TeX math equations with the $ math $ or $$ math $$ syntax and a plugin such as markdown-it-texmath.

    Although the default output format to stderr is identical to that of markdownlint-cli (making it easy to switch between CLI's), there are lots of ways to display results and so any number of output formatters can be configured to run after linting. I've provided stock implementations for default, JSON, JUnit, and summarized results, but anyone can provide their own formatter if they want something else.

  • Dependencies should be few. As with the markdownlint library itself, package dependencies are kept to a minimum. Fewer dependencies mean less code to install, parse, audit, and maintain - which makes everything easier.

So, which CLI should you use? Well, whichever you want! If you're happy with markdownlint-cli, there's no need to change. If you're looking for a bit more flexibility or want to see if markdownlint-cli2 is faster in your scenario, give it a try. At this point, markdownlint-cli2 supports pretty much everything markdownlint-cli does, so you're free to experiment and shouldn't need to give up any features if you switch.

What does this mean for the future of the original markdownlint-cli? Nothing! It's a great tool and it's used by many projects. I will continue to update it as I release new versions of the markdownlint library. However, I expect that my own time working on new features will be focused on markdownlint-cli2 for now.

Whether you use markdownlint-cli2 or markdownlint-cli, I hope you find it useful!

Don't just complain - offer solutions! [Enabling markdownlint rules to fix the violations they report]

In October of 2017, an issue was opened in the markdownlint repository on GitHub asking for the ability to automatically fix rule violations. (Background: markdownlint is a Node.js style checker and lint tool for Markdown/CommonMark files.) I liked the idea, but had some concerns about how to implement it effectively. I had recently added the ability to fix simple violations to the vscode-markdownlint extension for VS Code based entirely on regular expressions and it was primitive, but mostly sufficient.

Such was the state of things for about two years, with 15 of the 44 linting rules having regular expression-based fixes in VS Code that usually worked. Then, in August of 2019, I overcame my reservations about the feature and added fix information as one of the things a rule can report with a linting violation. In doing so, the road was paved for an additional 9 rules to become auto-fixable. What's more, it became possible for custom rules written by others to offer fixes as well.

Implementation notes

The way a rule reports fix information for a violation is via an object that looks like this in TypeScript:

/**
 * Fix information for RuleOnErrorInfo.
 */
type RuleOnErrorFixInfo = {
    /**
     * Line number (1-based).
     */
    lineNumber?: number;
    /**
     * Column of the fix (1-based).
     */
    editColumn?: number;
    /**
     * Count of characters to delete.
     */
    deleteCount?: number;
    /**
     * Text to insert (after deleting).
     */
    insertText?: string;
};

Aside: markdownlint now includes a TypeScript declaration file for all public APIs and objects!

The "fix information" object identifies a single edit that fixes the corresponding violation. All the properties shown above are optional, but in practice there will always be 2 or 3. lineNumber defaults to the line of the corresponding violation and almost never needs to be set. editColumn points to the location in the line to edit. deleteCount says how many characters to delete (the value -1 means to delete the entire line); insertText provides the characters to add. If delete and insert are both specified, the delete is applied before the insert. This simple format is easy for callers of the markdownlint API to apply, so the structure is proxied to them pretty much as-is when returning violations.

Aside: With the current design, a violation can only include a single fixInfo object. This could be limiting, but has proven adequate for all scenarios so far.

Practical matters

Considered in isolation, a single fix is easy to reason about and apply. However, when dealing with an entire document, there can be multiple violations for a line and therefore multiple fixes with potential to overlap and conflict. The first strategy to deal with this is to make fixes simple; the change represented by a fix should alter as little as possible. The second strategy is to apply fixes in the order least likely to create conflicts - that's right-to-left on a line with detection of overlaps that may cause the application of the second fix to be skipped. Finally, overlapping edits of different kinds that don't conflict are merged into one. This process isn't especially tricky, but there are some subtleties and so there are helper methods in the markdownlint-rule-helpers package for applying a single fix (applyFix) or multiple fixes (applyFixes).

Aside: markdownlint-rule-helpers is an undocumented, unsupported collection of functions and variables that helps author rules and utilities for markdownlint. The API for this package is ad-hoc, but everything in it is used by the core library and part of the 100% test coverage that project has.

Availability

Automatic fix behavior is available in markdownlint-cli, markdownlint-cli2, and the vscode-markdownlint extension for VS Code. Both CLIs can fix multiple files at once; the VS Code extension limits fixes to the current file (and includes undo support). Fixability is also available to consumers of the library via the markdownlint-rule-helpers package mentioned earlier. Not all rules are automatically fixable - in some cases the resolution is ambiguous and needs human intervention. However, rules that offer fixes can dramatically improve the quality of a document without the user having to do any work!

Further reading

For more about markdownlint and related topics, search for "markdownlint" on this blog.

Lint-free documentation [markdownlint is a Node.js style checker and lint tool for Markdown files]

I'm a strong believer in using static analysis tools to identify problems and catch mistakes. The Node.js/io.js community has some great options for linting JavaScript code (ex: JSHint and ESLint), and I use them regularly. But code isn't the only important asset - documentation can be just as important to a project's success.

The open-source community has pretty much standardized on Markdown for documentation which is a great choice because it's easy to read, write, and understand. That said, Markdown has a syntax, so there are "right" and "wrong" ways to do things - and not all parsers handle nuances the same way (though the CommonMark effort is trying to standardize). In particular, there are constructs that can lead to missing/broken text in some parsers but which are not obviously wrong in the original Markdown.

To show what I mean, I created a Gist of common Markdown mistakes. If you're not a Markdown expert, you might learn something by comparing the source and output. :)

Aside: The Markdown parser used by GitHub is quite good - but many issues are user error and it can't (yet) read your mind.

 

You shouldn't need to be a Markdown expert to avoid silly mistakes - that's what we have computers for. When I looked around for a Node-based linter, I didn't see anything - but I did find a very nice implementation for Ruby by Mark Harrison. I don't tend to have Ruby available in my development environment, but I had an itch to scratch, so I installed it and added a couple of rules to Mark's tool for the checks I wanted. Mark kindly accepted the corresponding pull requests, and all was well.

Except that once I'd tasted of the fruit of Markdown linting, I wanted to integrate it into other workflows - many of which are exclusively Node-based. I briefly entertained the idea of creating a Node package to install Ruby then use it to install and run a Ruby gem - but that made my head hurt...

 

So I prototyped a Node version of markdownlint by porting a few rules over and then ran the idea by Mark. He was supportive (and raised some great points!), so I gradually ported the rest of the rules to JavaScript with the same numbering/naming system to make it easy for people to migrate between the two tools. Mark already had a fantastic test infrastructure and great documentation for rules, so I shamelessly reused both in the Node version. Configuration for JavaScript tools is typically JSON, so the Node version uses a slightly different format than Ruby (though both are simple/obvious). I started with a fully asynchronous API for efficiency, but ended up adding a synchronous version for scenarios where that's more convenient. I strived to achieve functional parity with the Ruby implementation (and continue to do so as Mark makes updates!), but duplicating the CLI was a non-goal (please have a look at the mdl gem if that's what you need).

If this sounds interesting, please have a look at markdownlint on GitHub. As of this writing, it supports the same set of ~40 rules that the Ruby implementation does - you can read all about them in Mark's fantastic Rules.md. markdownlint exposes a single API which can be called in an asynchronous or synchronous manner and accepts an options object to identify the files/strings to lint and the set of rules to apply. It returns a simple object that lists the items that were checked along with the line numbers for any violations. The documentation shows of all of this and includes examples of calling markdownlint from both gulp and Grunt.

 

To make sure markdownlint works well, I've integrated it into some of my own projects, including this blog which I wrote specifically to allow authoring in Markdown. That's a nice start, but it doesn't prove markdownlint can handle larger projects with significant documentation written by different people at different times. For that you'd need to integrate with a project like ESLint which has extensive documentation that's entirely Markdown-based.

So I did. :) Supporting ESLint was one of the motivating factors behind porting markdownlint to Node in the first place: I love the tool and use it in all my projects. The documentation is excellent, but every now and then I'd come across weird or broken text. After submitting a couple of pull requests with fixes, I decided adding a Markdown linter to their test script would be a better way to keep typos out of the documentation. It turns out this was on the team's radar as well, and they - especially project owner Nicholas - were very helpful and accommodating as I introduced markdownlint and tweaked things to satisfy some of the rules.

 

At this point, maybe I've convinced you markdownlint works for my own purposes and that it works for some other purposes, but it's likely you have special requirements or would like to "try before you buy". (Which seems an ironic thing to say about free software, but there's a cost to everything, so maybe it's not that unreasonable after all.) Well, I have just the thing for you:

An interactive markdownlint demo that runs in the browser!

Although browser support was not (is not!) a goal, the relevant code is all JavaScript with just one dependency (that itself offers browser support) and only two methods that need polyfills (trimLeft/trimRight). So it was actually fairly straightforward (with some help from Browserify) to create a standalone, offline-enabled web page that lets anyone use a (modern) browser to experiment with markdownlint and validate arbitrary content. To make it super easy to get started, I made some deliberate mistakes in the sample content for the demo - feel free to fix them for me. :)

 

In summary:

  • Markdown is great
  • It's easy to read and write
  • Sometimes it doesn't do what you think
  • There are tools to help
  • markdownlint is one of them
  • Get it for Ruby or Node
  • Or try it in the browser

"If you can't measure it, you can't manage it." [A brief analysis of markdownlint rule popularity]

From time to time, discussions of a markdownlint rule come up where the popularity of one of the rules is questioned. There are about 45 rules right now, so there's a lot of room for debate about whether a particular one goes too far or isn't generally applicable. By convention, all rules are enabled for linting by default, though it is easy to disable any rules that you disagree with or that don't fit with a project's approach. But until recently, I had no good way of knowing what the popularity of these rules was in practice.

If only there were an easy way to collect the configuration files for the most popular repositories in GitHub, I could do some basic analysis to get an idea what rules were used or ignored in practice. Well, that's where Google's BigQuery comes in - specifically its database of public GitHub repositories that is available for anyone to query. I developed a basic understanding of the database and came up with the following query to list the most popular repositories with a markdownlint configuration file:

SELECT files.repo_name
FROM `bigquery-public-data.github_repos.files` as files
INNER JOIN `bigquery-public-data.github_repos.sample_repos` as repos
  ON files.repo_name = repos.repo_name
WHERE files.path = ".markdownlint.json" OR files.path = ".markdownlint.yaml"
ORDER BY repos.watch_count DESC
LIMIT 100

Aside: While this resource was almost exactly what I needed, it turns out the data that's available is off by an order of magnitude, so this analysis is not perfect. However, it's an approximation anyway, so this should not be a problem. (For context, follow this Twitter thread with @JustinBeckwith.)

The query above returns about 60 repository names. The next step was to download and process the relevant configuration files and output a simple CSV file recording which repositories used which rules (based on configuration defaults, customizations, and deprecations). This was fairly easily accomplished with a bit of code I've published in the markdownlint-analyze-config repository. You can run it if you'd like, but I captured the output as of early October, 2020 in the file analyze-config.csv for convenience.

Importing that data into the Numbers app and doing some simple aggregation produced the following representation of how common each rule is across the data set:

Bar chart showing how common each  rule is

Some observations:

  • There are 8 rules that are used by every project whose data is represented here. These should be uncontroversial. Good job, rules!
  • The two rules at the bottom with less than 5% use are both deprecated because they have been replaced by more capable rules. It's interesting some projects have explicitly enabled them, but they can be safely ignored.
  • More than half of the rules are used in at least 95% of the scenarios. These seem pretty solid as well, and are probably not going to see much protest.
  • All but 4 (of the non-deprecated) rules are used in at least 80% of the scenarios. Again, pretty strong, though there is some room for discussion in the lower ranges of this category.
  • Of those 4 least-popular rules that are active, 3 are used between 70% and 80% of the time. That's not shabby, but it's clear these rules are checking for things that are less universally applicable and/or somewhat controversial.
  • The least popular (non-deprecated) rule is MD013/line-length at about 45% popularity. This is not surprising, as there are definitely good arguments for and against manually wrapping lines at an arbitrary column. This rule is already disabled by default for the VS Code markdownlint extension because it is noisy in projects that use long lines (where nearly every line could trigger a violation).

Overall, this was a very informative exercise. The data source isn't perfect, but it's a good start and I can always rerun the numbers if I get a better list of repositories. Rules seem to be disabled less often in practice than I would have guessed. This is nice to see - and a good reminder to be careful about introducing controversial rules that many people end up wanting to turn off. The next time a discussion about rule popularity comes up, I'll be sure to reference this post!

Catch common Markdown mistakes as you make them [markdownlint is a Visual Studio Code extension to lint Markdown files]

The lightweight, cross-platform Visual Studio Code editor recently gained support for extensions, third party packages that add or enhance capabilities of the tool. Of particular interest to me are linters, syntax checkers that help avoid mistakes and maintain consistency when working with a language (either code or markup). I've previously written about markdownlint, a Node.js linter for the Markdown markup language. After looking at the VS Code API, it seemed straightforward to create a markdownlint extension for Code. I did so and published markdownlint to the extension gallery where it can be installed via the command ext install markdownlint. What's nice about editor integration for a linter is that feedback is immediate and interactive: mistakes are highlighted as they're made and it's easy to click a link for information about any rule violation.

If linting Markdown is something that interests you, please try the markdownlint extension for VS Code and share your feedback!

"DRINK ME" [Why I do not include npm-shrinkwrap.json in Node.js tool packages]

I maintain a few open source projects and get asked some of the same questions from time to time. I wrote the explanation below in August of 2023 and posted it as a GitHub Gist; I am capturing it here for easier reference.

Background:


For historical purposes and possible future reference, here are my notes on why I backed out a change to use npm-shrinkwrap.json in markdownlint-cli2.

The basic problem is that npm will include platform-specific packages in npm-shrinkwrap.json. Specifically, if one generates npm-shrinkwrap.json on Mac, it may include components (like fsevents) that are only supported on Mac. Attempts to use a published package with such a npm-shrinkwrap.json on a different platform like Linux or Windows fails with EBADPLATFORM. This seems (to me, currently) like a fundamental and fatal flaw with the way npm implements npm-shrinkwrap.json. And while there are ways npm might address this problem, the current state of things seems unusably broken.

To make this concrete, the result of running rm npm-shrinkwrap.json && npm install && npm shrinkwrap for this project on macOS can be found here: https://github.com/DavidAnson/markdownlint-cli2/blob/v0.9.0/npm-shrinkwrap.json. Note that fsevents is an optional Mac-only dependency: https://github.com/DavidAnson/markdownlint-cli2/blob/66b36d1681566451da8d56dcef4bb7a193cdf302/npm-shrinkwrap.json#L1955-L1958. Including it is not wrong per se, but sets the stage for failure as reproduced via GitHub Codespaces:

@DavidAnson > /workspaces/temp (main) $ ls
@DavidAnson > /workspaces/temp (main) $ node --version
v20.5.1
@DavidAnson > /workspaces/temp (main) $ npm --version
9.8.0
@DavidAnson > /workspaces/temp (main) $ npm install markdownlint-cli2@v0.9.0
npm WARN deprecated date-format@0.0.2: 0.x is no longer supported. Please upgrade to 4.x or higher.

added 442 packages in 4s

9 packages are looking for funding
  run `npm fund` for details
@DavidAnson > /workspaces/temp (main) $ npm clean-install
npm ERR! code EBADPLATFORM
npm ERR! notsup Unsupported platform for fsevents@2.3.3: wanted {"os":"darwin"} (current: {"os":"linux"})
npm ERR! notsup Valid os:  darwin
npm ERR! notsup Actual os: linux

npm ERR! A complete log of this run can be found in: /home/codespace/.npm/_logs/2023-08-27T18_24_58_585Z-debug-0.log
@DavidAnson > /workspaces/temp (main) $

Note that the initial package install succeeded, but the subsequent attempt to use clean-install failed due to the platform mismatch. This is a basic scenario and the user is completely blocked at this point.

Because this is a second-level failure, it is not caught by most reasonable continuous integration configurations which work from the current project directory instead of installing and testing via the packed .tgz file. However, attempts to reproduce this failure in CI via .tgz were unsuccessful: https://github.com/DavidAnson/markdownlint-cli2/commit/f9bcd599b3e6dbc8d2ebc631b13e922c5d0df8c0. From what I can tell, npm install of a local .tgz file is handled differently than when that same (identical) file is installed via the package repository.

While there are some efforts to test the .tgz scenario better (for example: https://github.com/boneskull/midnight-smoker), better testing does not solve the fundamental problem that npm-shrinkwrap.json is a platform-specific file that gets used by npm in a cross-platform manner.


Unrelated, but notable: npm installs ALL package dependencies when npm-shrinkwrap.json is present - even in a context where it would normally NOT install devDependencies. Contrast the 442 packages installed above vs. the 40 when --omit=dev is used explicitly:

@DavidAnson > /workspaces/temp (main) $ npm install markdownlint-cli2@v0.9.0 --omit=dev

added 40 packages in 1s

9 packages are looking for funding
  run `npm fund` for details
@DavidAnson > /workspaces/temp (main) $

But the default behavior of a dependency install in this manner is not to include devDependencies as seen when installing a version of this package without npm-shrinkwrap.json:

@DavidAnson > /workspaces/temp (main) $ npm install markdownlint-cli2@v0.9.2

added 35 packages in 2s

7 packages are looking for funding
  run `npm fund` for details
@DavidAnson > /workspaces/temp (main) $

References:

En Provence [Some thoughts about npm package provenance - and why I have not enabled it]

Last year, the GitHub blog outlined efforts to secure the Node.js package repository by Introducing npm package provenance. They write:

In order to increase the level of trust you have in the npm packages you download from the registry you must have visibility into the process by which the source was translated into the published artifact.

This requirement is addressed by npm package provenance:

What we need is a way to draw a direct line from the npm package back to the exact source code commit from which it was derived.

The npm documentation on Generating provenance statements explains the implications:

When a package in the npm registry has established provenance, it does not guarantee the package has no malicious code. Instead, npm provenance provides a verifiable link to the package's source code and build instructions, which developers can then audit and determine whether to trust it or not.

It is important to call out that provenance does NOT:

  • Establish trustworthiness: A package published with provenance could contain code to delete personal files when installed or imported. And if a package did so, it would be inappropriate to remove provenance as that is not a statement about trust.
  • Enable reproducible builds: The process used to produce a package can (and frequently does) reference artifacts outside its own repository when creating the package. A common scenario is installing other npm packages without a lock file. Even using a lock file offers no guarantee because dependent packages could reference ephemeral content like a URL, local state, randomness, etc..
  • Avoid the need to audit package contents: Knowing which repository commit was used to generate a package offers no guarantee about what is actually in the package. So it is necessary to manually audit every file in the package if security is important and trustworthiness needs to be established.
  • Avoid the need to audit package contents for every new version: Similarly, just because one version of a package was found to be trustworthy, there is no guarantee the next version will be. Therefore, it is necessary to perform a complete audit for every update.

It's also notable that provenance DOES:

  • Require giving an npm publish token to GitHub: As outlined in the documentation on Publishing packages to the npm registry, provenance requires granting GitHub permission to publish packages on your behalf. This creates a new opportunity for integrity to be compromised (and makes GitHub an attractive target for attackers).
  • Require bypassing two-factor authentication (2FA) for npm package publish: Multi-factor authentication is widely considered a baseline security practice. As outlined in the documentation on Requiring 2FA for package publishing and settings modification, this security measure must be disabled to give GitHub permission to publish on your behalf (because GitHub does not have access to the second factor).
  • Require defining and maintaining a GitHub Actions workflow for package publish: This additional effort should not be overwhelming for a package maintainer, but it represents ongoing time and attention that does not add (direct) value for a package's users. Furthermore, this workflow must be replicated across each package.

Considering the advantages and disadvantages, it does not seem to me that introducing npm provenance offers compelling enough benefits (for package consumers or for producers) to offset the cost for a maintainer. (Especially for a maintainer like myself with multiple packages and limited free time.) For now, my intent is to continue using git tags to identify which repository commit is associated with published package versions (e.g., markdownlint tags).