manpage2html.bash converts man(7) pages to HTML with a keyword index (with varying success rate). It’s used to create the xz(1) man page for the XZ Utils home page.
Background
Unlike the mdoc(7) format, man(7) doesn’t encode any semantic information. However, there are common conventions when to use bold or italic/underline, which can be used to infer some meaning.
This script takes the output from GNU groff and tries to determine where keywords (like command line options) are documented. Then the references to those keywords are turned into links to the matching documentation. Additionally, a table of contents (links to each section heading) is added at the beginning of the HTML file and a keyword index is added at the end of the file.
On some man(7) (and also mdoc(7)) pages of command line tools the approach works fairly well, on others it doesn’t. It likely could be improved further but tweaking the heuristics can also introduce unwanted false matches.
The script was originally written in 2010 but it was then forgotten for years until being released here in late 2023. It was written for fun and the implementation is ugly. One shouldn’t take the approach too seriously; there are saner methods to do a comparable thing. However, the script happens to work well enough for certain files and thus it seemed worth making it available for those who are curious.
Licensing
manpage2html.bash was written by Lasse Collin. From the version 2024-01-16 onwards, manpage2html.bash is under the BSD Zero Clause License.
Download
The script needs GNU groff, GNU bash, and GNU sed.
The name of the (possibly gzipped) file to convert can be given as an argument. Otherwise it is read from the standard input. The HTML output is always written to standard output.
manpage2html.bash (2024-01-28)