A couple of years ago, I had a fun idea for a “weekend” hack: writing an HTML parser in LaTeX macros. This way, one can write HTML directly inside a LaTeX document and compile as usual:
\documentclass{article}
\usepackage{hypertex}
\begin{document}
\begin{html}
<h1>Title</h1>
<p>% HTML goes here...
</p>\end{html}
\end{document}
The idea sounded at least mildly funny to me when I first thought of it, but I ended up shelving it for a while. But now that I’ve graduated and don’t have much to do this summer, I decided to revisit this and finally get around to implementing it. You can check it out on Github.
The parser itself is not super sophisticated—just a simple stack machine—and it only handles a small subset of HTML, translating tags to their TeX counterparts. Currently, it supports:
<h1>
, <h2>
, and <h3>
, via \section
, \subsection
, and \subsubsection
;<em>
, <strong>
, <tt>
, and <s>
, via \emph
, \textbf
, \texttt
, and the ulem
package;<ul>
, <ol>
, and <li>
, via the itemize
and enumerate
environments;<p>
via inserting \par
whenever we see a closing </p>
.Here’s a slightly more involved usage example:
\documentclass{article}
\usepackage{hypertex}
\begin{document}
\begin{html}
\TeX{}</h1>
<h1>Lorem ipsum in Hyper
<p>
Dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor
incididunt ut labore et dolore magna aliqua. <em>Ut enim ad minim
veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip
ex ea commodo consequat.</em>
<ol>
<li>
Duis aute irure dolor in reprehenderit in voluptate velit esse
cillum dolore eu fugiat nulla pariatur.
</li>
<li>
Excepteur sint occaecat cupidatat non proident, sunt in culpa
qui officia deserunt mollit anim id est laborum.
</li>
</ol>
Ullamcorper eget nulla facilisi etiam dignissim diam. In eu mi
bibendum neque egestas congue quisque egestas. <strong>Consequat
id porta nibh venenatis cras sed.</strong> Feugiat nisl pretium
fusce id velit. Dictumst quisque sagittis purus sit amet volutpat.
</p>
<p>
Pellentesque pulvinar pellentesque habitant morbi tristique
senectus et. <s>At varius vel pharetra vel turpis nunc eget. Velit
egestas dui id ornare arcu odio ut.</s> Proin sagittis nisl
rhoncus mattis rhoncus urna neque.
</p>\end{html}
\end{document}
How it renders is shown below. Note that this is slightly different from just using Pandoc or something similar: this works with any LaTeX compiler, so you could even do this in, say, Overleaf if you really wanted to.
The HyperTeX parser also has some degree of robustness against errors: it can detect and recover from basic errors like mismatched tags and unrecognized tag names, inserting an error message into the rendered document.
It was fun exploring the expl3
programming interface for LaTeX. (Thoughts: it’s a big improvement over traditional macro programming, even if the arcane naming convention reminds one of the old Hungarian notation.) I also think I found a bug in the soul
package while trying to abuse it to tokenize text. Anyway, the experience has taught me that, despite how good it is at typesetting mathematics, TeX is an absolutely awful programming language.
Comments