The HyperTeX Markup Language

A couple of years ago, I had a fun idea for a “weekend” hack: writing an HTML parser in macros. This way, one can write HTML directly inside a document and compile as usual:

\documentclass{article}
\usepackage{hypertex}

\begin{document}
  \begin{html}
    <h1>Title</h1>
    <p>
      % HTML goes here...
    </p>
  \end{html}
\end{document}

The idea sounded at least mildly funny to me when I first thought of it, but I ended up shelving it for a while. But now that I’ve graduated and don’t have much to do this summer, I decided to revisit this and finally get around to implementing it. You can check it out on Github.

The parser itself is not super sophisticated—just a simple stack machine—and it only handles a small subset of HTML, translating tags to their counterparts. Currently, it supports:

Here’s a slightly more involved usage example:

\documentclass{article}
\usepackage{hypertex}

\begin{document}
  \begin{html}
    <h1>Lorem ipsum in Hyper\TeX{}</h1>
    <p>
      Dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor
      incididunt ut labore et dolore magna aliqua. <em>Ut enim ad minim
      veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip
      ex ea commodo consequat.</em>
      <ol>
        <li>
          Duis aute irure dolor in reprehenderit in voluptate velit esse
          cillum dolore eu fugiat nulla pariatur.
        </li>
        <li>
          Excepteur sint occaecat cupidatat non proident, sunt in culpa
          qui officia deserunt mollit anim id est laborum.
        </li>
      </ol>
      Ullamcorper eget nulla facilisi etiam dignissim diam. In eu mi
      bibendum neque egestas congue quisque egestas. <strong>Consequat
      id porta nibh venenatis cras sed.</strong> Feugiat nisl pretium
      fusce id velit. Dictumst quisque sagittis purus sit amet volutpat.
    </p>
    <p>
      Pellentesque pulvinar pellentesque habitant morbi tristique
      senectus et. <s>At varius vel pharetra vel turpis nunc eget. Velit
      egestas dui id ornare arcu odio ut.</s> Proin sagittis nisl
      rhoncus mattis rhoncus urna neque.
    </p>
  \end{html}
\end{document}

How it renders is shown below. Note that this is slightly different from just using Pandoc or something similar: this works with any compiler, so you could even do this in, say, Overleaf if you really wanted to.

Sample document with placeholder text
Previous example compiled with pdflatex.

The Hyper parser also has some degree of robustness against errors: it can detect and recover from basic errors like mismatched tags and unrecognized tag names, inserting an error message into the rendered document.

It was fun exploring the expl3 programming interface for . (Thoughts: it’s a big improvement over traditional macro programming, even if the arcane naming convention reminds one of the old Hungarian notation.) I also think I found a bug in the soul package while trying to abuse it to tokenize text. Anyway, the experience has taught me that, despite how good it is at typesetting mathematics, is an absolutely awful programming language.


Comments

Submit a comment

Your comment will be held for moderation. If needed, I'll reach out to the provided email address with moderation updates. Your email will not be publicly displayed.

Note: comments are still in beta. Let me know if anything is broken!