I’ve Been to the Mountaintop

Having recently graduated college, I’m in a bit of a contemplative mood. So, I’ve decided to write up a post idea that I’ve had for a while: a sampling of some of the best “aha” moments that I’ve had over college.

To use a bit of a cliché analogy, CMU is like a long, painful hike in the mountains. Most of the time, you’re just trudging along what seems like an endless trail. But every once in a while, you break through the treeline and suddenly find yourself at the summit. From this glorious vantage point, you can see where all of your previous hiking was leading, and the whole world seems to come together in one beautiful moment. Was this worth the painful hike? I guess that depends on your personality.

What follows is a sampling of what I consider my favorite “mountaintop” moments over the course of my undergrad. I think a lot of these moments arise from the interaction between concepts in different classes, when learning one thing made me think of something I’d known before in a previous light.

Spectral graph theory: eigenvalues and connectivity

In 15-252 More Great Ideas in Theoretical Computer Science, we spent our time going over a lot of really cool stuff, but probably my favorite was spectral graph theory. In brief, spectral graph theory tries to relate combinatorial properties of graphs with properties of their algebraic representations.

Here’s one very accessible example: if you’ve studied computer science before, you’re probably familiar with the adjacency matrix representation of a graph. Given some graph G=(V,E)G = (V, E), the adjacency matrix AA is the matrix indexed by (u,v)V2(u,v) \in V^2 with Au,v={1(u,v)E0(u,v)E. A_{u,v} = \begin{cases} 1 & (u,v) \in E \\ 0 & (u,v) \notin E. \end{cases} What’s nice is that this is also the representation used practically in many programs; it’s not just some abstract theoretical construct. Let’s refer to the eigenvalues of AA as λ1λ2λn\lambda_1 \ge \lambda_2 \ge \cdots \ge \lambda_n.1

I used to think of the adjacency matrix as just an implementation trick, not really related to any interesting theoretical properties of graphs. But it turns out that the eigenvalues λi\lambda_i actually tell you quite a lot about the graph! For example, here are some really cool theorems:2

  1. If GG is dd-regular (i.e. all vertices have degree dd), then GG is connected if and only if λ2<d\lambda_2 < d. (More generally, you can relate the number of eigenvalues equal to λ1\lambda_1 to the number of connected components of GG.)
  2. Let GG be connected. Then λn=λ1\lambda_n = -\lambda_1 if and only if GG is bipartite. (A version of this was one of our homework problems!)

Adjacency matrices also pop up elsewhere; for example, Markov chains can be modeled in terms of powers of the transition matrix (for random walks on graphs, this is essentially a weighted adjacency matrix), and the eigenvector with λ=1\lambda = 1 of the transition matrix gives the stationary distribution.3 I think this result is much more intuitive than the previous one about connectivity.

There’s a lot more that could be said about spectral graph theory, especially relating to expander graphs (basically sparse graphs with good connectivity), but the connection between eigenvalues and connectivity was what initially wowed me so much, because it linked two fairly elementary things in an unintuitive way.

Regular expressions: connections to abstract algebra

Regular expressions are another very practical part of computing; most programmers have probably used them at some point or another. I’m particularly fond of them because they’re a place where good theory and good implementation meet; for a classic tour of this intersection, see Russ Cox’s blog post on implementing regular expression matching using nondeterministic finite automata.

But the connections between math and regular expressions go far deeper than that. There’s a lot of beautiful algebraic structure behind regular expressions, captured by Kleene algebras. What do I mean by this? Well, in the spirit of a monad is just a monoid in the category of endofunctors,4 I’d like to offer: a Kleene algebra is just an idempotent semiring endowed with a closure operator.

To make this idea clear, let me start by giving a mathematically suggestive syntax5 for regular expressions R\mathcal{R} over Σ\Sigma and a denotational semantics :RP(Σ)\llbracket \cdot \rrbracket : \mathcal{R} \to \mathcal{P}(\Sigma^\star). (If you’ve taken a course on automata theory before, you can probably skip this part unless you want a refresher.) To keep things simple, we will restrict ourselves to the alphabet Σ={a,b}\Sigma = \{a,b\}, i.e. we will only consider strings containing aa and bb. For a regular expression rRr \in \mathcal{R}, let r\llbracket r \rrbracket denote the set of strings matched by rr. In this case, a regular expression is one of:

To give an example, the regular expression a(a|b)b* matches any string starting with aa, then any letter, then any number of bb’s, such as "aabbb", "aab", and "ab". In our syntax, this would be written as a×(a+b)×ba \times (a + b) \times b^\star.

Now why did we go about defining this unorthodox syntax? Because it draws attention to something remarkable: viewed this way, (R,+,×,0,1)(\mathcal{R}, +, \times, 0, 1) is an idempotent6 semiring, endowed with an extra unary closure7 operator \star. (At least, up to extensional equivalence—let’s agree to say rrr \simeq r' if r=r\llbracket r \rrbracket = \llbracket r' \rrbracket. To make this rigorous, one ought really to consider the quotient structure R/\mathcal{R}/\mathord\simeq, with operations extended over congruence classes in the natural way. But I think I’ve already been pedantic enough here.)

This insight neatly frames a lot of things you can do with regular expressions—such as taking derivatives,8 solving equations over R\mathcal{R}, etc.—but I’ll just end with a particularly funny, almost coincidental application. (I actually briefly went over this example in a previous post, without all this explanation.) Here’s a not-particularly-intuitive theorem of abstract algebra: given a (not necessarily commutative) ring where 1ab1-ab is invertible, (1ba)1(1-ba)^{-1} also exists, and is given by the formula (1ba)1=1+b(1ab)1a. (1-ba)^{-1} = 1 + b (1 - ab)^{-1} a. Let’s try to recast this in terms of regular expressions. In a way, we can identify (1r)1(1-r)^{-1} with rr^\star, if you’ll allow some abuse of notation: 11rn=0rnr. \frac{1}{1-r} \simeq \sum_{n=0}^{\infty} r^n \simeq r^\star. That is, we can think of rr^\star as a formal power series in the semiring of regular expressions. But then this theorem just says (b×a)1+b×(a×b)×a, (b \times a)^\star \simeq 1 + b \times (a \times b)^\star \times a, which is actually quite obvious! Put in more conventional notation, (ba)* matches the same strings as () | (b (ab)* a).

Functional programming: first-class continuations and contradictions

Now we move on to something slightly more abstract: first-class continuations and their relation to classical logic. I’ll use SML as my example language; in SML/NJ, first-class continuations are implemented in the Cont module:

type 'a cont = 'a SMLofNJ.Cont.cont
val callcc : ('a cont -> 'a) -> 'a = SMLofNJ.Cont.callcc
val throw : 'a cont -> 'a -> 'b = SMLofNJ.Cont.throw

This is similar to the Scheme call/cc, if you’re familiar with that. The idea is that a continuation represents a reified control stack, allowing for nonlocal jumps in program execution, almost like a goto in imperative programming. An explanation of how this works is beyond the scope of this blog post; if you’ve taken a functional programming class before, you might want to read these old lecture notes or Matt Might’s blog post on the topic.

One particularly beautiful thing: these continuations correspond very exactly to contradictions in classical logic,9 in the sense of the Curry–Howard correspondence.10

Now ordinarily, a value of type τ\tau is just a constructive proof of the corresponding proposition tt.11 Here’s the thing: with constructive (or intuitionistic) logic, one does not have the law of the excluded middle, i.e. a.a¬a\forall a.\, a \vee \neg a. Of course, such a thing is absolutely permissible in classical logic; Aristotle is said to be the first one to come up with this rule. But in constructive logic, we demand that everything come with a proof that tells us how to construct it. If I claim that ABA \vee B, I need to verify this claim by showing you either an AA or a BB. It’s not enough for me to wave my hands and say, “well, one of them is true!”12

But here’s the catch: if we augment our programming language with first-class continuations (where the type τcont\tau\,\textsf{cont} corresponds13 to the contradiction ¬t\neg t), then we recover the law of the excluded middle! This can be shown by exhibiting a value lem of type τ+τcont\tau + \tau\,\textsf{cont},14 representing the proposition t.t¬t\forall t.\,t \vee \neg t.

datatype ('a, 'b) sum = Left of 'a | Right of 'b

(* We wrap the ['a + 'a cont] in a thunk to circumvent the value
   restriction, but this is not essential to our example. *)
val lem : unit -> ('a, 'a cont) sum =
  fn () =>
    callcc (fn ret => Left (callcc (fn k => throw ret (Right k))))

There’s actually an amazingly intuitive interpretation of this construction:

  1. If you ask me for a τ+τcont\tau + \tau\,\textsf{cont}, I just lie to you and say I have a τcont\tau\,\textsf{cont}.
  2. Now you might never call my bluff, in which case I’m in the clear. However, if you do challenge me, the only way you can do it is by throwing a τ\tau to my continuation, in which case I backtrack to where I lied to you and change my mind, giving you back the τ\tau that you just gave me.

This accords very well with the idea of a continuation as a form of nonlocal control flow. I call this “intuitive,” but the idea actually took me several years to grasp. I first came upon the idea of continuations as a freshman in 15-150 Principles of Functional Programming. I then learned of the connection with classical logic in 98-317 Hype for Types, but it wasn’t until I took 15-312 Foundations of Programming Languages as a senior that it really clicked for me.

The idea of continuations also pops up in quite a few other places, albeit in slightly different forms:

Retrospective

This list is certainly not exhaustive, and there are many worthwhile things that I learned that aren’t so easily distilled into bloggable bites. I do find it interesting that this list somewhat captures my progression from “theory A” (algorithms and complexity) to “theory B” (logic and languages). I may revisit this topic to write a part 2 if I think of more things to write. In general, I’m becoming more and more convinced of the value of posts that summarize what I’ve learned rather than trying to present dazzling new insights that others haven’t thought of before.


  1. Notice that if GG is an undirected graph, then AA is a real, symmetric matrix, and so the spectral theorem guarantees the existence of these eigenvalues.↩︎

  2. Actually, most of these results are more cleanly stated using the Laplacian rather than the adjacency matrix, but this is less frequently used as a real graph representation in programs.↩︎

  3. I say the stationary distribution because various ergodicity theorems provide sufficient conditions for its uniqueness. In fact, such results can be extended to infinite Markov chains, with some care.↩︎

  4. For the uninitiated, this is a widely-memed quote from Mac Lane’s textbook Categories for the Working Mathematician. The original quote reads: “All told, a monad in XX is just a monoid in the category of endofunctors of XX, with product ×\times replaced by composition of endofunctors and unit set by the identity endofunctor.”↩︎

  5. For those at CMU, this is the syntax used in 15-150. Generally speaking, “weird things” done by 15-150 staff are done that way because they point to something deeper, and it’s often quite satisfying to finally realize why something was initially presented one way. Also, if you’re interested in an algebraic treatment of computation more generally, be sure to check out 15-354 Computational Discrete Math; I hear that next fall might be the last offering.↩︎

  6. We say it’s idempotent because r+r=r\llbracket r + r \rrbracket = \llbracket r \rrbracket for any regular expression rr.↩︎

  7. So-called because r\llbracket r^\star \rrbracket is the least set containing r\llbracket r \rrbracket and ε\varepsilon which is closed under string concatenation.↩︎

  8. Note that these are also referred to as “left quotients,” which confusingly can be viewed as “right actions.” Nomenclature doesn’t always make sense…↩︎

  9. In this view, the name τcont\tau\,\textsf{cont} is a bit of a pun between continuation and contradiction.↩︎

  10. This is sometimes called the propositions-as-types correspondence. There’s also a deep connection between types and categories, giving rise to the notion of a computational Trinity. In fact, I would say that the beautiful correspondence between types, propositions, and categories is the strongest argument for functional programming.↩︎

  11. One does have to be slightly careful with divergence.↩︎

  12. An example of such a proof is the following classical argument showing that “an irrational to an irrational power may be rational.” Either 22\sqrt{2}^{\sqrt{2}} is rational or it’s not. In the former case, we are done. In the latter case, note (22)2=2(\sqrt{2}^{\sqrt{2}})^{\sqrt{2}} = 2.↩︎

  13. Actually, we should be a bit careful with this. Another way to construct a “contradiction” is the type τvoid\tau \to \textsf{void}, where void\textsf{void} is the type inhabited by no values. Intuitively, a function of type τvoid\tau \to \textsf{void} would be able to produce a value of type void\textsf{void} if you ever gave it a value of type τ\tau, but there don’t exist values of type void\textsf{void}! This is slightly different from the formulation using continuations.↩︎

  14. In SML, there is a slight inconvenience known as the value restriction, which determines which expressions may be polymorphic at the top level. In particular, only syntactic values are allowed, to avoid bad interactions with mutable references. We thus actually prove the proposition t.    (t¬t)\forall t.\,\top \implies (t \vee \neg t), where \top is trivial truth and corresponds to the unit type. But this doesn’t really matter for the purposes of this post.↩︎