Types Aren’t Sets

(Fair disclaimer: I took a couple of undergraduate classes in programming language semantics, category theory, and such, but I by no means consider myself to be an expert on type theory. What follows are my own musings and should be taken cum grano salis.)

One intuitive interpretation of a type t is that it represents a set inhabited by some values. This seems perfectly sensible for many commonly-encountered types; to hash this out more formally, let’s say that t\llbracket t \rrbracket is the set of values inhabiting the type t. So for some common OCaml types, for instance:

We can even extend this interpretation very naturally to (monomorphic, finite) algebraic data types:

To work out an example, consider the following type t.

type t =
  | A of unit * bool
  | B of (bool -> unit)

We have t={(A,((),true)),(A,((),false)),(B,{(true,()),(false,())})}\llbracket t \rrbracket = \{ (A, ((), \textsf{true})), (A, ((), \textsf{false})), (B, \{ (\textsf{true}, ()), (\textsf{false}, ()) \}) \}. One neat application of the above is that for any type t, we can alternatively represent the type t -> bool as the power set P(t)\mathcal{P}(\llbracket t \rrbracket), since tbool=boolt\llbracket t \to \textsf{bool} \rrbracket = \llbracket \textsf{bool} \rrbracket^{\llbracket t \rrbracket}, and an indicator function f:A{0,1}f : A \to \{0,1\} identifies a particular subset of AA.

In fact, this set-theoretic “denotational” semantics of types is slightly more powerful than advertised: it can handle some rather rudimentary infinite types as well, such as the classic recursively-defined natural numbers nat.

type nat =
  | Zero of unit
  | Succ of nat

We just have nat={(Zero,()),(Succ,(Zero,())),(Succ,(Succ,(Zero,()))),}\llbracket \textsf{nat} \rrbracket = \{ (\textsf{Zero}, ()), (\textsf{Succ}, (\textsf{Zero}, ())), (\textsf{Succ}, (\textsf{Succ}, (\textsf{Zero}, ()))), \dots \}. Now if we want to be really careful, we do need to redefine the semantics of variant types here, because we’re not allowed to define a set in terms of itself. It would be silly to define nat=unit+nat\llbracket \textsf{nat} \rrbracket = \llbracket \textsf{unit} \rrbracket + \llbracket \textsf{nat} \rrbracket, since this is a circular definition! We should instead think of nat\llbracket \textsf{nat} \rrbracket as the least fixed point of the equation N=unit+NN = \llbracket \textsf{unit} \rrbracket + N. This can be done without too much difficulty as the limit of a chain of approximations:

Finally, we let nat=kNnatk. \llbracket \textsf{nat} \rrbracket = \bigcup_{k \in \mathbb{N}} \llbracket \textsf{nat} \rrbracket_k. Notice that there’s a very obvious bijection between nat\llbracket \textsf{nat} \rrbracket and N\mathbb{N}, the good ol’ natural numbers. See, no problems thus far!

Type Isomorphisms

Now let us recall the notion of a type isomorphism. We say that two types t and t' are isomorphic (written ttt \simeq t'), if there exist two functions f : t -> t' and f' : t' -> t such that ff=idtf \circ f' = \textsf{id}_{t'} and ff=idtf' \circ f = \textsf{id}_{t}. That is, f and f' are “inverses” of each other.4

To give an example, bool×boolboolbool\textsf{bool} \times \textsf{bool} \simeq \textsf{bool} \to \textsf{bool}, witnessed by the following conversion functions:

let f : (bool * bool) -> (bool -> bool) =
  fun (t, f) ->
    function | true -> t | false -> f

let f' : (bool -> bool) -> (bool * bool) =
  fun f ->
    (f true, f false)

There are tons of interesting isomorphisms that can be proven; for instance, it’s not too hard to show that unittunit\textsf{unit} \simeq t \to \textsf{unit} for any type t, which is to say that there is exactly one function of type t -> unit, namely fun _ -> ().5

One notable point: under the set-theoretic semantics \llbracket \cdot \rrbracket, there’s a natural interpretation of the isomorphism ttt \simeq t' as a bijection between t\llbracket t \rrbracket and t\llbracket t' \rrbracket. In particular, bijections are the way set theorists compare the sizes of sets; we should have t=t|\llbracket t \rrbracket| = |\llbracket t' \rrbracket| whenever ttt \simeq t'.

Trouble with Cardinalities

Now that we’ve proposed a seemingly-reasonable model for thinking about types, let’s proceed to poke a hole it in. Consider the following mysterious type cantor:

type cantor = T of cantor -> bool

Now this is not a bogus type; you can actually construct values of type cantor, like T (fun _ -> true). But it is a tad bit fishy. There is, of course, the trivial isomorphism cantorcantorbool\textsf{cantor} \simeq \textsf{cantor} \to \textsf{bool}, witnessed by

let f : cantor -> (cantor -> bool) =
  fun (T x) -> x

let f' : (cantor -> bool) -> cantor =
  fun x -> T x

This is to say that there is a very natural bijection between cantor\llbracket \textsf{cantor} \rrbracket and cantorbool\llbracket \textsf{cantor} \to \textsf{bool} \rrbracket. But recall that cantorbool\llbracket \textsf{cantor} \to \textsf{bool} \rrbracket is likewise in bijection with P(cantor)\mathcal{P}(\llbracket \textsf{cantor} \rrbracket). This then implies that cantor=boolcantor=P(cantor), |\llbracket \textsf{cantor} \rrbracket| = \left\lvert \llbracket \textsf{bool} \rrbracket^{\llbracket \textsf{cantor} \rrbracket} \right\rvert = | \mathcal{P}(\llbracket \textsf{cantor} \rrbracket)|, and we have a real problem on our hands: Cantor once famously proved that no set can have cardinality equal to its own power set, in a theorem now named after him! Our seemingly-reasonable set-theoretic interpretation of types has gone astray and given us an answer that we know cannot be correct. I think the punchline here is that, counter-intuitively, sets and functions are not a good way to model the semantics of the pared-down version of OCaml that we’re considering in this post,6 because they can’t quite capture recursive types. In fact, mathematicians studying the categorical semantics of such languages often do so using domains and Scott-continuous functions, rather than sets and ordinary functions.

In fact, there is another (somewhat subtle) cardinality issue with the model that we’ve been discussing so far. For this, we don’t even need the tricky cantor example; consider the following type real:

type real = nat -> bool

This is not a recursive type, though of course it is defined in terms of a recursive type, namely nat. Yet there is still a bit of a problem here. Our set-theoretic semantics of types would tell us that we have the following bijections: real=P(nat)=R, |\llbracket \textsf{real} \rrbracket| = |\mathcal{P}(\llbracket \textsf{nat} \rrbracket)| = |\mathbb{R}|, which is perhaps what we’d naively expect. But then something weird is going on: there cannot possibly be uncountably many values of type real, because there are only countably many OCaml programs! (The set of all strings over a finite alphabet is enumerable, after all.) This is to say that the majority of mathematical functions natbool\llbracket \textsf{nat} \rrbracket \to \llbracket \textsf{bool} \rrbracket are not representable as values in our programming language.

Aside: Cantor Isn’t Real, He Can’t Hurt You

As an aside, let’s leave aside the semantics for a moment and go back to type isomorphisms, which we didn’t need \llbracket \cdot \rrbracket to define. There’s a somewhat cute way of showing that cantor≄real\textsf{cantor} \not\simeq \textsf{real} by making use of what is, in my opinion, one of the most surprising facts about types.

Suppose for a moment that cantorreal\textsf{cantor} \simeq \textsf{real}. Then realrealbool\textsf{real} \simeq \textsf{real} \to \textsf{bool} (exercise for the reader). However, this cannot be true, for the reason that \simeq preserves the decidability of extensional equivalence (another exercise for the reader), yet:

In general, I love finding silly uses of various mathematical results; one of my favorite posts that I’ve ever written is on using Fermat’s last theorem to solve a problem in an undergraduate theory class.

  1. Due to an obscure optimization, OCaml integers are actually 63-bit. This is so that you can store nn as 2n+12n + 1, so as to distinguish integers from pointers (which must always be even numbers due to memory alignment requirements, at least on most modern systems).↩︎

  2. For those unfamiliar with the term, this is sometimes called the tagged union (or the “coproduct in the category of sets,” if you want to sound fancy) A+B={(0,a)aA}{(1,b)bB}A + B = \{(0,a) \mid a \in A \} \cup \{(1,b) \mid b \in B\}. Of course, we can replace the labels 00 and 11 with anything we like, such as the constructor names of the variant in question.↩︎

  3. Here, I used the notation BA={ff:AB}B^A = \{ f \mid f : A \to B \}. The “exponential” notation is somewhat suggestive.↩︎

  4. There is a sneaky thing I did here, which is to slip in the assumption that the notion of “equality” between values makes sense. But this is not necessarily a given: what does it mean for two functions to be equal, for instance? In this case, the natural choice is to restrict ourselves to pure, total functions and to say that by equality, we really mean extensional equivalence. Note that it is not the case that equality between values is decidable on all types!↩︎

  5. From a categorical perspective, this is to say that unit is final in the category of types.↩︎

  6. This pared-down OCaml is very similar in spirit to PCF, which is essentially the simply-typed lambda calculus enriched with recursion.↩︎

  7. See this guest post by Martín Escardó on Andrej Bauer’s blog for details.↩︎