Types Aren’t Sets

(Fair disclaimer: I took a couple of undergraduate classes in programming language semantics, category theory, and such, but I by no means consider myself to be an expert on type theory. What follows are my own musings and should be taken cum grano salis.)

One intuitive interpretation of a type t is that it represents a set inhabited by some values. This seems perfectly sensible for many commonly-encountered types; to hash this out more formally, let’s say that

\llbracket t \rrbracket

is the set of values inhabiting the type t. So for some common OCaml types, for instance:

We can even extend this interpretation very naturally to (monomorphic, finite) algebraic data types:

We have

\llbracket t \rrbracket = \{ (A, ((), \textsf{true})), (A, ((), \textsf{false})), (B, \{ (\textsf{true}, ()), (\textsf{false}, ()) \}) \}

. One neat application of the above is that for any type t, we can alternatively represent the type t -> bool as the power set

\mathcal{P}(\llbracket t \rrbracket)

, since

\llbracket t \to \textsf{bool} \rrbracket = \llbracket \textsf{bool} \rrbracket^{\llbracket t \rrbracket}

, and an indicator function

f : A \to \{0,1\}

identifies a particular subset of

A

In fact, this set-theoretic “denotational” semantics of types is slightly more powerful than advertised: it can handle some rather rudimentary infinite types as well, such as the classic recursively-defined natural numbers nat.

We just have

\llbracket \textsf{nat} \rrbracket = \{ (\textsf{Zero}, ()), (\textsf{Succ}, (\textsf{Zero}, ())), (\textsf{Succ}, (\textsf{Succ}, (\textsf{Zero}, ()))), \dots \}

. Now if we want to be really careful, we do need to redefine the semantics of variant types here, because we’re not allowed to define a set in terms of itself. It would be silly to define

\llbracket \textsf{nat} \rrbracket = \llbracket \textsf{unit} \rrbracket + \llbracket \textsf{nat} \rrbracket

, since this is a circular definition! We should instead think of

\llbracket \textsf{nat} \rrbracket

as the least fixed point of the equation

N = \llbracket \textsf{unit} \rrbracket + N

. This can be done without too much difficulty as the limit of a chain of approximations:

Finally, we let

\llbracket \textsf{nat} \rrbracket = \bigcup_{k \in \mathbb{N}} \llbracket \textsf{nat} \rrbracket_k.

Notice that there’s a very obvious bijection between

\llbracket \textsf{nat} \rrbracket

and

\mathbb{N}

, the good ol’ natural numbers. See, no problems thus far!

Type Isomorphisms

Now let us recall the notion of a type isomorphism. We say that two types t and t' are isomorphic (written

t \simeq t'

), if there exist two functions f : t -> t' and f' : t' -> t such that

f \circ f' = \textsf{id}_{t'}

and

f' \circ f = \textsf{id}_{t}

. That is, f and f' are “inverses” of each other.⁴

To give an example,

\textsf{bool} \times \textsf{bool} \simeq \textsf{bool} \to \textsf{bool}

, witnessed by the following conversion functions:

There are tons of interesting isomorphisms that can be proven; for instance, it’s not too hard to show that

\textsf{unit} \simeq t \to \textsf{unit}

for any type t, which is to say that there is exactly one function of type t -> unit, namely fun _ -> ().⁵

One notable point: under the set-theoretic semantics

\llbracket \cdot \rrbracket

, there’s a natural interpretation of the isomorphism

t \simeq t'

as a bijection between

\llbracket t \rrbracket

and

\llbracket t' \rrbracket

. In particular, bijections are the way set theorists compare the sizes of sets; we should have

|\llbracket t \rrbracket| = |\llbracket t' \rrbracket|

whenever

t \simeq t'

Trouble with Cardinalities

Now that we’ve proposed a seemingly-reasonable model for thinking about types, let’s proceed to poke a hole it in. Consider the following mysterious type cantor:

Now this is not a bogus type; you can actually construct values of type cantor, like T (fun _ -> true). But it is a tad bit fishy. There is, of course, the trivial isomorphism

\textsf{cantor} \simeq \textsf{cantor} \to \textsf{bool}

, witnessed by

This is to say that there is a very natural bijection between

\llbracket \textsf{cantor} \rrbracket

and

\llbracket \textsf{cantor} \to \textsf{bool} \rrbracket

. But recall that

\llbracket \textsf{cantor} \to \textsf{bool} \rrbracket

is likewise in bijection with

\mathcal{P}(\llbracket \textsf{cantor} \rrbracket)

. This then implies that

|\llbracket \textsf{cantor} \rrbracket| = \left\lvert \llbracket \textsf{bool} \rrbracket^{\llbracket \textsf{cantor} \rrbracket} \right\rvert = | \mathcal{P}(\llbracket \textsf{cantor} \rrbracket)|,

and we have a real problem on our hands: Cantor once famously proved that no set can have cardinality equal to its own power set, in a theorem now named after him! Our seemingly-reasonable set-theoretic interpretation of types has gone astray and given us an answer that we know cannot be correct. I think the punchline here is that, counter-intuitively, sets and functions are not a good way to model the semantics of the pared-down version of OCaml that we’re considering in this post,⁶ because they can’t quite capture recursive types. In fact, mathematicians studying the categorical semantics of such languages often do so using domains and Scott-continuous functions, rather than sets and ordinary functions.

In fact, there is another (somewhat subtle) cardinality issue with the model that we’ve been discussing so far. For this, we don’t even need the tricky cantor example; consider the following type real:

This is not a recursive type, though of course it is defined in terms of a recursive type, namely nat. Yet there is still a bit of a problem here. Our set-theoretic semantics of types would tell us that we have the following bijections:

|\llbracket \textsf{real} \rrbracket| = |\mathcal{P}(\llbracket \textsf{nat} \rrbracket)| = |\mathbb{R}|,

which is perhaps what we’d naively expect. But then something weird is going on: there cannot possibly be uncountably many values of type real, because there are only countably many OCaml programs! (The set of all strings over a finite alphabet is enumerable, after all.) This is to say that the majority of mathematical functions

\llbracket \textsf{nat} \rrbracket \to \llbracket \textsf{bool} \rrbracket

are not representable as values in our programming language.

Aside: Cantor Isn’t Real, He Can’t Hurt You

As an aside, let’s leave aside the semantics for a moment and go back to type isomorphisms, which we didn’t need

\llbracket \cdot \rrbracket

to define. There’s a somewhat cute way of showing that

\textsf{cantor} \not\simeq \textsf{real}

by making use of what is, in my opinion, one of the most surprising facts about types.

Suppose for a moment that

\textsf{cantor} \simeq \textsf{real}

. Then

\textsf{real} \simeq \textsf{real} \to \textsf{bool}

(exercise for the reader). However, this cannot be true, for the reason that

\simeq

preserves the decidability of extensional equivalence (another exercise for the reader), yet:

In general, I love finding silly uses of various mathematical results; one of my favorite posts that I’ve ever written is on using Fermat’s last theorem to solve a problem in an undergraduate theory class.

Due to an obscure optimization, OCaml integers are actually 63-bit. This is so that you can store $n$ as $2n + 1$ , so as to distinguish integers from pointers (which must always be even numbers due to memory alignment requirements, at least on most modern systems).↩︎
For those unfamiliar with the term, this is sometimes called the tagged union (or the “coproduct in the category of sets,” if you want to sound fancy) $A + B = \{(0,a) \mid a \in A \} \cup \{(1,b) \mid b \in B\}$ . Of course, we can replace the labels $0$ and $1$ with anything we like, such as the constructor names of the variant in question.↩︎
Here, I used the notation $B^A = \{ f \mid f : A \to B \}$ . The “exponential” notation is somewhat suggestive.↩︎
There is a sneaky thing I did here, which is to slip in the assumption that the notion of “equality” between values makes sense. But this is not necessarily a given: what does it mean for two functions to be equal, for instance? In this case, the natural choice is to restrict ourselves to pure, total functions and to say that by equality, we really mean extensional equivalence. Note that it is not the case that equality between values is decidable on all types!↩︎
From a categorical perspective, this is to say that unit is final in the category of types.↩︎
This pared-down OCaml is very similar in spirit to PCF, which is essentially the simply-typed lambda calculus enriched with recursion.↩︎
See this guest post by Martín Escardó on Andrej Bauer’s blog for details.↩︎

Types Aren’t Sets October 29, 2023

Type Isomorphisms

Trouble with Cardinalities

Aside: Cantor Isn’t Real, He Can’t Hurt You

Comments

Submit a comment