Skip to content

hasochistic containers (a first attempt)

June 6, 2015

There was a bit of chat on Twitter about polynomials and strictly positive functors, which provoked me to think a bit about how much of the theory of containers (in the sense of Abbott, Altenkirch and Ghani) I could cook up in modern day Haskell. It turns out that we can make a plausible stab at the basics, but the wheel falls off when we try to get to more advanced things.

What is a container?

Informally, a container is a functor with a “shapes and positions” presentation: the contained values are given as the image of a function from positions, but the type of positions depends on the choice of the shape. Finite lists of things, for example, can be seen as functions from an n-element set to things, once you’ve chosen the shape n, otherwise known as the length of the list. If a functor fis a container, then its shape set will be isomorphic to f (), or what you get when you choose boring elements that just mark their position. It’s the dependency of the position set on the shape that makes the concept a little tricky to express in Haskell, but if we turn on {-# LANGUAGE KitchenSink #-}, we can get some way, at least.

I define a datatype whose only purpose is to pack up the type-level components of a container.

data (<|) (s :: i -> *) (p :: i -> *) = Dull

where the existential i is the type-level version of shapes, implicitly chosen in each value. Now, s gives the value-level presentation of the type-level shape: it could be a singleton type, giving no more or less information than the choice of i, but it’s ok if some type-level shapes are not represented, or if shapes contain some extra information upon which positions do not depend. What’s important is that indexing enforces compatibility between the shape choice (made by the producer of the container) and the position choices (made by the consumer of the container when elements get projected) in p.

It’s a bit of a hack. I’d like to pack up those pieces as a kind of containers, but I can’t do it yet, because new kinds without promotion hasn’t happened yet. I’ll have to work with types of kind * which happen to be given by <|, saying what components specify a container. Let us now say which container is thus specified.

data family Con (c :: *) :: * -> *
data instance Con (s <| p) x = forall i. s i :<: (p i -> x)

It’s not hard to see why these things are functors. If the container’s element-projector gives one sort of thing, you can make them another sort of thing by postcomposing a one-to-another function.

instance Functor (Con (s <| p)) where
  fmap h (s :<: e) = s :<: (h . e)

Given that fmap acts by composition, it’s easy to see that it respects identity and composition.

Pause a moment and think what Con (s <| p) is giving you. Informally, we get ∃i.(s i)*x(p i), writing the GADT’s lurking existential explicitly and writing the function type in exponential notation. Reading ∃ as summation, shapes as coefficients and positions as exponents, we see that containers are just power series, generalized to sum over some kind i of type-level things. Polynomials are just boring power series.

Nat, Fin and the ListC container

Let’s just make sure of the list example. We’ll need natural numbers and their singletons to make the shapes…

data Nat = Z | S Nat
data Natty :: Nat -> * where
  Zy :: Natty Z
  Sy :: Natty n -> Natty (S n)

…and the finite set family to make the positions.

data Fin :: Nat -> * where
  Fz :: Fin (S n)
  Fs :: Fin n -> Fin (S n)

The idea is that Fin n is a type with n values. A function in Fin n -> x is like an n-tuple of x-values. Think of it as a flat way of writing the exponential xn We have an empty tuple

void :: Fin Z -> x
void z = z `seq` error "so sue me!"

and a way to grow tuples

($:) :: x -> (Fin n -> x) -> Fin (S n) -> x
(x $: xs) Fz      = x
(x $: xs) (Fs n)  = xs n

And now we’re ready to go with our list container.

type ListC = Natty <| Fin

Let’s show how to get beween these lists and traditional lists. When I’m working at the functor level, I like to be explicit about constructing natural transformations.

type f :-> g = forall x. f x -> g x

Now we can define, recursively,

listIsoOut :: Con ListC :-> []
listIsoOut (Zy   :<: _) = []
listIsoOut (Sy n :<: e) = e Fz : listIsoOut (n :<: \ i -> (e . Fs))

If the length is zero, the list must be empty. Otherwise, separate the element in position 0 from the function which gives all the elements in positive positions. To go the other way, give a fold which makes use of our functions-as-tuples kit.

listIsoIn :: [] :-> Con ListC
listIsoIn = foldr cons nil where
  nil               = Zy   :<: void
  cons x (n :<: e)  = Sy n :<: (x $: e)

Container Morphisms

A polymorphic function between containers has to work with an arbitrary element type, so there’s nowhere the output container can get its elements from except the input container. What can such a function do? Firstly, it can look at the input shape in order to choose the output shape; secondly, it should say where in the input container each output position will find its element. We obtain a representation of these polymorphic functions in terms of shapes and positions, without trading in elements at all.

data family Morph (f :: *) (g :: *) :: *
data instance Morph (s <| p) (s' <| p')
  = Morph (forall i. s i -> Con (s' <| p') (p i))

That is, each input shape maps to an output container whose elements are input positions, like a kind of plan for how to build some output given some input. To deploy such a morphism, we need only map input positions to input elements.

($<$) :: Morph (s <| p) (s' <| p') ->
         Con (s <| p) :-> Con (s' <| p')
Morph m $<$ (s :<: e) = fmap e (m s)

The representation theorem for container morphisms asserts that the polymorphic functions between containers are given exactly by the container morphisms. That is, the above has an inverse.

morph :: (Con (s <| p) :-> Con (s' <| p')) -> Morph (s <| p) (s' <| p')
morph f = Morph $ \ s -> f (s :<: id)

Note that if s :: s i, then s :<: id :: (s <| p) (p i) is the container storing in every position exactly that position. You can check…

  morph f $<$ (s :<: e)
= {- definition -}
  fmap e (Morph (\ s -> f (s :<: id)) $>$ s)
= {- definition -}
  fmap e (f (s :<: id))
= {- naturality -}
  f (fmap e (s :<: id))
= {- definition -}
  f (s :<: (e . id))
= {- right identity -}
  f (s :<: e)

…and…

  morph (Morph m $<$)
= {- definition -}
  Morph $ \ s -> Morph m $<$ (s :<: id)
= {- definition -}
  Morph $ \ s -> fmap id (m s)
= {- functor preserves identity -}
  Morph $ \ s -> m s
= {- eta contraction -}
  Morph m

…or you can deploy the Yoneda lemma.

  (s <| p) :-> (s' <| p')
= {- type synonym -}
  forall x. (s <| p) x -> (s' <| p') x
~= {- data definition -}
  forall x. (exists i. (s i, p i -> x)) -> (s' <| p') x
~= {- curry -}
  forall x. forall i. s i -> (p i -> x) -> (s' <| p') x
~= {- reorder arguments -}
  forall i. s i -> forall x. (p i -> x) -> (s' <| p') x
~= {- Yoneda -}
  forall i. s i -> (s' <| p') (p i)
= {- data family -}
  Morph (s <| p) (s' <| p')

It’s a fun exercise to show that reverse can be expressed as a Morph ListC ListC without going via the representation theorem.

Closure Under the Polynomial Kit

We can define the kit of polynomial functor constructors as follows.

newtype I         x = I {unI :: x}
newtype K a       x = K {unK :: a}
newtype (:+:) f g x = Sum {muS :: Either (f x) (g x)}
newtype (:*:) f g x = Prod {dorP :: (f x , g x)}

They are Functor-preserving in the only sensible way.

instance Functor I where
  fmap h (I x) = I (h x)
instance Functor (K a) where
  fmap h (K a) = K a
instance (Functor f, Functor g) => Functor (f :+: g) where
  fmap h = Sum . either (Left . fmap h) (Right . fmap h) . muS
instance (Functor f, Functor g) => Functor (f :*: g) where
  fmap h = Prod . (fmap h *** fmap h) . dorP

But we can also show that containers are closed under the same operations.

For the identity, there is one shape and one position, so we need the unit singleton family.

data US :: () -> * where
  VV :: US '()
type IC = US <| US

Wrapping up an element in a container can happen in just one way.

iWrap :: x -> Con IC x
iWrap x = VV :<: const x

It is now easy to show that Con IC is isomorphic to I

iIsoIn :: I :-> Con IC
iIsoIn (I x) = iWrap x

iIsoOut :: Con IC :-> I
iIsoOut (VV :<: e) = I (e VV)

For constant polynomials, there are no positions for elements, but there is useful information in the shape. Abbott, Altenkirch and Ghani take the shape type to be the constant and the the position set to be everywhere empty. To follow suit, we’d need to use the singleton type for the constant, but that’s more Haskell work than necessary (unless you import Richard Eisenberg’s excellent library for that purpose). We can use () unit as the type-level shape and store the constant only at the value level.

data KS :: * -> () -> * where
  KS :: a -> KS a '()

Again, the position set must be empty

data KP :: u -> * where
kapow :: KP u -> b
kapow z = z `seq` error "so sue me!"
type KC a = KS a <| KP

We can put an element of the constant type into its container.

kon :: a -> Con (KC a) x
kon a = KS a :<: kapow

We thus obtain the isomorphism.

kIsoIn :: K a :-> Con (KC a)
kIsoIn (K a) = kon a

kIsoOut :: Con (KC a) :-> K a
kIsoOut (KS a :<: _) = K a

For sums, you pick a branch of the sum and give a shape for that branch. The positions must then come from the same branch and fit with the shape. So we need the type-level shape information to be an Either and make value-level things consistent with the type-level choice. That’s a job for this GADT.

data Case :: (i -> *) -> (j -> *) -> (Either i j) -> * where
  LL :: ls i -> Case ls rs (Left i)
  RR :: rs j -> Case ls rs (Right j)

Now, the sum of containers is given by consistent choices of shape and position.

type family SumC c c' :: * where
  SumC (s <| p) (s' <| p') = Case s s' <| Case p p'

That is, the choice of value-level shape fixes the type-level shape, and then the positions have to follow suit. If you know which choice has been made at the type level, you can project safely.

unLL :: Case s s' (Left i) -> s i
unLL (LL s) = s
unRR :: Case s s' (Right j) -> s' j
unRR (RR s') = s'

In turn, that allows us to define the injections of the sum as container morphisms.

inlC :: Morph (s <| p) (SumC (s <| p) (s' <| p'))
inlC = Morph $ \ s -> LL s :<: unLL
inrC :: Morph (s' <| p') (SumC (s <| p) (s' <| p'))
inrC = Morph $ \ s' -> RR s' :<: unRR

Now we’re ready to show that the container sum is isomorphic to the functorial sum of the two containers.

sumIsoIn :: (Con (s <| p) :+: Con (s' <| p')) :-> Con (SumC (s <| p) (s' <| p'))
sumIsoIn = either (inlC $<$) (inrC $<$) . muS

sumIsoOut :: Con (SumC (s <| p) (s' <| p')) :-> (Con (s <| p) :+: Con (s' <| p'))
sumIsoOut (LL s  :<: e) = Sum (Left (s :<: (e . LL)))
sumIsoOut (RR s' :<: e) = Sum (Right (s' :<: (e . RR)))

Now, for products of containers, you need a pair of shapes, one for each component, so the type-level shape also needs to be a pair.

data ProdS :: (i -> *) -> (j -> *) -> (i, j) -> * where
  (:&:) :: ls i -> rs j -> ProdS ls rs '(i, j)

An element position in such a container is either on the left or on the right, and then you need to know the position within that component.

data ProdP :: (i -> *) -> (j -> *) -> (i, j) -> * where
  PP :: Either (lp i) (rp j) -> ProdP lp rp '(i , j)
unPP :: ProdP lp rp '(i , j) -> Either (lp i) (rp j)
unPP (PP e) = e

The product is then given by those pieces, and the projections are container morphisms.

type family ProdC c c' :: * where
  ProdC (s <| p) (s' <| p') = ProdS s s' <| ProdP p p'
outlC :: Morph (ProdC (s <| p) (s' <| p')) (s <| p)
outlC = Morph $ \ (s :&: _) -> s :<: (PP . Left)
outrC :: Morph (ProdC (s <| p) (s' <| p')) (s' <| p')
outrC = Morph $ \ (_ :&: s') -> s' :<: (PP . Right)

Pairing is implemented by either on positions.

pairC :: Con (s <| p) x -> Con (s' <| p') x -> Con (ProdC (s <| p) (s' <| p')) x
pairC (s :<: e) (s' :<: e') = (s :&: s') :<: (either e e' . unPP)

Again, we get an isomorphism with functorial products.

prodIsoIn :: (Con (s <| p) :*: Con (s' <| p')) :-> Con (ProdC (s <| p) (s' <| p'))
prodIsoIn (Prod (c, c')) = pairC c c'

prodIsoOut :: Con (ProdC (s <| p) (s' <| p')) :-> (Con (s <| p) :*: Con (s' <| p'))
prodIsoOut c = Prod (outlC $<$ c, outrC $<$ c)

So, the polynomials are, as expected, containers.

W-types

The least fixpoint of a container is what Per Martin-Löf calls a W-type.

newtype W c = In (Con c (W c))

Lots of our favourite datatypes are W-types. E.g., unlabelled binary trees:

 
type Tree = W (SumC (KC ()) (ProdC IC IC))

Define the constructors like this.

leaf :: Tree
leaf = In (inlC $<$ kon ())
node :: Tree -> Tree -> Tree
node l r = In (inrC $&rt;$ pairC (iWrap l) (iWrap r))

But there are functors which are not containers: the continuation monad is the classic example. The element type always stays right of the arrow. Some people like to classify the polarity of parameter occurrences in a type operator as “positive” or “negative”. A top level occurrence is positive. Sum and product preserve polarity. Function types preserve polarity in the target but flip polarity in the domain. A type operator whose parameter occurs only positively will be a covariant functor; if the parameter occurs only negatively, it will be a contravariant functor. A “strictly positive” occurrence is not only positive: the even number of times its polarity has been flipped is zero. A type operator whose parameter occurs only strictly positively will be a container. Least fixpoints of functors have recursive “fold operators”, but least fixpoints of containers guarantee the existence of induction principles: the difference between the two matters when you’re dependently typed.

Hancock’s Tensor

Here’s an operation you can define on containers, but not on Haskell functors more generally. Peter Hancock defines the tensor of two containers thus

type family TensorC c c' :: * where
  TensorC (s <| p) (s' <| p') = ProdS s s' <| ProdS p p'

It’s a bit like a product, in that shapes pair up, but when we look at the positions, we don’t make a choice, we pick a pair. Think of the two components as coordinates in some sort of grid. Indeed, consider what TensorC ListC ListC might be. It’s the container which gives you the type of rectangular matrices: “lists of lists-all-the-same-length”.

Roland Backhouse wrote a paper a while back deriving properties of natural transformations on “F-structures of G-structures-all-the-same-shape”, but he couldn’t give a direct mathematical translation of that idea as an operation on functors, only by restricting the composition F.G to the unraggedy case. Hancock’s tensor gives us exactly that notion for containers.

You can degenerate tensor into functor composition…

newtype (f :.: g) x = C {unC :: f (g x)}

layers :: Con (TensorC (s <| p) (s' <| p')) :-> (Con (s <| p) :.: Con (s' <| p'))
layers ((s :&: s') :<: e) = C (s :<: \ p -> s' :<: \ p' -> e (p :&: p'))

…but you don’t have to do it that way around, because you can transpose a tensor, thanks to its regularity:

xpose :: Morph (TensorC (s <| p) (s' <| p')) (TensorC (s' <| p') (s  (s' :&: s) : (p :&: p')

Fans of free monads may enjoy thinking of them as the least fixpoint of the functorial equation

Free f = I :+: (f :.: Free f)

If f is a container Con (s < p), you can think of s as describing the commands you can issue and p as the responses appropriate to a given command. The free monad thus represents an interactive mode session where at each step you decide whether to stop and report your result or to issue another command, then continue with your session once you have the response.

What’s not so well known is that the free applicative is given exactly by replacing composition with tensor. The free applicative gives you a batch mode session, where your commands are like a deck of punch cards: the sequence is fixed in advance, and you report your result once you have collected your lineprinter output, consisting of all the responses to the commands.

Container Composition?

We have tensor for containers, but what about composition? Abbott, Altenkirch and Ghani have no difficulty defining it. The shape of a composite container is given exactly by an “outer” container whose elements are “inner” shapes. That way, we know the shape of the outer structure, and also the shape of each inner structure sitting at a given position in the outer structure. A composite position is a dependent pair: we have to find our way to an inner element, so we first pick an outer position, where we will find an inner structure (whose shape we know), and then we pick an inner position in that structure.

So now, we’re Haskelly stuffed. We need to promote Con itself (functions inside!). And we need its singletons. GHC stops playing.

How will the situation look when we have Π-types (eliminating the need for singletons) and the ability to promote GADTs? I don’t know. We’ll still need some higher-order functions at the type level.

Winding Up

Containers are an abstraction of a particularly well behaved class of functors, characterized in a way which is very flexible, but makes essential use of dependent types. They’re a rubbish representation of actual data, but they allow us to specify many generic operations in a parametric way. Rather than working by recursion over the sum-of-products structure of a datatype, we need only abstract over “shapes” and “positions”.

E.g., when positions have decidable equality, a container is (infinitely) differentiable (smooth?): you just use the usual rule for differentiating a power sequence, so that the shape of the derivative is a shape paired with a position for the “hole”, and the positions in the derivative are the positions apart from that of the hole. When you push that definition through our various formulae for sums and products, etc, the traditional rules of the calculus appear before your eyes.

Similarly, a traversable container is one whose position sets are always finite, and hence linearly orderable. One way to achieve that is to factor positions through Fin: effectively, shape determines size, and you can swap out the functional storage of elements for a vector.

I was quite surprised at how far I got turning the theory of containers into somewhat clunky Haskell, before the limits of our current dependently typed capabilities defeated me. I hope it’s been of some use in helping you see the shapes-and-positions structure of the data you’re used to.

Advertisements

One Herald Layout

May 16, 2015

Layout is a source of violent disagreement in programming languages. I’ve written about it before, in the context of Epigram. But now I’m even more overwhelmend than I was then, and I’m thinking about working on several languages, which makes me less inclined to think of their individual properties and concentrate on what I need. I’m certainly not pitching to solve everybody‘s layout problems once and for all: I’ll be lucky if I can even manage my own. Let’s try to boil the issues down.

Some lines are long. I grew up amongst the paraphernalia of the punchcard era, and for the most part, I used 80-column displays. To this day, when I’m hacking, I get uncomfortable if a line of code is longer than 78 characters, and I enjoy the way keeping my code narrow allows me to put more buffers of it on my screen. But however you play it, it’s far from odd to find that a logical line of code stretches wider than your window, so that it might be visually more helpful if it made more use of the vertical dimension. Indenting ‘continuation’ lines more than the ‘header’ line is a standard way to break the latter into pieces which fit.

Some lines are subordinate. Whether they are sublists of a list, or the equations of a locally defined function, or whatever, a textual construct sometimes requires a subordinate block of lines. It’s kind of usual to indent the lines which make up a subordinate block.

How do you tell whether an indented line is a continuation line or a header line within a subordinate block?

I’m trying to find a simple way to answer that question, and what I’m thinking is that I’d like a symbol which marks the end of ‘horizontal mode’, where indented lines continue the header, and the beginning of ‘vertical mode’, where indented lines (each in their own horizontal mode) belong to a subordinate block. My candidate for this symbol is -: just because it looks like a horizontal thing then some vertical things. I’m going to try to formulate sensible rules to identify the continuation and subordination structure.

An indentation level, or Dent, is an element of the set of natural numbers extended by bottom and top, with bottom < 0 < 1 < 2 < … j. An i-Block is a possibly empty sequence of j-Chunks each for some j > i. Within a given j-Chunk, each line is considered a continuation of the first (the header) until the first occurrence of -:, at which point the remainder of the j-Chunk is interpreted as a subordinated j-Block, with any text to the right of -: treated as a top-Line. A document is a bottom-Chunk.

And, er, that’s it. At least for the basic picture.

Higgledy piggledy
  boggle bump splat
Most of the post
  clusters close on the mat -:
  the phone bill
  the gas bill
  the lecce
  the junk
  the bags to dispose of
    old clothes from your trunk
The tide you divide
  to get into your flat
Will just gather dust
  if you leave it like that.

means

{Higgledy piggledy boggle bump splat; Most of the post clusters close on the mat {the phone bill; the gas bill; the lecce; the junk; the bags to dispose of old clothes from your trunk}; The tide you divide to get into your flat; Will just gather dust if you leave it like that.}

(Actually, it might make sense to allow a matching :- to act as an ‘unlayout herald’. The idea is that a Block is a bunch of Chunks and a Chunk is a bunch of Components, and a Component is either a lexical token or a subordinated Block. If a -: has no matching :-, it’s a subordinated Block Component at the end of its enclosing Chunk; the matching :- indicates the end of the subordinated Block Component, after which the Chunk continues.)

By way of an afterthought, why not take Dent to be the integers extended by bottom and top. A line which looks like this (with at least 3 dashes and any amount of whitespace either side)

/--------/

shifts the indentation origin to the left by number-of-dashes-plus-2, thus increasing the indentation of the leftmost physical column by the corresponding amount. A line like

\--------\

shifts the origin the other way, and if you overdo it, the leftmost physical column will have negative indentation, but not as negative as bottom. That’s one way to keep your subordinates from drifting too far to the right.

model the world; view your data; control their chaos

April 9, 2015

We’re heading into the time of year where institutional data integration miseries make a mockery of academic productivity as we scrabble to assemble the outcome of a variety of assessment processes into something that might resemble the basis for a judgment.

I share a module with a colleague (at Strathclyde we use the word “class”, but that might become confusing, given what follows). I do a lot more online assessment than he currently does, so it suits me to key all my student data by username. My colleague keys all his assessment data by registration number. Our institution’s Virtual Learning Environment keys students differently again, for exercises involving anonymous marking. All of these keys are just strings. How do we achieve coherence? Laboriously.

My part of the module is chopped up into topics. Each topic has associated classroom-delivered paper tests and some online materials.
The information about how students have performed in these various
components is managed rather heterogeneously. There might be one file for each paper test. Meanwhile, students each have their own directory recording their interaction with online materials, with a subdirectory for each topic, containing files which relate to their performance in individual assessment items. Some of these files have formats for which I am to blame; other file formats I have thrust upon me. I need to be able to find out who did how well in what by when. I need logic.

And I’m asking myself what I usually ask myself when I need logic: ‘How much of the logic I need can I get from types?’. I’m fond of decidable typechecking, and of various kinds of type-directed program construction (which I much prefer to program-directed type construction). Can we have types for data which help us to audit, integrate and transform them in semantically sensible ways? That’s the kind of problem that we dependent type theorists ought to be able to get our teeth into. But these everyday spreadsheet-this, database-that, log-file-the-other data are really quite unlike the indexed inductive tree-like datatypes which we are used lovingly to be crafting. “Beautiful Abstract Syntax Trees Are Readily Definable” was one of the names we thought about calling Epigram, until we checked the acronym. Dependent type theory is not just sitting on a canned solution to these real world data problems, ready to deploy. Quite a lot of headscratching will be necessary.

‘What’s a good type for a spreadsheet?’ is a reasonable question. ‘What’s a good dependent type for a spreadsheet?’ is a better question. ‘Upon what might a dependent type for a spreadsheet depend, and how much would that really have to do with spreadsheets per se?’ is a question which might lead to an idea. When you have diverse files and online sources all contributing information to some larger resource, we need to establish a broader conceptual framework if we are to work accurately with the individual components. The spreadsheets, database records, forms, etc, are all views or lenses into a larger system which may never exist as a single bucket of bits, but which me might seek to model.

So what I’m looking for is a dependently typed language of metadata. We should be able write a model of the information which ought to exist, and we should be able to write views which describe a particular presentation of some of the data. A machine should then be able to check that a model makes sense, that a view conforms to the model, and that the data is consistent with the view. Given a bunch of views, we should be able to compute whether they cover the model: which data are missing and which are multiply represented. The computational machinery to check, propagate or demand the actual data can then be constructed.

I had a thought about this last summer. Picking some syntax out of thin air, I began to write things like

class Student

class Module

for Module -:
  class Test

for Student, Module -:
  prop Participant

What’s going on? I’ve made four declarations: three “classes”, and one relation. A “class” is a conceptual variety of individuals. Classes can be (relatively) global, such as students or modules. Classes can be localized to a context, so that each module has its own class of tests.

The “for” construct localizes the declarations which are subordinated by indentation after the layout herald “-:”. It’s tidier to say that each module has a bunch of tests than that tests exist globally but each test maps to a module. Moreover, it means that tests in different modules need not share a keyspace.

A class is a finite enumeration whose elements are not known at declaration time. A prop is a finite enumeration whose elements are not known at declaration time, but it is known that there’s at most one element. There’s at most one way in which a student can be a participant in a module.

So far, I haven’t said anything about what these wretched individuals might look like. So,

for Student -:
  email     ! String
  username  ! String
  regNo     ! String
  surname   : String
  forenames : String

I’ve declared a bunch of things which ought to exist in the context of an individual student. The ones with “!” are intended to be keys for students. That’s to say any sensible view of student data should include at least one student key, but it doesn’t really matter which. Of course, with a little more dependently typed goodness, we could enforce formatting properties of email addresses, usernames and registration numbers…some other time. The point is that by introducing abstract notions of individual, outside of the way that those individuals can be keyed, we provide a hook for data integration.

I’m kind of saying what stuff should be stored in a “master record” for each student, but I don’t expect to know all the fields when I introduce the concept of a student.

Another thing that’s fun about bothering to introduce abstract classes of individual is that contextualization can be much more implicit. We do not need to name individuals to talk about stuff that’s pertinent to a typical individual, which means we can write higher-order things in a first order way and handle more of the plumbing by type-based lookup.

class Department

for Module -:
  department : Department
  moduleId   ! String
  class Test -:
    item    ! String
    max     : [0..]
    weight  : [0..]

for Student, Module, Participant, Test -:
  prop Present -:
    score : [0..max]

Here, I show how to associate a department with a module, after the fact. I also introduce tests for each module, each with a maximum score: the use of “-:” in the “class Test” declaration just elides an immediately subsequent “for Test -:”.

Correspondingly, for each student participating in a module (and those students might not be from the same department as the module), and for each test, it makes sense to wonder if the student showed up to the test and where in the range of possible marks they scored.

I should be able to write something like

Module/department

to mean, given a contextualizing department, just the modules for that department.

What’s a view of this information? It might be something like

one Module [moduleId]       | for Test [item
                            |           ----
                            |           max ]
----------------------------+--------------------------------
for Student, Participant    | val : Percentage
  [surname|foreNames|regNo] | if Present -:
                            |   [score]
                            |   val = weight * score / max
                            | else -:
                            |   ["A"]
                            |   val = 0

I’m sure we can negotiate over the two-dimensionality of the syntax (as long as we prioritise reading over writing), but that’s the picture. Scoping goes downward and rightward. The brackets show you what you actually see, which must exist in the given scope. The keyword “one” indicates that we are working within just one module, keyed by the given code. Meanwhile “for” requires us to tabulate all the individuals for whom an environment can be constructed to match the given context.

Meanwhile, the cells in the middle of the table will enable the computation of new local information, “val”. Presence or absence is signified by a score or the constant “A” (which is checkably distinct from all allowable scores), and the definition of “val” is given accordingly.

Note that I have not indicated whether this view is a query or a form. In fact, I have made sure it is valid in both roles. I’d like to be able to instruct the computer to initialize a spreadsheet with as much of this information as is available from other sources. I usually have to do that with cut and paste! After the fashion of pivot tables, I should be able to specify aggregations over rows and columns which are appropriate (e.g., the average score for each test, the total score for each student).

Lots of the ingredients for these tools are in existence already, and it’s clear that my knowledge of them is sadly lacking, having stumbled into the direction of data from the direction of dependent type theory. I seek to educate myself, and help in that regard is always appreciated. Of course, informally, I’m taught about some of the problems by the poor technology with which I face the mundane realities of my existence, and I understand that I can change me more easily than I can change the world. I don’t expect institutional buy-in (I’ll have a go, right enough), but I don’t need it. The point of the modelling language is to build myself a bubble with a permeable membrane: the things from outside the bubble can have sense made of them (by giving a view which describes the role of external data); the things constructed inside the bubble make sense intrinsically (because they were constructed in a model-directed way). Fewer strings! More things!

Edit: I should have included a link to
my slides on this topic, for a talk delivered at Microsoft Research and at York.

Warming up to Homotopy Type Theory

April 1, 2015

“Why do you hate homotopy type theory?” is question I am sometimes asked, but I never answer it, because the question has an inaccurate presupposition. I am not happy when people forget that function extensionality, a key benefit of HoTT, was already available in Observational Type Theory. I am not happy when people disregard the convenience of having a clearly delimited fragment of one’s propositions where proofs can be identified definitionally. I am not happy when people act as if homotopy type theory already works when, without an internal notion of computation which gives canonical forms (like OTT has), it doesn’t…yet. But I’m pretty sure it will acquire such a notion. So, for the avoidance of doubt, I do not hate homotopy type theory: I hate homotopy type theorists almost as much as I hate myself, which is why I have decided to become a homotopy type theorist as a kind of therapy. I’ve been thinking quite a lot, of late, about how to make homotopy type theory go, being as I am, an incorrigible tinkerer with computational mechanisms, and I think I’m onto something. By way of taking Ariadne’s advice, I thought I’d bruce out some ravings about where I think I’ve got to. I suspect I’m not going to be very helpful to people who aren’t already HoTT-headed, and probably not all that helpful to those who are, so let me manage expectations downwards: I’m not attempting pedagogy, I’m just thinking aloud.

As some of you may know, I’m from Northern Ireland, a place which naturally promotes homotopic (not that they would call it that, in case someone thought they were gay, given that malapropism is the fourth most popular national pastime after (in reverse order) homophobia, sectarianism and emigration) considerations as a consequence of the inconveniently large lake in the middle of it. “Who lives in the big blue bit?”(*), asked former Secretary of State, Sir Humphrey Atkins, when presented with a map of the place, shaded in accordance with sectarian affiliation. But I digress. Lough Neagh (which is pronounced something roughly like “Loch Nay”, giving us Northern Irish one more “ough” than the rest of yous) is the hole where the Isle of Man used to be until Fionn mac Cumhaill threw it at someone and missed.

Northern Ireland map

But the point is that if you’re going from Antrim to Enniskillen, you’ve got to go round Lough Neagh one way or the other, and no matter how much you stretch or divert your route, if you stay dry, you won’t deform one way into the other. And indeed, if you happen to be in Antrim and you ask for directions to Enniskillen, they’ll most likely tell you “If I was going to Enniskillen, I wouldn’t start from here.”. Much in the same way (upto deformation, I hope), if I was going to Homotopy Type Theory, I wouldn’t start from the Calculus of Inductive Constructions.

Why not? Because we start from the strange idea that equality is some sort of inductive definition

  Id (X : *)(x : X)(y : X) : *
  refl (X : *)(x : X) : Id X x x

which already places too much faith in the disappointing accident that is the definitional equality of an intensional type theory, and then we add an eliminator with a computation rule which nails our moving parts to said accident…

  J (X : *)(x : X)(P (y : X)(q : Id X x y) : *)(m : P x (refl X x))
    (y : X)(q : Id X x y) : P y q
  J _ _ _ m _ (refl _ _) = m

…right? What does that do? It says that whenever the proof of the equation is by reflexivity, the value m to be transported is already of the right type, so we don’t even need to think about what we have to do to it. If we are willing to consider only do-no-work transportation, we will never be able to escape from the definitional equality. (Note that the purpose of pattern matching q against refl is just to have a sound but not complete check that x is definitionally equal to y. If you like proof irrelevance (much more fun than K, for example), then you can just ignore q and decide definitional equality of x and y. I mean, if you’ve gone to the trouble of engineering a decidable definitional equality, you might as well get paid for it.)

But we don’t stick with definitional equality, and thank goodness for that. Observational Type Theory gives you structural equality on types and thus do-no-work-after-program-extraction transportation, but for open terms (and to be conservative over intensional type theory), we needed to refocus our efforts around the machinery of transportation, so that nontrivial explanations of equality result in nontrivial computations between types. That’s enough to get extensionality working. But univalence (the type of Antrim stuff is equal to the type of Enniskillen stuff if you have a way to transport it either way, such that a “there and back trip” makes no change to the stuff and orbits Lough Neagh a net total of zero times) is a whole other business, because now we can’t get away with just looking at the types to figure out what’s going on: we have to look at the particular route by which the types are connected.

(Local simile switch. Thorsten Altenkirch, Wouter Swierstra and I built an extensionality boat. We might imagine that one day there will be a fabulous univalence ship: the extensionality boat is just one of its lifeboats. But nobody’s built the ship yet, so don’t be too dismissive of our wee boat. You might learn something about building ships by thinking about that boat. I come from Belfast: we built the Titanic and then some prick sailed it into an iceberg because they made a valid deduction from a false hypothesis.)

So, what’s the plan? Firstly, decompose equality into two separate aspects: type equivalence and its refinement, value equality. The former is canonical.

  (X : *) {=} (Y : *)  :  *

I’m using braces rather than angle brackets only because I have to fight HTML.

The latter is computed by recursion over the former.

  (x : X) =[ (Q : X {=} Y) ]= (y : Y)  :  *

That is, the somewhat annotated mixfix operator =[…]= interprets a type isomorphism between types as the value equality relation thus induced on those types. I shall HoTT in Rel for this.

Value equality is thus heterogeneous in a way which necessarily depends on the type isomorphism which documents how to go about considering the values comparable. Let’s be quite concrete about that dependency. We get to look at Q to figure out how to relate x and y.

Reflexivity is not a constructor of {=}. Rather, every canonical type former induces a canonical constructor of {=}. In particular

  *^            :  * {=} *
  X =[ *^ ]= Y  =  X {=} Y

We may add

  sym (Q : X {=} Y)  :  Y {=} X
  y =[ sym Q ]= x    =  x =[ Q ]= y

  trans (Y : *)(XY : X {=} Y)(YZ : Y {=} Z) : X {=} Z
  x =[ trans Y XY YZ ]= z  =  Sigma Y \ y -> x =[ XY ]= y * y =[ YZ ]= z

Function extensionality becomes the value equality induced by the structural isomorphism for Pi-types. Types on which we depend turn into triples of two-things-and-a-path-between-them.

  Pi^ (S^ : S' {=} S`)
      (T^ : (s : Sigma (S' * S`) \ ss -> (s^ : ss car =[ S^ ]= ss cdr))
            -> T' (s car car) {=} T` (s car cdr))
    : Pi S' T' {=} Pi S` T`
  f' =[ Pi^ S^ T^ ]= f`  =  (s : Sigma (S' * S`) \ ss -> (s^ : ss car =[ S^ ]= ss cdr)) ->
    f' (s car car) =[ T^ s ]= f` (s car cdr)

Every elimination form must give rise to an elimination form for the corresponding equality proofs: if you eliminate equal things in equal ways, you get equal results, and these things have to compute when you get canonical proofs of equations between canonical things being eliminated. Consequently, reflexivity shows up as the translation from types to type isomorphisms, then from values to the equality induced by those type isomorphisms. In Observational Type Theory as we implemented it, reflexivity was an axiom, because by proof irrelevance (by which I mean by making sure never to look at the proof) it didn’t matter what it was: the half-built Death Star was fully operational. Here, we can’t get away with that dodge. Fortunately, I have at least some clue how to proceed. My less famous LICS rejectum, joint work with Thorsten, gives a vague sketch of the construction. The upshot is that every

X : *

has some

X^ : X {=} X

, and by way of a refinement, every

x : X

has some

x^ : x =[ X^ ]= x

.

Now, a type isomorphism is no use unless you can actually get from one side of it to the other. We shall need that type isomorphisms induce paths between values. That is, we shall need an eliminator

  path (S : *)(T : *)(Q : S {=} T)(s : S) : Sigma T \ t -> s =[ Q ]= t

and moreover, we shall need that paths are unique, in the sense that, for given inputs, every pair in the return type of

path

is equal to the thing that

path

returns. That is, we have a kind of propositional η-rule for paths. I’m not yet sure of the most ergonomic way to formulate that uniqueness. But consider, in particular, q : x =[ X^ ]= y. We will have that (x , x^) =[…]= (y , q) in the type of paths from x via X^. We thus recover more or less the J rule, seen as transportation between two path-dependent types.

  J (X : *)(x : X)
    (P : ((Sigma X \ y -> x =[ X^ ]= y) -> *)
    (m : P (x , x^))
    (y : X)(q : x =[ X^ ]= y)
    : P (y , q)
  J X x P m y q =
    path (P (x , x^)) (P (y , q)) (P^ (((x , x^) , (y , q)) , ... path uniqueness ...))
      m car

To achieve the definitional equational theory we’re used to from the J rule, we will need to make sure that the reflexivity construction, x^, generates proofs which are recognizably of that provenance, and we shall have to ensure that being recognizably reflexive is preserved by elimination forms, e.g., that we can take

  f^ ((s , s) , s^) = (f s)^

so that we can make

  path X X X^ x = (x , x^)

If we can obtain that path uniqueness from x along X^ when applied to (x , x^) gives (x , x^)^, then we shall have

  J X x P m x x^
    = path (P (x , x^)) (P (x , x^)) (P^ (((x , x^) , (x , x^)) , (x , x^)^)) m car
    = path (P (x , x^)) (P (x , x^)) (P (x , x^))^ m car
    = (m , m^) car
    = m

That is, the computationally obscure J rule has been decomposed into in-your-face transportation and path uniqueness. Somehow, I’m not surprised. It would not be the first time that a dependent eliminator has been recast as a non-dependent eliminator fixed up by an η-law. That’s exactly how I obtained a dependent case analysis principle for coinductive data without losing subject reduction.

Of course, we shall need to define

path

by recursion over type isomorphisms. We shall thus need to say how to compute

path Y X (sym XY) y

, which amounts to delivering the path in the other direction (the htap?), and its uniqueness. Transitivity goes (I hope) by composition.

So what of univalence? It’s not an axiom. It’s a constructor for

X {=} Y

where you just give the implementations of both path directions and show their uniqueness, thus explaining how to implement the elimination behaviour. We then need something like

  x =[ Univalence X Y xy yx ... ]= y  =  xy x =[ Y^ ]= y

but that’s annoyingly lopsided. We also need to know when isomorphisms are equal. Something like

  Q =[ X {=} Y ]= Q'  =  (\ x -> path X Y Q car) =[ (X -> Y)^ ]= (\ x -> path X Y Q' car)

might be enough, but again annoyingly lopsided.

It’s late and I’m tired, so I suppose I should try to sum up what I’m getting at. I’m hoping we can get to a computational treatment of univalence by isolating the notion of type isomorphism in quite an intensional way. On the one hand, the structure of a type isomorphism tells us how to formulate the equality for values in the related types. On the other hand, the structure of a particular type isomorphism tells us how to compute the transportations of values across it, giving rise to unique paths. Univalence allows us to propose arbitrary isomorphisms, and somehow, univalence gives an η-long normal form for type isomorphism: every type isomorphism is provably equal to the packaging-by-univalence of its elimination behaviour.

However, hilariously, we have to make sure that the relations =[…]= induces between equivalent type isomorphisms are equivalent (i.e. pointwise isomorphic), in order to show that =[…]=, like all the other elimination forms, respects equality. As County Antrim folk say, “There’s nothing for nothing in Islandmagee.”. Islandmagee, by the way, is the peninsula on the east coast, across the narrow sea from Westeros (which is a rehabilitated landfill site between Whitehead and Larne), apparently containing nothing.

(*) Eels, mostly.

being and doing and ports and pegs

February 28, 2015

I’ve been thinking…

…about components of computations. A component *does* something, given i inputs, to produce o outputs. That is, a component has i ports and o pegs, ordered spatially “left-to-right”, and I shall think of data flowing downward, into ports at the top and out from pegs at the bottom. When working one dimensionally, I’ll put top on the right and bottom on the left, as per usual with functional notation, and I’ll be a Bird-style rebel about types. Suppose I have two components, o0 <- c0 i0 and o1 = x.

Now define o0 + (o1 – i0) <- c0 c1 (i1 + (i0 – o1)) to be the component constructed by plugging c0's ports with c1's pegs, left-to-right: any overright inputs (overright is the spatial dual of leftover) remain active inputs of the composite and any overright output become outputs of the composite. We have one of the two pictures, below.

   |..i1..| |.i0  |      |......i1......|
  [___c1___]|  -  |     [_______c1_______]
   |..o1..| |  o1.|      |..i0..| |.o1  |
  [_______c0_______]    [___c0___]|  -  |
   |......o0......|      |..o0..| |  i0.|

Entertainingly, this composition is associative, with neutral element the portless pegless blank space. We obtain a parenthesis-free notation for building computations which degenerates to prefix-Polish in the case where every component has one peg.

We can also consider the regular horizontal juxtaposition, (o0 + o1) <- (c0 + c1) (i0 + i1), which makes no connections. We do need some parentheses to delimit the extent of +. We might write

(List S) (List T) <- unzip (List (Pair S T))
unzip nil                    = nil nil
unzip (cons (pair s t) sts)  = (cons s + cons t) unzip sts

I have taken the liberty of elaborating the 2 <- 1 arity of unzip with types. Note that the pattern variables s, t, sts are *values* with arity 1 <- 0. I have also included gratuitous parentheses on the left.

If we want to write higher-order functions, we shall need turn "doing" into "being". I'll write {..} to suspend computations (insert stock joke about braces being British for suspenders). Forcing a suspension requires some notation, and some means to identify the arity of the thing, which might be by type or by explicit annotation. A lot of the time, it might be more convenient to be explicit that a parameter to a function is for "doing", not "being". We might write

(List T) <- map {T <- (S)} (List S)
map f nil         = nil
map f (cons s ss) = cons (f s) (map f ss)

the point being that f need neither be forced when applying it to s, nor re-suspended when passing it recursively to map.

It’s kind of funny. If f and g are both 1 <- 1, then f g means their *composition*. To apply f to g, you write f {g}.

I think I'll stop for now. I don't think I've solved all the problems which are bugging me, but I think it's worth playing around with notational ideas which we can work with if we're just that little bit less reliant on inferring types and willing to check a bit more. I'm also trying to be more explicit about the value-computation distinction, in order to clean up the notation for managing effects. For example, that thing I called map just now. It's not what Haskellers call "map" (well, it is, but…); it's what Haskellers call "traverse". But that's another story.

compositional processes for dependent sessions

January 29, 2015

Back in this post, I formulated a notion of session type where the structure of later parts of the session were dependent on the traffic from, not the participants in, the earlier parts. I’m going to tweak the definition a little, just to chunk record-like pieces together, and to be explicit that traffic is a record.

  Record : Set                                        [(R : Record)]r : Set
  ::= ! (X : Signal)                                  [ ! X    ]r = [ X ]s
    | sg (S : Session) (T : [ S ]r -> Session)        [ sg S T ]r = (s : [ S ]r) * [ T s ]r

  Session : Set                                       [(S : Session)]t : Record
  ::= ! (R : Record)                                  [ ! R    ]t = R
    | sg (S : Session) (T : [ [ S ]t ]r -> Session)   [ sg S T ]t = sg [ S ]t \ s -> [ T s ]t
    | op (S : Session)                                [ op S   ]t = [ S ]t

The type Session is defined mutually with the interpretation [-]t which tells you of what the traffic for a session must consist. You can see that it’s a big dependent record type, with nothing higher-order in it, and that fits with our expectation that what goes over wires are finite sequences of bits.

Now we know what the types are, what are the processes which communicate? I gave them a weakest precondition semantics, where G and H, below are postconditions on the session traffic.

Play (! R)    G = (r : [ R ]r) * G r  -- choose a happy record
Play (sg S T) G = Play S \ s -> Play (T s) \ t -> G (s , t)
Play (op S)   G = Oppo S G

Oppo (! R)    H = (r : [ R ]r) -> H r  -- put up with whatever record
Oppo (sg S T) H = Oppo S \ s -> Oppo (T s) \ t -> H (s , t)
Oppo (op S)   H = Play S H

The trouble is that each of Player and Opponent are described by a right-nested sequence of Π- and Σ-types, quantifying over signals. The nesting structure of the session type gets flattened, and you pretty much have to work in continuation-passing style. I did just about manage to program a small example with it, but I wasn’t all that happy with how it worked out.

So I had a nice chat with Simon Gay and the ABCD gang at Glasgow University, and then I had another think. What I want is a more straightforward notion of participant.

Role ::= play | oppo
swap (r : Role) : Role
swap play = oppo
swap oppo = play

Party (r : Role)(S : Session) : Set

but when we try to define that, we hit this problem

Party r (sg S T) = (s : Party r S) * Party r (T (? : [ S ]t))

That is, we have one of the parties to S, but to see how to continue, we need the whole of the traffic generated when that party interacts.

So let’s get it. Let’s define

Party (r : Role)(S : Session) : Set
Reply (r : Role)(S : Session)(p : Party r S) : Record
traff (r : Role)(S : Session)(p : Party r S)(y : [ Reply r S p ]r) : [ [ S ]t ]r

which may remind some of you of an indexed container, but I digress. The plan is that parties are higher-order, reflecting the way behaviour is a function of input. But the replies are first-order. A reply is exactly the fragment of the traffic which is not chosen by the party, so we can construct the whole of the session traffic given one party and the reply to it.

So here goes. Base case:

Party play (! R) = [ R ]r         Party oppo (! R) = One
Reply play (! R) x = ! one        Reply oppo (! R) _ = R
traff play (! R) x _ = x          traff oppo (! X) _ x = x

And now we’re good for

Party r (sg S T) = (s : Party r S) * (s' : [ Reply r S s ]r) -> Party r (T (traff r S s s'))
Reply r (sg S T) (s , k) = sg (Reply r S s) \ s' -> Reply r (T (traff r S s s')) (k s)
traff r (sg S T) (s , k) (s' , t') =
  let ss' = traff r S s s' in ss' , traff r (T ss') (k s') t'

Changing direction is no big deal.

Party r (op T) = Party (swap r) T
Reply r (op T) t = Reply (swap r) T t
traff r (op T) t t' = traff (swap r) T t t'

But now we have a peculiar pickle. If we have both parties, we should be able to figure out both replies, and then we’ll have *two* versions of the traffic, and they’d better agree. We need a traffic equaliser.

talk (S : Session)(p : Party play S)(o : Party oppo S)
  : (p' : [ Reply play S p ]r) * (o' : [ Reply oppo S o ]r) *
    traff play S p p' == traff oppo S o o'
talk (! X) x _ = _ , x , refl
talk (sg S T) (ps , pk) (os , ok) =
  let ps' , os' , qs = talk S ps os
      pt' , ot' , qt = talk (T (traff play S ps ps')) (pk ps') (ok os')
  in  (ps' , pt') , (os' , ot') , (qs , qt)
talk (op T) p o = let o' , p' , q = talk T o p in p' , o' , sym q

I’ve lied, of course. That doesn’t quite typecheck.

  ok os' : Party oppo (T (traff oppo S os os'))

We need to use qs to fix it up. We can do that by carefully abstracting both versions of the traffic.

talk (S : Session)(p : Party play S)(o : Party oppo S)
  : (p' : [ Reply play S p ]r) * (o' : [ Reply oppo S o ]r) *
    traff play S p p' == traff oppo S o o'
talk (! X) x _ = _ , x , refl
talk (sg S T) (ps , pk) (os , ok) =
  let ps' , os' , qs = talk S ps os
      talk' (pss : [ S ]t)(oss : [ S ]t)(qs : pss == oss)
            (pt : Party play (T pss))(ot : Party oppo (T oss))
          : (pt' : [ Reply play (T pss) pt ]r) * (ot' : [ Replay oppo (T oss) ot ]r) *
            traff play (T pss) pt pt' == traff oppo (T oss) ot ot'
      talk' ss ss refl pt ot = talk (T ss) pt ot
      pt' , ot' , qt = talk' (traff play S ps ps') (traff oppo S os os') qs (pk ps') (ok os')
  in  (ps' , pt') , (os' , ot') , (qs , qt)
talk (op T) p o = let o' , p' , q = talk T o p in p' , o' , sym q

(The need to build that particular helper function has me wondering if we could come up with a better notation for programs which build tuples a bit at a time.)

So we end up with a notion of parties to sessions which chop up more compositionally in accordance with the structure of the sessions themselves. It’s still a two-party session story with quite a sequential flavour.

universe hierarchies

January 9, 2015

Back in 2011, at the Hotel Erica, Berg en Dal, I gave this talk about cumulative hierarchies of universes in type theory. It was a first attempt at simplifying their treatment. But what’s it all about, anyway?

In many ways, cumulative hierarchies of universes are the motivation for type theories in the first place. Bertrand Russell, having delivered the troublesome ‘set of all sets which don’t contain themselves’ paradox, sought to fix the problem by classifying sets into layers. Type 0 sets are boring ordinary sets which are defined without quantifying over any universal set. Type 1 sets include all the type 0 sets, but also the set of all type 0 sets and those which quantify over it. Type 2 sets yadayada. The paradox fails because the problematic set cannot belong to the type of sets that it talks about.

Per Martin-Löf’s 1971 type theory had one sort, Type, and took Type : Type. Jean-Yves Girard showed this theory inconsistent by translating a non-normalising term from his own inconsistent System U, which had two layers, both impredicative (admitting universal quantification over anything). Various systems of layering have been introduced to fix this problem, juggling power, flexibility and complexity. The basis for what we tend to use today is Zhaohui Luo’s Extended Calculus of Constructions, which has one impredicative layer, Prop, at the bottom (but can it really be the bottom if it’s impredicative?) and a sequence of predicative layers, Type_0, Type_1, and so ad infinitum. The rules make

  Prop : Type_0 : Type_1 : ...

but also

  Prop ≤ Type_0 ≤ Type_1 : ...

The impredicativity of Prop and predicativity of Type_n is clear from the rules for forming function types

    G, x : A |- P : Prop             G |- A : Type_n     G, x : A |- Type_n
  ----------------------------     ------------------------------------------
    G |- (x : A) -> P : Prop         G |- (x : A) -> B : Type_n

So you can form propositions by quantifying over anything, allowing second-order definitions of propositional connectives, e.g. the following are in Prop if P and Q are

  P /\ Q  =  (R : Prop) -> (P -> Q -> R) -> R
  P \/ Q  =  (R : Prop) -> (P -> R) -> (Q -> R) -> R

but the domain and range of a function type must exist at the same level as the function type itself.

The ≤ relation is a subtyping relation, known as ‘cumulativity’ and it’s used to extend the conversion rule with a kind of subsumption.

    G |- s : S     G |- T : Type_n     S ≤ T
  -----------------------------------------------
    G |- s : T

Cumulativity is thus a preorder which includes conversion and level inclusion. Luo also closes cumulativity equi-co-variantly for functions.

    S = S'     T ≤ T'
  -----------------------------------
    (x : S) -> T  ≤  (x : S') -> T'

He does this not because anything goes horribly wrong if you treat function types as contravariant in their domain, but to ensure that ≤ can be modelled by inclusion in a set-theoretic semantics. If you see functions as relations — subsets of input-output pairs — then growing the notional codomain keeps functions the same, but shrinking the domain requires throwing pairs out of the function as inputs cease to be available.

Formally, universe levels allow us to police paradox, but informally, if we have to write universe levels all over the place, that’s a nuisance. Many systems (notably not Agda) allow us to write Type (or Set, or whatever it’s called) and hope. There are lots of ways to interpret such a thing. Bob Harper and Randy Pollack allow “anonymous universes”, where we read each usage of Type as meaning Type_n for some n which we couldn’t be bothered to write: it is not difficult to collect all the constraints on such n’s and ensure a solution exists. It’s slightly unpleasant to have programs which are piecewise accepted although collectively damned, but at least it’s clear what’s going on. And why it’s not enough. Bob and Randy note that we might write

  id {X : Type} (x : X) : X
  id x = x

(where the curly braces indicate an argument to be kept hidden) but if we treat that Type as fixed for all uses of the definition, then

  id id

is an error. The type (X : Type_n) -> X -> X lives at Type_(n+1) and above and so cannot instantiate X : Type_n. The trouble is that we have said some layer has a polymorphic identity function, when we meant to say that every layer has a polymorphic identity function. The fix is to treat each usage of a definition as if its anonymous Type occurrences are fresh: the two ids get different Types, and we learn that the first must be bigger than the second.

We get the same situation with datatypes. I can write

  data List (X : Type) : Type  :=  nil  |  cons (x : X) (xs : List X)

Let’s say that X : Type_m and List X : Type_l. Because List X can store elements of X, we need m ≤ l. If I write cons Nat nil, where Nat : Type_n, then we have Type_n instantiating X, so n < m ≤ l, which might be ok. But if I write cons (List Nat) nil, the game is up: I now have l < m ≤ l. Just as every level is closed under the formation of function types, so I want every level to be closed under the formation of list types. Ye olde Lego system treated datatypes as declared rather than defined, and thus did not extend anonymous Type generativity to them. These days, Coq does something sensible (i.e., generativity in more places) to allow cons (List Nat) nil, but I’m not entirely sure what: I think this is the relevant document, and it seems to suggest that inductive types are treated as a form of definition.

Agda does something entirely different. Ulf Norell once told me that he’d tried implementing the full Harper-Pollack story, but that it had proven computationally too expensive. (Matthieu Sozeau has implemented Harper-Pollack in a version of Coq: I get the impression from him that it is also quite compute-intensive, but that these days the engines can take it.) Agda removes cumulativity in favour of explicit universe polymorphism. Lack of cumulativity means that Set0 : Set1, but Set0 /: Set2. The level of a function type is the maximum of its domain and codomain levels, so Set0 -> Set0 : Set1, Set1 -> Set0 : Set2, etc. The identity function is

id : {l : Level}{X : Set l} -> X -> X
id x = x

and you get anonymous Type generativity by explicitly quantifying over the level variables involved. Lack of cumulativity is a nuisance: if you are working polymorphically over some X : Set0, you do not get X : Set1 — instead, you must wrap X in gratuitous record type. Levels are just that bit more explicit, and uses of ‘max’ are hard to avoid.

What would I do? I’d think about the Central Bank of Ireland, a magnificent work of performance architecture by the late Sam Stephenson. The building was constructed around a pair of reinforced concrete lift shafts. Working at ground level, they built one floor of the building, and then they winched it up to the top of the lift shafts and pinned it into place. The building appeared to be growing downward from the Dublin sky. At each step, they did the construction at the lowest level possible, then shifted it to where they needed it.

What do we need? For starters, we need some notion of level, equipped with a relation < so we can say

         i < j
  --------------------
    Type_j :> Type_i

If < is well founded (i.e., call a level ‘accessible’, inductively, if every level below it is accessible; the hierarchy is well founded if all levels are accessible), we should be able to build a model. Working bidirectionally, we should just be able to close levels under constructions like function type. We need antisymmetry: i < j contradicts j < i. We do not need distinct levels to be comparable with <

    Type_i :> S     x : S |- Type_i :> T
  ----------------------------------------
    Type_i :> (x : S) -> T

We then need a partial order from which we generate the cumulativity relation (for my taste, treating function types contravariantly in the input and covariantly in the output). To be compatible, < must imply ≤, but it need not be the case that i ≤ j for distinct i and j implies i < j. A construction which relies on i ≤ j but not i < j makes use of cumulativity but not membership, and can thus be remapped to a setting where i and j merge or to a setting where they separate.

A specific use case might be in the formalization of category theory, where we might want to say that objects tend to be at least as large as arrows without enforcing whether or not that inequality is strict. Types-and-functions has large objects and small arrows. Functors-and-natural-transformations has large objects and large arrows. (I'm grateful to Paolo Capriotti for that example.)

I'm very happy to be explicit about whether things are ‘large’ or ‘small’, provided I can shift constructions to other settings which respect the constraints upon which I rely. That's to say, I would prefer not to have anonymous Type. I'd like to be locally explicit about universe levels, so that the level constraints the typechecker encounters contain no unknown quantities. But then I'd like to winch constructions into any level system which is compatible with the one they come from. The simplest version of this story just amounts to marking each usage of a definition with its remapping; we engineer the syntax of remappings so that trivial embeddings are marked by the absence of a more explicit shift. However, we might be able to do better than that.

For one thing, there will be situations when the typechecking discipline makes it obvious which shift is happening. Just as the level at which you check a function type is the level at which you check its domain and codomain, so we might like to be clear that the level at which we check List X is the level at which we check X. We should be able to define things which play the role of type constructors and are subject to type checking, rather than type synthesis. We may thus specify how the levels used in the arguments and in the definiens relate to the level at which the type is being checked. Partial applications of these type constructors typecheck exactly when their eta-long forms do. So, when we write

  data List (X : Type) : Type  ::=  nil  |  cons (x : X) (xs : List X)

we’re really treating Type as a variable standing for the universe at which we’re checking list type formation. Meanwhile, if our local hierarchy has a successor structure, we might write.

  data Desc : Suc Type  ::=  pi (S : Type) (T : S -> Desc)  |  ...

to indicate that such descriptions of types always form a relatively large type.

For another thing, there are situations in which there’s always a level that will do and it doesn’t matter what it is. Let’s revisit the identity function

  id {X : Type} (x : X) : X
  id x = x

Whatever universe X inhabits, we can always shift id to it. Similarly, if we define composition

  {A : Type} {B (a : A) : Type} {C (a : A) (b : B a) : Type}
  (f {a : A} (b : B a) : C a b) . (g (a : A) : B a)  :  (a : A) -> C a (g a)
  f . g = \ x -> f (g a)

then that Type could be anything. If our local universe hierarchy always has least upper bounds, the relevant A, B and C will always tell us which Type to choose. I’m stumbling towards the intuition that here, the usage sites of A, B and C see them as classifiers, rather than the classified. We never need them to be small, just large enough. One clue is that the constructions of id and . can happen in the singleton hierarchy (always assuming that their types don’t themselves need to have types).

And that brings me to another thought. In smash-the-syntactic-phase-distinction dependent type theory land, we sometimes forget to distinguish the twin roles of types: being valid classifiers and being members of some universe. When we declare a thing, we need to check that its given type is a valid classifier, and we tend not to say which level that type belongs to. We often neglect to consider systems where there are types which can classify but not be classified, so that ‘type’ means ‘type at some level’. If we’re interested in building a powerful hierarchy in which to construct everything, it makes sense to have skyhooks all the way up. But if we’re in the business of winching constructions to wherever we need them, then the weaker the system we start from, the more systems we can target.

So I’m certainly not saying I have a fully formed proposal all worked out and ready to go. What I am saying is that the current implementations of cumulative hierarchies complicate the construction of things in order to facilite a diversity of ways in which they can be used. Sam Stephenson showed us that we could build all the floors without leaving the ground. Yet again, it’s that key lesson which I learned from James McKinna: never resort to polymorphism when initiality will do.