## being and doing and ports and pegs

I’ve been thinking…

…about components of computations. A component *does* something, given i inputs, to produce o outputs. That is, a component has i ports and o pegs, ordered spatially “left-to-right”, and I shall think of data flowing downward, into ports at the top and out from pegs at the bottom. When working one dimensionally, I’ll put top on the right and bottom on the left, as per usual with functional notation, and I’ll be a Bird-style rebel about types. Suppose I have two components, o0 <- c0 i0 and o1 = x.

Now define o0 + (o1 – i0) <- c0 c1 (i1 + (i0 – o1)) to be the component constructed by plugging c0's ports with c1's pegs, left-to-right: any overright inputs (overright is the spatial dual of leftover) remain active inputs of the composite and any overright output become outputs of the composite. We have one of the two pictures, below.

|..i1..| |.i0 | |......i1......| [___c1___]| - | [_______c1_______] |..o1..| | o1.| |..i0..| |.o1 | [_______c0_______] [___c0___]| - | |......o0......| |..o0..| | i0.|

Entertainingly, this composition is associative, with neutral element the portless pegless blank space. We obtain a parenthesis-free notation for building computations which degenerates to prefix-Polish in the case where every component has one peg.

We can also consider the regular horizontal juxtaposition, (o0 + o1) <- (c0 + c1) (i0 + i1), which makes no connections. We do need some parentheses to delimit the extent of +. We might write

(List S) (List T) <- unzip (List (Pair S T)) unzip nil = nil nil unzip (cons (pair s t) sts) = (cons s + cons t) unzip sts

I have taken the liberty of elaborating the 2 <- 1 arity of unzip with types. Note that the pattern variables s, t, sts are *values* with arity 1 <- 0. I have also included gratuitous parentheses on the left.

If we want to write higher-order functions, we shall need turn "doing" into "being". I'll write {..} to suspend computations (insert stock joke about braces being British for suspenders). Forcing a suspension requires some notation, and some means to identify the arity of the thing, which might be by type or by explicit annotation. A lot of the time, it might be more convenient to be explicit that a parameter to a function is for "doing", not "being". We might write

(List T) <- map {T <- (S)} (List S) map f nil = nil map f (cons s ss) = cons (f s) (map f ss)

the point being that f need neither be forced when applying it to s, nor re-suspended when passing it recursively to map.

It’s kind of funny. If f and g are both 1 <- 1, then f g means their *composition*. To apply f to g, you write f {g}.

I think I'll stop for now. I don't think I've solved all the problems which are bugging me, but I think it's worth playing around with notational ideas which we can work with if we're just that little bit less reliant on inferring types and willing to check a bit more. I'm also trying to be more explicit about the value-computation distinction, in order to clean up the notation for managing effects. For example, that thing I called map just now. It's not what Haskellers call "map" (well, it is, but…); it's what Haskellers call "traverse". But that's another story.

## compositional processes for dependent sessions

Back in this post, I formulated a notion of session type where the structure of later parts of the session were dependent on the *traffic* from, not the *participants* in, the earlier parts. I’m going to tweak the definition a little, just to chunk record-like pieces together, and to be explicit that traffic is a record.

Record : Set [(R : Record)]r : Set ::= ! (X : Signal) [ ! X ]r = [ X ]s | sg (S : Session) (T : [ S ]r -> Session) [ sg S T ]r = (s : [ S ]r) * [ T s ]r Session : Set [(S : Session)]t : Record ::= ! (R : Record) [ ! R ]t = R | sg (S : Session) (T : [ [ S ]t ]r -> Session) [ sg S T ]t = sg [ S ]t \ s -> [ T s ]t | op (S : Session) [ op S ]t = [ S ]t

The type Session is defined mutually with the interpretation [-]t which tells you of what the traffic for a session must consist. You can see that it’s a big dependent record type, with nothing higher-order in it, and that fits with our expectation that what goes over wires are finite sequences of bits.

Now we know what the types are, what are the processes which communicate? I gave them a *weakest precondition* semantics, where G and H, below are postconditions on the session traffic.

Play (! R) G = (r : [ R ]r) * G r -- choose a happy record Play (sg S T) G = Play S \ s -> Play (T s) \ t -> G (s , t) Play (op S) G = Oppo S G Oppo (! R) H = (r : [ R ]r) -> H r -- put up with whatever record Oppo (sg S T) H = Oppo S \ s -> Oppo (T s) \ t -> H (s , t) Oppo (op S) H = Play S H

The trouble is that each of Player and Opponent are described by a right-nested sequence of Π- and Σ-types, quantifying over signals. The nesting structure of the session type gets flattened, and you pretty much have to work in continuation-passing style. I did just about manage to program a small example with it, but I wasn’t all that happy with how it worked out.

So I had a nice chat with Simon Gay and the ABCD gang at Glasgow University, and then I had another think. What I want is a more straightforward notion of participant.

Role ::= play | oppo swap (r : Role) : Role swap play = oppo swap oppo = play Party (r : Role)(S : Session) : Set

but when we try to define that, we hit this problem

Party r (sg S T) = (s : Party r S) * Party r (T (? : [ S ]t))

That is, we have one of the parties to S, but to see how to continue, we need the whole of the traffic generated when that party interacts.

So let’s get it. Let’s define

Party (r : Role)(S : Session) : Set Reply (r : Role)(S : Session)(p : Party r S) : Record traff (r : Role)(S : Session)(p : Party r S)(y : [ Reply r S p ]r) : [ [ S ]t ]r

which may remind some of you of an indexed container, but I digress. The plan is that parties are higher-order, reflecting the way behaviour is a function of input. But the replies are first-order. A reply is exactly the fragment of the traffic which is not chosen by the party, so we can construct the whole of the session traffic given one party and the reply to it.

So here goes. Base case:

Party play (! R) = [ R ]r Party oppo (! R) = One Reply play (! R) x = ! one Reply oppo (! R) _ = R traff play (! R) x _ = x traff oppo (! X) _ x = x

And now we’re good for

Party r (sg S T) = (s : Party r S) * (s' : [ Reply r S s ]r) -> Party r (T (traff r S s s')) Reply r (sg S T) (s , k) = sg (Reply r S s) \ s' -> Reply r (T (traff r S s s')) (k s) traff r (sg S T) (s , k) (s' , t') = let ss' = traff r S s s' in ss' , traff r (T ss') (k s') t'

Changing direction is no big deal.

Party r (op T) = Party (swap r) T Reply r (op T) t = Reply (swap r) T t traff r (op T) t t' = traff (swap r) T t t'

But now we have a peculiar pickle. If we have both parties, we should be able to figure out both replies, and then we’ll have *two* versions of the traffic, and they’d better agree. We need a traffic equaliser.

talk (S : Session)(p : Party play S)(o : Party oppo S) : (p' : [ Reply play S p ]r) * (o' : [ Reply oppo S o ]r) * traff play S p p' == traff oppo S o o' talk (! X) x _ = _ , x , refl talk (sg S T) (ps , pk) (os , ok) = let ps' , os' , qs = talk S ps os pt' , ot' , qt = talk (T (traff play S ps ps')) (pk ps') (ok os') in (ps' , pt') , (os' , ot') , (qs , qt) talk (op T) p o = let o' , p' , q = talk T o p in p' , o' , sym q

I’ve lied, of course. That doesn’t quite typecheck.

ok os' : Party oppo (T (traff oppo S os os'))

We need to use qs to fix it up. We can do that by carefully abstracting both versions of the traffic.

talk (S : Session)(p : Party play S)(o : Party oppo S) : (p' : [ Reply play S p ]r) * (o' : [ Reply oppo S o ]r) * traff play S p p' == traff oppo S o o' talk (! X) x _ = _ , x , refl talk (sg S T) (ps , pk) (os , ok) = let ps' , os' , qs = talk S ps os talk' (pss : [ S ]t)(oss : [ S ]t)(qs : pss == oss) (pt : Party play (T pss))(ot : Party oppo (T oss)) : (pt' : [ Reply play (T pss) pt ]r) * (ot' : [ Replay oppo (T oss) ot ]r) * traff play (T pss) pt pt' == traff oppo (T oss) ot ot' talk' ss ss refl pt ot = talk (T ss) pt ot pt' , ot' , qt = talk' (traff play S ps ps') (traff oppo S os os') qs (pk ps') (ok os') in (ps' , pt') , (os' , ot') , (qs , qt) talk (op T) p o = let o' , p' , q = talk T o p in p' , o' , sym q

(The need to build that particular helper function has me wondering if we could come up with a better notation for programs which build tuples a bit at a time.)

So we end up with a notion of parties to sessions which chop up more compositionally in accordance with the structure of the sessions themselves. It’s still a two-party session story with quite a sequential flavour.

## universe hierarchies

Back in 2011, at the Hotel Erica, Berg en Dal, I gave this talk about cumulative hierarchies of universes in type theory. It was a first attempt at simplifying their treatment. But what’s it all about, anyway?

In many ways, cumulative hierarchies of universes are the motivation for type theories in the first place. Bertrand Russell, having delivered the troublesome ‘set of all sets which don’t contain themselves’ paradox, sought to fix the problem by classifying sets into layers. Type 0 sets are boring ordinary sets which are defined without quantifying over any universal set. Type 1 sets include all the type 0 sets, but also the set of all type 0 sets and those which quantify over it. Type 2 sets yadayada. The paradox fails because the problematic set cannot belong to the type of sets that it talks about.

Per Martin-Löf’s 1971 type theory had one sort, Type, and took Type : Type. Jean-Yves Girard showed this theory inconsistent by translating a non-normalising term from his own inconsistent System U, which had two layers, both impredicative (admitting universal quantification over anything). Various systems of layering have been introduced to fix this problem, juggling power, flexibility and complexity. The basis for what we tend to use today is Zhaohui Luo’s Extended Calculus of Constructions, which has one impredicative layer, Prop, at the bottom (but can it really be the bottom if it’s impredicative?) and a sequence of predicative layers, Type_0, Type_1, and so ad infinitum. The rules make

Prop : Type_0 : Type_1 : ...

but also

Prop ≤ Type_0 ≤ Type_1 : ...

The impredicativity of Prop and predicativity of Type_n is clear from the rules for forming function types

G, x : A |- P : Prop G |- A : Type_n G, x : A |- Type_n ---------------------------- ------------------------------------------ G |- (x : A) -> P : Prop G |- (x : A) -> B : Type_n

So you can form propositions by quantifying over anything, allowing second-order definitions of propositional connectives, e.g. the following are in Prop if P and Q are

P /\ Q = (R : Prop) -> (P -> Q -> R) -> R P \/ Q = (R : Prop) -> (P -> R) -> (Q -> R) -> R

but the domain and range of a function type must exist at the same level as the function type itself.

The ≤ relation is a subtyping relation, known as ‘cumulativity’ and it’s used to extend the conversion rule with a kind of subsumption.

G |- s : S G |- T : Type_n S ≤ T ----------------------------------------------- G |- s : T

Cumulativity is thus a preorder which includes conversion and level inclusion. Luo also closes cumulativity equi-co-variantly for functions.

S = S' T ≤ T' ----------------------------------- (x : S) -> T ≤ (x : S') -> T'

He does this not because anything goes horribly wrong if you treat function types as contravariant in their domain, but to ensure that ≤ can be modelled by inclusion in a set-theoretic semantics. If you see functions as relations — subsets of input-output pairs — then growing the notional codomain keeps functions the same, but shrinking the domain requires throwing pairs out of the function as inputs cease to be available.

Formally, universe levels allow us to police paradox, but informally, if we have to write universe levels all over the place, that’s a nuisance. Many systems (notably not Agda) allow us to write Type (or Set, or whatever it’s called) and hope. There are lots of ways to interpret such a thing. Bob Harper and Randy Pollack allow “anonymous universes”, where we read each usage of Type as meaning Type_n for some n which we couldn’t be bothered to write: it is not difficult to collect all the constraints on such n’s and ensure a solution exists. It’s slightly unpleasant to have programs which are piecewise accepted although collectively damned, but at least it’s clear what’s going on. And why it’s not enough. Bob and Randy note that we might write

id {X : Type} (x : X) : X id x = x

(where the curly braces indicate an argument to be kept hidden) but if we treat that Type as fixed for all uses of the definition, then

id id

is an error. The type (X : Type_n) -> X -> X lives at Type_(n+1) and above and so cannot instantiate X : Type_n. The trouble is that we have said *some* layer has a polymorphic identity function, when we meant to say that *every* layer has a polymorphic identity function. The fix is to treat each usage of a definition as if its anonymous Type occurrences are fresh: the two ids get different Types, and we learn that the first must be bigger than the second.

We get the same situation with datatypes. I can write

data List (X : Type) : Type := nil | cons (x : X) (xs : List X)

Let’s say that X : Type_m and List X : Type_l. Because List X can store elements of X, we need m ≤ l. If I write cons Nat nil, where Nat : Type_n, then we have Type_n instantiating X, so n < m ≤ l, which might be ok. But if I write cons (List Nat) nil, the game is up: I now have l < m ≤ l. Just as every level is closed under the formation of function types, so I want every level to be closed under the formation of list types. Ye olde Lego system treated datatypes as *declared* rather than defined, and thus did not extend anonymous Type generativity to them. These days, Coq does something sensible (i.e., generativity in more places) to allow cons (List Nat) nil, but I’m not entirely sure what: I think this is the relevant document, and it seems to suggest that inductive types are treated as a form of definition.

Agda does something entirely different. Ulf Norell once told me that he’d tried implementing the full Harper-Pollack story, but that it had proven computationally too expensive. (Matthieu Sozeau has implemented Harper-Pollack in a version of Coq: I get the impression from him that it is also quite compute-intensive, but that these days the engines can take it.) Agda removes cumulativity in favour of explicit universe polymorphism. Lack of cumulativity means that Set0 : Set1, but Set0 /: Set2. The level of a function type is the maximum of its domain and codomain levels, so Set0 -> Set0 : Set1, Set1 -> Set0 : Set2, etc. The identity function is

id : {l : Level}{X : Set l} -> X -> X id x = x

and you get anonymous Type generativity by explicitly quantifying over the level variables involved. Lack of cumulativity is a nuisance: if you are working polymorphically over some X : Set0, you do not get X : Set1 — instead, you must wrap X in gratuitous record type. Levels are just that bit more explicit, and uses of ‘max’ are hard to avoid.

What would I do? I’d think about the Central Bank of Ireland, a magnificent work of performance architecture by the late Sam Stephenson. The building was constructed around a pair of reinforced concrete lift shafts. Working at ground level, they built one floor of the building, and then they winched it up to the top of the lift shafts and pinned it into place. The building appeared to be growing downward from the Dublin sky. At each step, they did the construction at the lowest level possible, then shifted it to where they needed it.

What do we need? For starters, we need some notion of level, equipped with a relation < so we can say

i < j -------------------- Type_j :> Type_i

If < is well founded (i.e., call a level ‘accessible’, inductively, if every level below it is accessible; the hierarchy is well founded if all levels are accessible), we should be able to build a model. Working bidirectionally, we should just be able to close levels under constructions like function type. We need antisymmetry: i < j contradicts j < i. We do not need distinct levels to be comparable with <

Type_i :> S x : S |- Type_i :> T ---------------------------------------- Type_i :> (x : S) -> T

We then need a partial order from which we generate the cumulativity relation (for my taste, treating function types contravariantly in the input and covariantly in the output). To be compatible, < must imply ≤, but it need not be the case that i ≤ j for distinct i and j implies i < j. A construction which relies on i ≤ j but not i < j makes use of cumulativity but not membership, and can thus be remapped to a setting where i and j merge or to a setting where they separate.

A specific use case might be in the formalization of category theory, where we might want to say that objects tend to be at least as large as arrows without enforcing whether or not that inequality is strict. Types-and-functions has large objects and small arrows. Functors-and-natural-transformations has large objects and large arrows. (I'm grateful to Paolo Capriotti for that example.)

I'm very happy to be explicit about whether things are ‘large’ or ‘small’, provided I can shift constructions to other settings which respect the constraints upon which I rely. That's to say, I would prefer not to have anonymous Type. I'd like to be locally explicit about universe levels, so that the level constraints the typechecker encounters contain no unknown quantities. But then I'd like to winch constructions into any level system which is compatible with the one they come from. The simplest version of this story just amounts to marking each usage of a definition with its remapping; we engineer the syntax of remappings so that trivial embeddings are marked by the absence of a more explicit shift. However, we might be able to do better than that.

For one thing, there will be situations when the typechecking discipline makes it obvious which shift is happening. Just as the level at which you check a function type is the level at which you check its domain and codomain, so we might like to be clear that the level at which we check List X is the level at which we check X. We should be able to define things which play the role of type constructors and are subject to type checking, rather than type synthesis. We may thus specify how the levels used in the arguments and in the definiens relate to the level at which the type is being checked. Partial applications of these type constructors typecheck exactly when their eta-long forms do. So, when we write

data List (X : Type) : Type ::= nil | cons (x : X) (xs : List X)

we’re really treating Type as a variable standing for the universe at which we’re checking list type formation. Meanwhile, if our local hierarchy has a successor structure, we might write.

data Desc : Suc Type ::= pi (S : Type) (T : S -> Desc) | ...

to indicate that such descriptions of types always form a relatively large type.

For another thing, there are situations in which there’s always a level that will do and it doesn’t matter what it is. Let’s revisit the identity function

id {X : Type} (x : X) : X id x = x

Whatever universe X inhabits, we can always shift id to it. Similarly, if we define composition

{A : Type} {B (a : A) : Type} {C (a : A) (b : B a) : Type} (f {a : A} (b : B a) : C a b) . (g (a : A) : B a) : (a : A) -> C a (g a) f . g = \ x -> f (g a)

then that Type could be anything. If our local universe hierarchy always has least upper bounds, the relevant A, B and C will always tell us which Type to choose. I’m stumbling towards the intuition that here, the usage sites of A, B and C see them as classifiers, rather than the classified. We never need them to be small, just large enough. One clue is that the constructions of id and . can happen in the singleton hierarchy (always assuming that their types don’t themselves need to have types).

And that brings me to another thought. In smash-the-syntactic-phase-distinction dependent type theory land, we sometimes forget to distinguish the twin roles of types: being valid classifiers and being members of some universe. When we declare a thing, we need to check that its given type is a valid classifier, and we tend not to say which level that type belongs to. We often neglect to consider systems where there are types which can classify but not be classified, so that ‘type’ means ‘type at some level’. If we’re interested in building a powerful hierarchy in which to construct everything, it makes sense to have skyhooks all the way up. But if we’re in the business of winching constructions to wherever we need them, then the weaker the system we start from, the more systems we can target.

So I’m certainly not saying I have a fully formed proposal all worked out and ready to go. What I am saying is that the current implementations of cumulative hierarchies complicate the construction of things in order to facilite a diversity of ways in which they can be used. Sam Stephenson showed us that we could build all the floors without leaving the ground. Yet again, it’s that key lesson which I learned from James McKinna: never resort to polymorphism when initiality will do.

## observational type theory (delivery)

There is quite some diversity in views about the purposes that types serve. At some level, they describe the layout of data in memory. At another level, they describe mathematical notions of stuff. Sometimes, they describe a representation of data (which might serve multiple purposes) and document the particular purpose for which they are intended. E.g., Haskell’s (Maybe x) type extends x with an extra value whose purpose is to indicate failure to compute an x; to extend some semigroup x with a neutral element thus forming a monoid is a different purpose, and thus it is represented in Haskell by a different type. The essential tension is that it is both valuable to be able to work up to isomorphism, but at the same time to make administrative distinctions between isomorphic types in order to make additional administrative distinctions or give a key to particular associated structure. Is the purpose of types to determine structure (up to structure preserving maps) or to give a concrete taxonomy of structure?

Homotopy Type Theory takes the former point of view, but Observational Type Theory takes the latter. HoTT equality characterises the structure preserving maps which translate between views of the same structure, requiring nontrivial computation, even when the underlying data have the same representation: as Augustus de Morgan would have told you, negation is a structure preserving map from the Booleans to the Booleans. When OTT makes two types equal, their closed values are already in some “run-time” sense compatible: there is no information required to explain how to transport values between equal types because equality enforces coincidence of type *description*.

The essence of observational type theory is to define compatibility of types, and the associated notion of coercion.

(S : Type) (T : Type) : Prop coe (S : Type) (T : Type) [S <-> T] (s : S) : T

There’s more to it than that, of course, but that’s the intersection between the ‘theorem-proving’ world of equality and the ‘programming’ world of types. The OTT interpretation of types makes proofs *irrelevant* in a strong and lazy sense. In a moment, I’ll tell you what this Prop business and these square brackets are about. But let me at least document the intention that at run-time, when computing with closed values only, provably equal types should be sufficiently compatible that every coercion can just be treated as the identity. (HoTT equality sometimes contains genuinely useful information about how to compute structure preserving maps, and thus cannot be so readily disposed of. That’s not to say we could never detect and exploit dullness of coercion in HoTT. In our haste towards the univalent future, I am anxious not to give any of the ground that I have gained.)

What is Prop? Is it the subuniverse of Type consisting of what HoTT people call “mere propositions”? No. It’s a strictly (or rather, lazily) smaller subuniverse than that. **You are never permitted to ask if an inhabitant of a Prop is canonical.** That’s how we get to treat all open values in Props as judgmentally equal. We don’t just declare that proofs are irrelevant: we make damn sure that they actually are. It is never necessary to normalize proofs. Think about how good old eqJ worked in the previous post: operationally, it fires only if its proof argument normalizes to refl, and that’s too strict for the new regime. (One could (and once upon a time, I did) make a proof-irrelevant but intensional version of eqJ which tests whether the proposed equation holds judgmentally, rather than whether its proof is specifically refl. But the whole point is to escape the judgmental equality.)

Which things are Props? Fundamentally, these things

Type :> S x : S |- Prop :> P ---------------- --------------- ------------------------------------ Prop :> Zero Prop :> One Prop :> (x : S) -> P

We can, of course, construct a binary product,

P * Q = (x : Two) -> if x then P else Q

and pragmatically, it might be nice to make it primitive and use tuples rather than functions. But that’s irrelevant. We have 0 and 1 closed under arbitrary products, so we’re never going to scale the dizzy heights of having a bit. And we’re never going to have any operations which compute strictly. We do, however, have absurdity elimination

naughtE [Zero] : T

which strictly *refuses* to compute. Proofs thus play the role of το δαιμωνο&endsigma; of Socrates, the ‘divine sign’ which never told him what to do, but intervened to advise him against action he might otherwise have undertaken (notably, and ineffectually, the execution of the ten generals after the Battle of Arginusae, a tragic tale of tabloidesque kneejerkery which rings painfully true today, but I digress).

What about these square brackets? You’ll note that it is my habit to name value arguments in round brackets. Well, in the declarations of functions, I prefer to write propositions in square brackets without naming the proofs, and I prefer not to give those proofs explicitly. I feel entitled to let proofs of One remain trivial and proofs of P * Q be found componentwise (another reason for making conjunction primitive). I expect a machine to discover the correct way to instantiate universal quantifiers with computationally inert terms by matching propositions by, then matching variables with other inert things. (It’s not reasonable to guess concrete values for variables, as that requires running computation backwards: in the inert case, the computation didn’t run forwards in the first place.) In effect, I expect *elaboration* to do simple finitary proof search.

Now, what is this <-> relation? Let’s figure it out a bit at a time. We’ll see what we need it to be when we try to implement coe.

((x : S) -> T) <-> ((x' : S') -> T') == ? coe ((x : S) -> T)) ((x' : S') -> T') f = \ x' -> ?

We’re going to need to ship x’ from S’ to S, apply f, then ship the result back to T’. That’s a clue.

((x : S) -> T) <-> ((x' : S') -> T') == (S' <-> S) * ?T coe ((x : S) -> T)) ((x' : S') -> T') f = \ x' -> let x = coe S' S x' in coe T T' (f x)

However, we need to make sure that ?T justifies the second coercion. That is, we need to establish that x : S and x’ : S’ are basically the same value, even though they have different types. It is not inconsiderably unexpected, and yet entirely consistent with our prospectus, that we did not need a notion of value equality at all until the point at which it became clear that we needed a *heterogen eous* notion of value equality. (Spelling note: ‘heterogenous’ is the straight special case of ‘erogenous’.) We shall need

(a : A) == (b : B) : Prop [coe A B a == a] -- it doesn't matter what the proof of this is

and then we can take

((x : S) -> T) <-> ((x' : S') -> T') = (S' <-> S) * (x : S) -> (x' : S') -> [x == x'] -> (T <-> T') coe ((x : S) -> T)) ((x' : S') -> T') f = \ x' -> let x = coe S' S x' in coe T T' (f x)

Of course, we must now define a == b. We have some choices. Does it mean

a and b have equal types and equal values

? That’s what I said in my thesis. And that’s what you get if you define heterogeneous equality ‘inductively’. And it does hang together on that basis. But I’ve changed my mind about equality (see what I did there?). I now think a == b means

as soon as you learn that a and b have equal types, you will know they have equal values

That means we don’t have to make == pack up the evidence for <->.

There is a third alternative. Take

[A <-> B] (a : A) == (b : B): Prop

That is, we could make the formation of a heterogeneous equation require some a priori explanation of value compatibility. That’s the dependent version of the second, conditional interpretation given above. And it’s the only thing to do for HoTT, where the ‘basis for comparison’ itself contains nontrivial information about how to bring two values into a comparable representation. Why not do this for OTT? Proof-irrelevance. What? **Proof-irrelevance allows non-dependent constructions the same effectiveness as their dependent counterparts.** It’s enough to be logically consistent. Who cares which proof justifies what as long as there are enough of them about? But it’s good to be aware that the proof of a == b will never do you any good unless you know **that** A <-> B; HoTT strengthens that “that” to a **why** and necessitates the bookkeeping of the evidence.

(*Historical note about my 1998 implementation of dependent pattern matching.* Back then, I used a homogeneous intensional equality, equipped with both J and K rules. Heterogeneous equality was implemented as homogeneous equality after coercion. The unification step for reflexive equations required K, as the dependent equations needed those proofs to be refl. I demonstrated it at the TYPES98 meeting, organised by Thorsten and friends at Kloster Irsee (clear evidence that Thorsten can organise a pissup in a brewery — I was there and it was good), with all our favourite vector-zip examples working…eventually. I never wrote it up. When I started writing it up, I found myself trying to justify why comparability should be an accident of judgmental equality in the first place, and I just couldn’t. My conscience compelled me to invent John Major (recently defeated) equality, which allowed me to give a much simpler non-proof-dependent presentation of dependent pattern matching. Note to self: not all of the acetates for that talk were cut up to glaze a tomato-growing indoor greenhouse; maybe scan the survivors (the graphical shtick was based on the Saul Bass credits for Otto Preminger’s *Anatomy of a Murder* (James Stewart, Lee Remick, Ben Gazzara); Bass’s cover art for the Duke Ellington soundtrack LP of that movie has influenced my walls ever since)?)

So, going with the easy option, it’s

(f : (x : S) -> T) == (f' : (x' : S') -> T') = (x : S) -> (x' : S') -> [x == x'] -> f x == f' x'

That is, functions are equal if they yield equal outputs from equal inputs. Extensionality!

And you’ll notice that, as long as we’re shipping functions from one function type to another, it’s obvious how to get going with a λ. The moment it’s clear from the types that we’re in trouble, we arrange to give up:

((x : S) -> T) <-> Type = Zero

etc.

We do, of course, take

(A : Type) == (B : Type) = (A <-> B)

but we’re quite free to allow

(P : Prop) == (Q : Prop) = ([P] -> Q) * ([Q] -> P)

as the (nonexistent) run-time representations of P-proofs and Q-proofs are trivially compatible.

In order to explain the ‘run-time’ compatibility intuition behind observational equality, it may be necessary to make the what-time distinction clearer, e.g., via world systems. Have I done these things in the wrong order? Is that the problem?

Of course, functions (with only non-dependent elimination, but hence extensionality) and types (with no typecase) do not put any grit in the oyster. Throw in the Booleans.

Two <-> Two = One coe Two Two b = b (tt : Two) (tt : Two) = One (tt : Two) (ff : Two) = Zero (ff : Two) (tt : Two) = Zero (ff : Two) (ff : Two) = One

But that’s not awfully exciting. Throw in the naturals.

Nat <-> Nat = One coe Nat Nat n = n (zero : Nat) <-> (zero : Nat) = One (zero : Nat) <-> (suc _ : Nat) = Zero (suc _ : Nat) <-> (zero : Nat) = Zero (suc m : Nat) <-> (suc n : Nat) = (m : Nat) == (n : Nat)

The Leibniz property of equality that you know and love (if x equals y, replacing x by y preserves constructability) is a matter of function extensionality. If P : X -> Type, then equal elements x and x’ of X give us equal types P x and P x’, hence transportability between the two.

But where are the dependent datatypes? Well, there’s a problem if you define overly intensional predicates like

data Favourite : (Nat -> Nat) -> Set where fave : Favourite (const 42)

in that we might define

deep : Nat -> Nat deep zero = 42 deep (suc n) = deep n

so that deep is not judgmentally (const 42), although the two can be proven to agree on all inputs, by induction. There is a canonical value of type Favourite (const 42) but not one of type Favourite deep, so there is no way that an extensional theory can have the canonicity property.

The fix, of course, is to do what the GADT people did, and define inductive families explicitly upto propositional equality. And then of course you can transport them, by transisitivity.

But it’s tomorrow, already, and this was supposed to be today’s blogpost. It’s so over.

## observational type theory (the motivation)

The history of typed programming is the history of the struggle for equality. Raising the topic of equality is the single most effective way to start a fight in a room full of type theorists, and you can do it just by casually using the verb “is” — someone will ask you what “is” is, and then the air will fill with custard pies.

Never trust a type theorist who has not changed their mind about equality.

Let’s muck in. As soon as you have any notion of *computation* in types, you acquire the nontrivial problem of determining when two types are sufficiently alike that a value of one passes as a value of the other. Details vary, but most type theories ensure that a rule like the following is at the very least admissible

G |- s : S G |- S = T : Type ------------------------------------ G |- s : T

where the notion of equality is presented as a judgment. The circumstances under which that judgment holds vary wildly. Different people approach the problem with different priorities and different prejudices. I am no exception.

When you arrive as a foreigner to someone’s local variety of type theory, a key thing to say is “I want to know what the Language Of Verifiable Evidence is. I want you to show me.”. Given a judgment, what is the decidably checkable document that can certify it? I need to know the answer to this question, because I don’t want to have to trust you. Given the set of judgments J, there should be a set E of evidence and I should be able to implement my own total function check : J -> E -> 2 such that

- if check j e = tt, then j holds
- if j holds, then for some e, check j e = tt

Now, there is one obvious candidate for E — the set of derivations for judgments. In this case, check j e need merely confirm that each step of e is valid and that its conclusion is j. However, derivations can be large, especially when they explain every step of computation that has been used to identify types. Systems (e.g., NuPRL) which use derivations as evidence tend to work in an LCF style, with an abstract type of derived judgments, so that E is never reified as a type of concrete objects: it’s emergent from the execution traces of an ML program. There is no “check”: derivations are presented as arbitrary ML programs which can do many worse things than loop; to see if they’re fit for purpose, you need to evaluate them and see if the judgment derived is the one that you wanted.

I belong to the religion of the other extreme, which holds that the only acceptable candidate for E is 1. That is, I think judgments should be *decidable* without the need for further certification. In some sense, the division of labour between humans and computers is that humans propose judgments for computers to decide. That’s to say I prefer a system with propositions-as-types and proofs-as-programs. If there’s some certifying information required, I want it in the program, not in the derivation of the programs well-typedness.

A digression from this overlong introduction. Different people use the term ‘core’ or ‘kernel’ theory meaning different things, often with separately valid motivations. I expect the “program”, described above, to be an information-rich document, internal to the implementation of the system. I expect this program to be generated by *elaboration* from user-supplied source code which is rather more terse, e.g., by methods of constraint-propagation in the tradition of Milner. I also expect this program to contain far more information than is needed for its closed run-time execution, necessitating an *extraction* process, in the tradition of Coq. Moreover, I expect the information added by elaboration to differ significantly from the information removed by extraction: if we are using types effectively, they should drive the elaboration process to deliver details for the extracted code with only a little explicit nudging from us. I *am* willing to contemplate type systems in which the programs are the post-extraction ‘realisers’ and the evidence which justifies their safety sits in the derivations, but I want to be sure I can generate those derivations mechanically from the well typed terms of a richer language.

But anyway, different approaches to the role and status of the judgments motivate different approaches to equality. If we take a propositions-as-judgments approach, we might incline to want the equality judgment to capture a more ‘mathematical’ equality. If we want judgments to be decidable and think of propositions as types, then it is inevitable that the equality judgment will be both

- a
**disappointment**, in that to remain decidable, it must fail to recognize some equations, and - an
**accident**, in that it relies on whatever varieties of computational jiggery-pokery have been implemented.

I’m ok with that because I know computers can’t do everything, but I still like them to work as hard for me as I can think to make them.

Given these differing views of the equality judgment, what happens to the equality proposition, by which I mean the equality type? I’ll be clear to write the equality type as ==. If you think of evidence as living in derivations, you can have

G |- a = b : T G |- q : a == b ---------------------- ------------------- G |- refl : a == b G |- a = b : T

meaning that any algorithm seeking to guess a derivation for an equality judgment must invent a term q from nothing — but of course, things don’t work that way and they’re not meant to. (Of course, refl is usually given just as the proof of a == a, but I fused in an appeal to the conversion rule in order to show the symmetry between introduction and elimination.) This is the approach taken in “Extensional” Type Theory, which is so-called because it allows the derivation of extensional equality between functions — functions are equal if they provably agree on all inputs — but it is not the only way to build a type theory with extensional propositional equality, so as a naming, it doesn’t really characterize the key design choice. The rule on the right is often called the *equality reflection* rule, and that’s the key characteristic.

The other popular approach, taken by “Intensional” Type Theory, makes == inductively generated by refl, which forces == to characterize whatever disappointing accident happens to be implemented by =. The elimination rule for == and its computational behaviour compound this reliance on whatever it is the equality judgment will accept.

eqJ (A : Type) (a : A) (b : A) (q : a == b) (P (c : A) (q : a == c) : Type) (p : P a refl) : P b q eqJ _ _ _ refl _ p = p

The point is that the output, p, if it ever shows up, will be just exactly the input p, without adjustment. For that to be type-preserving, we must ensure that p : P b q, and hence that P a refl = P b q. Fortunately, matching q with refl is a sufficient condition to make q = refl and b = a. (With equality reflection, the existence of q is enough to make the equation typecheck, which means that any old rubbish typechecks when you hypothesize the proof of a false equation, which means that it is not safe to evaluate open terms.) However, the no-adjustment method of computing with equality proofs forces us to rely on the fact that == ultimately boils down to =. Closed values cannot be made equal unless they are judgmentally so. We have closed equality reflection as an admissible rule.

. |- q : a == b ------------------- . |- a = b : T

By canonicity (the fact that closed expressions compute to canonical values), q must compute to refl. By subject reduction, refl typechecks at type a == b. By inversion, a = b must hold. As we cannot decide the extensional equality of functions, to keep = decidable, we cannot make == extensional. Functions are identified up to their *implementations*, not their observable behaviour.

Or, put a different way, if we want = to be decidable and == to be extensional, then we must give up on the no-adjustment method of computing with equality proofs, and consider instead a mode of computation which transports values between provably equal types even when they are not necessarily equal judgmentally.

In Observational Type Theory, our mission was to introduce just such a mode of computation with proofs of ==, and then to liberalise == with function extensionality, without losing canonicity. We succeeded. I think it should be possible to achieve the same success for Homotopy Type Theory, with its many rich sources of extrajudicial equality. More about OTT tomorrow. I’m currently visiting Nottingham, which had OTT in it back in the day, so it’s on my mind.

## linear dependent types

Just as the Deutsche Demokratische Republik was named after exactly what it was not, so there are a number of papers whose titles emphasize the combination of linear types and dependent types which are actually about keeping the two very carefully separated, in spite of their presence within the same system. Let me be at pains to approve of the motivations behind that work and the systems developed therein, which are entirely fit for their intended purposes. But my purposes are different.

In particular, I wish to develop a linear dependent type system in which the use-exactly-once policy is guaranteed for the run-time behaviour, but the types are still free to mention linear values arbitrarily. Types aren’t around at run-time. Usage in types doesn’t count as *consumption*: rather, it is *contemplation*. I can contemplate the pint of Brixton Porter I enjoyed last night and tell you how much I enjoyed that pint, even though I cannot drink that pint again. This is nothing new: researchers in program logics have traded in named quantities which are not the current values of actual program variables for decades. Let’s get us some of that action. And as Neel wisely suggested, the key is to use monoidally structured worlds. I doubt Dominic Orchard, Tomas Petricek and Alan Mycroft will be fainting with amazement, either. Nor will Marco Gaboardi.

Where I do need to be a bit of a religious dependent type theorist is in never divvying up the variables in the context: you just can’t presume to do that sort of thing when the types of more local variables depend on more global variables. However, what you’re (I hope I don’t assume too much) used to calling a context, I now call a *pre*-context.

G ::= . | G, x : T

Now, each precontext G has an associated set of G-contexts, annotating each variable with some resource w (because a resource is sort of a world).

. is a .-context W, wx : T is a (G, x : T)-context if W is a G context

Now, w inhabits the set {0, 1, *} where * means “lots”, or “hrair” if you speak rabbit (for once the reference is to *Richard* Adams), and they’re monoidal with rabbit addition.

+ | 0 1 * -- ------- 0 | 0 1 * 1 | 1 * * * | * * *

Pre-contexts G tell you which things may be contemplated. G-contexts tell you how many of each G-thing is available to be consumed. Note that G-contexts are monoidal pointwise, and we may write 0G for the neutral element.

Typing judgments are also marked with a resource, saying how many of the terms we’re consuming.

Here’s pre-context validity. I write :> and <: respectively for reverse and

forward membership epsilons.

0G |- 0 Type :> S -------------- --------------------- . |- valid G, x : S |- valid

Any resource marking of a valid pre-context G yields a valid G-context.

Now, guess what! There’s a set of quantifiers, which today I’ll write as funny infix arrows -q

**-x**intersection, or ‘for an unknown…’**-o**lollipop, or ‘given just one…’**->**dependent function (a.k.a. Π), or ‘given plenty…’

And, following my modus operandi from *worlds*, we have an action which I’ll write prefix -qw

-qw | 0 1 * ---- ------- -x | 0 0 0 -o | 0 1 * -> | 0 * *

which tells you how many copies of the inputs you need to make however many of the outputs you

want. I know, it looks a bit like multiplication.

Now we can give the (deliberately 1971-inconsistent, to be repaired in some other post) rules. As ever, I work bidirectionally. Today, for convenience, I don’t bother with beta-redexes and presume substitution acts hereditarily. S <= T is a placeholder for the notion of subtyping induced by cumulativity, but for now it should just amount to beta-eta-equality.

G, x : S, G' |- valid W |- w f <: (x : S) -q T W' |- -qw S :> s ------------------------------- -------------------------------------------------- 0G, wx : S, 0G' |- w x <: S W + W' |- w f s <: t W, -qwx : S |- w T :> t W |- w e <: S S <= T ------------------------------------ ---------------------------- W |- w (x : S) -q T :> \ x -> t W |- w T :> s G |- valid 0G, 0x : S |- 0 Type :> T ------------------------ -------------------------------- 0G |- 0 Type :> Type 0G |- 0 Type :> (x : S) -q T

Key observations:

- Type construction and related contemplative acts require no run-time resource.
- The only two-premise rule requires you to split your resources two ways, but all the variables remain in the context, hence available for contemplation (which costs nothing).
- If W is a G-context and W |- w T :> t, then 0G |- 0 T :> t, and similarly with e <: S.

Think of 0 as the ethereal world where the types dwell in eternity. All constructions, however constrained on this earth, have intuitionistic λ-calculus souls which dwell amongst the types, where the police dogs all have rubber teeth and the jails are all made of tin, and you can bust right out again just as soon as they throw you in. And when the application rule substitutes T[s/x], it is the *soul* of s which replaces the bound x in the type T, even if s lives in world 1.

There’s plenty more work to do, of course. I need to think about datatypes, too. I’m fond of the linear paramorphism (the tool of choice for sorting algorithms)

para : (F (mu F & T) -o T) -> mu F -o T

which says that you may either leave each child untouched or turn it into a T, but not both, when you are turning a whole node into a T. What the hell does that become in the dependent case? We’ll have a dependent version of &, so we can write

indn : ((fd : F ((d : mu F) & P d)) -o P (in (F fst fd))) -> (d : mu F) -o P d

It’s ok for the type of the snd projection to depend on the fst, because even when you choose to take snd on earth, the fst that could have been still lives ethereally.

Interesting times. Oh, is that my dinner? I’d better stop contemplating this and consume that.

Edit: hopefully, repaired typing rules mangled by wordpress seeing tags where there weren’t any

## traffic-dependent session types

This lump of Agda is a talk I gave at the Scottish Programming Languages Seminar, some Wednesday last term at Heriot-Watt, in a brief oasis of research activity. I originally wrote it in my notebook coming back from Arran on the ferry, sometime in 2013 (or was it earlier?), chatting with Peter Hancock. As for how to talk about the idea, that took a little longer. It crystallised during Phil Wadler’s lecture on session types at last summer’s Advances in Programming Languages summer school, also at Heriot-Watt. It’s often a matter of the right question, and as so often in my professional life, the right question was asked by James McKinna. James asked something like

Given that types are used to classify things, what are the things that session types classify?

If you’ve heard Phil give his Propositions-as-Sessions talk (or if you are Phil, and you’ve been in the habit of giving it), you may have noticed that at the level of the grammar in Phil’s chat (e.g., “A (x) B means send A, then behave like B”) the answer to James’s question isn’t thunderingly concrete (is it the things which do the behaving that are the things which can be sent?). I don’t want to get at Phil: I can hardly throw stones at informality in communication style. Formally (and it wasn’t hard to cajole this from Phil), the session types are classifying *channels*: “send A” means send (on my channel) the identity of another channel which will act with type A; “then behave like B” means that after we send the channel, we should act, on our own channel like type B. Although they look as if they should be symmetrical, tensor (asciified (x)) and its dual, par (asciified >8), are askew with Phil’s interpretation in a way which makes a lot of sense when you think about the communication involved.

To misquote Robert Wyatt, ‘Well I askew…’ in that, as a dependent type theorist, I’m used to dependency pushing rightward in the sigma-types and pi-types we use for dependent pairs and functions. I write :> to mean ‘admits’ (I’d use \ni in LaTeX).

A :> a B[a/x] :> b x : A |- B :> b ------------------------- ---------------------------- (x : A) * B :> a, b (x : A) -> B :> \ x -> b

To give a value in (x : A) * B, you give a value in A, then you give a value in the appropriate instance of B (as chosen by yourself). To give a value in (x : A) -> B, you wait to receive a value in A, then you reply with a value in the appropriate instance of B (as chosen by your caller). Sounds a lot like session types to me. Moreover, the skew of dependency chimes with the idea that structure of a communication depends in some way on the choices which gets made as it progresses. Are Phil’s tensor and par the degenerate non-dependent special cases of some sigma and pi? If so, that might be good, because we could recover the other two operators, + and &, in the usual (if you’re a dependently typed programmer) way, namely

A + B = (x : Two) * if x then A else B A & B = (x : Two) -> if x then A else B

That is, respectively, “I send a bit, telling you which of A or B I have chosen to follow.” and “You send a bit, telling me which of A or B you have decided that I must follow.”. Foundationally, “transmit a bit and fork on it” is clearly minimal and generic; pragmatically, the requirement to fork on data one bit at a time and at the moment of transmission is a bit steep. I might like to send, e.g., the dimensions of a bitmap and then the correctly-sized matrix of the bits. Dependent types and session types seem made for each other. I have in the past been so bold as to say

If you’re not doing session types with dependent types, then you’re doing them wrong.

and these days, I have even more of an idea what I mean by that. Session types are supposed to capture some patterns of dependency. There has to be a connection. but what?

The trouble is, if you just fiddle about on the whiteboard trying to ‘dependent-up’ the non-dependent rules, it’s not so easy. Look at the rule for forming a pair: we need to understand what B[a/x] means. The same thing will happen when we apply a function. Now we really need the answer to James’s question. When a dependent type theorist asks “what are the things that session classify?”, they mean ‘what are the a’s which gets substituted for the x in B?’.

One way to not answer this question is to decide that, because we know how to make types depend on values, we will restrict the domains of dependency to value types. In linear logic terms, that requires some sort of bangability restriction. In some dependently typed models of sessions (e.g., Edwin Brady’s), the domain of the sigma and pi operators are not session types. That’s a pragmatic way to avoid thinking, but it doesn’t get us tensor and par, and it makes the technology more complicated, not less. (People who expect dependent types to be a source of complexity have not understood them: we introduce dependent types to simplify the task of *coping* with complexity.) Let’s answer the damn question.

What does x stand for, in B? Is it…

- a
**channel**for communicating according to A? - a
**process**which communicates according to A?

or what? What should session types depend on?

It’s not clear how to substitute a channel for a variable, or how that might mean we get dependency on values when in the special cases where it’s clear values are what’s transmitted. It’s also a bit worrying to consider session types depending on the parties to the communication. The key thing is that rest of the protocol can depend on what has been transmitted so far. So it’s neither of the above. It is

- the
**traffic**of a communication according to A?

I’ve told you a bit about induction-recursion, so I can cut to the chase. The traffic of a session is exactly the record of its transmissions. That is, we need at least a universe of types which close up the type of signals over the formation of dependent pairs. But we don’t just need to know what the traffic consists of — it’s not just a dependent record type — we need to know which party is responsible for which bits of the traffic. In a two-party protocol, we just need to know when the roles swap. I presume that (Signal, [_]s) is a universe of types with transmissible values and define Sessions and their traffic as follows.

Session : Set [(S : Session)]t : Set ::= ! (X : Signal) [ ! X ]t = [ X ]s | sg (S : Session) (T : [ S ]t -> Session) [ sg S T ]t = (s : [ S ]t) * [ T s ]t | op (S : Session) [ op S ]t = [ S ]t

The session type is a record type marked up with role-swapping; the traffic consists of records, ignoring the role-swapping.

Now we need to say what it is to communicate in accordance with such a session type. I define the two roles.

Play (S : Session) (G (s : [ S ]t) : Set) : Set Oppo (S : Session) (H (s : [ S ]t) : Set) : Set

Crucially, I cut myself some slack. The above G and H are traffic-dependent postconditions which the player and opponent must achieve. My definition calculates the precondition for achieving those postconditions as a result of the communication.

Play (! X) G = (x : [ X ]s) * G x -- choose a happy signal Play (sg S T) G = Play S \ s -> Play (T s) \ t -> G (s , t) Play (op S) G = Oppo S G Oppo (! X) H = (x : [ X ]s) -> H x -- put up with whatever signal Oppo (sg S T) H = Oppo S \ s -> Oppo (T s) \ t -> H (s , t) Oppo (op S) H = Play S H

Traffic-dependent postconditions are just enough to let us say that our goal for the first component of a pair is to be able to carry on with the second (with the goal of achieving our original postcondition). I’ve had my basic training in Bundy-jumping, and I know how to get an induction to happen by generalizing a goal with “sink”-variables (G and H, respectively), whose job is to absorb the difference between the induction hypotheses and the induction conclusion.

Let there be communication and happiness!

traffic (S : Session) (p : Play S G) (o : Oppo S H) : (s : [ S ]t) * G s * H s traffic (! X) (x , g) h = x , g , h x traffic (sg S T) p o with traffic S p o ... | s , p' , o' with traffic (T s) p' o' ... | t , g , h = (s , t) , g , h traffic (op S) p o with traffic S o p ... | s , h , g = s , g , h

The file I linked at the top gives an example development of a matrix-transmitting session. There’s plenty of head scratching still to do.

If we’re interested in multi-party sessions, the idea that traffic is a record does not change, but the types of the parties do. As well as sigma for sending and pi for receiving, we need an intersection type to quantify over signals which we neither send nor receive. So we need something like worlds.

But where have we reached? We can indeed define

A (x) B = sg A \ _ -> B A + B = sg Two \ x -> if x then A else B pi A B = sg (op A) B A >8 B = pi A \ _ -> B A & B = pi Two \ x -> if x then A else B

The linearity has vanished into the characterization of processes as tree-like strategies for communication: paths in trees are linear. We’ve doubled down on Phil’s askew interpretation of the linear operators by making them dependent quantifiers.

But the processes classified by the session types are not the things which the variables stand for. This is most certainly not a linear dependent type theory. I’ve also been thinking about what that might mean, with dependency on the linearly typed things (like, how would you use types as a language of specification for linearly typed functions?). Session types may correspond to linear types, but dependent session types (dependency on traffic) are not the same thing as linear dependent types (dependency on the programs themselves). Realising that these concepts are potentially distinct is crucial to figuring out what they might entail.