Skip to content

observational type theory (the motivation)

January 6, 2015

The history of typed programming is the history of the struggle for equality. Raising the topic of equality is the single most effective way to start a fight in a room full of type theorists, and you can do it just by casually using the verb “is” — someone will ask you what “is” is, and then the air will fill with custard pies.

Never trust a type theorist who has not changed their mind about equality.

Let’s muck in. As soon as you have any notion of computation in types, you acquire the nontrivial problem of determining when two types are sufficiently alike that a value of one passes as a value of the other. Details vary, but most type theories ensure that a rule like the following is at the very least admissible

    G |- s : S     G |- S = T : Type
    G |- s : T

where the notion of equality is presented as a judgment. The circumstances under which that judgment holds vary wildly. Different people approach the problem with different priorities and different prejudices. I am no exception.

When you arrive as a foreigner to someone’s local variety of type theory, a key thing to say is “I want to know what the Language Of Verifiable Evidence is. I want you to show me.”. Given a judgment, what is the decidably checkable document that can certify it? I need to know the answer to this question, because I don’t want to have to trust you. Given the set of judgments J, there should be a set E of evidence and I should be able to implement my own total function check : J -> E -> 2 such that

  • if check j e = tt, then j holds
  • if j holds, then for some e, check j e = tt

Now, there is one obvious candidate for E — the set of derivations for judgments. In this case, check j e need merely confirm that each step of e is valid and that its conclusion is j. However, derivations can be large, especially when they explain every step of computation that has been used to identify types. Systems (e.g., NuPRL) which use derivations as evidence tend to work in an LCF style, with an abstract type of derived judgments, so that E is never reified as a type of concrete objects: it’s emergent from the execution traces of an ML program. There is no “check”: derivations are presented as arbitrary ML programs which can do many worse things than loop; to see if they’re fit for purpose, you need to evaluate them and see if the judgment derived is the one that you wanted.

I belong to the religion of the other extreme, which holds that the only acceptable candidate for E is 1. That is, I think judgments should be decidable without the need for further certification. In some sense, the division of labour between humans and computers is that humans propose judgments for computers to decide. That’s to say I prefer a system with propositions-as-types and proofs-as-programs. If there’s some certifying information required, I want it in the program, not in the derivation of the programs well-typedness.

A digression from this overlong introduction. Different people use the term ‘core’ or ‘kernel’ theory meaning different things, often with separately valid motivations. I expect the “program”, described above, to be an information-rich document, internal to the implementation of the system. I expect this program to be generated by elaboration from user-supplied source code which is rather more terse, e.g., by methods of constraint-propagation in the tradition of Milner. I also expect this program to contain far more information than is needed for its closed run-time execution, necessitating an extraction process, in the tradition of Coq. Moreover, I expect the information added by elaboration to differ significantly from the information removed by extraction: if we are using types effectively, they should drive the elaboration process to deliver details for the extracted code with only a little explicit nudging from us. I am willing to contemplate type systems in which the programs are the post-extraction ‘realisers’ and the evidence which justifies their safety sits in the derivations, but I want to be sure I can generate those derivations mechanically from the well typed terms of a richer language.

But anyway, different approaches to the role and status of the judgments motivate different approaches to equality. If we take a propositions-as-judgments approach, we might incline to want the equality judgment to capture a more ‘mathematical’ equality. If we want judgments to be decidable and think of propositions as types, then it is inevitable that the equality judgment will be both

  • a disappointment, in that to remain decidable, it must fail to recognize some equations, and
  • an accident, in that it relies on whatever varieties of computational jiggery-pokery have been implemented.

I’m ok with that because I know computers can’t do everything, but I still like them to work as hard for me as I can think to make them.

Given these differing views of the equality judgment, what happens to the equality proposition, by which I mean the equality type? I’ll be clear to write the equality type as ==. If you think of evidence as living in derivations, you can have

    G |- a = b : T             G |- q : a == b
  ----------------------     -------------------
    G |- refl : a == b         G |- a = b : T

meaning that any algorithm seeking to guess a derivation for an equality judgment must invent a term q from nothing — but of course, things don’t work that way and they’re not meant to. (Of course, refl is usually given just as the proof of a == a, but I fused in an appeal to the conversion rule in order to show the symmetry between introduction and elimination.) This is the approach taken in “Extensional” Type Theory, which is so-called because it allows the derivation of extensional equality between functions — functions are equal if they provably agree on all inputs — but it is not the only way to build a type theory with extensional propositional equality, so as a naming, it doesn’t really characterize the key design choice. The rule on the right is often called the equality reflection rule, and that’s the key characteristic.

The other popular approach, taken by “Intensional” Type Theory, makes == inductively generated by refl, which forces == to characterize whatever disappointing accident happens to be implemented by =. The elimination rule for == and its computational behaviour compound this reliance on whatever it is the equality judgment will accept.

eqJ (A : Type) (a : A) (b : A) (q : a == b) (P (c : A) (q : a == c) : Type) (p : P a refl) : P b q
eqJ _ _ _ refl _ p = p

The point is that the output, p, if it ever shows up, will be just exactly the input p, without adjustment. For that to be type-preserving, we must ensure that p : P b q, and hence that P a refl = P b q. Fortunately, matching q with refl is a sufficient condition to make q = refl and b = a. (With equality reflection, the existence of q is enough to make the equation typecheck, which means that any old rubbish typechecks when you hypothesize the proof of a false equation, which means that it is not safe to evaluate open terms.) However, the no-adjustment method of computing with equality proofs forces us to rely on the fact that == ultimately boils down to =. Closed values cannot be made equal unless they are judgmentally so. We have closed equality reflection as an admissible rule.

    . |- q : a == b
    . |- a = b : T

By canonicity (the fact that closed expressions compute to canonical values), q must compute to refl. By subject reduction, refl typechecks at type a == b. By inversion, a = b must hold. As we cannot decide the extensional equality of functions, to keep = decidable, we cannot make == extensional. Functions are identified up to their implementations, not their observable behaviour.

Or, put a different way, if we want = to be decidable and == to be extensional, then we must give up on the no-adjustment method of computing with equality proofs, and consider instead a mode of computation which transports values between provably equal types even when they are not necessarily equal judgmentally.

In Observational Type Theory, our mission was to introduce just such a mode of computation with proofs of ==, and then to liberalise == with function extensionality, without losing canonicity. We succeeded. I think it should be possible to achieve the same success for Homotopy Type Theory, with its many rich sources of extrajudicial equality. More about OTT tomorrow. I’m currently visiting Nottingham, which had OTT in it back in the day, so it’s on my mind.


linear dependent types

January 5, 2015

Just as the Deutsche Demokratische Republik was named after exactly what it was not, so there are a number of papers whose titles emphasize the combination of linear types and dependent types which are actually about keeping the two very carefully separated, in spite of their presence within the same system. Let me be at pains to approve of the motivations behind that work and the systems developed therein, which are entirely fit for their intended purposes. But my purposes are different.

In particular, I wish to develop a linear dependent type system in which the use-exactly-once policy is guaranteed for the run-time behaviour, but the types are still free to mention linear values arbitrarily. Types aren’t around at run-time. Usage in types doesn’t count as consumption: rather, it is contemplation. I can contemplate the pint of Brixton Porter I enjoyed last night and tell you how much I enjoyed that pint, even though I cannot drink that pint again. This is nothing new: researchers in program logics have traded in named quantities which are not the current values of actual program variables for decades. Let’s get us some of that action. And as Neel wisely suggested, the key is to use monoidally structured worlds. I doubt Dominic Orchard, Tomas Petricek and Alan Mycroft will be fainting with amazement, either. Nor will Marco Gaboardi.

Where I do need to be a bit of a religious dependent type theorist is in never divvying up the variables in the context: you just can’t presume to do that sort of thing when the types of more local variables depend on more global variables. However, what you’re (I hope I don’t assume too much) used to calling a context, I now call a pre-context.

  G ::=  .  | G, x : T

Now, each precontext G has an associated set of G-contexts, annotating each variable with some resource w (because a resource is sort of a world).

  . is a .-context
  W, wx : T is a (G, x : T)-context if W is a G context

Now, w inhabits the set {0, 1, *} where * means “lots”, or “hrair” if you speak rabbit (for once the reference is to Richard Adams), and they’re monoidal with rabbit addition.

+ | 0 1 *
-- -------
0 | 0 1 *
1 | 1 * *
* | * * *

Pre-contexts G tell you which things may be contemplated. G-contexts tell you how many of each G-thing is available to be consumed. Note that G-contexts are monoidal pointwise, and we may write 0G for the neutral element.

Typing judgments are also marked with a resource, saying how many of the terms we’re consuming.

Here’s pre-context validity. I write :> and <: respectively for reverse and
forward membership epsilons.

                     0G |- 0 Type :> S
--------------     ---------------------
  . |- valid         G, x : S |- valid

Any resource marking of a valid pre-context G yields a valid G-context.

Now, guess what! There’s a set of quantifiers, which today I’ll write as funny infix arrows -q

  • -x intersection, or ‘for an unknown…’
  • -o lollipop, or ‘given just one…’
  • -> dependent function (a.k.a. Π), or ‘given plenty…’

And, following my modus operandi from worlds, we have an action which I’ll write prefix -qw

-qw | 0 1 *
---- -------
-x  | 0 0 0
-o  | 0 1 *
->  | 0 * *

which tells you how many copies of the inputs you need to make however many of the outputs you
want. I know, it looks a bit like multiplication.

Now we can give the (deliberately 1971-inconsistent, to be repaired in some other post) rules. As ever, I work bidirectionally. Today, for convenience, I don’t bother with beta-redexes and presume substitution acts hereditarily. S <= T is a placeholder for the notion of subtyping induced by cumulativity, but for now it should just amount to beta-eta-equality.

    G,  x : S,  G' |- valid            W |- w f <: (x : S) -q T     W' |- -qw S :> s
 -------------------------------     --------------------------------------------------
   0G, wx : S, 0G' |- w x <: S         W + W' |- w f s <: t

   W, -qwx : S |- w T :> t                  W |- w e <: S     S <= T
 ------------------------------------     ----------------------------
   W |- w (x : S) -q T :> \ x -> t          W |- w T :> s

   G |- valid                   0G, 0x : S |- 0 Type :> T
 ------------------------     --------------------------------
   0G |- 0 Type :> Type         0G |- 0 Type :> (x : S) -q T

Key observations:

  • Type construction and related contemplative acts require no run-time resource.
  • The only two-premise rule requires you to split your resources two ways, but all the variables remain in the context, hence available for contemplation (which costs nothing).
  • If W is a G-context and W |- w T :> t, then 0G |- 0 T :> t, and similarly with e <: S.

Think of 0 as the ethereal world where the types dwell in eternity. All constructions, however constrained on this earth, have intuitionistic λ-calculus souls which dwell amongst the types, where the police dogs all have rubber teeth and the jails are all made of tin, and you can bust right out again just as soon as they throw you in. And when the application rule substitutes T[s/x], it is the soul of s which replaces the bound x in the type T, even if s lives in world 1.

There’s plenty more work to do, of course. I need to think about datatypes, too. I’m fond of the linear paramorphism (the tool of choice for sorting algorithms)

  para : (F (mu F & T) -o T) -> mu F -o T

which says that you may either leave each child untouched or turn it into a T, but not both, when you are turning a whole node into a T. What the hell does that become in the dependent case? We’ll have a dependent version of &, so we can write

  indn : ((fd : F ((d : mu F) & P d)) -o P (in (F fst fd))) -> (d : mu F) -o P d

It’s ok for the type of the snd projection to depend on the fst, because even when you choose to take snd on earth, the fst that could have been still lives ethereally.

Interesting times. Oh, is that my dinner? I’d better stop contemplating this and consume that.

Edit: hopefully, repaired typing rules mangled by wordpress seeing tags where there weren’t any

traffic-dependent session types

January 4, 2015

This lump of Agda is a talk I gave at the Scottish Programming Languages Seminar, some Wednesday last term at Heriot-Watt, in a brief oasis of research activity. I originally wrote it in my notebook coming back from Arran on the ferry, sometime in 2013 (or was it earlier?), chatting with Peter Hancock. As for how to talk about the idea, that took a little longer. It crystallised during Phil Wadler’s lecture on session types at last summer’s Advances in Programming Languages summer school, also at Heriot-Watt. It’s often a matter of the right question, and as so often in my professional life, the right question was asked by James McKinna. James asked something like

Given that types are used to classify things, what are the things that session types classify?

If you’ve heard Phil give his Propositions-as-Sessions talk (or if you are Phil, and you’ve been in the habit of giving it), you may have noticed that at the level of the grammar in Phil’s chat (e.g., “A (x) B means send A, then behave like B”) the answer to James’s question isn’t thunderingly concrete (is it the things which do the behaving that are the things which can be sent?). I don’t want to get at Phil: I can hardly throw stones at informality in communication style. Formally (and it wasn’t hard to cajole this from Phil), the session types are classifying channels: “send A” means send (on my channel) the identity of another channel which will act with type A; “then behave like B” means that after we send the channel, we should act, on our own channel like type B. Although they look as if they should be symmetrical, tensor (asciified (x)) and its dual, par (asciified >8), are askew with Phil’s interpretation in a way which makes a lot of sense when you think about the communication involved.

To misquote Robert Wyatt, ‘Well I askew…’ in that, as a dependent type theorist, I’m used to dependency pushing rightward in the sigma-types and pi-types we use for dependent pairs and functions. I write :> to mean ‘admits’ (I’d use \ni in LaTeX).

    A :> a    B[a/x] :> b                 x : A  |-  B :> b
  -------------------------         ----------------------------
     (x : A) * B :> a, b              (x : A) -> B :> \ x -> b

To give a value in (x : A) * B, you give a value in A, then you give a value in the appropriate instance of B (as chosen by yourself). To give a value in (x : A) -> B, you wait to receive a value in A, then you reply with a value in the appropriate instance of B (as chosen by your caller). Sounds a lot like session types to me. Moreover, the skew of dependency chimes with the idea that structure of a communication depends in some way on the choices which gets made as it progresses. Are Phil’s tensor and par the degenerate non-dependent special cases of some sigma and pi? If so, that might be good, because we could recover the other two operators, + and &, in the usual (if you’re a dependently typed programmer) way, namely

  A + B  =  (x : Two) *   if x then A else B
  A & B  =  (x : Two) ->  if x then A else B

That is, respectively, “I send a bit, telling you which of A or B I have chosen to follow.” and “You send a bit, telling me which of A or B you have decided that I must follow.”. Foundationally, “transmit a bit and fork on it” is clearly minimal and generic; pragmatically, the requirement to fork on data one bit at a time and at the moment of transmission is a bit steep. I might like to send, e.g., the dimensions of a bitmap and then the correctly-sized matrix of the bits. Dependent types and session types seem made for each other. I have in the past been so bold as to say

If you’re not doing session types with dependent types, then you’re doing them wrong.

and these days, I have even more of an idea what I mean by that. Session types are supposed to capture some patterns of dependency. There has to be a connection. but what?

The trouble is, if you just fiddle about on the whiteboard trying to ‘dependent-up’ the non-dependent rules, it’s not so easy. Look at the rule for forming a pair: we need to understand what B[a/x] means. The same thing will happen when we apply a function. Now we really need the answer to James’s question. When a dependent type theorist asks “what are the things that session classify?”, they mean ‘what are the a’s which gets substituted for the x in B?’.

One way to not answer this question is to decide that, because we know how to make types depend on values, we will restrict the domains of dependency to value types. In linear logic terms, that requires some sort of bangability restriction. In some dependently typed models of sessions (e.g., Edwin Brady’s), the domain of the sigma and pi operators are not session types. That’s a pragmatic way to avoid thinking, but it doesn’t get us tensor and par, and it makes the technology more complicated, not less. (People who expect dependent types to be a source of complexity have not understood them: we introduce dependent types to simplify the task of coping with complexity.) Let’s answer the damn question.

What does x stand for, in B? Is it…

  • a channel for communicating according to A?
  • a process which communicates according to A?

or what? What should session types depend on?

It’s not clear how to substitute a channel for a variable, or how that might mean we get dependency on values when in the special cases where it’s clear values are what’s transmitted. It’s also a bit worrying to consider session types depending on the parties to the communication. The key thing is that rest of the protocol can depend on what has been transmitted so far. So it’s neither of the above. It is

  • the traffic of a communication according to A?

I’ve told you a bit about induction-recursion, so I can cut to the chase. The traffic of a session is exactly the record of its transmissions. That is, we need at least a universe of types which close up the type of signals over the formation of dependent pairs. But we don’t just need to know what the traffic consists of — it’s not just a dependent record type — we need to know which party is responsible for which bits of the traffic. In a two-party protocol, we just need to know when the roles swap. I presume that (Signal, [_]s) is a universe of types with transmissible values and define Sessions and their traffic as follows.

  Session : Set                                   [(S : Session)]t : Set
  ::= ! (X : Signal)                              [ ! X    ]t = [ X ]s
    | sg (S : Session) (T : [ S ]t -> Session)    [ sg S T ]t = (s : [ S ]t) * [ T s ]t
    | op (S : Session)                            [ op S   ]t = [ S ]t

The session type is a record type marked up with role-swapping; the traffic consists of records, ignoring the role-swapping.

Now we need to say what it is to communicate in accordance with such a session type. I define the two roles.

Play (S : Session) (G (s : [ S ]t) : Set) : Set
Oppo (S : Session) (H (s : [ S ]t) : Set) : Set

Crucially, I cut myself some slack. The above G and H are traffic-dependent postconditions which the player and opponent must achieve. My definition calculates the precondition for achieving those postconditions as a result of the communication.

Play (! X)    G = (x : [ X ]s) * G x  -- choose a happy signal
Play (sg S T) G = Play S \ s -> Play (T s) \ t -> G (s , t)
Play (op S)   G = Oppo S G

Oppo (! X)    H = (x : [ X ]s) -> H x  -- put up with whatever signal
Oppo (sg S T) H = Oppo S \ s -> Oppo (T s) \ t -> H (s , t)
Oppo (op S)   H = Play S H

Traffic-dependent postconditions are just enough to let us say that our goal for the first component of a pair is to be able to carry on with the second (with the goal of achieving our original postcondition). I’ve had my basic training in Bundy-jumping, and I know how to get an induction to happen by generalizing a goal with “sink”-variables (G and H, respectively), whose job is to absorb the difference between the induction hypotheses and the induction conclusion.

Let there be communication and happiness!

traffic (S : Session) (p : Play S G) (o : Oppo S H) : (s : [ S ]t) * G s * H s
traffic (! X)     (x , g) h = x , g , h x
traffic (sg S T) p       o with traffic S p o
... | s , p' , o' with traffic (T s) p' o'
... | t , g , h = (s , t) , g , h
traffic (op S) p o with traffic S o p
... | s , h , g = s , g , h

The file I linked at the top gives an example development of a matrix-transmitting session. There’s plenty of head scratching still to do.

If we’re interested in multi-party sessions, the idea that traffic is a record does not change, but the types of the parties do. As well as sigma for sending and pi for receiving, we need an intersection type to quantify over signals which we neither send nor receive. So we need something like worlds.

But where have we reached? We can indeed define

  A (x) B  =  sg A \ _ -> B
  A  +  B  =  sg Two \ x -> if x then A else B
  pi A B   =  sg (op A) B
  A  >8 B  =  pi A \ _ -> B
  A  &  B  =  pi Two \ x -> if x then A else B

The linearity has vanished into the characterization of processes as tree-like strategies for communication: paths in trees are linear. We’ve doubled down on Phil’s askew interpretation of the linear operators by making them dependent quantifiers.

But the processes classified by the session types are not the things which the variables stand for. This is most certainly not a linear dependent type theory. I’ve also been thinking about what that might mean, with dependency on the linearly typed things (like, how would you use types as a language of specification for linearly typed functions?). Session types may correspond to linear types, but dependent session types (dependency on traffic) are not the same thing as linear dependent types (dependency on the programs themselves). Realising that these concepts are potentially distinct is crucial to figuring out what they might entail.

theories of testing

January 2, 2015

Here’s a fun fact for you. Consider the type of unlabelled binary trees, with its catamorphism/iterator/fold thingy.

  data Tree = Leaf | Tree :^: Tree

  cata :: t -> (t -> t -> t) -> Tree -> t
  cata leaf node Leaf        = leaf
  cata leaf node (Node l r)  = node (cata leaf node l) (cata leaf node r)

I have in mind a particular function

  myf :: Tree -> Tree
  myf = cata myl myn where
    myl :: Tree
    myl = ... -- built with constructors
    myn :: Tree -> Tree -> Tree
    myn myf_l myf_r = ... -- built with constructors, myf_l and myf_r

and I describe to you what it does. Perhaps you think you can do the same job. You hand me yrf, which you promise you implemented in the same way.

  yrf :: Tree -> Tree
  yrf = cata yrl yrn where
    yrl :: Tree
    yrl = ... -- built with constructors
    yrn :: Tree -> Tree -> Tree
    yrn yrf_l yrf_r = ... -- built with constructors, yrf_l and yrf_r

Assuming your promise about the implementation is in good faith, I test

  and [ myf t == yrf t
      | t <- [Leaf, Node Leaf Leaf, Node (Node Leaf Leaf) Leaf, Node Leaf (Node Leaf Leaf)]

and the result is True. Congratulations! You got it right!

I don’t need to be given yrl and yrn. I just need to know that you built yrf that way. Of course, I can obtain yrl by evaluating yrf Leaf. Now that I know what yrl is, evaluating yrf (Node Leaf Leaf) gives me a bunch of candidates for yrn. I just look for occurrences of yrl as a subterm of yrf (Node Leaf Leaf), because I know that yrf_l and yrf_r will both have been yrf Leaf = yrl when yrn was called. The trouble is that I don’t know which occurrences of yrl are yrf_l, which are yrf_r, and which are just constants by an astounding coincidence. So I perturb my experiment. When I evaluate yrf (Node (Node Leaf Leaf) Leaf) and compare, I can see which yrl occurrences have become something else, and those must correspond to the only thing which can have changed, yrf_l. Similarly, yrf (Node Leaf (Node Leaf Leaf)) tells me which yrl occurrences must have been yrf_r. If you were telling the truth about the simplicity of your program’s construction, I’ve checked its coincidence with mine on all inputs just by testing it on four.

I’ve been fascinated with such results for a while now. This draft, A Polynomial Testing Principle, formalises the result that extensional equality for a language of polynomials and summation can be decided by testing just for the input 0..n, where n is an upper bound on the degree of the polynomials. Classic inductive proofs such as (Sum{i=1..n}i)^2 = Sum{i=1..n}i^3 hold just by checking we get 0,1,9,36,100. What’s fun about the proof is that it doesn’t step beyond the natural numbers. No subtractions, no fractions, I do the words and Agda does the actions. The method is to define the forward difference operator, symbolically, check that it reduces degree (which amounts merely to being well typed), then prove that it satisfies the specification

  f n + diff f n = f (suc n)

for all n, which is an easy structural induction on the code for f. Now, if we know f 0 = g 0 and diff f agrees everywhere with diff g, then f must agree with g. If f and g agree on {0..suc n}, then f 0 = g 0 and
diff f agrees with diff g on {0..n}, hence everywhere by induction.

Thinking back to the result on trees, the argument has a similar character, deducing the dependency of a function on its input by making strategic perturbations to them and computing differences. So, is there a more general theory of differential testing?

It’s funny, of course, because I’m used to having a partial evaluator handy. I can happily run programs on open terms. The essence of proof by induction is that you prove a result for all inputs by testing a cover of open constructor forms which allow just enough computation to reduce the problem for the whole inputs to the problem for their substructures. The challenge here is voluntarily to give up partial evaluation and achieve results of the same strength by ‘dark grey box’ testing. We don’t get to see the implementation, but we do get a bound on how weird it is, and from that we deduce not merely a test suite which has 100% code coverage, but which actually allows us to establish the absence of bugs. That’s got to be worth thinking about, right?


January 2, 2015

This is a topic I need to page back in, so I thought I’d pick it up today, even though my thinking on the topic is linked both with observational type theory and with local time for recursion and corecursion. Am I writing these posts in a feasible order? Let’s find out.

Firstly, what is coinduction? It’s a process for generating tree-like structures in which some paths may be infinite. And it’s named by duality to induction, which is a process for consuming tree-like structures in which all paths must be finite. What’s a good example of a coinductive structure? Not bloody streams, that’s for sure: they’re like picking natural numbers as an example of inductive datatypes — disingenuously simple in a way that must be overcome to achieve learning. So let’s not do that. I prefer the labelled, possibly infinite binary trees. The Strathclyde Haskell Enhancement (or rather, a text editor) lets me write

  codata Tree x = Tree {tree :: (x, Maybe (Tree x, Tree x))}

Exploring such a tree, you can always ask “What’s here?”, and you will be presented with some number (zero or two) possible ways to move deeper into the tree. We can build trees which always let us go on forever.

calculateWilfully :: Tree (Integer, Integer)
calculateWilfully = grow (1,1) where
  grow x@(n,d) = Tree (x, Just (grow (n, n + d), grow (n + d, d)))

We can build trees which sometimes stop.

huntDiag :: (Integer, Integer) -> Tree (Integer, Integer)
huntDiag x@(a, b) = Tree (x,
   if a == b then Nothing else Just (huntDiag (a - b, b), huntDiag (a, b - a)))

Your basic primitive construction tool for coinductive data is to unfold a coalgebra. Let’s see what that amounts to here.

  unfoldTree :: (s -> (x, Maybe (s, s))) -> s -> Tree x
  unfoldTree f s = Tree (x, case m of
      Nothing       -> Nothing
      Just (ls, rs) -> Just (unfoldTree f ls, unfoldTree f rs))
    where (x, m) = f s

The coalgebra, f, tells us how to grow one layer of structure from a seed. In the place for each subtree, we get the seed from which it should be grown. It’s hopefully easy to see that my two examples, above, are given by unfolding coalgebras whose seeds are pairs of integers — the corecursive calls happen just where the corresponding coalgebra needs to put a seed.

Note that the projection function, tree, is a coalgebra…

  unfoldTree tree :: (x, Maybe (Tree x, Tree x)) -> Tree x

…with the same type as the coconstructor, Tree. It’s not hard to see, informally, that the two will do essentially the same job, in a handwavy sort of a way. But how to be precise…?

Meanwhile, in jolly old dishonest Haskell, we can write all sorts of thrilling (in that “is this safe?” sort of way) programs which fool about with coinductive data. For this example, I need streams and trees.

  codata Stream x = x :> Stream x

Note that I reserve ‘stream’ for the definitely infinite sequences: those which may (but don’t have to) stop are ‘colists’. The example I’m thinking of is good old Gibbons-and-Jones breadth-first labelling. The task is to take a Tree () and deliver a Tree Integer, labelled with 0 at the top, 1 and 2 the next layer down, and so on. Step one, boggle at this.

  spiral :: (Stream a -> (b, Stream a)) -> a -> b
  spiral f a = b where (b, as) = f (a :> as)

I call these ‘Oxford’ programs, because that’s where a lot of them get written: they’re hilariously circular in a somewhat underexplained manner. In the case of spiral, the type falls a long way short of explaining which f’s can possibly achieve a useful outcome: they’re rather special. But now, the main deal.

  label :: Tree () -> Tree Integer
  label t = spiral (help t) 0 where
    help :: Tree () -> Stream Integer -> (Tree Integer, Stream Integer)
    help (Tree ((), m)) (i :> is0) = case m of
      Nothing -> (Tree (i, Nothing), (i + 1) :> is0)
      Just (lt, rt) ->
        let (lt', is1) = help lt is0
            (rt', is2) = help rt is1
        in  (Tree (i, Just (lt', rt')), (i + 1) :> is2)

What’s going on? The labelling task is easy if we know which number we need to kick off each layer. So just imagine we already know that: we give ourselves a stream of ‘initialisers’ — integers telling us which numbers to use down the left spine of the tree. The head initialiser labels the top node, and the tail tells us how to label any subtrees which might exist. We give back not only the tree, but also the stream of ‘leftovers’ — the successors of the labels used down the right spine of the tree, ready to be used to label the next layer down. The spiral function ties the knot, using the leftovers stream as the tail of the initialisers. The program is all about how the data are used spatially, regardless of what happens when. Haskell’s laziness ensures that if there is away to flatten the data dependency graph into a temporal order, some such order will be found, and in this case, more by luck than the typing judgment, we do indeed win. It’s not remotely obvious how to render label as an unfold, because the key information we use when growing the tree is threaded, not just distributed from the root. We don’t learn the leftover labels to kick off the right subtree ‘until’ we have built the left.

So, how on earth are we to make this sort of stuff make sense in an honest language? How are we to reason about it in a proof language? The state of the art is kind of painful in several different ways. The moment you say ‘productivity checking’, you’re dead. Functions like spiral show us that productivity is only the tip of the continuity iceberg: it’s not enough for spiral’s argument to be a function which is productive on streams; the function needs the rather particular property that it does not inspect the tail of its input in the course of computing the head of its output.

In, Productive Corecursion via Guarded Recursion, Bob and I used a modal operator to enforce such continuity properties in a toy language. How do we integrate such a technology with type theory as we know it? One thing’s for sure, none of the currently popular languages based on type theory has anything to say about continuity, with the honourable and significant exception of Andreas Abel’s “sized types”, delivered as an extension in Agda. Certainly, Andreas can write breadth-first tree-labelling, but can he write the generic spiral? How easy is it to negotiate a variety of continuity disciplines, locally? His work is certainly part of the solution.

The reasoning problem is, of course, a manifestation of our old bugbear, equality. In honest type theory, we can equip a coinductive type with construction by unfolding (e.g., unfoldTree) and destruction by projection (e.g., tree). We do not need to take coconstructors as primitive, because they are derivable. Agda gives you the projection, but replaces the formal unfold by recursion-with-productivity-checking. Agda is, moreover, especially careful to forbid dependent case analysis of coinductive data (is Idris?), effectively ensuring that nondependent projection is the only way to take things apart. Indeed, the new ‘copattern’ notation capitalises on the idea of projection-only access by making you present codata in terms of their projections: all the examples in the relevant publications use streams, which is lucky, because streams have no coproduct structure to make you regret giving up case analysis.

What’s wrong with dependent case analysis? Nothing. But it’s incompatible with rubbish intensional propositional equality — that’s what’s wrong. The thing is, unfolds compute lazily: they only actually unfold when you attempt to project from them. If Haskell had copatterns, I’d have written

  unfoldTree :: (s -> (x, Maybe (s, s))) -> s -> Tree x
  tree (unfoldTree f s) = (x, case m of
      Nothing       -> Nothing
      Just (ls, rs) -> Just (unfoldTree f ls, unfoldTree f rs))
    where (x, m) = f s

because that explains when and how to run the program. And nobody used a coconstructor.

In particular, computation does not give us that

  Tree (tree t) = t        -- makes no demand, does not compute

but if we do take a primitive coconstructor, we can have that

  tree (Tree ts) = ts      -- is a demand, can compute

The dependent case analysis principle says that all codata are given by the coconstructor (whether that’s a primitive, or derived by unfolding the projection). Excuse the notation, which is Haskell but not as we
know it.

  treeCase :: forall (c :: Tree x -> *) ->
              (pi (ts :: (x, Maybe (Tree x, Tree x))) -> c (Tree ts)) ->
              pi (t :: Tree x) -> c t
  treeCase c p (Tree ts) = p ts

Coq lets us have that. But now I can prove the above law which does not compute. By case analysis, it holds if

  Tree (tree (Tree ts)) = Tree ts

and that is indeed the case by reflexivity, because the tree in the middle annihilates the inner Tree.

Pick any particular ts, and we get closed proof of an equation which does not hold by computation alone. Indeed, the proof will compute to reflexivity, but reflexivity does not typecheck as a proof of the equation, because the equation doesn’t hold by computation alone. Subject reduction fails. Consistency doesn’t. But it’s still disturbing that typability is unstable in that way. It’s also annoying that we pay the cost without getting the whole of the benefit: although that equation between codata is provable, plenty of other simulations are not.

The trouble, of course, is that intensionally, treeCase isn’t telling the whole truth: not every coinductive structure is implemented by coconstructor; rather, every coinductive structure can be simulated up to observation by one that’s implemented by coconstructor. If propositional equality captured not the intensional equality of codata, but rather their bisimilarity, we could derive treeCase given only the nondependent tree.

  treeCase c p t = fixup (p (tree t)) where
    fixup :: c (Tree (tree t)) -> c t
    fixup -- uses substitutivity of equality, given simulation Tree (tree t) = t

In Let’s See How Things Unfold, an invited talk abstract which got rather out of hand, I sketched how to get equality up to bisimulation in the context of an observational type theory. A similar story ought to work in homotopy type theory. I started writing a journal version (which would have been fully refereed, unlike my talk abstract), but whizz went the deadline and spike went the draft.

I think we’re in a bit of a mess. We don’t have the right reasoning tools. We don’t have compositional coprogramming tools. It might be clear, in principle, that coinduction (not general bloody recursion) is the way to make software whose job is to persist and respond to user demands, but there’s a serious need to re-examine both the foundations and the pragmatics of the whole coinduction business, before we can make that principled observation a sustainable reality.

irish induction-recursion

January 1, 2015

This is a bit of a shaggy dog story. Where to begin? I suppose I should remind you what definition by induction-recursion (IR) lets you accomplish. I’ll show you the usual coding scheme for IR definitions, and then I’ll show you the ‘Irish’ coding scheme which I came up with, not realising at the time that I might be doing more than bureaucratic shuffling of parts. It turns out that the status of Irish IR is not obvious.

The idea behind IR is that we can make an inductive definition of a datatype at the same time as we define a function which computes by recursion over that type: the recursive function can be involved in the types of the data constructors. E.g., defined contexts and environments as follows

  Context : Set1                                 Env (G : Context) : Set
  ::= nil                                        Env nil         = One
    | snoc (G : Context)(S : Env G -> Set)       Env (snoc G S)  = (g : Env G) * S g

Each successive context entry is a type depending on the prior context — given as a function from environments. The notion of environment is essential to the datatype declaration, even though it is being defined just in time. An environment for the empty context is trivial. An environment for a context extended is the dependent pair of an environment for the original context and a value in the type that extends the context ((s , t) : (x : S) * T whenever s : S and t : T[s/x]).

The Context type above is large, in that its values pack up sets, but the real power of IR lies in its ability to make small encodings of large things. Here, for example, is the universe closed under the Boolean type and the dependent function space:

  U : Set                                    El (T : U) : Set
  ::= two                                    El two       = Two
    | pi (S : U)(T : El S -> U)              El (pi S T)  = (s : El S) -> El (T s)

Here, U is a small set, but its values encode sets, with El telling you of what the elements of U’s encoded sets consist. We need to use El when accounting for the value-dependency in pi-types. Without IR, you can still give an encoding of such a universe, but it’ll be large.

In fact, if you know what you’re doing, you can give a small encoding of an entire predicative hierarchy using IR: it’s an astonishingly powerful means of definition. But what is it? Which definitions are allowed?

As ever, the constructors of an IR type amount to the initial algebra for an endofunctor, but not in Set. Borrowing Peter Hancock’s terminology, let Fam X = (I : Set) * I -> X. Fam X is the type of small coding schemes for Xs. Above, (U, El) : Fam Set. IR types are given as initial algebras for an endofunctor on some Fam X. For that to make sense, I have to equip Fam X with morphisms. A morphism from (I, f) to (J, g) is some h : I -> J such that f = g . h. So, if X is small, then Fam X amounts to the slice category, Set/X, but the whole sodding point is that X usually isn’t small. I’ll make use of

  Code (F : Fam X) : Set                   decode (F : Fam X)(i : Code F) : X
  Code (I, f) = I                          decode (I, f) i = f i

Now, the question is which functors will give rise to IR definitions. In 1999, Peter Dybjer and Anton Setzer gave a coding scheme for a collection of functors which do just that. Below, DS X Y is an encoding of functors from Fam X to Fam Y. (It’s a little easier to see what’s going on if we consider Fam-functors in general, not just endofunctors.)

  DS (X : Set1)(Y : Set1) : Set1
  ::= iota (y : Y)                                 -- end of data, deliver y
    | sigma (S : Set) (T : S -> DS X Y)            -- non-recursive field of type S
    | delta (H : Set) (T : (H -> X) -> DS X Y)     -- substructures branching over H

Let me encode the above example universe:

  UEl : DS Set Set
  UEl = sigma {two, pi}
          \ two -> iota Two
          | pi  -> delta One \ S -> delta (S ()) \ T -> iota ((s : S ) -> T s)

That is, we first offer the choice of “two” or “pi”. If “two” is chosen, the encoded type is Two. If “pi” is chosen, we ask for one recursively chosen code for the pi-type domain, and we get S : One -> Set as its decoding, so we can ask for an (S ())-indexed family of codes to give the dependent range of the pi-type, then deliver the set (s : S ()) -> T s. Or so I say…

We need to say which functors are encoded and make sure the above code really gets the intended interpretation. To do so, it will help to define the corresponding operations on Fam Y.
The constant family encodes just one element of Y with a trivial code.

  ONE (y : Y) : Fam Y
  ONE y = One, \ _ -> y

We can build up right-nested Fam Y encodings by dependent pairing over some set S. Once you have chosen an S, you can see the structure of the rest of the codes and how to decode them.

  SIGMA (S : Set) (T : S -> Fam Y) : Fam Y
  SIGMA S T = ((s : S) * Code (T s)), \ (s, t) -> decode (T s) t

Now, if you have some (I, f) : Fam X, you can use values in I as codes for values in X. Any old function g : H -> I gives us a bunch of codes, and the composition f . g : H -> X gives us the corresponding bunch of X-values.

  DELTA  (F : Fam X) (H : Set) (T : (H -> X) -> Fam Y) : Fam Y
  DELTA (I, f) H T = ((g : H -> I) * Code (T (f . g))), \ (g, t) -> decode (T (f . g)) t

Crucially, whatever Fam X we plug in, whatever its type of codes, we know we’ll be able to decode them to X.

Now we can give the action on objects.

  [(T : DS X Y)] : Fam X -> Fam Y
  [iota y]    F  =  ONE y
  [sigma S T] F  =  SIGMA S \ s -> [T s] F
  [delta H T] F  =  DELTA F H \ k -> [T k] F

The action on morphisms is dictated by the structures thus given: given a translation between two encodings of X, we can lift it to a translation between the corresponding encodings of Y.

Cranking through our example, we get

  Code ([UEl] (U, El)) = (x : {two, pi}) * case
    two -> One
    pi  -> (S : One -> Set) * (T : S () -> Set) * One

  decode ([UEl] (U, El)) (two, ())       = Two
  decode ([UEl] (U, El)) (pi, S, T, ())  = (s : El (S ())) -> El (T s)

so our definition has worked out the way we planned.

The eagle-eyed will have spotted that DS X is a monad, indeed it’s a free monad with iota as return and a >>= operator which grafts more node structure on at each iota.

Moreover, we can define

  dsI : DS X X
  dsI = delta One \ f -> iota (f ())

such that

Code ([dsI] (I, f)) = (i : I) * One
decode ([dsI] (I, f)) (i, ()) = f i

i.e., [dsI] is isomorphic to the identity functor on Fam X.

But here’s the biggie. Is there a composition operator on DS

  dsC (F : DS Y Z) (G : DS X Y) : DS X Z

such that [dsC F G] is isomorphic to the functor composition [F].[G]? The usual suspects (Ghani, Hancock, Altenkirch, me…) have been banging our heads off this for a few years now. We just don’t know. We suspect that there isn’t such a thing, but we haven’t been able to find a concrete example of codes F and G whose composite definitely cannot be encoded.

Now, once upon a time, I was trying to make the presentation of DS X Y more like the universe of functors Chapman, Dagand, Morris and I built in The Gentle Art of Levitation. I started out like this — it’s an IR-definition itself.

  Irish (X : Set1) : Set1                       Info : Irish X -> Set1
  ::= idi                                       Info idi          = X
    | con (A : Set)                             Info (con A)      = A
    | sigma (S : Irish X)                       Info (sigma S T)  = (s : Info S) * 
        (T : Info S -> Irish X)                                     Info (T s)
    | pi (S : Set)(T : S -> Irish X)            Info (pi S T)     = (s : S) -> Info (T s)

Irish is a set of codes for structures involving a Fam X. Info tells us the information we are sure to have in those structures, whatever Fam X we plug in: we know more than nothing about them, because we know they decode to X. Info has to be large, because X is large.

Of course, I have to tell you what structure you get when you plug in a Fam X. We get node structures, small sets, from which we can compute Info.

  Node (T : Irish X) (F : Fam X) : Set
  info (T : Irish X) (F : Fam X) (t : Node T F) : Info T

  Node id      F = Code F                  info idi     F i = decode F i
  Node (con A) _ = A                       info (con A) _ a = a
  Node (sigma S T) F = (s : Node S F) *    info (sigma S T) F (s, t) = s', info (T s') F t 
                Node (T (info S F s)) F      where s' = info S F s
  Node (pi S T) F = (s : S) ->             info (pi S T) F f = \ s -> info (T s) F (f s)
                    Node (T s) F

Node can be small, because we now use our Fam X to give us a small encoding for X-values. We compute Info from Node by deploying the decoder which comes with the Fam X. We now have that

  F : Fam X   =>   (Node T F, info T F) : Fam (Info T)

But we want a Fam Y. All we need for that is a function from Info T to Y. That is

  irish (T : Irish X) (d : Info T -> Y) (F : Fam X) : Fam Y
  irish T d F = Node T F, (d . info T F)

In fact, it’s quite a lot like the way we write IR definitions informally. We give a description of the data, and we write a decoder for it. I should, of course, give the action on morphisms (which is postcomposition in the relevant places), and check that we actually get a functor (which of course we do, because postcomposition respects identity and composition).

Let me work through the running example again

  U : Irish Set
  U = sigma (con {two, pi})
        \ two -> con One
        | pi  -> sigma idi \ S -> pi S \ _ -> idi

  El : Info U -> Set
  El (two, ())   = Two
  El (pi, S, T)  = (s : S) -> T s

and irish U El is the endofunctor we need.

It’s not hard to see that irish idi id is exactly the identity functor on Fam X. But there’s more.
If we have F : Irish Y, that’s a description of node structures that a Fam Y plugs into, but if we have the bits for a functor from Fam X to Fam Y, we should be able to get a structure that a Fam X plugs into, just
by grafting stuff in place of F’s idi markers. At the same time, we need the forgetful map which discards the extra detail from the refined structure, giving us just the Info for the outer layer that corresponds to F.

  subC (F : Irish Y)(G : Irish X)(d : Info G -> Y) : Irish X
  fogC (F : Irish Y)(G : Irish X)(d : Info G -> Y) : Info (subC F G d) -> Info F

  subC idi G d = G                            fogC idi G d g = d g
  subC (con A) G d = con A                    fogC (con A) G d a = a
  subC (sigma S T) G d =                      fogC (sigma S T) G d (s, t) =
    sigma (subC S G d) \ s' ->                  s', fogC (T s') G d t
      subC (T (fogC S G d s)) G d               where s' = fogC S G d s
  subC (pi S T) G d =                         fogC (pi S T) G d f = \ s ->
    pi S \ s -> subC (T s) G d                  fogC (T s) G d (f s)

Putting these pieces together, we see that

  irC0 (F : Irish Y)(f : Info F -> Z)(G : Irish X)(g : Info G -> Y) : Irish X
  irC1 (F : Irish Y)(f : Info F -> Z)(G : Irish X)(g : Info G -> Y) : Info (irC0 F f G g) -> Z
  irC0 F f G g = subC F G g
  irc1 F f G g = f . fogC F G g

gives us the tools to compose our functors. Of course, one must do the work to show that we don’t just get something with the right type to be composition. It’s not that difficult. Intuitively, substituting the placeholder for substructures in the node description is exactly how to design a composite structure.

So, Irish IR corresponds quite neatly to the way we write IR definitions anyway, and it describes a class of functors between Fams which is readily closed under composition. It’s very easy to check that every DS X Y gives rise to an (F : Irish X) * Info F -> Y. Dybjer-Setzer iota gives Irish con One, Dybjer-Setzer sigma gives Irish sigma-con, Dybjer-Setzer delta gives Irish sigma-pi-idi. What’s the problem?

Well, when you try to translate Irish IR codes to Dybjer-Setzer codes, it’s far from obvious how to do it. If we could do it, we’d be able to compose DS functors, via the Irish composition. But there’s an alternative possibility, which seems more likely: it may well be the case that Irish IR codes capture a larger class of functor than Dybjer-Setzer codes. That might even be a good thing, but it would certainly prevent us appealing to Dybjer and Setzer’s set-theoretic model to show consistency. We need a pile of fresh work to show the more generous notion makes sense. It could be too good to be true!

Although I’ve been using an informal notation in this blogpost, I’ve managed to do the whole construction in Agda. Can’t for the life of me remember where I put it. But it raises the question of what notion of IR Agda actually implements. Entertainingly, Irish/Info has a perfectly sensible Dybjer-Setzer encoding as an inductive-recursive definition in Set1 (which means it also has an Irish IR encoding, so we can play the levitation game). Agda’s positivity checker doesn’t complain when we tie the knot. I don’t think we can really deduce much about the consistency of such constructions from the fact that Agda fails to forbid them.

Zooming out a bit, let’s see what we’ve got. We have a compositional first-class presentation of inductive-recursive definition. It would fit neatly in a closed kernel for some dependently typed programming language. But we still have nagging questions about its status. And, of course, there are other things we might like to do with datatypes.


December 31, 2014

It’s an obvious but not often considered fact that if you erase all the terms from a valid construction in System F, you get a bunch of perfectly sensible types. (When you erase a Λ, you need to extend the context in which the types you expose will be judged.) Erasing the types is more likely to happen, yielding perfectly sensible terms which we might have some motivation to evaluate, but erasing the terms works, too. The thing about System F is that types and terms separate like oil and water: the context may build up term variables with λ and type variables with Λ, but when you’re making a type you can’t give a term variable, and when you’re making a term, you can’t give a type variable. Worlds apart. The separation of worlds means you can erase either terms or types without damaging the residue.

GCHQ read all my communications (hi, Richard!), but I’m not entitled to read theirs. Fortunately for me, I don’t need to know what they think in order to go about my business. Without the inward flow of unprivileged information, however, GHCQ would be able to monitor only one another, which might increase the availability of office space in Cheltenham. I hasten to add (in case Richard’s getting twitchy) that I am not proposing “Cheltenham erasure”, but the fact remains that my function depends in no way on the outflow of privileged information from GCHQ. We’re worlds apart, but there’s a directed information flow between those worlds.

Meanwhile, at GHC HQ, we’ve been trying to fight our way out of the mess caused by the separation of worlds in System F and its extensions: in each world, we find ourselves building avatars for things which live in the other, awkwardly simulating dependent types. We can’t just impose dependent types, because in their absence from the type world, the terms have learned to behave rather badly. What we can do, however, is install a shared world, for things which behave perfectly sensibly in types and at run time: information may flow from the shared world to both. During typechecking, we may have to run programs from the shared world, but we don’t need to run chaotic run-time-only code. Information does not flow downwards from the types-only world, so when generating run-time code, we can still erase everything we used to, leaving the shared and the run-time-only things.

Once you start to see systems in terms of information flow, it’s hard to stop. The question is really this: which communications of data are permitted by the notion of variable scope? Perhaps not all variables are in scope for all constructions. Note that if we prevent communication by being in scope, we may not have excluded explicit communication, (cost?-)modelled as an effect. Here’s a laundry list of possibilities and wheezes.

  • stage We cut execution into stages. Values persist from earlier to later stages, but dependency is forbidden the other way around. Stages are worlds.
  • timing productive corecursion In Productive Corecursion via Guarded Recursion, Bob Atkey and I used a notion of time to ensure that corecursive processes can always give you ‘today’s’ behaviour before they loop. Days are worlds. At least for streams, Paula Severi is there already (e.g., her ICFP 2012 paper with Fer-Jan de Vries).
  • detailsI saw Jeremy Singer give a talk about ‘AnyScale apps’ — programs (e.g. games) which move between computing environments and react dynamically to the available resources (processing power, screen resolution, etc). To make them work, you have to be able to cut off computation below a dynamically chosen level of detail. Clearly, the cruder parts had better not depend on the optional refinements. Detail levels are worlds.
  • security As touched on, above, access to information can be managed via a preorder of privilege. Vincent Simonet showed the way with Flow Caml. Security levels are worlds.
  • distribution I have a bunch of computers. Each of them has access to some shared libraries, but also their own private memory. Communication between computers is expensive and requires an explicit effect. Types model which knows what. Computing resources are worlds.
  • Hoare logic Specifications can talk about program variables, but program code cannot talk about logic variables. Specfication and program live in different worlds.
  • phase Typechecking and compilation can involve partial evaluation of arbitrary code. Program execution does not involve types. Information flows from run-time code to the typechecker, even though the temporal order is the other way around. Note that run-time values do not flow to the typechecker, just the symbols which stand for them.

It’s time we stopped thinking of these issues as separate. They have a great deal in common.

Dependent type theory has a bit of a reputation as a mish-mash: all the terms and types jumbled up together in the same syntactic category. The flow of information from terms to types is exactly the source of its power and the basis for confusion amongst casual observers. A common misconception is (or at least used to be) that the flow of information from terms to types prevents the erasure of types from terms: that’s manifest nonsense — it prevents the erasure of terms from types, so no big deal. The absence of typecase means that terms depend only parametrically on types, so type erasure still works. It would be good, however, to nail that down by a separation of worlds. To a large extent, that’s what Sheard and Mishra-Linger did, what Barras and Bernardo did (building on work by Miquel), and what Bernardy and Moulin do in spades: all in different ways. This needs said:

Parametricity arises by quantification against the flow of information.

Quantifying over information you can’t access had better mean you don’t need to care what it is. Dependent type theory needs both parametric quantification (intersection) and non-parametric quantification (product). Moreover, which you choose shouldn’t depend on the domain. I can agree with the claim that it is useful to be able to quantify parametrically over types without believing that all quantification over types should be parametric. With a clear separation of worlds, we could write datatype-generic code when we want to (and stage its execution, too), but still make it clear when parametricity applies. Too many people are trapped into believing nonsense like ‘typecase breaks parametricity’ without interrogating the unstable presumptions that make it wise in a narrow locality.

What do I propose to do about this business of worlds? I’d like to put it on a more general footing. I don’t imagine there’s one holy and perfect system of worlds. I think it’s more important to know when a construction can be transported between systems of worlds. As usual, the morphisms are more important than the objects. So we need to find the general structure. This draft is the beginning of that story, but there’s much more to dig out. I’ll sketch the basic picture here.

Start from a vanilla dependent type theory, where contexts assign types to variables and judgments explain which terms a type admits. Now put everything in a world. Worlds live in a preorder W. Every context entry has a ‘source’ world: judgments construct terms in a ‘target’ world. The variable rule, ‘a variable in the context is a term’ acquires the side-condition that information can flow from source to target.

What about quantification and functions? For a long time, I thought that each world should have its own quantifier, abstraction and application, explaining what it is to be a function over stuff in that world. We’re used to that in System F, right? We have ∀, Λ and type application for functions over types, and →, λ and term application for functions over values: the lack of information flow from terms to types means → is reliably degenerate, and the lack of information flow from types to terms means that ∀ is parametric, allowing the Church-to-Curry erasure. That story works, but it’s not good enough.

Suppose I have a world for library code which can flow into any of a number of discrete computing facilities. I want each local computation to be able to apply library functionality to its own local data. So the type of a library function had better not fix the world its argument comes from. I spent a lot of time staring at Microsoft whiteboards trying to figure out how to recompute the types of functions as they moved between worlds, renegotiating their domains. My brain hurt. Then, the penny dropped.

We need portable quantifiers, e.g., capturing what ‘locally’ means, wherever we happen to be. Let us have a set of quantifiers Q, with a monotonic action, qw, on W. Now, types mention quantifiers, never worlds; worlds live in judgments. When we check a q-λ in world w, we bind the variable in world qw; when we check an application of a q-function in world w, we check the argument in world qw. E.g., the ‘locally’ quantifier has the identity action, but we might also have an ‘intersection’ which maps every w to the world where types live. As we work through a typing derivation, the action of Q on a given w will be iterated, giving each w a local subsystem of worlds. As long as the world preorder preserves the structure of those local subsystems, we can transport constructions whenever the preorder permits.

Here is a hasty Agda formalisation of the world transportation lemma which I cooked up one Sunday morning in the middle of a fen. The less said about the rest of that day, the better.

I think world systems have a considerable potential to change the way we manage information flow (I’m reminded of Douglas Adams’s glorious phrase “rigidly defined areas of doubt and uncertainty”) across a whole bunch of application domains. At the same time, I don’t think they will disturb the way we think about the semantics of type theories: we need a Kripke semantics to manage the context inclusion preorder with which we account for open computation (values are used not only in the empty context, but rather in contexts with at least as much information as the contexts in which they are constructed), so it might as well be an interesting Kripke semantics.

Away from the blackboard, they’ll change the way we do code generation, too. A bunch of these thoughts germinated from a beer mat Edwin Brady and I once scribbled on in the Basque and Carol, and they’re informing what’s happening with erasure in Idris today.

Morphisms between local world systems are important. In our work on corecursion, Bob and I are most particular about ensuring that the time-slicing which justifies the productivity of codata need not be negotiated by its consumer: you work with more of the detail visible, it checks out, then you rub out the pencil.

There’s a lot still to think about, but I’m excited and optimistic at the prospect. The ‘worlds’ idea is altering my perception of lots of other things I work on (i.e., it’s a threshold concept), so forgive me if I’m keen to pass it on.