Skip to content

theories of testing

January 2, 2015

Here’s a fun fact for you. Consider the type of unlabelled binary trees, with its catamorphism/iterator/fold thingy.

  data Tree = Leaf | Tree :^: Tree

  cata :: t -> (t -> t -> t) -> Tree -> t
  cata leaf node Leaf        = leaf
  cata leaf node (Node l r)  = node (cata leaf node l) (cata leaf node r)

I have in mind a particular function

  myf :: Tree -> Tree
  myf = cata myl myn where
    myl :: Tree
    myl = ... -- built with constructors
    myn :: Tree -> Tree -> Tree
    myn myf_l myf_r = ... -- built with constructors, myf_l and myf_r

and I describe to you what it does. Perhaps you think you can do the same job. You hand me yrf, which you promise you implemented in the same way.

  yrf :: Tree -> Tree
  yrf = cata yrl yrn where
    yrl :: Tree
    yrl = ... -- built with constructors
    yrn :: Tree -> Tree -> Tree
    yrn yrf_l yrf_r = ... -- built with constructors, yrf_l and yrf_r

Assuming your promise about the implementation is in good faith, I test

  and [ myf t == yrf t
      | t <- [Leaf, Node Leaf Leaf, Node (Node Leaf Leaf) Leaf, Node Leaf (Node Leaf Leaf)]
      ]

and the result is True. Congratulations! You got it right!

I don’t need to be given yrl and yrn. I just need to know that you built yrf that way. Of course, I can obtain yrl by evaluating yrf Leaf. Now that I know what yrl is, evaluating yrf (Node Leaf Leaf) gives me a bunch of candidates for yrn. I just look for occurrences of yrl as a subterm of yrf (Node Leaf Leaf), because I know that yrf_l and yrf_r will both have been yrf Leaf = yrl when yrn was called. The trouble is that I don’t know which occurrences of yrl are yrf_l, which are yrf_r, and which are just constants by an astounding coincidence. So I perturb my experiment. When I evaluate yrf (Node (Node Leaf Leaf) Leaf) and compare, I can see which yrl occurrences have become something else, and those must correspond to the only thing which can have changed, yrf_l. Similarly, yrf (Node Leaf (Node Leaf Leaf)) tells me which yrl occurrences must have been yrf_r. If you were telling the truth about the simplicity of your program’s construction, I’ve checked its coincidence with mine on all inputs just by testing it on four.

I’ve been fascinated with such results for a while now. This draft, A Polynomial Testing Principle, formalises the result that extensional equality for a language of polynomials and summation can be decided by testing just for the input 0..n, where n is an upper bound on the degree of the polynomials. Classic inductive proofs such as (Sum{i=1..n}i)^2 = Sum{i=1..n}i^3 hold just by checking we get 0,1,9,36,100. What’s fun about the proof is that it doesn’t step beyond the natural numbers. No subtractions, no fractions, I do the words and Agda does the actions. The method is to define the forward difference operator, symbolically, check that it reduces degree (which amounts merely to being well typed), then prove that it satisfies the specification

  f n + diff f n = f (suc n)

for all n, which is an easy structural induction on the code for f. Now, if we know f 0 = g 0 and diff f agrees everywhere with diff g, then f must agree with g. If f and g agree on {0..suc n}, then f 0 = g 0 and
diff f agrees with diff g on {0..n}, hence everywhere by induction.

Thinking back to the result on trees, the argument has a similar character, deducing the dependency of a function on its input by making strategic perturbations to them and computing differences. So, is there a more general theory of differential testing?

It’s funny, of course, because I’m used to having a partial evaluator handy. I can happily run programs on open terms. The essence of proof by induction is that you prove a result for all inputs by testing a cover of open constructor forms which allow just enough computation to reduce the problem for the whole inputs to the problem for their substructures. The challenge here is voluntarily to give up partial evaluation and achieve results of the same strength by ‘dark grey box’ testing. We don’t get to see the implementation, but we do get a bound on how weird it is, and from that we deduce not merely a test suite which has 100% code coverage, but which actually allows us to establish the absence of bugs. That’s got to be worth thinking about, right?

Advertisements

coinduction

January 2, 2015

This is a topic I need to page back in, so I thought I’d pick it up today, even though my thinking on the topic is linked both with observational type theory and with local time for recursion and corecursion. Am I writing these posts in a feasible order? Let’s find out.

Firstly, what is coinduction? It’s a process for generating tree-like structures in which some paths may be infinite. And it’s named by duality to induction, which is a process for consuming tree-like structures in which all paths must be finite. What’s a good example of a coinductive structure? Not bloody streams, that’s for sure: they’re like picking natural numbers as an example of inductive datatypes — disingenuously simple in a way that must be overcome to achieve learning. So let’s not do that. I prefer the labelled, possibly infinite binary trees. The Strathclyde Haskell Enhancement (or rather, a text editor) lets me write

  codata Tree x = Tree {tree :: (x, Maybe (Tree x, Tree x))}

Exploring such a tree, you can always ask “What’s here?”, and you will be presented with some number (zero or two) possible ways to move deeper into the tree. We can build trees which always let us go on forever.

calculateWilfully :: Tree (Integer, Integer)
calculateWilfully = grow (1,1) where
  grow x@(n,d) = Tree (x, Just (grow (n, n + d), grow (n + d, d)))

We can build trees which sometimes stop.

huntDiag :: (Integer, Integer) -> Tree (Integer, Integer)
huntDiag x@(a, b) = Tree (x,
   if a == b then Nothing else Just (huntDiag (a - b, b), huntDiag (a, b - a)))

Your basic primitive construction tool for coinductive data is to unfold a coalgebra. Let’s see what that amounts to here.

  unfoldTree :: (s -> (x, Maybe (s, s))) -> s -> Tree x
  unfoldTree f s = Tree (x, case m of
      Nothing       -> Nothing
      Just (ls, rs) -> Just (unfoldTree f ls, unfoldTree f rs))
    where (x, m) = f s

The coalgebra, f, tells us how to grow one layer of structure from a seed. In the place for each subtree, we get the seed from which it should be grown. It’s hopefully easy to see that my two examples, above, are given by unfolding coalgebras whose seeds are pairs of integers — the corecursive calls happen just where the corresponding coalgebra needs to put a seed.

Note that the projection function, tree, is a coalgebra…

  unfoldTree tree :: (x, Maybe (Tree x, Tree x)) -> Tree x

…with the same type as the coconstructor, Tree. It’s not hard to see, informally, that the two will do essentially the same job, in a handwavy sort of a way. But how to be precise…?

Meanwhile, in jolly old dishonest Haskell, we can write all sorts of thrilling (in that “is this safe?” sort of way) programs which fool about with coinductive data. For this example, I need streams and trees.

  codata Stream x = x :> Stream x

Note that I reserve ‘stream’ for the definitely infinite sequences: those which may (but don’t have to) stop are ‘colists’. The example I’m thinking of is good old Gibbons-and-Jones breadth-first labelling. The task is to take a Tree () and deliver a Tree Integer, labelled with 0 at the top, 1 and 2 the next layer down, and so on. Step one, boggle at this.

  spiral :: (Stream a -> (b, Stream a)) -> a -> b
  spiral f a = b where (b, as) = f (a :> as)

I call these ‘Oxford’ programs, because that’s where a lot of them get written: they’re hilariously circular in a somewhat underexplained manner. In the case of spiral, the type falls a long way short of explaining which f’s can possibly achieve a useful outcome: they’re rather special. But now, the main deal.

  label :: Tree () -> Tree Integer
  label t = spiral (help t) 0 where
    help :: Tree () -> Stream Integer -> (Tree Integer, Stream Integer)
    help (Tree ((), m)) (i :> is0) = case m of
      Nothing -> (Tree (i, Nothing), (i + 1) :> is0)
      Just (lt, rt) ->
        let (lt', is1) = help lt is0
            (rt', is2) = help rt is1
        in  (Tree (i, Just (lt', rt')), (i + 1) :> is2)

What’s going on? The labelling task is easy if we know which number we need to kick off each layer. So just imagine we already know that: we give ourselves a stream of ‘initialisers’ — integers telling us which numbers to use down the left spine of the tree. The head initialiser labels the top node, and the tail tells us how to label any subtrees which might exist. We give back not only the tree, but also the stream of ‘leftovers’ — the successors of the labels used down the right spine of the tree, ready to be used to label the next layer down. The spiral function ties the knot, using the leftovers stream as the tail of the initialisers. The program is all about how the data are used spatially, regardless of what happens when. Haskell’s laziness ensures that if there is away to flatten the data dependency graph into a temporal order, some such order will be found, and in this case, more by luck than the typing judgment, we do indeed win. It’s not remotely obvious how to render label as an unfold, because the key information we use when growing the tree is threaded, not just distributed from the root. We don’t learn the leftover labels to kick off the right subtree ‘until’ we have built the left.

So, how on earth are we to make this sort of stuff make sense in an honest language? How are we to reason about it in a proof language? The state of the art is kind of painful in several different ways. The moment you say ‘productivity checking’, you’re dead. Functions like spiral show us that productivity is only the tip of the continuity iceberg: it’s not enough for spiral’s argument to be a function which is productive on streams; the function needs the rather particular property that it does not inspect the tail of its input in the course of computing the head of its output.

In, Productive Corecursion via Guarded Recursion, Bob and I used a modal operator to enforce such continuity properties in a toy language. How do we integrate such a technology with type theory as we know it? One thing’s for sure, none of the currently popular languages based on type theory has anything to say about continuity, with the honourable and significant exception of Andreas Abel’s “sized types”, delivered as an extension in Agda. Certainly, Andreas can write breadth-first tree-labelling, but can he write the generic spiral? How easy is it to negotiate a variety of continuity disciplines, locally? His work is certainly part of the solution.

The reasoning problem is, of course, a manifestation of our old bugbear, equality. In honest type theory, we can equip a coinductive type with construction by unfolding (e.g., unfoldTree) and destruction by projection (e.g., tree). We do not need to take coconstructors as primitive, because they are derivable. Agda gives you the projection, but replaces the formal unfold by recursion-with-productivity-checking. Agda is, moreover, especially careful to forbid dependent case analysis of coinductive data (is Idris?), effectively ensuring that nondependent projection is the only way to take things apart. Indeed, the new ‘copattern’ notation capitalises on the idea of projection-only access by making you present codata in terms of their projections: all the examples in the relevant publications use streams, which is lucky, because streams have no coproduct structure to make you regret giving up case analysis.

What’s wrong with dependent case analysis? Nothing. But it’s incompatible with rubbish intensional propositional equality — that’s what’s wrong. The thing is, unfolds compute lazily: they only actually unfold when you attempt to project from them. If Haskell had copatterns, I’d have written

  unfoldTree :: (s -> (x, Maybe (s, s))) -> s -> Tree x
  tree (unfoldTree f s) = (x, case m of
      Nothing       -> Nothing
      Just (ls, rs) -> Just (unfoldTree f ls, unfoldTree f rs))
    where (x, m) = f s

because that explains when and how to run the program. And nobody used a coconstructor.

In particular, computation does not give us that

  Tree (tree t) = t        -- makes no demand, does not compute

but if we do take a primitive coconstructor, we can have that

  tree (Tree ts) = ts      -- is a demand, can compute

The dependent case analysis principle says that all codata are given by the coconstructor (whether that’s a primitive, or derived by unfolding the projection). Excuse the notation, which is Haskell but not as we
know it.

  treeCase :: forall (c :: Tree x -> *) ->
              (pi (ts :: (x, Maybe (Tree x, Tree x))) -> c (Tree ts)) ->
              pi (t :: Tree x) -> c t
  treeCase c p (Tree ts) = p ts

Coq lets us have that. But now I can prove the above law which does not compute. By case analysis, it holds if

  Tree (tree (Tree ts)) = Tree ts

and that is indeed the case by reflexivity, because the tree in the middle annihilates the inner Tree.

Pick any particular ts, and we get closed proof of an equation which does not hold by computation alone. Indeed, the proof will compute to reflexivity, but reflexivity does not typecheck as a proof of the equation, because the equation doesn’t hold by computation alone. Subject reduction fails. Consistency doesn’t. But it’s still disturbing that typability is unstable in that way. It’s also annoying that we pay the cost without getting the whole of the benefit: although that equation between codata is provable, plenty of other simulations are not.

The trouble, of course, is that intensionally, treeCase isn’t telling the whole truth: not every coinductive structure is implemented by coconstructor; rather, every coinductive structure can be simulated up to observation by one that’s implemented by coconstructor. If propositional equality captured not the intensional equality of codata, but rather their bisimilarity, we could derive treeCase given only the nondependent tree.

  treeCase c p t = fixup (p (tree t)) where
    fixup :: c (Tree (tree t)) -> c t
    fixup -- uses substitutivity of equality, given simulation Tree (tree t) = t

In Let’s See How Things Unfold, an invited talk abstract which got rather out of hand, I sketched how to get equality up to bisimulation in the context of an observational type theory. A similar story ought to work in homotopy type theory. I started writing a journal version (which would have been fully refereed, unlike my talk abstract), but whizz went the deadline and spike went the draft.

I think we’re in a bit of a mess. We don’t have the right reasoning tools. We don’t have compositional coprogramming tools. It might be clear, in principle, that coinduction (not general bloody recursion) is the way to make software whose job is to persist and respond to user demands, but there’s a serious need to re-examine both the foundations and the pragmatics of the whole coinduction business, before we can make that principled observation a sustainable reality.

irish induction-recursion

January 1, 2015

This is a bit of a shaggy dog story. Where to begin? I suppose I should remind you what definition by induction-recursion (IR) lets you accomplish. I’ll show you the usual coding scheme for IR definitions, and then I’ll show you the ‘Irish’ coding scheme which I came up with, not realising at the time that I might be doing more than bureaucratic shuffling of parts. It turns out that the status of Irish IR is not obvious.

The idea behind IR is that we can make an inductive definition of a datatype at the same time as we define a function which computes by recursion over that type: the recursive function can be involved in the types of the data constructors. E.g., defined contexts and environments as follows

  Context : Set1                                 Env (G : Context) : Set
  ::= nil                                        Env nil         = One
    | snoc (G : Context)(S : Env G -> Set)       Env (snoc G S)  = (g : Env G) * S g

Each successive context entry is a type depending on the prior context — given as a function from environments. The notion of environment is essential to the datatype declaration, even though it is being defined just in time. An environment for the empty context is trivial. An environment for a context extended is the dependent pair of an environment for the original context and a value in the type that extends the context ((s , t) : (x : S) * T whenever s : S and t : T[s/x]).

The Context type above is large, in that its values pack up sets, but the real power of IR lies in its ability to make small encodings of large things. Here, for example, is the universe closed under the Boolean type and the dependent function space:

  U : Set                                    El (T : U) : Set
  ::= two                                    El two       = Two
    | pi (S : U)(T : El S -> U)              El (pi S T)  = (s : El S) -> El (T s)

Here, U is a small set, but its values encode sets, with El telling you of what the elements of U’s encoded sets consist. We need to use El when accounting for the value-dependency in pi-types. Without IR, you can still give an encoding of such a universe, but it’ll be large.

In fact, if you know what you’re doing, you can give a small encoding of an entire predicative hierarchy using IR: it’s an astonishingly powerful means of definition. But what is it? Which definitions are allowed?

As ever, the constructors of an IR type amount to the initial algebra for an endofunctor, but not in Set. Borrowing Peter Hancock’s terminology, let Fam X = (I : Set) * I -> X. Fam X is the type of small coding schemes for Xs. Above, (U, El) : Fam Set. IR types are given as initial algebras for an endofunctor on some Fam X. For that to make sense, I have to equip Fam X with morphisms. A morphism from (I, f) to (J, g) is some h : I -> J such that f = g . h. So, if X is small, then Fam X amounts to the slice category, Set/X, but the whole sodding point is that X usually isn’t small. I’ll make use of

  Code (F : Fam X) : Set                   decode (F : Fam X)(i : Code F) : X
  Code (I, f) = I                          decode (I, f) i = f i

Now, the question is which functors will give rise to IR definitions. In 1999, Peter Dybjer and Anton Setzer gave a coding scheme for a collection of functors which do just that. Below, DS X Y is an encoding of functors from Fam X to Fam Y. (It’s a little easier to see what’s going on if we consider Fam-functors in general, not just endofunctors.)

  DS (X : Set1)(Y : Set1) : Set1
  ::= iota (y : Y)                                 -- end of data, deliver y
    | sigma (S : Set) (T : S -> DS X Y)            -- non-recursive field of type S
    | delta (H : Set) (T : (H -> X) -> DS X Y)     -- substructures branching over H

Let me encode the above example universe:

  UEl : DS Set Set
  UEl = sigma {two, pi}
          \ two -> iota Two
          | pi  -> delta One \ S -> delta (S ()) \ T -> iota ((s : S ) -> T s)

That is, we first offer the choice of “two” or “pi”. If “two” is chosen, the encoded type is Two. If “pi” is chosen, we ask for one recursively chosen code for the pi-type domain, and we get S : One -> Set as its decoding, so we can ask for an (S ())-indexed family of codes to give the dependent range of the pi-type, then deliver the set (s : S ()) -> T s. Or so I say…

We need to say which functors are encoded and make sure the above code really gets the intended interpretation. To do so, it will help to define the corresponding operations on Fam Y.
The constant family encodes just one element of Y with a trivial code.

  ONE (y : Y) : Fam Y
  ONE y = One, \ _ -> y

We can build up right-nested Fam Y encodings by dependent pairing over some set S. Once you have chosen an S, you can see the structure of the rest of the codes and how to decode them.

  SIGMA (S : Set) (T : S -> Fam Y) : Fam Y
  SIGMA S T = ((s : S) * Code (T s)), \ (s, t) -> decode (T s) t

Now, if you have some (I, f) : Fam X, you can use values in I as codes for values in X. Any old function g : H -> I gives us a bunch of codes, and the composition f . g : H -> X gives us the corresponding bunch of X-values.

  DELTA  (F : Fam X) (H : Set) (T : (H -> X) -> Fam Y) : Fam Y
  DELTA (I, f) H T = ((g : H -> I) * Code (T (f . g))), \ (g, t) -> decode (T (f . g)) t

Crucially, whatever Fam X we plug in, whatever its type of codes, we know we’ll be able to decode them to X.

Now we can give the action on objects.

  [(T : DS X Y)] : Fam X -> Fam Y
  [iota y]    F  =  ONE y
  [sigma S T] F  =  SIGMA S \ s -> [T s] F
  [delta H T] F  =  DELTA F H \ k -> [T k] F

The action on morphisms is dictated by the structures thus given: given a translation between two encodings of X, we can lift it to a translation between the corresponding encodings of Y.

Cranking through our example, we get

  Code ([UEl] (U, El)) = (x : {two, pi}) * case
    two -> One
    pi  -> (S : One -> Set) * (T : S () -> Set) * One

  decode ([UEl] (U, El)) (two, ())       = Two
  decode ([UEl] (U, El)) (pi, S, T, ())  = (s : El (S ())) -> El (T s)

so our definition has worked out the way we planned.

The eagle-eyed will have spotted that DS X is a monad, indeed it’s a free monad with iota as return and a >>= operator which grafts more node structure on at each iota.

Moreover, we can define

  dsI : DS X X
  dsI = delta One \ f -> iota (f ())

such that

Code ([dsI] (I, f)) = (i : I) * One
decode ([dsI] (I, f)) (i, ()) = f i

i.e., [dsI] is isomorphic to the identity functor on Fam X.

But here’s the biggie. Is there a composition operator on DS

  dsC (F : DS Y Z) (G : DS X Y) : DS X Z

such that [dsC F G] is isomorphic to the functor composition [F].[G]? The usual suspects (Ghani, Hancock, Altenkirch, me…) have been banging our heads off this for a few years now. We just don’t know. We suspect that there isn’t such a thing, but we haven’t been able to find a concrete example of codes F and G whose composite definitely cannot be encoded.

Now, once upon a time, I was trying to make the presentation of DS X Y more like the universe of functors Chapman, Dagand, Morris and I built in The Gentle Art of Levitation. I started out like this — it’s an IR-definition itself.

  Irish (X : Set1) : Set1                       Info : Irish X -> Set1
  ::= idi                                       Info idi          = X
    | con (A : Set)                             Info (con A)      = A
    | sigma (S : Irish X)                       Info (sigma S T)  = (s : Info S) * 
        (T : Info S -> Irish X)                                     Info (T s)
    | pi (S : Set)(T : S -> Irish X)            Info (pi S T)     = (s : S) -> Info (T s)

Irish is a set of codes for structures involving a Fam X. Info tells us the information we are sure to have in those structures, whatever Fam X we plug in: we know more than nothing about them, because we know they decode to X. Info has to be large, because X is large.

Of course, I have to tell you what structure you get when you plug in a Fam X. We get node structures, small sets, from which we can compute Info.

  Node (T : Irish X) (F : Fam X) : Set
  info (T : Irish X) (F : Fam X) (t : Node T F) : Info T

  Node id      F = Code F                  info idi     F i = decode F i
  Node (con A) _ = A                       info (con A) _ a = a
  Node (sigma S T) F = (s : Node S F) *    info (sigma S T) F (s, t) = s', info (T s') F t 
                Node (T (info S F s)) F      where s' = info S F s
  Node (pi S T) F = (s : S) ->             info (pi S T) F f = \ s -> info (T s) F (f s)
                    Node (T s) F

Node can be small, because we now use our Fam X to give us a small encoding for X-values. We compute Info from Node by deploying the decoder which comes with the Fam X. We now have that

  F : Fam X   =>   (Node T F, info T F) : Fam (Info T)

But we want a Fam Y. All we need for that is a function from Info T to Y. That is

  irish (T : Irish X) (d : Info T -> Y) (F : Fam X) : Fam Y
  irish T d F = Node T F, (d . info T F)

In fact, it’s quite a lot like the way we write IR definitions informally. We give a description of the data, and we write a decoder for it. I should, of course, give the action on morphisms (which is postcomposition in the relevant places), and check that we actually get a functor (which of course we do, because postcomposition respects identity and composition).

Let me work through the running example again

  U : Irish Set
  U = sigma (con {two, pi})
        \ two -> con One
        | pi  -> sigma idi \ S -> pi S \ _ -> idi

  El : Info U -> Set
  El (two, ())   = Two
  El (pi, S, T)  = (s : S) -> T s

and irish U El is the endofunctor we need.

It’s not hard to see that irish idi id is exactly the identity functor on Fam X. But there’s more.
If we have F : Irish Y, that’s a description of node structures that a Fam Y plugs into, but if we have the bits for a functor from Fam X to Fam Y, we should be able to get a structure that a Fam X plugs into, just
by grafting stuff in place of F’s idi markers. At the same time, we need the forgetful map which discards the extra detail from the refined structure, giving us just the Info for the outer layer that corresponds to F.

  subC (F : Irish Y)(G : Irish X)(d : Info G -> Y) : Irish X
  fogC (F : Irish Y)(G : Irish X)(d : Info G -> Y) : Info (subC F G d) -> Info F

  subC idi G d = G                            fogC idi G d g = d g
  subC (con A) G d = con A                    fogC (con A) G d a = a
  subC (sigma S T) G d =                      fogC (sigma S T) G d (s, t) =
    sigma (subC S G d) \ s' ->                  s', fogC (T s') G d t
      subC (T (fogC S G d s)) G d               where s' = fogC S G d s
  subC (pi S T) G d =                         fogC (pi S T) G d f = \ s ->
    pi S \ s -> subC (T s) G d                  fogC (T s) G d (f s)

Putting these pieces together, we see that

  irC0 (F : Irish Y)(f : Info F -> Z)(G : Irish X)(g : Info G -> Y) : Irish X
  irC1 (F : Irish Y)(f : Info F -> Z)(G : Irish X)(g : Info G -> Y) : Info (irC0 F f G g) -> Z
  irC0 F f G g = subC F G g
  irc1 F f G g = f . fogC F G g

gives us the tools to compose our functors. Of course, one must do the work to show that we don’t just get something with the right type to be composition. It’s not that difficult. Intuitively, substituting the placeholder for substructures in the node description is exactly how to design a composite structure.

So, Irish IR corresponds quite neatly to the way we write IR definitions anyway, and it describes a class of functors between Fams which is readily closed under composition. It’s very easy to check that every DS X Y gives rise to an (F : Irish X) * Info F -> Y. Dybjer-Setzer iota gives Irish con One, Dybjer-Setzer sigma gives Irish sigma-con, Dybjer-Setzer delta gives Irish sigma-pi-idi. What’s the problem?

Well, when you try to translate Irish IR codes to Dybjer-Setzer codes, it’s far from obvious how to do it. If we could do it, we’d be able to compose DS functors, via the Irish composition. But there’s an alternative possibility, which seems more likely: it may well be the case that Irish IR codes capture a larger class of functor than Dybjer-Setzer codes. That might even be a good thing, but it would certainly prevent us appealing to Dybjer and Setzer’s set-theoretic model to show consistency. We need a pile of fresh work to show the more generous notion makes sense. It could be too good to be true!

Although I’ve been using an informal notation in this blogpost, I’ve managed to do the whole construction in Agda. Can’t for the life of me remember where I put it. But it raises the question of what notion of IR Agda actually implements. Entertainingly, Irish/Info has a perfectly sensible Dybjer-Setzer encoding as an inductive-recursive definition in Set1 (which means it also has an Irish IR encoding, so we can play the levitation game). Agda’s positivity checker doesn’t complain when we tie the knot. I don’t think we can really deduce much about the consistency of such constructions from the fact that Agda fails to forbid them.

Zooming out a bit, let’s see what we’ve got. We have a compositional first-class presentation of inductive-recursive definition. It would fit neatly in a closed kernel for some dependently typed programming language. But we still have nagging questions about its status. And, of course, there are other things we might like to do with datatypes.

worlds

December 31, 2014

It’s an obvious but not often considered fact that if you erase all the terms from a valid construction in System F, you get a bunch of perfectly sensible types. (When you erase a Λ, you need to extend the context in which the types you expose will be judged.) Erasing the types is more likely to happen, yielding perfectly sensible terms which we might have some motivation to evaluate, but erasing the terms works, too. The thing about System F is that types and terms separate like oil and water: the context may build up term variables with λ and type variables with Λ, but when you’re making a type you can’t give a term variable, and when you’re making a term, you can’t give a type variable. Worlds apart. The separation of worlds means you can erase either terms or types without damaging the residue.

GCHQ read all my communications (hi, Richard!), but I’m not entitled to read theirs. Fortunately for me, I don’t need to know what they think in order to go about my business. Without the inward flow of unprivileged information, however, GHCQ would be able to monitor only one another, which might increase the availability of office space in Cheltenham. I hasten to add (in case Richard’s getting twitchy) that I am not proposing “Cheltenham erasure”, but the fact remains that my function depends in no way on the outflow of privileged information from GCHQ. We’re worlds apart, but there’s a directed information flow between those worlds.

Meanwhile, at GHC HQ, we’ve been trying to fight our way out of the mess caused by the separation of worlds in System F and its extensions: in each world, we find ourselves building avatars for things which live in the other, awkwardly simulating dependent types. We can’t just impose dependent types, because in their absence from the type world, the terms have learned to behave rather badly. What we can do, however, is install a shared world, for things which behave perfectly sensibly in types and at run time: information may flow from the shared world to both. During typechecking, we may have to run programs from the shared world, but we don’t need to run chaotic run-time-only code. Information does not flow downwards from the types-only world, so when generating run-time code, we can still erase everything we used to, leaving the shared and the run-time-only things.

Once you start to see systems in terms of information flow, it’s hard to stop. The question is really this: which communications of data are permitted by the notion of variable scope? Perhaps not all variables are in scope for all constructions. Note that if we prevent communication by being in scope, we may not have excluded explicit communication, (cost?-)modelled as an effect. Here’s a laundry list of possibilities and wheezes.

  • stage We cut execution into stages. Values persist from earlier to later stages, but dependency is forbidden the other way around. Stages are worlds.
  • timing productive corecursion In Productive Corecursion via Guarded Recursion, Bob Atkey and I used a notion of time to ensure that corecursive processes can always give you ‘today’s’ behaviour before they loop. Days are worlds. At least for streams, Paula Severi is there already (e.g., her ICFP 2012 paper with Fer-Jan de Vries).
  • detailsI saw Jeremy Singer give a talk about ‘AnyScale apps’ — programs (e.g. games) which move between computing environments and react dynamically to the available resources (processing power, screen resolution, etc). To make them work, you have to be able to cut off computation below a dynamically chosen level of detail. Clearly, the cruder parts had better not depend on the optional refinements. Detail levels are worlds.
  • security As touched on, above, access to information can be managed via a preorder of privilege. Vincent Simonet showed the way with Flow Caml. Security levels are worlds.
  • distribution I have a bunch of computers. Each of them has access to some shared libraries, but also their own private memory. Communication between computers is expensive and requires an explicit effect. Types model which knows what. Computing resources are worlds.
  • Hoare logic Specifications can talk about program variables, but program code cannot talk about logic variables. Specfication and program live in different worlds.
  • phase Typechecking and compilation can involve partial evaluation of arbitrary code. Program execution does not involve types. Information flows from run-time code to the typechecker, even though the temporal order is the other way around. Note that run-time values do not flow to the typechecker, just the symbols which stand for them.

It’s time we stopped thinking of these issues as separate. They have a great deal in common.

Dependent type theory has a bit of a reputation as a mish-mash: all the terms and types jumbled up together in the same syntactic category. The flow of information from terms to types is exactly the source of its power and the basis for confusion amongst casual observers. A common misconception is (or at least used to be) that the flow of information from terms to types prevents the erasure of types from terms: that’s manifest nonsense — it prevents the erasure of terms from types, so no big deal. The absence of typecase means that terms depend only parametrically on types, so type erasure still works. It would be good, however, to nail that down by a separation of worlds. To a large extent, that’s what Sheard and Mishra-Linger did, what Barras and Bernardo did (building on work by Miquel), and what Bernardy and Moulin do in spades: all in different ways. This needs said:

Parametricity arises by quantification against the flow of information.

Quantifying over information you can’t access had better mean you don’t need to care what it is. Dependent type theory needs both parametric quantification (intersection) and non-parametric quantification (product). Moreover, which you choose shouldn’t depend on the domain. I can agree with the claim that it is useful to be able to quantify parametrically over types without believing that all quantification over types should be parametric. With a clear separation of worlds, we could write datatype-generic code when we want to (and stage its execution, too), but still make it clear when parametricity applies. Too many people are trapped into believing nonsense like ‘typecase breaks parametricity’ without interrogating the unstable presumptions that make it wise in a narrow locality.

What do I propose to do about this business of worlds? I’d like to put it on a more general footing. I don’t imagine there’s one holy and perfect system of worlds. I think it’s more important to know when a construction can be transported between systems of worlds. As usual, the morphisms are more important than the objects. So we need to find the general structure. This draft is the beginning of that story, but there’s much more to dig out. I’ll sketch the basic picture here.

Start from a vanilla dependent type theory, where contexts assign types to variables and judgments explain which terms a type admits. Now put everything in a world. Worlds live in a preorder W. Every context entry has a ‘source’ world: judgments construct terms in a ‘target’ world. The variable rule, ‘a variable in the context is a term’ acquires the side-condition that information can flow from source to target.

What about quantification and functions? For a long time, I thought that each world should have its own quantifier, abstraction and application, explaining what it is to be a function over stuff in that world. We’re used to that in System F, right? We have ∀, Λ and type application for functions over types, and →, λ and term application for functions over values: the lack of information flow from terms to types means → is reliably degenerate, and the lack of information flow from types to terms means that ∀ is parametric, allowing the Church-to-Curry erasure. That story works, but it’s not good enough.

Suppose I have a world for library code which can flow into any of a number of discrete computing facilities. I want each local computation to be able to apply library functionality to its own local data. So the type of a library function had better not fix the world its argument comes from. I spent a lot of time staring at Microsoft whiteboards trying to figure out how to recompute the types of functions as they moved between worlds, renegotiating their domains. My brain hurt. Then, the penny dropped.

We need portable quantifiers, e.g., capturing what ‘locally’ means, wherever we happen to be. Let us have a set of quantifiers Q, with a monotonic action, qw, on W. Now, types mention quantifiers, never worlds; worlds live in judgments. When we check a q-λ in world w, we bind the variable in world qw; when we check an application of a q-function in world w, we check the argument in world qw. E.g., the ‘locally’ quantifier has the identity action, but we might also have an ‘intersection’ which maps every w to the world where types live. As we work through a typing derivation, the action of Q on a given w will be iterated, giving each w a local subsystem of worlds. As long as the world preorder preserves the structure of those local subsystems, we can transport constructions whenever the preorder permits.

Here is a hasty Agda formalisation of the world transportation lemma which I cooked up one Sunday morning in the middle of a fen. The less said about the rest of that day, the better.

I think world systems have a considerable potential to change the way we manage information flow (I’m reminded of Douglas Adams’s glorious phrase “rigidly defined areas of doubt and uncertainty”) across a whole bunch of application domains. At the same time, I don’t think they will disturb the way we think about the semantics of type theories: we need a Kripke semantics to manage the context inclusion preorder with which we account for open computation (values are used not only in the empty context, but rather in contexts with at least as much information as the contexts in which they are constructed), so it might as well be an interesting Kripke semantics.

Away from the blackboard, they’ll change the way we do code generation, too. A bunch of these thoughts germinated from a beer mat Edwin Brady and I once scribbled on in the Basque and Carol, and they’re informing what’s happening with erasure in Idris today.

Morphisms between local world systems are important. In our work on corecursion, Bob and I are most particular about ensuring that the time-slicing which justifies the productivity of codata need not be negotiated by its consumer: you work with more of the detail visible, it checks out, then you rub out the pencil.

There’s a lot still to think about, but I’m excited and optimistic at the prospect. The ‘worlds’ idea is altering my perception of lots of other things I work on (i.e., it’s a threshold concept), so forgive me if I’m keen to pass it on.

brains splattered all over the walls

December 30, 2014

It’s nearly the new year, my lectures are all done, and the activities of War, Famine, Pestilence and Death in my daily life have settled down at least a bit. I wonder if I might find some of the discarded slops of my brain and scoop them back into my cranium, to see if they might restart. What was it that the person who used to be me used to think about? I hastily scribbled a list of things I think I think are worth thinking about, and as it got longer and longer, I got that thinking feeling. It occurred to me that maybe I should try to write just a little bit about where I’m at with some of them, as a way of examining the dependency relationships between them and planning some sort of strategy to address the mess.

Working through some of this stuff might also make it a bit clearer why I’m not in the mood for significant programming language design and implementation just now. I’m not promising that I’ll never be. I don’t want to make a bunch of short term decisions for the sake of ‘going live’ that I’m sure I’ll regret.

So, here’s the list that I scribbled, in the order in which I scribbled it (which is not a sensible order). I’ll try to give a one-line (of source, which seems to spill when rendered) summary of each.

  • worlds
    variables live in worlds; terms are built in worlds; worlds are preordered, controlling information flow
  • coinduction
    breaks subject reduction in Coq; has only non-dependent case in Agda; needs a better equality to repair
  • syntax-not-just-data
    an inductive datatype just gives the closed terms of a syntax with no binding; syntax should be primitive
  • irish induction-recursion
    my coding of IR functors is at least as rich as Dybjer and Setzer’s; they compose; too good to be true?
  • dependent types for data modelling
    types for spreadsheets or databases unless they’re relative to a model of the underlying data, so make that
  • reflection
    let’s just treat Set as a datatype (ahem, a syntax) and get on with it; get parametricity from worlds
  • universe hierarchies
    cumulativity matters; universe polymorphism is overrated; no fixed hierarchy but morphisms between them
  • problems-as-types
    using core types for the presentation of elaboration problems, not just the semantics of their solutions
  • ν-rules
    the equational theories of open terms can be generated more interestingly than by lifting closed computation
  • quotients
    what are they? what’s their proof-relevant counterpart? is that enough higher-dimensional structure?
  • OTT motivation delivery
    before we abandon Observational Type Theory, we should at least try to remember what it was and did
  • HoTT
    would really enjoy an internal notion of computation, a closed universe of HITs, the fun I’m used to
  • user interfaces in a darkened room
    a functional model of the structure of applications, abstracting away from appearance/hardware
  • traffic-dependent session types
    the protocol for future communication depends not on the prior processes, just on their traffic
  • linear dependent types
    with a dependent lollipop, rather than the current both-present-but-never-the-twain state of the art
  • theories of testing
    testing can reveal the absence of bugs if you can bound the weirdness of the programs being tested
  • differential structure and deep patterns
    building search (what strategy?) into pattern matching by allowing plugging-into-contexts on the left
  • mutation
    if we have sole ownership of data, we can mutate it, and maintain multiple access paths…consistently?
  • local time for recursion and corecursion
    use a temporal modality to demonstrate totality, but your client doesn’t care, so make sure it isn’t there
  • internal parametricity
    whatever I know about what I don’t know, I want a goddamn proof term which tells me that I know it
  • re-engineering failed definitions
    it’s all very well refining holes until you find out they’re the wrong holes; assisted head-scratching?
  • ornaments
    how do we work with ornaments in practice, both from the developer’s viewpoint and the compiler’s?
  • modularity
    how to put reference implementations in a signature and choose better implementations in the structure
  • effects-and-handlers
    pure function application is just the least nontrivial way for one computation to contextualize another

There are probably more that I’ve forgotten. I wonder where to begin. I’d be rash to promise a ‘24 days of unfinished half-ideas’, but I might get further than nowhere.

EDIT: I omitted ν-rules from the original version, quite by accident.

the first biscuit in the packet is always broken

May 2, 2011

I thought I might get myself a new blog. But then I found I had an old one that I didn’t use. Perhaps I’ll start using it. Don’t hold your breath. I’ll need to figure out how to add code and sums and stuff. Then things might get interesting. Sorry it isn’t yet. Interesting. Or much else for that matter. Publish and be damned.