I set many exercises which involve constructing sequential circuits. What’s a sequential circuit? It’s a “function” from i bits of input to o bits of output that can access and update s bits of internal state. That’s to say it’s a function from s+i bits to s+o bits of output for some s. I need to tell when they’re equal up to external observation, regardless of s.
We have 2^{s} states, x. Each determines an immediate behaviour, b(x):2^{i}→2^{o} and a continuation k(x):2^{i}→2^{s}. The behaviour is directly observable but the continuation is not. The continuation can be distinguished only by its behaviour.
My strategy is to analyse each circuit to identify the observational equivalence classes of concrete states, then try to put two such analyses into bisimulation. The first part of that amounts to finding the smallest set U of abstract states such that we have functions p:U→Pow(2^{s}) and f:2^{s}→U such that
In other words, find the coarsest equivalence such that in equivalent states, the circuit must, for all inputs, produce equal outputs and evolve to equivalent states.
For a given (s,b,k), suppose we have such a (U,p,f,u!,u?). And for some corresponding (i.e., same i, same o) (t,c,l) let us have (V,q,g,v!,v?). Our mission is to establish an isomorphism, ~, between U and V such that
The analysis works by starting with the hope that all states are equivalent and then disabusing ourselves of this notion. We dump all states into a singleton partition (U=1,p(*)=2^{s},f(m)=*), then refine by a bunch of observations. To refine a partition by an observation, replace each part by the partition you get by distinguishing its elements on the basis of the observation: you can tell them apart, so you need to separate them! Start refining by b, to separate the states which behave differently immediately. Then refine by f(k(,m) for each m: i.e, if you ever map source states to distinguished target states, you must distinguish the source states. Iterate to a fixpoint: as there are finitely many states, partitions cannot refine forever. We have grown our partition only when forced to by observable differences, so we have the smallest possible number of parts.
Now we need to grow our bisimulation, from the empty set of pairs. Start by inverting u! and v!, so that each immediate behaviour maps to the set of states which exhibit it: these must be pointwise isomorphic, which gives you a search space of candidates for pairs u~v. For every guess you make, you have to check that u?(u,m)~v?(v,m) for all m, so you learn more about what ~ must be.
The circuits are observationally equivalent, I claim, iff there exists such a bisimulation.
Here is my attempt at a Haskell implementation.
]]>The key basic choice is that the proof style is declarative. That is, the proof document shows you the decomposition of goals into subgoals which is but transient in systems like Coq and Agda. It’s intended to be used interactively, but my current plan is to facilitate that interaction by implementing a transducer for proof documents. So
rod doc1 .. docn
will read the given files and write new versions of them, updated by the machine’s take on what they say. In particular, a goal for which a rule has been suggested but no subderivations provided will be elaborated to a bunch of open subgoals if that rule applies.
E.g.
Show(imp(and(A, B), and(B, A))) by impI !
(where the ! is the user telling rod) becomes
Show(imp(and(A, B), and(B, A))) by impI {
Given(and(A, B)), Show(and(B, A)) ? }
(where the ? is rod asking the user)
I want the logic to be highly configurable, and I want to be able to express construction, where data unknown when a goal is stated get discovered as the proof progresses. Specifically, I want to be able to attach proof terms to a logic and watch them appear!
How on earth am I planning to do that? I’m writing this blogpost to try and answer that question for myself, apart from anyone else. They do say you should write the manual before the software…
To get anywhere, one must have stuff. Syntax is up for grabs, but I have limited energy for niceties. Stuff lives in sorts. There are four predefined sorts: Type
, Goal
, Hypo
and Rule
. Things of sort Type
are also sorts. There is a strong whiff of the Logical Framework.
declare Prop Type
That is, we can populate our sorts by declaring constructors.
declare
id kind
where
kind ::= scope sort
scope ::= (binding,..binding) 
binding ::= kind  name kind
e.g., I might declare that I can form goals and hypotheses from propositions
declare Show(Prop) Goal
declare Given(Prop) Hypo
or propositions from connectives
declare and(Prop, Prop) Prop
declare imp(Prop, Prop) Prop
But I might also choose to introduce proof objects:
declare Proof Type
and declare a construction goal
declare Make(Proof, Prop) Goal
declare Have(Proof, Prop)
where the proof object will be synthesized by rod as we go.
To get anywhere with proving things, we need inference rules. To build them, first declare a constructor of sort Rule.
declare impI Rule
Then say how to deploy the rule.
rule
Goal by
Rule subgoals
subgoals ::= {
subgoal;..}

subgoal ::= scope Hypo,..Hypo,Goal
You’ll notice I’m now writing kinds in grammars. They’re just terms that we check.
rule Show(imp(A, B)) by impI {
Given(A), Show(B) }
What are A and B? They’re metavariables. Metavariables are bound in patterns and used in expressions. The technical distinction between patterns and expressions must wait, but the idea is that we can solve for a metavariable in a pattern usage, and not in an expression usage. Likewise, we can infer the kind of a metavariable only from pattern usages. First order metavariables have only pattern usages: here, we could figure out A and B either from the goal or from the subgoal.
We may express modus ponens as follows:
declare impE(Prop) Rule
rule Show(B) by impE(A) {
Show(imp(A, B)) ;
Show(A) }
Actually, rules have an extra option.
rule
Goal by
Rule conditions subgoals
conditions ::= when Hypo,..Hypo 
which allows us to demand the presence of particular hypotheses to allow a rule to fire. As in,
declare hypo Rule
rule Show(P) by hypo when Given P
So we expect the goal to tell rod which hypothesis to check for.
Here is the introduction rule for conjunction.
declare andI Rule
rule Show(and(A, B)) by andI {
Show(A) ;
Show(B) }
I could define elimination as projection, but I’m not going to. Let me tackle that a different way.
Here’s another variation on rules, specifically geared towards elimination. These ones are anonymous! Let’s have some examples.
rule Show(P) from Given(imp(A, B)) {
Show(A) ;
Given(B), Show(P) }
rule Show(P) from Given(and(A, B)) {
Given(A), Given(B), Show(P) }
The idea is to establish the canonical means to exploit a hypothesis, preferably with as little prejudice as possible about the goal. In general, that’s
rule
Goal from
Hypo subgoals
On deployment, there had better be exactly one rulefrom which fits the situation.
A proof is a tree of goals and rule invocations. We’ve seen enough to have an example.
Show(imp(and(A, B), and(B, A))) by impI {
Given(and(A, B)), Show(B, A) from and(A, B) {
Given(A), Given(B), Show(and(B, A)) by andI {
Show(B) by hypo ;
Show(A) by hypo } } }
Checking the proof is a matter of checking that rule deployments match, then checking subproofs. Which raises the question of what matching might be. Let me park that question while I make it a bit more difficult.
Crucially, wherever the rule declaration has patterns, the rule invocation has expressions, and vice versa: we change direction as demanded by the kinds. But timing is everything…
You can build this proof interactively. You may write ? instead of a rule invocation, or give a rule and a !
Sez teacher: solve (A Prop, B Prop) Show(imp(and(A, B), and(B, A)))
Sez rod: Show(imp(and(A, B), and(B, A))) ?
Sez you: Show(imp(and(A, B), and(B, A))) by impI !
Sez rod: Show(imp(and(A, B), and(B, A))) by impI {
Given(and(A, B)), Show(and(B, A)) ? }
Let’s be kind about rulesfrom. You can mark a hypothesis with a !
, then invoke it using from !
.
Sez you: Show(imp(and(A, B), and(B, A))) by impI {
Given(and(A, B)) !, Show(and(B, A)) from !}
Sez rod: Show(imp(and(A, B), and(B, A))) by impI {
Given(and(A, B)), Show(and(B, A)) from and(A, B) {
Given(A), Given(B), Show(and(B, A)) ? } }
And so it goes.
Let’s go wild.
declare lam((Proof)Proof) Proof
declare app(Proof, Proof) Proof
The lam constructor can bind a variable. That means subgoals will need to bind variables, too.
declare abstract Rule
rule Make(lam(x. b[x]), imp(A, B)) by abstract {
(x Proof) Have(x, A[]), Make(b[x], B[]) }
A subgoal can be prefixed with a parenthesis containing commaseparated declarations. Note that the rule has a proof expression as its first argument, with lam binding x
and using it as the parameter to metavariable b. The square bracket syntax is used to indicate permitted dependency. Here, we are saying that b
may depend on x
(in the rule and in the subgoal), but that A
and B
may not. If you omit the square brackets, then by default, you may depend on anything in scope, so the A[]
and B[]
are necessary. We deduce the kind of a metavariable by abstracting the kind being checked at its use site over its permitted dependencies. Here, we obtain b(Proof) Proof
.
A pattern usage of a metavariable is some m[x0,..xk] which does not occur inside a metavariable expression usage. A metavariable expression usage is a regular application of a metavariable to terms m(s0,..sk)
.
Solving m[x0,..xk] = t
for m amounts to checking whether t
can be persuaded to depend only on x0,..xk
: the only tolerable occurrences of some other variable, y
, is in the dependencies of another metavariable pattern in t
, i.e., if t
contains some n[..y]
, then we can ensure m
‘s independence of y
by forcing n
to become independent of y
, also. That’s the move Dale Miller calls “pruning”.
Solving m(s0,..sk) = t
for m
is just too hard. Our only chance is to wait for m to get solved elsewhere and substituted. That is, expressions force us to postpone constraints. It is forbidden for a metavariable to have exclusively expression usages, as there is no way ever to solve it!
We can now have bidirectional information flow in rules! In the next rule, A and B flow down, but we should expect f and a to flow back up.
declare apply(Prop) Rule
rule Make(app(f, a), B) by apply(A) {
Make(f, imp(A, B)) ;
Make(a, A) }
declare use(Proof) Rule
rule Make(x, P) by use(x) when Have(x, P)
Or if you like normal forms, drop apply and use
rule Make(p, P) from Have(f, imp(A, B)) {
Make(a, A) ;
Have(app(f, a), B), Make(p, P) }
Sez teacher: solve (A Prop, B Prop, C Prop) Make(p, imp(imp(A, imp(B, C)), imp(imp(A, B), imp(A, C))))
Sez rod: Make(p, imp(imp(A, imp(B, C)), imp(imp(A, B), imp(A, C)))) ?
Sez you: Make(p, imp(imp(A, imp(B, C)), imp(imp(A, B), imp(A, C))))
by abstract !
Sez rod: Make(lam(x. q[x]),
imp(imp(A, imp(B, C)), imp(imp(A, B), imp(A, C))))
by abstract {
(x Proof)
Have(x, imp(A, imp(B, C))),
Make(q[x], imp(imp(A, B), imp(A, C))) ? }
I’ve got a rather illiberal Cstyle syntax, which I’d like to sweeten. I’ve cooked up a formalism for grammarswithscoping which might fit here. More on that another time.
I’ve built the illiberal parser, checkers for declarations and rules, and the unification algorithm that the proofchecker will use. Let’s see how I get on with the rest. I’m still unsure whether hooliganistic unification is enough, or whether I will need to be more prescriptive about expected information flow in rules, goals and hypotheses. Perhaps a more careful approach to mode would make pattern matching enough.
]]>I’m also a fan of smallstep computation, because I work with dependent types, and I don’t like presuming awfully strong properties of computation while I’m still in the process of writing the rules down.
I separate my term languages into separate syntactic categories.
checked s,t,S,T ::= c  [e]
synthed e,f ::= x  t : T  e d
The c stands for constructor and the d for destructor: more on them shortly.
Correspondingly, I have two typing judgments.
Γ  T ∋ t
Γ  e ∈ S
Contexts, Γ, assign types to variables.
context Γ,Δ ::=  Γ, x : S
If you bother me about variable freshness conditions, I shall respond by switching to a nameless notation: I’m perfectly comfortable living there, but I know I should use names when communicating with people who aren’t used to it.
There’s a smallstep computation relation for each syntactic category, overloaded. Computation never changes syntactic category. It’s a little bit funny to include type ascriptions. Those t : T terms I refer to as radicals, because they’re the active things in a computation.
We have υcontraction which notes that a radical no longer capable of computation needs no type. That’s how things stop.
[t : T] ↝_{υ} t
We have another class of βcontractions which explain how things go.
(c : C) d ↝_{β} e
That’s to say computation is always a reaction between a constructor and a destructor at a given type. The ↝ relation is the closure of the contractions under all contexts.
Discipline 1. Do not refer to the context explicitly in a typing rule. Premises may give a local context extension.
The assertion
 x : S
means that the (implicit) context ascribes S to x. This is used in precisely one rule.
 x : S
———– (var)
x ∈ S
Here’s another rule I enjoy, without even adding in any stuff to the type theory.
e ∈ S S = T
———————— (embed)
T ∋ [e]
We can and will worry about what = means, but it’s ok to think of it as αequivalence, which those of us in the de Bruijnies just call syntactic identity.
I’m going to insist on the existence of one constructor, *, whose job is to classify anything (other than itself, possibly) which can be ascribed as a type.
With that, I add
* ∋ T T ∋ t
———————— (ascribe)
t : T ∈ T
There are but two other rules that are common to all my bidirectional systems, allowing types to compute before checking, or after synthesis.
T ↝ T’ T’ ∋ t
————————– (pre)
T ∋ t
e ∈ S S ↝ S’
————————– (post)
e ∈ S’
I get ahead of myself. I should talk a little about how the syntax of these calculi works, and more pressingly, how the metasyntax of the formulae which show up in typing rules works.
Our calculus has syntactic sorts
sort i ::= chk  syn  dst
and from syntactic sorts we generate the metasyntactic kinds, which are higherorder and uncurried, indicating the syntactic binding structure of things.
kind k ::= (k, .. k)i
where we drop empty ().
A signature assigns kinds to constants. A scope assigns kinds to variables and a problem assigns kinds to metavariables. I indicate kinding by juxtaposition. E.g.,
Π(chk, (syn)chk)chk λ((syn)chk)chk
We may then construct expressions from constants, variables and metavariables: each must be fully applied, in accordance with their kinds, to spines which, in each position, abstract according to the required kind before providing a subexpression. That is, our metalanguage keeps everything ηlong with respect to kinds.
Π(S, x. T(x)) λ(x. t(x))
Note that in the above, the problem is
S()chk T(syn)chk t(syn)chk
Expressions admit simultaneous hereditary substitution (scope morphisms) for variables and thus substitutions (problem morphisms) for metavariables.
But there’s more, or rather, less. A pattern allows constants and variables to take spines, but the metavariables are instead given only the subset of the variables M[x,..] upon which they may depend. We may write patterns
Π(S, x. T[x]) λ(x. t[x])
If you think of type ascription as a reserved infix constructor whose kind is (chk, chk)syn, then you will see that a βredex is a pattern
(λ(x. t[x]) : Π(S, x. T[x])) s
but that its reduct is an expression, not a pattern.
t(s : S) : T(s : S)
Which brings me to
Discipline 2. Judgments and relations are moded.
There are three modes: input, subject and output.
We have
input  input ∋ subject
input  subject ∈ output
input  x : output
input ↝ output
Discipline 3. Rule conclusions have patterns for inputs and subjects, but expressions for outputs. Rule premises have expressions for inputs and subjects, but patterns for outputs. Metavariable scope rotates clockwise, with patterns binding metavariables and expressions merely using them.
Observe that our βrule exactly follows this discipline, as do the typing rules, so far.
One notationally helpful consequence of this discipline is that a metavariable always has one binding site t[x] in a pattern which identifies its dependencies, so at its use sites in expressions, we may write merely t(e), rather than t(e/x), because the binding site makes clear what is to be substituted. I apologize to anonymous reviewers I have confused in the past by following this convention without elucidating it.
Note that this discipline is not enough to make typing rules algorithmic in that inputs need not specify outputs. Indeed, one can imagine a judgment form which makes up an output from thin air, whose use in a premise allows us to bring a new metavariable into the problem. However, following this discipline takes us a great deal closer to an algorithm.
Pattern matching is the business of solving an equation between a pattern and an expression, yielding an instantiation of the metavariables in the pattern. It is stable under substitution in the sense that if a given expression matches a pattern, any substitution instance of the expression matches the pattern too, with the correspondingly substituted instantiation.
Let me double down on discipline 1.
Discipline 4. With the exception of the variable rule, it is forbidden to mention variables in the context, apart from those bound in the context extension of premises.
The consequence of these disciplines is that we cannot help but achieve stability under substitution. Our rules, by discipline, characterize only those properties of terms which are invariant under substitution.
Lemma 5. Substitution is admissible for all judgments and relations.
x : S  J[x] e ∈ S
————————————
J(e)
The variable rule is constructed exactly to ensure that we know how to substitute for its deployment in derivations (which are, by the by, also stable under thinning, by construction — Lemma 5ε).
There is a helpful refinement of discipline 4 which I took too long to invent, partly because a great many rule systems out there in the wild do not respect it, resulting in the need for CLEVERNESS in situations where STUPIDITY is perfectly effective.
Discipline 6. Each premise subject must be a distinct conclusion subject metavariable applied to distinct variables bound in that premise’s context extension. A conclusion subject metavariable becomes available for use in other expressions only after it has been used in such a subject. Every conclusion subject metavariable must be such a premise subject.
That is to say, a typing rule takes responsibility for validating its subject by determining how its parts must be validated.
But, moreover, a rule must never revalidate anything. Our moded discipline allows us to work contractually.
A rule is a server for its conclusion and a client for its premises. It is for clients to make promises about inputs and servers to make promises about outputs. The purpose of a typing derivation is to justify the promise about its subject asserted by its conclusion.
The syntactic constructors/destructors are constants with kinds of the form
((syn, .. syn)chk, .. (syn, .. syn)chk)chk
or, respectively,
((syn, .. syn)chk, .. (syn, .. syn)chk)dst
The constructors and destructors of our object languaeare fully applied uses of syntactic constructors and destructors, respectively. The application destructor has an empty name and kind (chk)chk, along with some liberal conventions about when parentheses are strictly necessary.
The checking rules for constructors, c or C, are subject to the following discipline.
Discipline 7. The conclusion of the typing rule for a constructor must have an outermost constructor in the pattern for its type and no use of embedding.
Our example rules all follow this discipline.
——– (type)
* ∋ *
* ∋ S x : S  * ∋ T(x)
————————————– (Π)
* ∋ Π(S, x. T[x])
* ∋ S x : S  T(x) ∋ t(x)
—————————————– (λ)
Π(S, x. T[x]) ∋ λ(x. t[x])
Meanwhile, the elimination rules follow a corresponding discipline.
Discipline 8. The conclusion of the typing rule for eliminating with a destructor must have a metavariable in the subject pattern for the synthed thing to be eliminated; its first premise must demand an outermost constructor in the pattern for the type synthesized for that metavariable and no use of embedding.
The application rule follows discipline 8.
f ∈ Π(S, x. T[x]) S ∋ s
——————————————— (application)
f s ∈ T(s : S)
Note that the kinds of the constants allowed in constructors and destructors ensures that all of their conclusion metavariables are checked.
The typing rules thus insist on constructor patterns for the types of things being constructed or destructed.
A βredex is what you get when the constructor patterns required for the active types in a constructor checking rule and a destructor synthesis rule unify.
I forgot to mention, unification is the business of solving an equation between two patterns. Fortunately (thank you, Dale Miller; thank you, James McKinna, for telling me to read Dale Miller), this is a decidable problem which admits most general solutions. Note that the most general solution to m[x] = c(n[x,y]) is n[x,y] = n'[x], m[x] = c(n'[x]), i.e., in the presence of binding, solving a metavariable may require pruning permitted dependencies of other metavariables, but apart from that, everything is as in good old Robinson ’75.
If we follow
Discipline 9. Distinct constructor/destructor rules for a given type constructor have distinct constructor/destructor constants. There is exactly one βrule for each unifying pair.
we obtain confluence by return of post.
Lemma 10. βυreduction is confluent by construction.
No redex is a redex for more than one contraction scheme. Every subexpression of a redex for pattern p with matching substitution σ which happens also to be a redex is a residual, i.e., deep enough within the outer redex to be left intact within σ. Takahashi’s proof goes through on the nod. That is to say, we may construct the notion of parallel reduction > which is closed under all syntactic constructs including nullary ones, but allows each redex to reduce to its contractum after the parallel reduction of its schematic variables. In other words, do anything you can see as long as they don’t interfere.
x  t(x) > t'[x] S > S’ x  T(x) > T'[x] s > s
————————————————————————— (β)
(λ(x. t[x]) : Π(S, x. T[x])) s > t'(s’ : S’) : T’ (s’ : S’)
But discipline 9 ensures that the notion of development is well defined. That’s the born to be wild operation of firing all of your guns at once. There is a function dev(t) such that
t > dev(t) t > t’ ⇒ t’ > dev(t)
because > lets you fire all of your guns at once, but if you don’t, you can always fire just the remaining guns on your next move. So > has the diamond property.
Without difficulty, ↝ ⊆ > ⊆ ↝*, which means ↝* has the diamond property, too.
If we forgot, for a moment, the rules for pre and postcomputation, we could readily identify, by inversion, what we must know about the metavariables, all yielding sort chk, in any redex. For application, that would be
* ∋ S
x : S  * ∋ T(x)
x : S  T(x) ∋ t(x)
S ∋ s
and we can similarly figure out the type R mandated by the elim rule, which in our example is
T(s : S)
We may reduce our redex to some r : R’ such that R ↝^{*} R’ for which * &nil R’ and R’ ∋ r are derivable from the above facts and substitution (which is known to be admissible), allowing r : R’ ∈ R’.
In our example, we choose r = t(s : S), and we can derive s : S ∈ S and hence, by stability, * ∋ T(s : S) and T(s : S) ∋ t(s : S).
Define optional reduction ↝^{?} = ↝ ∪ =. By construction, if something optionally reduces, its subformulae optionally reduce also. If something that optionally reduces actually itself contracts, its subformulae stay put.
Lemma 11. We have the following.
Γ  T ∋ t ∧ Γ ↝* Γ^{*} ∧ T ↝* T^{*} ∧ t ↝^{?} t’ ⇒ Γ^{*}  T^{*} ∋ t’
Γ  e ∈ S ∧ Γ ↝* Γ^{*} ∧ e ↝^{?} e’ ⇒ ∃S^{*}. S ↝* S^{*} ∧ Γ^{*}  e’ ∈ S^{*}
That is, if client computes inputs as much as they like and subjects by at most one step, server can compute outputs enough to recover the judgment.
Step 1. Transform the rules to syntax directed form by adding arbitrary precomputation as the first premise of every checking rule and arbitrary postcomputation as the last premise of every synthesis rule.
Step 2. Induction on derivations. Discipline 6 ensures that the induction hypotheses are strong enough to cover all the structural rules.
Step 3. For each contraction scheme, we deploy the induction hypotheses, then patch the derivation justifying that contraction by appeal to confluence. Crucially, matching constructor patterns is preserved by reduction: disciplines 7 and 8 ensure that computation does not destroy the applicability of rules.
All my type systems enjoy confluence and subject reduction. It is not a thing to prove. It is a thing to not screw up, by knowing how to write down rules which follow disciplines ensuring that they’re not just any old rubbish. Andy Pitts once said “Type soundness proofs are two a penny.”, but I think they’re cheaper than that.
Add dependent pairs, following the discipline.
]]>I expect I’ll have about 100 first year students taking the class I pretend is an introduction to hardware but which is actually an introduction to functional programming and semantics. There’s a lab session each week in which I experiment on them, and processing the data resulting from those experiments is a major headache. Feedback needs to be rapid if it’s to inform remedial action.
My workflow is pretty poor. So far, the test questions have been done on paper, even though each student is at a computer. My comrades and I mark those pieces of paper, but then it’s not over, because I have to type in all the scores. It takes too long. It often goes wrong.
One method I adopted last year was to identify solution traits. Few mistakes are peculiar to one individual. Many good or bad properties of solutions, or traits, as I call them, are shared between many submissions. I learned to save time by giving traits a coding system. Markers just write trait codes on the scripts; the meaning of those codes is given once, on the web page associated with the assessment item. Markers also delivered a score for each part of the exercise and a total score. We tried to map traits to scores consistently, but that’s not always easy to do in one pass. Backtracking was sometimes required. If we had multiple markers, we’d share an office, and the shared trait description list would grow on the whiteboard as new things came up. I discovered that I could be bothered to type in both the score and the trait list for each student, but it was quite a bit of work.
The idea of solution traits as salient aspects of student submissions struck me as something from which we could extract more work. I ought to be mining trait data. Maybe later.
Markers should not score solutions directly. If we can be bothered to classify traits, we should merely identify the subset of recognized traits exhibited by each question part. Separately, we should give the algorithm for computing a score from that trait subset. That way, we apply a consistent scoring system, and we can tune it and see what happens. Here’s the plan. Each trait is a propositional variable. A scoring scheme is given by a maximum score and a list of propositional formulae each paired with a tariff (which may be positive or negative). The score is the total of the tariffs associated with formulae satisfied by the trait set, capped below by 0 and above by the indicated maximum. We should have some way to see which solutions gain or lose as we fiddle with the scheme.
Each marking job is presented to the marker as a web page showing the question, the candidate solution, the specimen solution, and the list of checkboxes corresponding to the traits identified so far. It may be that some magic pixie helpers (be they electronic or undergraduate) have precooked the initial trait assignment to something plausible. That is, online marking jobs have the advantage that we can throw compute power at them whenever solutions can be algorithmically assessed, but we don’t have to construct exercises entirely on that basis. If all is well, the marker need merely make any necessary adjustments to the trait assignment and ship it. Problems may include the need to add traits, so that should be an option, or to request a discussion with the question setter and flag the job for revisiting after that discussion. Concurrent trait addition may result in the need to merge traits: i.e., to define one trait as a propositional formula in terms of the others, with conflicting or cyclic definitions demanding discussion. Oh transactions.
How do I get these marking jobs online in the first place? Well, the fact that the students do the problems in labs where each has a computer is some help. Each exercise has a web page, and whenever it makes sense to request a solution which fits easily into a box in an HTML form, we can do that, whether it’s to be marked by machines or by people. But there may be components of solutions which are not so easily mediated: diagrams, mainly. I have previous at forcing students to type ASCII art diagrams in parsable formats, much to their irritation, but I would never dream of making such a requirement under time pressure. I need a way to get 100 students to draw part of an assignment on paper, then make that drawing appear as part of the online marking job with the minimum of fuss.
I prepare the paper on which they will draw. It has a printed banner across the top, which consists of three lines of text, each with one long word and sixteen threeletter words, lined up in seventeen columns of fixedpitch text. The three long words in the left column vary with and identify the task. The 48 threeletter words are fixed for the whole year. Each student has an identity code given by one threeletter word from each line, and the web page for the exercise reminds them what this code is. Each position in each line stands for a distinct number in 0..15, and the sum of the three positions is 0 (mod 16), so a clear indication of the chosen word in just two of the lines is sufficient to identify the student. I can print out a master copy of the page, then photocopy it 100 times and hand the pages out. The student individuates their copy by obliterating their assigned words, e.g., by crossing them out (more than once, please).
I collect in the pages at the end and I take them to the photocopy room, where the big fast photocopier can deploy its hidden talent: scanning. I get one enormous pdf of the lot. Unpacking the embedded images with pdfimage, I get a bitmap for each page. Using imagemagick, I turn each bitmap into both a jpeg (for stuffing into the marking job) and a tiff (cropped to the banner) which I then shove through tesseract, a rather good and totally free piece of optical character recognition software, well trained at detecting printed words in tiffs. The long words present in the scanned text tell me which exercise the jpeg belongs to; the short words missing from the scanned text tell me (with high probability) whose solution it is. Solutions not individuated by this machinery are queued for humans to assess, but experiments so far leave me hopeful. The workflow should be: stack paper in the document feeder, select scantoemail, select the relevant daemon’s email address, press go. And we’re ready to bash through the marking jobs online.
Once we markers have done all we can and it’s time to give the students their feedback, we push the release button which bangs the associated dinner gong. Online, the students are faced with a marking job, showing the question, their solution, the specimen solution, and an empty checkbox trait list, all in a form with a submit button. We oblige them to mark their own work in order to be told their given score. When they hit the submit button, they get to see their score, and on which traits their marker’s opinion diverges from their own. If they are at least twothirds in agreement, a small donation is made to their reputation score (as distinct from their score in the topic of the assessment).
Tutorial homeworks can and should also give rise to online marking jobs in just the same way. Part of the exercise can be a web form and the rest done by uploading an image. There are scanners in the lab, but we can also arrange to allow submission of images by email from a phone or a tablet. Once a student has submitted a solution, they become available for marking jobs on the exercise, starting with their own, but also for other people. After the tutorial, each student should certainly confirm their selfassessment, but preferably also revisit any other marking they did prior to the tutorial. Reflection is reputationenhancing. Peertopeer marking is reputation enhancing. Note that tutors will have the ability to eyeball homework submissions (if only to detect their absence) but are not paid to spend time marking them.
Expert students who have nothing to gain by taking a test (because they already have full marks in the test’s topic) may, if they have free time at the right time, collect reputation by marking their colleagues’ test submissions. The results they deliver must be moderated, but they do at least help to precook marking jobs for the official markers. Of course, reasonable accuracy is necessary for payment.
An exercise (or parts thereof) should be associated with a topic and constitute evidence towards mastery of said topic. If some parts of an exercise conform to a standard scheme, that’s also useful information. Some traits will relate to the topic (e.g., typical misunderstandings) and some to the scheme, so we gain a good contribution to the trait set for a brand new question just by bother to make those links. We might also seek to identify regions in the teaching materials for the relevant topic which either reinforce positive traits or help to counteract negative ones: crowdsourcing such associations might indeed be useful.
My main motivation is to try to improve the efficiency of the marking process in order to give feedback more rapidly with less effort but undiminished quality. Thinking about technological approaches to the management of marking is something I enjoy a great deal more than marking itself, so I often play the game despite the little that’s left of the candle. I should watch that. But the shift to marking as an online activity also opens up all sorts of other possibilities to generate useful data and involve students constructively in the process. It’s a bit of an adventure.
]]>In a typical university curriculum, we arrange the classes (or modules or courses or whatever you call them) in a dependency graph: to do this class, you should already have done that class, and knowledge from that class will be assumed, etc. I find it useful to push them same sort of structure down a level, the better to organise the broad topics within a class, and even sometimes to structure the process of learning a topic. “Where am I in this picture?” is the question the students should ask themselves: we should make it easy for them to find the answer.
Suppose the hierarchichal structure is given like a file system where each node is a directory. In a real file system, we might indeed represent a node as a directory, but not everything which is a directory will necessarily correspond to a node: each node will need a file which maps its internal curriculum, indicating which subdirectories are subnodes and, for each subnode, which other subnodes are immediate prerequisites. What do I mean by “A is a prerequisite of B?”. I mean “Mastery of A is necessary for study of B to be sensible.” There should be no cycles in that graph: mutual relevance does not imply any prerequisite status.
Every such hierarchy can be flattened by giving each node an entry point (a prerequisite of all subnodes) and an exit point (whose prerequisites are the entry point and all subnodes), then linking a node’s entry point from the exit points of its prerequisites. It may also be helpful (but it’s certainly not vital) to indicate for each subnode its external prerequisites, being those nodes elsewhere (neither a sibling nor an ancestor) on which it depends. These should be consistent with the internal curricula, in the sense that the flattening must remain acyclic. Note that we can have D/A/X > D/B/Y and D/B/W > D/A/Z,
but if so neither A nor B may be considered a prerequisite of the other.
Colour schemes: if you have mastered a node, it’s safetyblue; if you have mastered the prerequisites for a node but not the node itself, it’s activitygreen; if you haven’t yet mastered the prerequisites for a node, it’s dangerred. Red/green distinctions are not ideal for colourblind people, so we should write the captions on green nodes in a more emphatic font: they constitute the frontier where progress can be made.
If we plot the curriculum graph, to whatever level of detail we can locally muster, we give some structure to the businesses of teaching, learning and assessment. Learning is what the students do: we can have no direct effect on learning, except perhaps surgery, a fact which sometimes goes astray in discussions of ‘learning enhancement’. Teachers can have an impact on the environment in which learners learn, but ultimately we’re passive in their learning process, and they’ll take whatever they take from the experiences we give them. Learners are going to make a journey through the curriculum. Teaching is how we try to propel them on that journey. Assessment is how we and they determine where they have reached on that journey. If we associate teaching and assessment activities with nodes in the curriculum graph, we make clearer their specific utility in the learning process.
Crucially, by identifying the prerequisition structure, we focus attention in from the whole picture to the student’s frontier of green nodes where they can sensibly study but have not yet achieved mastery. Now, by aligning teaching materials to nodes, the students can identify those which are immediately relevant to their progress, and by aligning assessment to nodes, the students can tell whether progress is happening. My purpose is simply to make it as clear as possible where each student is stuck and what they can do about it.
Teaching takes a variety of forms. We might write notes. We might give lectures. We might offer lab sessions or tutorials. We can arrange ad hoc help sessions. We should expect the delivery of lectures to be an admissible linearisation of the prerequisition structure: that is, lectures should also have a frontier. The students should be able to compare their own frontier with the lecture frontier, and with a fuzzed out version of the cohort’s frontiers. It’s a danger sign if a student’s frontier is strictly behind a lecture: they aren’t best placed to get much from it, at least not at first. If notes are available online in advance of lectures, we acquire the means to cue advance reading and to detect it. The more tightly aligned assessment is to the curriculum, the easier it is to prescribe remedial activities.
Moreover, at the beginning of a lecture, I should very much like to know how the class divides as red/green/blue for that topic. Too much red (or, less likely, overwhelming blue) and I should maybe reconsider giving that lecture. Of course, you only get useful tips like that if you’re doing enough assessment (little and often is better than in overwhelming clumps) to gauge the frontier accurately.
When a student visits a node, they should see its hierarchy of ancestors and its immediate subnode structure. The status of all of these nodes should be clear. The rest of the information before them should answer as readily as possible the question ‘What can I usefully do?’. They should see a chronological programme of activities for that node, first into the future and then into the past: lectures may have reading associated; tutorials may have homework handins associated. A node may have a number of assessment items associated, some of which may be active: the list of all assessment items at or below the current node should be readily accessible. Every piece of teaching material should be accompanied by the means to record engagement and comprehension, and to leave a note for the attention of staff.
I want to stop writing for now and publish the story so far. I haven’t said much about assessment mechanisms or what mastery might consist of: these issues I have also been thinking about. But I hope I’ve at least given some reason to believe that there is something to be gained by refining the graph structure of the curriculum when it comes to modelling the progress made by learners and promoting useful activity, for us and for them.
]]>There are n sweets in a bag.
6 of the sweets are orange.
The rest of the sweets are yellow.Hannah takes at random a sweet from the bag.
She eats the sweet.Hannah then takes at random another sweet from the bag.
She eats the sweet.The probability that Hannah eats two orange sweets is 1/3.
(a) Show that n^{2} – n – 90 = 0
(b) Solve n^{2} – n – 90 = 0 to find the value of n
What makes this question so hard? Might we ask it differently? What’s going on? At undergraduate level (first years, mostly) I’ve been setting and marking exams: my lot are not much older than the pupils puzzling with Hannah. It’s good to try to think about these things. (It’s just possible my niece sat that exam, which is an extra cue to be less reflexive and more reflective.)
Correspondingly, I have a variety of crackpot theories, but before I waste your time with them, let me trouble you with some more respectable theory from David Perkins, specifically his paper Beyond Understanding.
It’s not just a matter of what you know. What does what you know enable you to do? Perkins characterises an escalating scale of what I might call knowledge weaponisation (which remind me of the old joke about solving, stating or colouring in the NavierStokes equations):
The “Hannah’s sweets” puzzle clearly demands more than possessive knowledge of basic probability, algebraic manipulations of fractions, and solving quadratics. Given part (a), part (b) should be a routine problem. It’s part (a) that sticks out as a bit peculiar, pulling an equation like a rabbit from a hat then asking the students to find a hatful of rabbit shit. It doesn’t demand anything they don’t know, but it isn’t a routine calculation.
If I take my specs off, metaphorically I mean, the detail becomes indistinct but the general story arc remains perceptible. The question goes like this
That’s like lots of questions. Once upon a time, there might have been a question like this.
Archie own six black socks, some number of pink socks and no other socks. He is too lazy to pair them up before he puts them in his sock drawer all in a jumble. He always gets dressed in the dark, picking two socks at random. The day after laundry day, when all the socks are available, he typically wears a pair of black socks one third of the time. How many pink socks does Archie have?
My question, with its stereotyped male protagonist and bias against pink, probably needs a bit of rethinking before we can issue it to the youth of today. But it starts by introducing a mystery quantity, gives us some chat which constrains that quantity, then finishes by asking us to find that quantity. That is, it’s coded as an instance of that standard problem type. Moreover, it doesn’t name a variable, let alone confront us with a quadratic formula, and it doesn’t specifically invoke the concept of probabilty. Socks motivate the interest in “two the same” better than sweets, but they might predispose us to guess that they are found in even numbers. The latter is a good red herring, and besides, Archie has probably lost the odd sock here and there. But I digress. The issue I’d like to open is the relative difficulty, from a human perspective, of “Hannah’s sweets” and “Archie’s socks”.
You see, my crackpot theory is that “Hannah’s sweets” is knocked off an older question uncannily like “Archie’s socks”, but when revising it, the examiners named the number of sweets and added the intermediate goal to establish the quadratic constraint in an attempt to make the problem require less initiative. If so, I think they also made it more intimidating. However, they also made the last phase of the problem, part (b), completely routine, in the hope that people without the initiative for part (a) would still collect some marks. Whether students notice that they can do part (b) without a clue for part (a) is another matter. Many students stop doing a question at the first part which causes trouble and read only on demand, which means that in “problem” questions, they miss the clues in the later question parts for what might constitute useful information in the prose.
The media have, by and large, not printed part (b) of the question. They make it look like the question is “there are some sweets; prove an equation”, rather than “there are some sweets; find out how many”. See, for example, Alex Bellos’s piece in the Grauniad. Why is this? Crackpot theory time again.
Especially in its curtailed form, “Hannah’s sweets” looks superficially weirder than “Archie’s socks” because some algebra terrorism has been added and the purpose has been taken away.
What both questions have in common is that they are in code. They must be decoded before the algebra can begin. “Hannah’s sweets” uses weaker encryption than “Archie’s socks”. I’m afraid that’s because I wrote “Archie’s socks”: here are some more excerpts from my archive. I quote them selectively, not to give you whole questions, just a sense of my style of distraction.
Disco Mary is rummaging through a collection of old electronic spare parts. She finds a multicolour lamp which has three Boolean inputs, labelled red, green and blue. … The green input signal is connected to the output of a T flipflop. The blue input signal is connected to the output of another T flipflop. … Mary’s favourite song is “What A Blue Sunset” by Ray Dayglo and the Thunderclaps, so she decides to wire the lamp and flipflops to make a repeating sequence of colours, changing with each clock cycle: cyan, blue, yellow, red, then back to cyan and round again.
Madame Arlene teaches the Viennese Waltz. When she is teaching beginners, she finds that she has to shout “1, 2, 3, 1, 2, 3,. . . ” repeatedly for ever, to keep her pupils in time. When she teaches advanced classes, she doesn’t need to shout, because they listen to the music. … Madame Arlene builds a shouting machine and wires it into her music player: it gets a 2bit unsigned binary number as its input and a clock signal generated by the player in time with the music. At each tick, the shouting machine checks its input: if it gets 0, it shouts nothing; otherwise it shouts the number it gets. … The challenge is to generate the input signal for the shouting machine, so that both counting and silent behaviours can happen.
A control panel has four switches on it, named S, T, U and V, respectively. Each switch sends a 0 signal when its handle points downward and a 1 signal when its handle points upward. This is the current setting: [S down, T up, U down, V up] The control panel is wired both to the lock of a safe and to a burglar alarm. The setting on the control panel represents a number in 4bit two’s complement binary notation. … To open the safe, you need to construct a circuit which connects the switches S, T, U, V to the release control R which will set R = 1 if and only if the correct combination is entered. An informant has discovered that the correct combination is 6. … [alarm circuit diagram] … You can flick only one switch at a time. You need to flick switches in sequence to change from the current setting to the setting with the correct combination. You must not set off the alarm.
Professor Garble is a researcher in multicore programming techniques, attempting to explain a recent trend in processor performance. ‘Moore’s Law is finished! That’s why processor clock speed has levelled off. And that’s why processors have exponentially increasing numbers of cores.’ Tick the box or boxes for whichever of the following is true. … [] He is correct in none of the above ways.
That’s just the sort of thing that occurs to me in the bath.
All of these questions require you to extract a model of what is going on from some chat. That’s the skill I am trying to test. I make the chat blatantly spurious partly to be clear that it is a “decode the puzzle” question, but mostly because I am habitually facetious. It occurs to me that maybe exams should be no place for facetiousness from them or from me: why should I have a laugh when they’re not having quite so much fun? This question style is at least routinely visited upon them in the course of the class: if you have paid attention to past papers, it is exactly what you expect. Still, I can see how it could be exclusionary, like an injoke that you detect but don’t get. I think perhaps that I should do less of that stuff in exams and more in class, where there’s less pressure and we can afford a bit of a laugh while learning to decode problems.
But I really do digress. The point about encoded problem questions is that you need to recognize when you are being told Something Important, and what that something is. In that respect, it’s a lot like doing a cryptic crossword: crossword clues use a stylised language that takes time and practice to acquire. I was taught to do crosswords by my father’s colleagues, who always appointed me the writerinner for the lunchtime crossword and were happy to indulge my queries: what signalled an anagram, an inclusion, a pun, and so on. In the same way, I instinctively decode the declaration “The probability that Hannah eats two orange sweets is 1/3.” as the instruction “Write a formula for the probability that Hannah eats two orange sweets and set it equal to 1/3.”. It’s familiarity with this sort of code which pushes puzzles like “Hannah’s sweets” back down Perkins’s scale of Ps. And that’s teachable. When I see part (a), I’m a bit spooked, and I think “Why are they asking me to deduce this equation? I already have an equation? Why are they not just asking me what n is? Ah, that’s part (b). Oh well, I expect I should be able to deduce the part (a) equation from mine by doing a wee bit of algebra.”. I’m already on course, and part (a) threatens to throw me off it: that’s why I think “Archie’s socks” is easier. I’m reminded of Whitehead’s remark:
It is a profoundly erroneous truism, repeated by all copybooks and by eminent people when they are making speeches, that we should cultivate the habit of thinking of what we are doing. The precise opposite is the case. Civilization advances by extending the number of important operations which we can perform without thinking about them. Operations of thought are like cavalry charges in a battle — they are strictly limited in number, they require fresh horses, and must only be made at decisive moments.
I agree, sending for the thoughtcavalry is a desperate measure, but it would be a shame if an examination intended to assess knowledge and reward the performative (or even proactive) were entirely devoid of decisive moments. For “Hannah’s sweets”, the thoughtcavalry can be avoided if you recognize the way in which you are being instructed. Moreover, thoughtcavalry tactics, when needed, are greatly assisted by the key extra exampuzzle knowledge that the question contains a sufficiency of clues: we don’t get that luxury in real problemsolving. I often tell students that my role is to be simultaneously Blofeld and Q: the fact that they are in a James Bond movie means that there is necessarily a strategy to escape Blofeld’s menaces with Q’s gadgets. I do not expect them to die. In fact, I’m trying to arrange their survival. In that sense, “Hannah’s sweets” already comes with the expectation that whatever information is packed in the opening prose must be sufficient to ensure the equation demanded of us: the game is to unpack it.
We could present “Hannah’s sweets” in a more decoded form. Here’s a kind of streamofconsciousness translation.
Hannah has a number of sweets. It doesn’t matter that they are sweets or that Hannah is called Hannah. What matters is that there are things and we are going to find out how many: call that n. 6 of the sweets are orange and the rest are yellow. There are two different sorts of thing: “orange” and “yellow” are arbitrary labels whose only role is to be distinct. You are told that there are 6 orange sweets, but not how many yellow sweets (perhaps call that y, so n = 6 + y). Hannah selects two sweets at random without replacement. It doesn’t matter whether she eats them or throws them at pigeons. It does matter that the two selections are random, and that the second selection is made from one fewer than the first. Their randomness tells you that you can base probability on proportion and that selections are independent, so you can compute the probability of a particular pair selection by multiplying the probabilities of the separate selections. You are told the probability of a particular outcome: she selects two orange sweets with probability 1/3. The probability of getting two oranges clearly depends on n: write down a formula for that probability and set it equal to 1/3. Rearrange that equation (by clearing fractions) to obtain the quadratic equation n^{2} – n – 90 = 0, then factorize the equation to obtain two candidate solutions, only one of which makes sense.
I think it’s reasonable to teach that decoding skill and to expect school pupils to acquire it. The persistent complaint that they haven’t seen anything like it on a past paper seems wide of the mark, and more worryingly as a perceived entitlement to be tested only on possessive knowledge. We should be clear in our rejection of that entitlement. But we should also acknowlege that the boundaries between Perkins’s kinds of knowledge is fluid, and that our responsibility as teachers is to rearrange their positions relative to the student by acting on both, in the direction marked “progress” by Whitehead.
]]>Informally, a container is a functor with a “shapes and positions” presentation: the contained values are given as the image of a function from positions, but the type of positions depends on the choice of the shape. Finite lists of things, for example, can be seen as functions from an nelement set to things, once you’ve chosen the shape n, otherwise known as the length of the list. If a functor f
is a container, then its shape set will be isomorphic to f ()
, or what you get when you choose boring elements that just mark their position. It’s the dependency of the position set on the shape that makes the concept a little tricky to express in Haskell, but if we turn on {# LANGUAGE KitchenSink #}
, we can get some way, at least.
I define a datatype whose only purpose is to pack up the typelevel components of a container.
data (<) (s :: i > *) (p :: i > *) = Dull
where the existential i
is the typelevel version of shapes, implicitly chosen in each value. Now, s
gives the valuelevel presentation of the typelevel shape: it could be a singleton type, giving no more or less information than the choice of i
, but it’s ok if some typelevel shapes are not represented, or if shapes contain some extra information upon which positions do not depend. What’s important is that indexing enforces compatibility between the shape choice (made by the producer of the container) and the position choices (made by the consumer of the container when elements get projected) in p
.
It’s a bit of a hack. I’d like to pack up those pieces as a kind of containers, but I can’t do it yet, because new kinds without promotion hasn’t happened yet. I’ll have to work with types of kind * which happen to be given by <
, saying what components specify a container. Let us now say which container is thus specified.
data family Con (c :: *) :: * > * data instance Con (s < p) x = forall i. s i :<: (p i > x)
It’s not hard to see why these things are functors. If the container’s elementprojector gives one sort of thing, you can make them another sort of thing by postcomposing a onetoanother function.
instance Functor (Con (s < p)) where fmap h (s :<: e) = s :<: (h . e)
Given that fmap
acts by composition, it’s easy to see that it respects identity and composition.
Pause a moment and think what Con (s < p)
is giving you. Informally, we get ∃i.(s i)*x^{(p i)}, writing the GADT’s lurking existential explicitly and writing the function type in exponential notation. Reading ∃ as summation, shapes as coefficients and positions as exponents, we see that containers are just power series, generalized to sum over some kind i
of typelevel things. Polynomials are just boring power series.
Let’s just make sure of the list example. We’ll need natural numbers and their singletons to make the shapes…
data Nat = Z  S Nat data Natty :: Nat > * where Zy :: Natty Z Sy :: Natty n > Natty (S n)
…and the finite set family to make the positions.
data Fin :: Nat > * where Fz :: Fin (S n) Fs :: Fin n > Fin (S n)
The idea is that Fin n
is a type with n
values. A function in Fin n > x
is like an n
tuple of x
values. Think of it as a flat way of writing the exponential x^{n} We have an empty tuple
void :: Fin Z > x void z = z `seq` error "so sue me!"
and a way to grow tuples
($:) :: x > (Fin n > x) > Fin (S n) > x (x $: xs) Fz = x (x $: xs) (Fs n) = xs n
And now we’re ready to go with our list container.
type ListC = Natty < Fin
Let’s show how to get beween these lists and traditional lists. When I’m working at the functor level, I like to be explicit about constructing natural transformations.
type f :> g = forall x. f x > g x
Now we can define, recursively,
listIsoOut :: Con ListC :> [] listIsoOut (Zy :<: _) = [] listIsoOut (Sy n :<: e) = e Fz : listIsoOut (n :<: \ i > (e . Fs))
If the length is zero, the list must be empty. Otherwise, separate the element in position 0 from the function which gives all the elements in positive positions. To go the other way, give a fold which makes use of our functionsastuples kit.
listIsoIn :: [] :> Con ListC listIsoIn = foldr cons nil where nil = Zy :<: void cons x (n :<: e) = Sy n :<: (x $: e)
A polymorphic function between containers has to work with an arbitrary element type, so there’s nowhere the output container can get its elements from except the input container. What can such a function do? Firstly, it can look at the input shape in order to choose the output shape; secondly, it should say where in the input container each output position will find its element. We obtain a representation of these polymorphic functions in terms of shapes and positions, without trading in elements at all.
data family Morph (f :: *) (g :: *) :: * data instance Morph (s < p) (s' < p') = Morph (forall i. s i > Con (s' < p') (p i))
That is, each input shape maps to an output container whose elements are input positions, like a kind of plan for how to build some output given some input. To deploy such a morphism, we need only map input positions to input elements.
($<$) :: Morph (s < p) (s' < p') > Con (s < p) :> Con (s' < p') Morph m $<$ (s :<: e) = fmap e (m s)
The representation theorem for container morphisms asserts that the polymorphic functions between containers are given exactly by the container morphisms. That is, the above has an inverse.
morph :: (Con (s < p) :> Con (s' < p')) > Morph (s < p) (s' < p') morph f = Morph $ \ s > f (s :<: id)
Note that if s :: s i
, then s :<: id :: (s < p) (p i)
is the container storing in every position exactly that position. You can check…
morph f $<$ (s :<: e) = { definition } fmap e (Morph (\ s > f (s :<: id)) $>$ s) = { definition } fmap e (f (s :<: id)) = { naturality } f (fmap e (s :<: id)) = { definition } f (s :<: (e . id)) = { right identity } f (s :<: e)
…and…
morph (Morph m $<$) = { definition } Morph $ \ s > Morph m $<$ (s :<: id) = { definition } Morph $ \ s > fmap id (m s) = { functor preserves identity } Morph $ \ s > m s = { eta contraction } Morph m
…or you can deploy the Yoneda lemma.
(s < p) :> (s' < p') = { type synonym } forall x. (s < p) x > (s' < p') x ~= { data definition } forall x. (exists i. (s i, p i > x)) > (s' < p') x ~= { curry } forall x. forall i. s i > (p i > x) > (s' < p') x ~= { reorder arguments } forall i. s i > forall x. (p i > x) > (s' < p') x ~= { Yoneda } forall i. s i > (s' < p') (p i) = { data family } Morph (s < p) (s' < p')
It’s a fun exercise to show that reverse
can be expressed as a Morph ListC ListC
without going via the representation theorem.
We can define the kit of polynomial functor constructors as follows.
newtype I x = I {unI :: x} newtype K a x = K {unK :: a} newtype (:+:) f g x = Sum {muS :: Either (f x) (g x)} newtype (:*:) f g x = Prod {dorP :: (f x , g x)}
They are Functor
preserving in the only sensible way.
instance Functor I where fmap h (I x) = I (h x) instance Functor (K a) where fmap h (K a) = K a instance (Functor f, Functor g) => Functor (f :+: g) where fmap h = Sum . either (Left . fmap h) (Right . fmap h) . muS instance (Functor f, Functor g) => Functor (f :*: g) where fmap h = Prod . (fmap h *** fmap h) . dorP
But we can also show that containers are closed under the same operations.
For the identity, there is one shape and one position, so we need the unit singleton family.
data US :: () > * where VV :: US '() type IC = US < US
Wrapping up an element in a container can happen in just one way.
iWrap :: x > Con IC x iWrap x = VV :<: const x
It is now easy to show that Con IC
is isomorphic to I
iIsoIn :: I :> Con IC iIsoIn (I x) = iWrap x iIsoOut :: Con IC :> I iIsoOut (VV :<: e) = I (e VV)
For constant polynomials, there are no positions for elements, but there is useful information in the shape. Abbott, Altenkirch and Ghani take the shape type to be the constant and the the position set to be everywhere empty. To follow suit, we’d need to use the singleton type for the constant, but that’s more Haskell work than necessary (unless you import Richard Eisenberg’s excellent library for that purpose). We can use ()
unit as the typelevel shape and store the constant only at the value level.
data KS :: * > () > * where KS :: a > KS a '()
Again, the position set must be empty
data KP :: u > * where kapow :: KP u > b kapow z = z `seq` error "so sue me!" type KC a = KS a < KP
We can put an element of the constant type into its container.
kon :: a > Con (KC a) x kon a = KS a :<: kapow
We thus obtain the isomorphism.
kIsoIn :: K a :> Con (KC a) kIsoIn (K a) = kon a kIsoOut :: Con (KC a) :> K a kIsoOut (KS a :<: _) = K a
For sums, you pick a branch of the sum and give a shape for that branch. The positions must then come from the same branch and fit with the shape. So we need the typelevel shape information to be an Either
and make valuelevel things consistent with the typelevel choice. That’s a job for this GADT.
data Case :: (i > *) > (j > *) > (Either i j) > * where LL :: ls i > Case ls rs (Left i) RR :: rs j > Case ls rs (Right j)
Now, the sum of containers is given by consistent choices of shape and position.
type family SumC c c' :: * where SumC (s < p) (s' < p') = Case s s' < Case p p'
That is, the choice of valuelevel shape fixes the typelevel shape, and then the positions have to follow suit. If you know which choice has been made at the type level, you can project safely.
unLL :: Case s s' (Left i) > s i unLL (LL s) = s unRR :: Case s s' (Right j) > s' j unRR (RR s') = s'
In turn, that allows us to define the injections of the sum as container morphisms.
inlC :: Morph (s < p) (SumC (s < p) (s' < p')) inlC = Morph $ \ s > LL s :<: unLL inrC :: Morph (s' < p') (SumC (s < p) (s' < p')) inrC = Morph $ \ s' > RR s' :<: unRR
Now we’re ready to show that the container sum is isomorphic to the functorial sum of the two containers.
sumIsoIn :: (Con (s < p) :+: Con (s' < p')) :> Con (SumC (s < p) (s' < p')) sumIsoIn = either (inlC $<$) (inrC $<$) . muS sumIsoOut :: Con (SumC (s < p) (s' < p')) :> (Con (s < p) :+: Con (s' < p')) sumIsoOut (LL s :<: e) = Sum (Left (s :<: (e . LL))) sumIsoOut (RR s' :<: e) = Sum (Right (s' :<: (e . RR)))
Now, for products of containers, you need a pair of shapes, one for each component, so the typelevel shape also needs to be a pair.
data ProdS :: (i > *) > (j > *) > (i, j) > * where (:&:) :: ls i > rs j > ProdS ls rs '(i, j)
An element position in such a container is either on the left or on the right, and then you need to know the position within that component.
data ProdP :: (i > *) > (j > *) > (i, j) > * where PP :: Either (lp i) (rp j) > ProdP lp rp '(i , j) unPP :: ProdP lp rp '(i , j) > Either (lp i) (rp j) unPP (PP e) = e
The product is then given by those pieces, and the projections are container morphisms.
type family ProdC c c' :: * where ProdC (s < p) (s' < p') = ProdS s s' < ProdP p p' outlC :: Morph (ProdC (s < p) (s' < p')) (s < p) outlC = Morph $ \ (s :&: _) > s :<: (PP . Left) outrC :: Morph (ProdC (s < p) (s' < p')) (s' < p') outrC = Morph $ \ (_ :&: s') > s' :<: (PP . Right)
Pairing is implemented by either
on positions.
pairC :: Con (s < p) x > Con (s' < p') x > Con (ProdC (s < p) (s' < p')) x pairC (s :<: e) (s' :<: e') = (s :&: s') :<: (either e e' . unPP)
Again, we get an isomorphism with functorial products.
prodIsoIn :: (Con (s < p) :*: Con (s' < p')) :> Con (ProdC (s < p) (s' < p')) prodIsoIn (Prod (c, c')) = pairC c c' prodIsoOut :: Con (ProdC (s < p) (s' < p')) :> (Con (s < p) :*: Con (s' < p')) prodIsoOut c = Prod (outlC $<$ c, outrC $<$ c)
So, the polynomials are, as expected, containers.
The least fixpoint of a container is what Per MartinLöf calls a Wtype.
newtype W c = In (Con c (W c))
Lots of our favourite datatypes are Wtypes. E.g., unlabelled binary trees:
type Tree = W (SumC (KC ()) (ProdC IC IC))
Define the constructors like this.
leaf :: Tree leaf = In (inlC $<$ kon ()) node :: Tree > Tree > Tree node l r = In (inrC $&rt;$ pairC (iWrap l) (iWrap r))
But there are functors which are not containers: the continuation monad is the classic example. The element type always stays right of the arrow. Some people like to classify the polarity of parameter occurrences in a type operator as “positive” or “negative”. A top level occurrence is positive. Sum and product preserve polarity. Function types preserve polarity in the target but flip polarity in the domain. A type operator whose parameter occurs only positively will be a covariant functor; if the parameter occurs only negatively, it will be a contravariant functor. A “strictly positive” occurrence is not only positive: the even number of times its polarity has been flipped is zero. A type operator whose parameter occurs only strictly positively will be a container. Least fixpoints of functors have recursive “fold operators”, but least fixpoints of containers guarantee the existence of induction principles: the difference between the two matters when you’re dependently typed.
Here’s an operation you can define on containers, but not on Haskell functors more generally. Peter Hancock defines the tensor of two containers thus
type family TensorC c c' :: * where TensorC (s < p) (s' < p') = ProdS s s' < ProdS p p'
It’s a bit like a product, in that shapes pair up, but when we look at the positions, we don’t make a choice, we pick a pair. Think of the two components as coordinates in some sort of grid. Indeed, consider what TensorC ListC ListC
might be. It’s the container which gives you the type of rectangular matrices: “lists of listsallthesamelength”.
Roland Backhouse wrote a paper a while back deriving properties of natural transformations on “Fstructures of Gstructuresallthesameshape”, but he couldn’t give a direct mathematical translation of that idea as an operation on functors, only by restricting the composition F.G to the unraggedy case. Hancock’s tensor gives us exactly that notion for containers.
You can degenerate tensor into functor composition…
newtype (f :.: g) x = C {unC :: f (g x)} layers :: Con (TensorC (s < p) (s' < p')) :> (Con (s < p) :.: Con (s' < p')) layers ((s :&: s') :<: e) = C (s :<: \ p > s' :<: \ p' > e (p :&: p'))
…but you don’t have to do it that way around, because you can transpose a tensor, thanks to its regularity:
xpose :: Morph (TensorC (s < p) (s' < p')) (TensorC (s' < p') (s (s' :&: s) : (p :&: p')
Fans of free monads may enjoy thinking of them as the least fixpoint of the functorial equation
Free f = I :+: (f :.: Free f)
If f
is a container Con (s < p)
, you can think of s
as describing the commands you can issue and p as the responses appropriate to a given command. The free monad thus represents an interactive mode session where at each step you decide whether to stop and report your result or to issue another command, then continue with your session once you have the response.
What’s not so well known is that the free applicative is given exactly by replacing composition with tensor. The free applicative gives you a batch mode session, where your commands are like a deck of punch cards: the sequence is fixed in advance, and you report your result once you have collected your lineprinter output, consisting of all the responses to the commands.
We have tensor for containers, but what about composition? Abbott, Altenkirch and Ghani have no difficulty defining it. The shape of a composite container is given exactly by an “outer” container whose elements are “inner” shapes. That way, we know the shape of the outer structure, and also the shape of each inner structure sitting at a given position in the outer structure. A composite position is a dependent pair: we have to find our way to an inner element, so we first pick an outer position, where we will find an inner structure (whose shape we know), and then we pick an inner position in that structure.
So now, we’re Haskelly stuffed. We need to promote Con itself (functions inside!). And we need its singletons. GHC stops playing.
How will the situation look when we have Πtypes (eliminating the need for singletons) and the ability to promote GADTs? I don’t know. We’ll still need some higherorder functions at the type level.
Containers are an abstraction of a particularly well behaved class of functors, characterized in a way which is very flexible, but makes essential use of dependent types. They’re a rubbish representation of actual data, but they allow us to specify many generic operations in a parametric way. Rather than working by recursion over the sumofproducts structure of a datatype, we need only abstract over “shapes” and “positions”.
E.g., when positions have decidable equality, a container is (infinitely) differentiable (smooth?): you just use the usual rule for differentiating a power sequence, so that the shape of the derivative is a shape paired with a position for the “hole”, and the positions in the derivative are the positions apart from that of the hole. When you push that definition through our various formulae for sums and products, etc, the traditional rules of the calculus appear before your eyes.
Similarly, a traversable container is one whose position sets are always finite, and hence linearly orderable. One way to achieve that is to factor positions through Fin
: effectively, shape determines size, and you can swap out the functional storage of elements for a vector.
I was quite surprised at how far I got turning the theory of containers into somewhat clunky Haskell, before the limits of our current dependently typed capabilities defeated me. I hope it’s been of some use in helping you see the shapesandpositions structure of the data you’re used to.
]]>Some lines are long. I grew up amongst the paraphernalia of the punchcard era, and for the most part, I used 80column displays. To this day, when I’m hacking, I get uncomfortable if a line of code is longer than 78 characters, and I enjoy the way keeping my code narrow allows me to put more buffers of it on my screen. But however you play it, it’s far from odd to find that a logical line of code stretches wider than your window, so that it might be visually more helpful if it made more use of the vertical dimension. Indenting ‘continuation’ lines more than the ‘header’ line is a standard way to break the latter into pieces which fit.
Some lines are subordinate. Whether they are sublists of a list, or the equations of a locally defined function, or whatever, a textual construct sometimes requires a subordinate block of lines. It’s kind of usual to indent the lines which make up a subordinate block.
How do you tell whether an indented line is a continuation line or a header line within a subordinate block?
I’m trying to find a simple way to answer that question, and what I’m thinking is that I’d like a symbol which marks the end of ‘horizontal mode’, where indented lines continue the header, and the beginning of ‘vertical mode’, where indented lines (each in their own horizontal mode) belong to a subordinate block. My candidate for this symbol is :
just because it looks like a horizontal thing then some vertical things. I’m going to try to formulate sensible rules to identify the continuation and subordination structure.
An indentation level, or Dent, is an element of the set of natural numbers extended by bottom and top, with bottom < 0 < 1 < 2 < … j. An iBlock is a possibly empty sequence of jChunks each for some j > i. Within a given jChunk, each line is considered a continuation of the first (the header) until the first occurrence of :
, at which point the remainder of the jChunk is interpreted as a subordinated jBlock, with any text to the right of :
treated as a topLine. A document is a bottomChunk.
And, er, that’s it. At least for the basic picture.
Higgledy piggledy boggle bump splat Most of the post clusters close on the mat : the phone bill the gas bill the lecce the junk the bags to dispose of old clothes from your trunk The tide you divide to get into your flat Will just gather dust if you leave it like that.
means
{Higgledy piggledy boggle bump splat; Most of the post clusters close on the mat {the phone bill; the gas bill; the lecce; the junk; the bags to dispose of old clothes from your trunk}; The tide you divide to get into your flat; Will just gather dust if you leave it like that.}
(Actually, it might make sense to allow a matching :
to act as an ‘unlayout herald’. The idea is that a Block is a bunch of Chunks and a Chunk is a bunch of Components, and a Component is either a lexical token or a subordinated Block. If a :
has no matching :
, it’s a subordinated Block Component at the end of its enclosing Chunk; the matching :
indicates the end of the subordinated Block Component, after which the Chunk continues.)
By way of an afterthought, why not take Dent to be the integers extended by bottom and top. A line which looks like this (with at least 3 dashes and any amount of whitespace either side)
//
shifts the indentation origin to the left by numberofdashesplus2, thus increasing the indentation of the leftmost physical column by the corresponding amount. A line like
\\
shifts the origin the other way, and if you overdo it, the leftmost physical column will have negative indentation, but not as negative as bottom. That’s one way to keep your subordinates from drifting too far to the right.
]]>I share a module with a colleague (at Strathclyde we use the word “class”, but that might become confusing, given what follows). I do a lot more online assessment than he currently does, so it suits me to key all my student data by username. My colleague keys all his assessment data by registration number. Our institution’s Virtual Learning Environment keys students differently again, for exercises involving anonymous marking. All of these keys are just strings. How do we achieve coherence? Laboriously.
My part of the module is chopped up into topics. Each topic has associated classroomdelivered paper tests and some online materials.
The information about how students have performed in these various
components is managed rather heterogeneously. There might be one file for each paper test. Meanwhile, students each have their own directory recording their interaction with online materials, with a subdirectory for each topic, containing files which relate to their performance in individual assessment items. Some of these files have formats for which I am to blame; other file formats I have thrust upon me. I need to be able to find out who did how well in what by when. I need logic.
And I’m asking myself what I usually ask myself when I need logic: ‘How much of the logic I need can I get from types?’. I’m fond of decidable typechecking, and of various kinds of typedirected program construction (which I much prefer to programdirected type construction). Can we have types for data which help us to audit, integrate and transform them in semantically sensible ways? That’s the kind of problem that we dependent type theorists ought to be able to get our teeth into. But these everyday spreadsheetthis, databasethat, logfiletheother data are really quite unlike the indexed inductive treelike datatypes which we are used lovingly to be crafting. “Beautiful Abstract Syntax Trees Are Readily Definable” was one of the names we thought about calling Epigram, until we checked the acronym. Dependent type theory is not just sitting on a canned solution to these real world data problems, ready to deploy. Quite a lot of headscratching will be necessary.
‘What’s a good type for a spreadsheet?’ is a reasonable question. ‘What’s a good dependent type for a spreadsheet?’ is a better question. ‘Upon what might a dependent type for a spreadsheet depend, and how much would that really have to do with spreadsheets per se?’ is a question which might lead to an idea. When you have diverse files and online sources all contributing information to some larger resource, we need to establish a broader conceptual framework if we are to work accurately with the individual components. The spreadsheets, database records, forms, etc, are all views or lenses into a larger system which may never exist as a single bucket of bits, but which me might seek to model.
So what I’m looking for is a dependently typed language of metadata. We should be able write a model of the information which ought to exist, and we should be able to write views which describe a particular presentation of some of the data. A machine should then be able to check that a model makes sense, that a view conforms to the model, and that the data is consistent with the view. Given a bunch of views, we should be able to compute whether they cover the model: which data are missing and which are multiply represented. The computational machinery to check, propagate or demand the actual data can then be constructed.
I had a thought about this last summer. Picking some syntax out of thin air, I began to write things like
class Student class Module for Module : class Test for Student, Module : prop Participant
What’s going on? I’ve made four declarations: three “classes”, and one relation. A “class” is a conceptual variety of individuals. Classes can be (relatively) global, such as students or modules. Classes can be localized to a context, so that each module has its own class of tests.
The “for” construct localizes the declarations which are subordinated by indentation after the layout herald “:”. It’s tidier to say that each module has a bunch of tests than that tests exist globally but each test maps to a module. Moreover, it means that tests in different modules need not share a keyspace.
A class is a finite enumeration whose elements are not known at declaration time. A prop is a finite enumeration whose elements are not known at declaration time, but it is known that there’s at most one element. There’s at most one way in which a student can be a participant in a module.
So far, I haven’t said anything about what these wretched individuals might look like. So,
for Student : email ! String username ! String regNo ! String surname : String forenames : String
I’ve declared a bunch of things which ought to exist in the context of an individual student. The ones with “!” are intended to be keys for students. That’s to say any sensible view of student data should include at least one student key, but it doesn’t really matter which. Of course, with a little more dependently typed goodness, we could enforce formatting properties of email addresses, usernames and registration numbers…some other time. The point is that by introducing abstract notions of individual, outside of the way that those individuals can be keyed, we provide a hook for data integration.
I’m kind of saying what stuff should be stored in a “master record” for each student, but I don’t expect to know all the fields when I introduce the concept of a student.
Another thing that’s fun about bothering to introduce abstract classes of individual is that contextualization can be much more implicit. We do not need to name individuals to talk about stuff that’s pertinent to a typical individual, which means we can write higherorder things in a first order way and handle more of the plumbing by typebased lookup.
class Department for Module : department : Department moduleId ! String class Test : item ! String max : [0..] weight : [0..] for Student, Module, Participant, Test : prop Present : score : [0..max]
Here, I show how to associate a department with a module, after the fact. I also introduce tests for each module, each with a maximum score: the use of “:” in the “class Test” declaration just elides an immediately subsequent “for Test :”.
Correspondingly, for each student participating in a module (and those students might not be from the same department as the module), and for each test, it makes sense to wonder if the student showed up to the test and where in the range of possible marks they scored.
I should be able to write something like
Module/department
to mean, given a contextualizing department, just the modules for that department.
What’s a view of this information? It might be something like
one Module [moduleId]  for Test [item    max ] + for Student, Participant  val : Percentage [surnameforeNamesregNo]  if Present :  [score]  val = weight * score / max  else :  ["A"]  val = 0
I’m sure we can negotiate over the twodimensionality of the syntax (as long as we prioritise reading over writing), but that’s the picture. Scoping goes downward and rightward. The brackets show you what you actually see, which must exist in the given scope. The keyword “one” indicates that we are working within just one module, keyed by the given code. Meanwhile “for” requires us to tabulate all the individuals for whom an environment can be constructed to match the given context.
Meanwhile, the cells in the middle of the table will enable the computation of new local information, “val”. Presence or absence is signified by a score or the constant “A” (which is checkably distinct from all allowable scores), and the definition of “val” is given accordingly.
Note that I have not indicated whether this view is a query or a form. In fact, I have made sure it is valid in both roles. I’d like to be able to instruct the computer to initialize a spreadsheet with as much of this information as is available from other sources. I usually have to do that with cut and paste! After the fashion of pivot tables, I should be able to specify aggregations over rows and columns which are appropriate (e.g., the average score for each test, the total score for each student).
Lots of the ingredients for these tools are in existence already, and it’s clear that my knowledge of them is sadly lacking, having stumbled into the direction of data from the direction of dependent type theory. I seek to educate myself, and help in that regard is always appreciated. Of course, informally, I’m taught about some of the problems by the poor technology with which I face the mundane realities of my existence, and I understand that I can change me more easily than I can change the world. I don’t expect institutional buyin (I’ll have a go, right enough), but I don’t need it. The point of the modelling language is to build myself a bubble with a permeable membrane: the things from outside the bubble can have sense made of them (by giving a view which describes the role of external data); the things constructed inside the bubble make sense intrinsically (because they were constructed in a modeldirected way). Fewer strings! More things!
Edit: I should have included a link to
my slides on this topic, for a talk delivered at Microsoft Research and at York.
As some of you may know, I’m from Northern Ireland, a place which naturally promotes homotopic (not that they would call it that, in case someone thought they were gay, given that malapropism is the fourth most popular national pastime after (in reverse order) homophobia, sectarianism and emigration) considerations as a consequence of the inconveniently large lake in the middle of it. “Who lives in the big blue bit?”(*), asked former Secretary of State, Sir Humphrey Atkins, when presented with a map of the place, shaded in accordance with sectarian affiliation. But I digress. Lough Neagh (which is pronounced something roughly like “Loch Nay”, giving us Northern Irish one more “ough” than the rest of yous) is the hole where the Isle of Man used to be until Fionn mac Cumhaill threw it at someone and missed.
But the point is that if you’re going from Antrim to Enniskillen, you’ve got to go round Lough Neagh one way or the other, and no matter how much you stretch or divert your route, if you stay dry, you won’t deform one way into the other. And indeed, if you happen to be in Antrim and you ask for directions to Enniskillen, they’ll most likely tell you “If I was going to Enniskillen, I wouldn’t start from here.”. Much in the same way (upto deformation, I hope), if I was going to Homotopy Type Theory, I wouldn’t start from the Calculus of Inductive Constructions.
Why not? Because we start from the strange idea that equality is some sort of inductive definition
Id (X : *)(x : X)(y : X) : * refl (X : *)(x : X) : Id X x x
which already places too much faith in the disappointing accident that is the definitional equality of an intensional type theory, and then we add an eliminator with a computation rule which nails our moving parts to said accident…
J (X : *)(x : X)(P (y : X)(q : Id X x y) : *)(m : P x (refl X x)) (y : X)(q : Id X x y) : P y q J _ _ _ m _ (refl _ _) = m
…right? What does that do? It says that whenever the proof of the equation is by reflexivity, the value m to be transported is already of the right type, so we don’t even need to think about what we have to do to it. If we are willing to consider only donowork transportation, we will never be able to escape from the definitional equality. (Note that the purpose of pattern matching q against refl is just to have a sound but not complete check that x is definitionally equal to y. If you like proof irrelevance (much more fun than K, for example), then you can just ignore q and decide definitional equality of x and y. I mean, if you’ve gone to the trouble of engineering a decidable definitional equality, you might as well get paid for it.)
But we don’t stick with definitional equality, and thank goodness for that. Observational Type Theory gives you structural equality on types and thus donoworkafterprogramextraction transportation, but for open terms (and to be conservative over intensional type theory), we needed to refocus our efforts around the machinery of transportation, so that nontrivial explanations of equality result in nontrivial computations between types. That’s enough to get extensionality working. But univalence (the type of Antrim stuff is equal to the type of Enniskillen stuff if you have a way to transport it either way, such that a “there and back trip” makes no change to the stuff and orbits Lough Neagh a net total of zero times) is a whole other business, because now we can’t get away with just looking at the types to figure out what’s going on: we have to look at the particular route by which the types are connected.
(Local simile switch. Thorsten Altenkirch, Wouter Swierstra and I built an extensionality boat. We might imagine that one day there will be a fabulous univalence ship: the extensionality boat is just one of its lifeboats. But nobody’s built the ship yet, so don’t be too dismissive of our wee boat. You might learn something about building ships by thinking about that boat. I come from Belfast: we built the Titanic and then some prick sailed it into an iceberg because they made a valid deduction from a false hypothesis.)
So, what’s the plan? Firstly, decompose equality into two separate aspects: type equivalence and its refinement, value equality. The former is canonical.
(X : *) {=} (Y : *) : *
I’m using braces rather than angle brackets only because I have to fight HTML.
The latter is computed by recursion over the former.
(x : X) =[ (Q : X {=} Y) ]= (y : Y) : *
That is, the somewhat annotated mixfix operator =[…]= interprets a type isomorphism between types as the value equality relation thus induced on those types. I shall HoTT in Rel for this.
Value equality is thus heterogeneous in a way which necessarily depends on the type isomorphism which documents how to go about considering the values comparable. Let’s be quite concrete about that dependency. We get to look at Q to figure out how to relate x and y.
Reflexivity is not a constructor of {=}. Rather, every canonical type former induces a canonical constructor of {=}. In particular
*^ : * {=} * X =[ *^ ]= Y = X {=} Y
We may add
sym (Q : X {=} Y) : Y {=} X y =[ sym Q ]= x = x =[ Q ]= y trans (Y : *)(XY : X {=} Y)(YZ : Y {=} Z) : X {=} Z x =[ trans Y XY YZ ]= z = Sigma Y \ y > x =[ XY ]= y * y =[ YZ ]= z
Function extensionality becomes the value equality induced by the structural isomorphism for Pitypes. Types on which we depend turn into triples of twothingsandapathbetweenthem.
Pi^ (S^ : S' {=} S`) (T^ : (s : Sigma (S' * S`) \ ss > (s^ : ss car =[ S^ ]= ss cdr)) > T' (s car car) {=} T` (s car cdr)) : Pi S' T' {=} Pi S` T` f' =[ Pi^ S^ T^ ]= f` = (s : Sigma (S' * S`) \ ss > (s^ : ss car =[ S^ ]= ss cdr)) > f' (s car car) =[ T^ s ]= f` (s car cdr)
Every elimination form must give rise to an elimination form for the corresponding equality proofs: if you eliminate equal things in equal ways, you get equal results, and these things have to compute when you get canonical proofs of equations between canonical things being eliminated. Consequently, reflexivity shows up as the translation from types to type isomorphisms, then from values to the equality induced by those type isomorphisms. In Observational Type Theory as we implemented it, reflexivity was an axiom, because by proof irrelevance (by which I mean by making sure never to look at the proof) it didn’t matter what it was: the halfbuilt Death Star was fully operational. Here, we can’t get away with that dodge. Fortunately, I have at least some clue how to proceed. My less famous LICS rejectum, joint work with Thorsten, gives a vague sketch of the construction. The upshot is that every
X : *
has some
X^ : X {=} X
, and by way of a refinement, every
x : X
has some
x^ : x =[ X^ ]= x
.
Now, a type isomorphism is no use unless you can actually get from one side of it to the other. We shall need that type isomorphisms induce paths between values. That is, we shall need an eliminator
path (S : *)(T : *)(Q : S {=} T)(s : S) : Sigma T \ t > s =[ Q ]= t
and moreover, we shall need that paths are unique, in the sense that, for given inputs, every pair in the return type of
path
is equal to the thing that
path
returns. That is, we have a kind of propositional ηrule for paths. I’m not yet sure of the most ergonomic way to formulate that uniqueness. But consider, in particular, q : x =[ X^ ]= y. We will have that (x , x^) =[…]= (y , q) in the type of paths from x via X^. We thus recover more or less the J rule, seen as transportation between two pathdependent types.
J (X : *)(x : X) (P : ((Sigma X \ y > x =[ X^ ]= y) > *) (m : P (x , x^)) (y : X)(q : x =[ X^ ]= y) : P (y , q) J X x P m y q = path (P (x , x^)) (P (y , q)) (P^ (((x , x^) , (y , q)) , ... path uniqueness ...)) m car
To achieve the definitional equational theory we’re used to from the J rule, we will need to make sure that the reflexivity construction, x^, generates proofs which are recognizably of that provenance, and we shall have to ensure that being recognizably reflexive is preserved by elimination forms, e.g., that we can take
f^ ((s , s) , s^) = (f s)^
so that we can make
path X X X^ x = (x , x^)
If we can obtain that path uniqueness from x along X^ when applied to (x , x^) gives (x , x^)^, then we shall have
J X x P m x x^ = path (P (x , x^)) (P (x , x^)) (P^ (((x , x^) , (x , x^)) , (x , x^)^)) m car = path (P (x , x^)) (P (x , x^)) (P (x , x^))^ m car = (m , m^) car = m
That is, the computationally obscure J rule has been decomposed into inyourface transportation and path uniqueness. Somehow, I’m not surprised. It would not be the first time that a dependent eliminator has been recast as a nondependent eliminator fixed up by an ηlaw. That’s exactly how I obtained a dependent case analysis principle for coinductive data without losing subject reduction.
Of course, we shall need to define
path
by recursion over type isomorphisms. We shall thus need to say how to compute
path Y X (sym XY) y
, which amounts to delivering the path in the other direction (the htap?), and its uniqueness. Transitivity goes (I hope) by composition.
So what of univalence? It’s not an axiom. It’s a constructor for
X {=} Y
where you just give the implementations of both path directions and show their uniqueness, thus explaining how to implement the elimination behaviour. We then need something like
x =[ Univalence X Y xy yx ... ]= y = xy x =[ Y^ ]= y
but that’s annoyingly lopsided. We also need to know when isomorphisms are equal. Something like
Q =[ X {=} Y ]= Q' = (\ x > path X Y Q car) =[ (X > Y)^ ]= (\ x > path X Y Q' car)
might be enough, but again annoyingly lopsided.
It’s late and I’m tired, so I suppose I should try to sum up what I’m getting at. I’m hoping we can get to a computational treatment of univalence by isolating the notion of type isomorphism in quite an intensional way. On the one hand, the structure of a type isomorphism tells us how to formulate the equality for values in the related types. On the other hand, the structure of a particular type isomorphism tells us how to compute the transportations of values across it, giving rise to unique paths. Univalence allows us to propose arbitrary isomorphisms, and somehow, univalence gives an ηlong normal form for type isomorphism: every type isomorphism is provably equal to the packagingbyunivalence of its elimination behaviour.
However, hilariously, we have to make sure that the relations =[…]= induces between equivalent type isomorphisms are equivalent (i.e. pointwise isomorphic), in order to show that =[…]=, like all the other elimination forms, respects equality. As County Antrim folk say, “There’s nothing for nothing in Islandmagee.”. Islandmagee, by the way, is the peninsula on the east coast, across the narrow sea from Westeros (which is a rehabilitated landfill site between Whitehead and Larne), apparently containing nothing.
(*) Eels, mostly.
]]>