## Ming — a typecheckerchecker (episode 0)

Over the festive period, largely to celebrate even having time to, I’m minded to write a typecheckerchecker, with which I’ll then write a typechecker or two. The plan is to operationalise ideas about how bidirectional type systems work to guide their construction in a way which guarantees their metatheoretic properties.

It’s called *Ming*, because it wants to rule the universe, because I don’t mind the implementation being a bit smelly, and because the holidays have historically been a time for old serials.

The central observation is that one cannot write down a rule without incurring a proof obligation, and that there is some structure as to the appropriate means of discharging that proof obligation. We don’t write down any old rubbish, you know.

## bisimulation for sequential circuits

In case you didn’t know, I teach hardware to the first year. I’m trying to improve my assessment technology for the associated exercises. This post is about the program I wrote today that might help me do that.

I set many exercises which involve constructing sequential circuits. What’s a sequential circuit? It’s a “function” from i bits of input to o bits of output that can access and update s bits of internal state. That’s to say it’s a function from s+i bits to s+o bits of output for some s. I need to tell when they’re equal up to external observation, regardless of s.

We have 2^{s} states, x. Each determines an immediate behaviour, b(x):2^{i}→2^{o} and a continuation k(x):2^{i}→2^{s}. The behaviour is directly observable but the continuation is not. The continuation can be distinguished only by *its* behaviour.

My strategy is to analyse each circuit to identify the observational equivalence classes of *concrete* states, then try to put two such analyses into bisimulation. The first part of that amounts to finding the smallest set U of *abstract* states such that we have functions p:U→Pow(2^{s}) and f:2^{s}→U such that

- p(u) is nonempty for all u∈U
- x∈p(u) iff f(x)=u
- if x,y∈p(u) then b(x)=b(y); i.e., for each u∈U, u!(u):2
^{i}→2^{o}is well defined via u!(u,m) = b(x,m) for some x∈p(u) - if x,y∈p(u) then for all m∈2
^{i}, f(k(x,m))=f(k(y,m)); i.e., for each u∈I, u?(u):2^{i}→U is well defined via u?(u,m)=f(k(x,m)) for some x∈p(u)

In other words, find the coarsest equivalence such that in equivalent states, the circuit must, for all inputs, produce equal outputs and evolve to equivalent states.

For a given (s,b,k), suppose we have such a (U,p,f,u!,u?). And for some corresponding (i.e., same i, same o) (t,c,l) let us have (V,q,g,v!,v?). Our mission is to establish an isomorphism, ~, between U and V such that

- if u~v, then u!(u)=v!(v)
- if u~v, then for all m∈2
^{i}, u?(u,m)~v?(v,m)

The analysis works by starting with the hope that all states are equivalent and then disabusing ourselves of this notion. We dump all states into a singleton partition (U=1,p(*)=2^{s},f(m)=*), then refine by a bunch of observations. To refine a partition by an observation, replace each part by the partition you get by distinguishing its elements on the basis of the observation: you can tell them apart, so you need to separate them! Start refining by b, to separate the states which behave differently immediately. Then refine by f(k(-,m) for each m: i.e, if you ever map source states to distinguished target states, you must distinguish the source states. Iterate to a fixpoint: as there are finitely many states, partitions cannot refine forever. We have grown our partition only when forced to by observable differences, so we have the smallest possible number of parts.

Now we need to grow our bisimulation, from the empty set of pairs. Start by inverting u! and v!, so that each immediate behaviour maps to the set of states which exhibit it: these must be pointwise isomorphic, which gives you a search space of candidates for pairs u~v. For every guess you make, you have to check that u?(u,m)~v?(v,m) for all m, so you learn more about what ~ must be.

The circuits are observationally equivalent, I claim, iff there exists such a bisimulation.

Here is my attempt at a Haskell implementation.

## designing the rod

For reasons largely of masochism, I’m trying to gift my future selves a piece of technology: a proofchecker, in the style of Rod Burstall’s ProveEasy, that I’m calling “rod” largely because I intend to inflict it on young people, lest they be spoiled.

The key basic choice is that the proof style is *declarative*. That is, the proof document shows you the decomposition of goals into subgoals which is but transient in systems like Coq and Agda. It’s intended to be used interactively, but my current plan is to facilitate that interaction by implementing a transducer for proof documents. So

`rod doc1 .. docn`

will read the given files and write new versions of them, updated by the machine’s take on what they say. In particular, a goal for which a rule has been suggested but no subderivations provided will be elaborated to a bunch of open subgoals if that rule applies.

E.g.

`Show(imp(and(A, B), and(B, A))) by impI !`

(where the ! is the user telling rod) becomes

`Show(imp(and(A, B), and(B, A))) by impI {`

Given(and(A, B)), Show(and(B, A)) ? }

(where the ? is rod asking the user)

I want the logic to be highly configurable, and I want to be able to express *construction*, where data unknown when a goal is stated get discovered as the proof progresses. Specifically, I want to be able to attach proof terms to a logic and watch them appear!

How on earth am I planning to do that? I’m writing this blogpost to try and answer that question for myself, apart from anyone else. They do say you should write the manual before the software…

## basics of bidirectionalism

I like my type systems to be bidirectional. Different people mean different things by the word “bidirectional”, and I’m one of them. Some people relate the notion to “polarity”, and I’m not one of them.

I’m also a fan of *small-step* computation, because I work with dependent types, and I don’t like presuming awfully strong properties of computation while I’m still in the process of writing the rules down.

## setup

I separate my term languages into separate syntactic categories.

checkeds,t,S,T ::= c | [e]

synthede,f ::= x | t : T | e d

The *c* stands for *constructor* and the *d* for *destructor*: more on them shortly.

Correspondingly, I have two typing judgments.

Γ |- T ∋ t

Γ |- e ∈ S

Contexts, Γ, assign types to variables.

contextΓ,Δ ::= | Γ, x : S

If you bother me about variable freshness conditions, I shall respond by switching to a nameless notation: I’m perfectly comfortable living there, but I know I should use names when communicating with people who aren’t used to it.

There’s a small-step computation relation for each syntactic category, overloaded. Computation never changes syntactic category. It’s a little bit funny to include type ascriptions. Those *t : T* terms I refer to as *radicals*, because they’re the active things in a computation.

We have *υ-contraction* which notes that a radical no longer capable of computation needs no type. That’s how things stop.

[t : T] ↝

_{υ}t

We have another class of *β-contractions* which explain how things go.

(c : C) d ↝

_{β}e

That’s to say computation is always a reaction between a constructor and a destructor at a given type. The ↝ relation is the closure of the contractions under all contexts.

## online trait-based marking

In order to provide useful feedback rapidly on those assessment items whose motivation is at least partly *formative*, I’m thinking about a variety of new dodges to try next year.

I expect I’ll have about 100 first year students taking the class I pretend is an introduction to hardware but which is actually an introduction to functional programming and semantics. There’s a lab session each week in which I experiment on them, and processing the data resulting from those experiments is a major headache. Feedback needs to be rapid if it’s to inform remedial action.

My workflow is pretty poor. So far, the test questions have been done on paper, even though each student is at a computer. My comrades and I mark those pieces of paper, but then it’s not over, because I have to type in all the scores. It takes too long. It often goes wrong.

## solution traits

One method I adopted last year was to identify *solution traits*. Few mistakes are peculiar to one individual. Many good or bad properties of solutions, or *traits*, as I call them, are shared between many submissions. I learned to save time by giving traits a coding system. Markers just write trait codes on the scripts; the meaning of those codes is given once, on the web page associated with the assessment item. Markers also delivered a score for each part of the exercise and a total score. We tried to map traits to scores consistently, but that’s not always easy to do in one pass. Backtracking was sometimes required. If we had multiple markers, we’d share an office, and the shared trait description list would grow on the whiteboard as new things came up. I discovered that I could be bothered to type in both the score and the trait list for each student, but it was quite a bit of work.

The idea of solution traits as salient aspects of student submissions struck me as something from which we could extract more work. I ought to be mining trait data. Maybe later.

## from traits to scores

Markers should not score solutions directly. If we can be bothered to classify traits, we should merely identify the subset of recognized traits exhibited by each question part. Separately, we should give the algorithm for computing a score from that trait subset. That way, we apply a consistent scoring system, and we can tune it and see what happens. Here’s the plan. Each trait is a propositional variable. A scoring scheme is given by a maximum score and a list of propositional formulae each paired with a tariff (which may be positive or negative). The score is the total of the tariffs associated with formulae satisfied by the trait set, capped below by 0 and above by the indicated maximum. We should have some way to see which solutions gain or lose as we fiddle with the scheme.

## hybrid online marking jobs

Each marking job is presented to the marker as a web page showing the question, the candidate solution, the specimen solution, and the list of checkboxes corresponding to the traits identified so far. It may be that some magic pixie helpers (be they electronic or undergraduate) have precooked the initial trait assignment to something plausible. That is, online marking jobs have the advantage that we can throw compute power at them whenever solutions can be algorithmically assessed, but we don’t have to construct exercises entirely on that basis. If all is well, the marker need merely make any necessary adjustments to the trait assignment and ship it. Problems may include the need to add traits, so that should be an option, or to request a discussion with the question setter and flag the job for revisiting after that discussion. Concurrent trait addition may result in the need to merge traits: i.e., to define one trait as a propositional formula in terms of the others, with conflicting or cyclic definitions demanding discussion. Oh transactions.

## making marking jobs online

How do I get these marking jobs online in the first place? Well, the fact that the students do the problems in labs where each has a computer is some help. Each exercise has a web page, and whenever it makes sense to request a solution which fits easily into a box in an HTML form, we can do that, whether it’s to be marked by machines or by people. But there may be components of solutions which are not so easily mediated: diagrams, mainly. I have previous at forcing students to type ASCII art diagrams in parsable formats, much to their irritation, but I would never dream of making such a requirement under time pressure. I need a way to get 100 students to draw part of an assignment on paper, then make that drawing appear as part of the online marking job with the minimum of fuss.

## banners and scanners

I prepare the paper on which they will draw. It has a printed banner across the top, which consists of three lines of text, each with one long word and sixteen three-letter words, lined up in seventeen columns of fixed-pitch text. The three long words in the left column vary with and identify the task. The 48 three-letter words are fixed for the whole year. Each student has an identity code given by one three-letter word from each line, and the web page for the exercise reminds them what this code is. Each position in each line stands for a distinct number in 0..15, and the sum of the three positions is 0 (mod 16), so a clear indication of the chosen word in just two of the lines is sufficient to identify the student. I can print out a master copy of the page, then photocopy it 100 times and hand the pages out. The student individuates their copy by obliterating their assigned words, e.g., by crossing them out (more than once, please).

I collect in the pages at the end and I take them to the photocopy room, where the big fast photocopier can deploy its hidden talent: scanning. I get one enormous pdf of the lot. Unpacking the embedded images with pdfimage, I get a bitmap for each page. Using imagemagick, I turn each bitmap into both a jpeg (for stuffing into the marking job) and a tiff (cropped to the banner) which I then shove through tesseract, a rather good and totally free piece of optical character recognition software, well trained at detecting printed words in tiffs. The long words present in the scanned text tell me which exercise the jpeg belongs to; the short words missing from the scanned text tell me (with high probability) whose solution it is. Solutions not individuated by this machinery are queued for humans to assess, but experiments so far leave me hopeful. The workflow should be: stack paper in the document feeder, select scan-to-email, select the relevant daemon’s email address, press go. And we’re ready to bash through the marking jobs online.

## mark retrieval by students

Once we markers have done all we can and it’s time to give the students their feedback, we push the release button which bangs the associated dinner gong. Online, the students are faced with a marking job, showing the question, their solution, the specimen solution, and an empty checkbox trait list, all in a form with a submit button. We oblige them to mark their own work in order to be told their given score. When they hit the submit button, they get to see their score, and on which traits their marker’s opinion diverges from their own. If they are at least two-thirds in agreement, a small donation is made to their reputation score (as distinct from their score in the topic of the assessment).

## pixie helpers of the carbon kind

Tutorial homeworks can and should also give rise to online marking jobs in just the same way. Part of the exercise can be a web form and the rest done by uploading an image. There are scanners in the lab, but we can also arrange to allow submission of images by email from a phone or a tablet. Once a student has submitted a solution, they become available for marking jobs on the exercise, starting with their own, but also for other people. After the tutorial, each student should certainly confirm their self-assessment, but preferably also revisit any other marking they did prior to the tutorial. Reflection is reputation-enhancing. Peer-to-peer marking is reputation enhancing. Note that tutors will have the ability to eyeball homework submissions (if only to detect their absence) but are not paid to spend time marking them.

Expert students who have nothing to gain by taking a test (because they already have full marks in the test’s topic) may, if they have free time at the right time, collect reputation by marking their colleagues’ test submissions. The results they deliver must be moderated, but they do at least help to precook marking jobs for the official markers. Of course, reasonable accuracy is necessary for payment.

## traits in relationship to the curriculum graph

An exercise (or parts thereof) should be associated with a topic and constitute evidence towards mastery of said topic. If some parts of an exercise conform to a standard scheme, that’s also useful information. Some traits will relate to the topic (e.g., typical misunderstandings) and some to the scheme, so we gain a good contribution to the trait set for a brand new question just by bother to make those links. We might also seek to identify regions in the teaching materials for the relevant topic which either reinforce positive traits or help to counteract negative ones: crowdsourcing such associations might indeed be useful.

## the workflow is a lot, but it isn’t everything

My main motivation is to try to improve the efficiency of the marking process in order to give feedback more rapidly with less effort but undiminished quality. Thinking about technological approaches to the management of marking is something I enjoy a great deal more than marking itself, so I often play the game despite the little that’s left of the candle. I should watch that. But the shift to marking as an *online* activity also opens up all sorts of other possibilities to generate useful data and involve students constructively in the process. It’s a bit of an adventure.

## curriculum higraphs

I’m a teacher, too, remember? If you’re here for the research, this post might fill a much needed gap in your life.

In a typical university curriculum, we arrange the classes (or modules or courses or whatever you call them) in a dependency graph: to do this class, you should already have done that class, and knowledge from that class will be assumed, etc. I find it useful to push them same sort of structure down a level, the better to organise the broad topics within a class, and even sometimes to structure the process of learning a topic. “Where am I in this picture?” is the question the students should ask themselves: we should make it easy for them to find the answer.

## what’s the picture?

Suppose the hierarchichal structure is given like a file system where each node is a directory. In a real file system, we might indeed represent a node as a directory, but not everything which is a directory will necessarily correspond to a node: each node will need a file which maps its internal curriculum, indicating which subdirectories are subnodes and, for each subnode, which other subnodes are immediate prerequisites. What do I mean by “A is a prerequisite of B?”. I mean “Mastery of A is *necessary* for study of B to be *sensible*.” There should be no cycles in that graph: mutual relevance does not imply any prerequisite status.

Every such hierarchy can be flattened by giving each node an entry point (a prerequisite of all subnodes) and an exit point (whose prerequisites are the entry point and all subnodes), then linking a node’s entry point from the exit points of its prerequisites. It may also be helpful (but it’s certainly not vital) to indicate for each subnode its external prerequisites, being those nodes elsewhere (neither a sibling nor an ancestor) on which it depends. These should be consistent with the internal curricula, in the sense that the flattening must remain acyclic. Note that we can have D/A/X -> D/B/Y and D/B/W -> D/A/Z,

but if so neither A nor B may be considered a prerequisite of the other.

Colour schemes: if you have mastered a node, it’s safety-blue; if you have mastered the prerequisites for a node but not the node itself, it’s activity-green; if you haven’t yet mastered the prerequisites for a node, it’s danger-red. Red/green distinctions are not ideal for colourblind people, so we should write the captions on green nodes in a more emphatic font: they constitute the frontier where progress can be made.

## why bother?

If we plot the curriculum graph, to whatever level of detail we can locally muster, we give some structure to the businesses of *teaching*, *learning* and *assessment*. Learning is what the students do: we can have no direct effect on learning, except perhaps surgery, a fact which sometimes goes astray in discussions of ‘learning enhancement’. Teachers can have an impact on the environment in which learners learn, but ultimately we’re passive in their learning process, and they’ll take whatever they take from the experiences we give them. Learners are going to make a journey through the curriculum. Teaching is how we try to propel them on that journey. Assessment is how we and they determine where they have reached on that journey. If we associate teaching and assessment activities with nodes in the curriculum graph, we make clearer their specific utility in the learning process.

Crucially, by identifying the prerequisition structure, we focus attention in from the whole picture to the student’s frontier of green nodes where they can sensibly study but have not yet achieved mastery. Now, by aligning teaching materials to nodes, the students can identify those which are immediately relevant to their progress, and by aligning assessment to nodes, the students can tell whether progress is happening. My purpose is simply to make it as clear as possible where each student is stuck and what they can do about it.

Teaching takes a variety of forms. We might write notes. We might give lectures. We might offer lab sessions or tutorials. We can arrange ad hoc help sessions. We should expect the delivery of lectures to be an admissible linearisation of the prerequisition structure: that is, lectures should also have a frontier. The students should be able to compare their own frontier with the lecture frontier, and with a fuzzed out version of the cohort’s frontiers. It’s a danger sign if a student’s frontier is strictly behind a lecture: they aren’t best placed to get much from it, at least not at first. If notes are available online in advance of lectures, we acquire the means to cue advance reading and to detect it. The more tightly aligned assessment is to the curriculum, the easier it is to prescribe remedial activities.

Moreover, at the beginning of a lecture, I should very much like to know how the class divides as red/green/blue for that topic. Too much red (or, less likely, overwhelming blue) and I should maybe reconsider giving that lecture. Of course, you only get useful tips like that if you’re doing enough assessment (little and often is better than in overwhelming clumps) to gauge the frontier accurately.

## what’s a student’s online view?

When a student visits a node, they should see its hierarchy of ancestors and its immediate subnode structure. The status of all of these nodes should be clear. The rest of the information before them should answer as readily as possible the question ‘What can I usefully do?’. They should see a chronological programme of activities for that node, first into the future and then into the past: lectures may have reading associated; tutorials may have homework handins associated. A node may have a number of assessment items associated, some of which may be active: the list of all assessment items at or below the current node should be readily accessible. Every piece of teaching material should be accompanied by the means to record engagement and comprehension, and to leave a note for the attention of staff.

## more later

I want to stop writing for now and publish the story so far. I haven’t said much about assessment mechanisms or what mastery might consist of: these issues I have also been thinking about. But I hope I’ve at least given some reason to believe that there is something to be gained by refining the graph structure of the curriculum when it comes to modelling the progress made by learners and promoting useful activity, for us and for them.

## Hannah’s sweets and Archie’s socks

This question, from a GCSE maths exam, has been causing a stir. There’s been quite a bit about it in the media, although you’ll struggle to find anyone who can be arsed to print part (b). *Why is that?*

There are n sweets in a bag.

6 of the sweets are orange.

The rest of the sweets are yellow.Hannah takes at random a sweet from the bag.

She eats the sweet.Hannah then takes at random another sweet from the bag.

She eats the sweet.The probability that Hannah eats two orange sweets is 1/3.

(a) Show that n

^{2}– n – 90 = 0(b) Solve n

^{2}– n – 90 = 0 to find the value of n

What makes this question so hard? Might we ask it differently? What’s going on? At undergraduate level (first years, mostly) I’ve been setting and marking exams: my lot are not much older than the pupils puzzling with Hannah. It’s good to try to think about these things. (It’s just possible my niece sat that exam, which is an extra cue to be less reflexive and more reflective.)

Correspondingly, I have a variety of crackpot theories, but before I waste your time with them, let me trouble you with some more respectable theory from David Perkins, specifically his paper Beyond Understanding.

It’s not just a matter of what you know. What does what you know enable you to do? Perkins characterises an escalating scale of what I might call knowledge weaponisation (which remind me of the old joke about solving, stating or colouring in the Navier-Stokes equations):

**possessive**knowledge can be recited and applied to routine calculations**performative**knowledge can be called upon more flexibly to solve problems**proactive**knowledge can be deployed outside its domain of obvious applicability

The “Hannah’s sweets” puzzle clearly demands more than *possessive* knowledge of basic probability, algebraic manipulations of fractions, and solving quadratics. Given part (a), part (b) should be a routine problem. It’s part (a) that sticks out as a bit peculiar, pulling an equation like a rabbit from a hat then asking the students to find a hatful of rabbit shit. It doesn’t demand anything they don’t know, but it isn’t a routine calculation.

If I take my specs off, metaphorically I mean, the detail becomes indistinct but the general story arc remains perceptible. The question goes like this

- There are some unknown quantities, described in words.
*If they are not represented by named variables, then introduce named variables for them.* - There are some mathematical facts about those quantities trapped in a slab of prose.
*Scan the prose for quantities you can represent as formulae. Extract constraints on those formulae.* - Deduce the values of the unknown quantities.
*Solve the constraints by whatever means is appropriate.*

That’s like lots of questions. Once upon a time, there might have been a question like this.

Archie own six black socks, some number of pink socks and no other socks. He is too lazy to pair them up before he puts them in his sock drawer all in a jumble. He always gets dressed in the dark, picking two socks at random. The day after laundry day, when all the socks are available, he typically wears a pair of black socks one third of the time. How many pink socks does Archie have?

My question, with its stereotyped male protagonist and bias against pink, probably needs a bit of rethinking before we can issue it to the youth of today. But it starts by introducing a mystery quantity, gives us some chat which constrains that quantity, then finishes by asking us to find that quantity. That is, it’s coded as an instance of that standard problem type. Moreover, it doesn’t name a variable, let alone confront us with a quadratic formula, and it doesn’t specifically invoke the concept of probabilty. Socks motivate the interest in “two the same” better than sweets, but they might predispose us to guess that they are found in even numbers. The latter is a good red herring, and besides, Archie has probably lost the odd sock here and there. But I digress. The issue I’d like to open is the relative difficulty, from a human perspective, of “Hannah’s sweets” and “Archie’s socks”.

You see, my crackpot theory is that “Hannah’s sweets” is knocked off an older question uncannily like “Archie’s socks”, but when revising it, the examiners named the number of sweets and added the intermediate goal to establish the quadratic constraint in an attempt to make the problem require less initiative. If so, I think they also made it more intimidating. However, they also made the last phase of the problem, part (b), completely routine, in the hope that people without the initiative for part (a) would still collect some marks. Whether students notice that they can do part (b) without a clue for part (a) is another matter. Many students stop doing a question at the first part which causes trouble and read only on demand, which means that in “problem” questions, they miss the clues in the later question parts for what might constitute useful information in the prose.

The media have, by and large, not printed part (b) of the question. They make it look like the question is “there are some sweets; prove an equation”, rather than “there are some sweets; find out how many”. See, for example, Alex Bellos’s piece in the Grauniad. Why is this? Crackpot theory time again.

- There’s one photo of the question paper which seems to show up a lot. The Guardian credits its source as Twitter. It cuts off after part (a). The story is clearly less time-consuming to deliver just as Twitter churnalism: bothering to find out what the whole question might have been is extra work and who likes doing that? (I got part (b) from the BBC, which shows what you can achieve with a mandatory licence fee.)
- The impact is to make a
*dishonest*protest about the extent to which the problem is undermotivated. That shows good politician skills on the part of the original poster, and is a good way to increase the sensation-value of the story. Many secondhand reports compound the problem by publishing not this image but its transcription in plain text, dropping the (a) label to make it look like the question stops at the quadratic equation. They also write n^2 rather than n^{2}, necessitating a comment on notation, making it look as if the original question introduced weird notation on the fly, when in fact the extra bend is introduced in quotation.

Especially in its curtailed form, “Hannah’s sweets” looks superficially weirder than “Archie’s socks” because some al-gebra terrorism has been added and the purpose has been taken away.

What both questions have in common is that they are *in code*. They must be decoded before the algebra can begin. “Hannah’s sweets” uses weaker encryption than “Archie’s socks”. I’m afraid that’s because I wrote “Archie’s socks”: here are some more excerpts from my archive. I quote them selectively, not to give you whole questions, just a sense of my style of distraction.

Disco Mary is rummaging through a collection of old electronic spare parts. She finds a multicolour lamp which has three Boolean inputs, labelled red, green and blue. … The green input signal is connected to the output of a T flip-flop. The blue input signal is connected to the output of another T flip-flop. … Mary’s favourite song is “What A Blue Sunset” by Ray Dayglo and the Thunderclaps, so she decides to wire the lamp and flip-flops to make a repeating sequence of colours, changing with each clock cycle: cyan, blue, yellow, red, then back to cyan and round again.

Madame Arlene teaches the Viennese Waltz. When she is teaching beginners, she finds that she has to shout “1, 2, 3, 1, 2, 3,. . . ” repeatedly for ever, to keep her pupils in time. When she teaches advanced classes, she doesn’t need to shout, because they listen to the music. … Madame Arlene builds a shouting machine and wires it into her music player: it gets a 2-bit unsigned binary number as its input and a clock signal generated by the player in time with the music. At each tick, the shouting machine checks its input: if it gets 0, it shouts nothing; otherwise it shouts the number it gets. … The challenge is to generate the input signal for the shouting machine, so that both counting and silent behaviours can happen.

A control panel has four switches on it, named S, T, U and V, respectively. Each switch sends a 0 signal when its handle points downward and a 1 signal when its handle points upward. This is the current setting: [S down, T up, U down, V up] The control panel is wired both to the lock of a safe and to a burglar alarm. The setting on the control panel represents a number in 4-bit two’s complement binary notation. … To open the safe, you need to construct a circuit which connects the switches S, T, U, V to the release control R which will set R = 1 if and only if the correct combination is entered. An informant has discovered that the correct combination is -6. … [alarm circuit diagram] … You can flick only one switch at a time. You need to flick switches in sequence to change from the current setting to the setting with the correct combination. You must not set off the alarm.

Professor Garble is a researcher in multicore programming techniques, attempting to explain a recent trend in processor performance. ‘Moore’s Law is finished! That’s why processor clock speed has levelled off. And that’s why processors have exponentially increasing numbers of cores.’ Tick the box or boxes for whichever of the following is true. … [] He is correct in none of the above ways.

That’s just the sort of thing that occurs to me in the bath.

All of these questions require you to extract a model of what is going on from some chat. That’s the skill I am trying to test. I make the chat blatantly spurious partly to be clear that it is a “decode the puzzle” question, but mostly because I am habitually facetious. It occurs to me that maybe exams should be no place for facetiousness from them or from me: why should I have a laugh when they’re not having quite so much fun? This question style is at least routinely visited upon them in the course of the class: if you have paid attention to past papers, it is exactly what you expect. Still, I can see how it could be exclusionary, like an in-joke that you detect but don’t get. I think perhaps that I should do less of that stuff in exams and more in class, where there’s less pressure and we can afford a bit of a laugh while learning to decode problems.

But I really do digress. The point about encoded problem questions is that you need to recognize when you are being told Something Important, and what that something is. In that respect, it’s a lot like doing a cryptic crossword: crossword clues use a *stylised* language that takes time and practice to acquire. I was taught to do crosswords by my father’s colleagues, who always appointed me the writer-inner for the lunchtime crossword and were happy to indulge my queries: what signalled an anagram, an inclusion, a pun, and so on. In the same way, I instinctively decode the declaration “The probability that Hannah eats two orange sweets is 1/3.” as the instruction “Write a formula for the probability that Hannah eats two orange sweets and set it equal to 1/3.”. It’s familiarity with this sort of code which pushes puzzles like “Hannah’s sweets” back down Perkins’s scale of Ps. And that’s teachable. When I see part (a), I’m a bit spooked, and I think “Why are they asking me to deduce this equation? I already have an equation? Why are they not just asking me what n is? Ah, that’s part (b). Oh well, I expect I should be able to deduce the part (a) equation from mine by doing a wee bit of algebra.”. I’m already on course, and part (a) threatens to throw me off it: that’s why I think “Archie’s socks” is easier. I’m reminded of Whitehead’s remark:

It is a profoundly erroneous truism, repeated by all copy-books and by eminent people when they are making speeches, that we should cultivate the habit of thinking of what we are doing. The precise opposite is the case. Civilization advances by extending the number of important operations which we can perform without thinking about them. Operations of thought are like cavalry charges in a battle — they are strictly limited in number, they require fresh horses, and must only be made at decisive moments.

I agree, sending for the thought-cavalry is a desperate measure, but it would be a shame if an examination intended to assess knowledge and reward the performative (or even proactive) were entirely devoid of decisive moments. For “Hannah’s sweets”, the thought-cavalry can be avoided if you recognize the way in which you are being instructed. Moreover, thought-cavalry tactics, when needed, are greatly assisted by the key extra exam-puzzle knowledge that the question contains a sufficiency of clues: we don’t get that luxury in real problem-solving. I often tell students that my role is to be simultaneously Blofeld and Q: the fact that they are in a James Bond movie means that there is necessarily a strategy to escape Blofeld’s menaces with Q’s gadgets. I do not expect them to die. In fact, I’m trying to arrange their survival. In that sense, “Hannah’s sweets” already comes with the expectation that whatever information is packed in the opening prose must be sufficient to ensure the equation demanded of us: the game is to unpack it.

We could present “Hannah’s sweets” in a more decoded form. Here’s a kind of stream-of-consciousness translation.

Hannah has a number of sweets. It doesn’t matter that they are sweets or that Hannah is called Hannah. What matters is that there are things and we are going to find out how many: call that n. 6 of the sweets are orange and the rest are yellow. There are two different sorts of thing: “orange” and “yellow” are arbitrary labels whose only role is to be distinct. You are told that there are 6 orange sweets, but not how many yellow sweets (perhaps call that y, so n = 6 + y). Hannah selects two sweets at random without replacement. It doesn’t matter whether she eats them or throws them at pigeons. It does matter that the two selections are random, and that the second selection is made from one fewer than the first. Their randomness tells you that you can base probability on proportion and that selections are independent, so you can compute the probability of a particular pair selection by multiplying the probabilities of the separate selections. You are told the probability of a particular outcome: she selects two orange sweets with probability 1/3. The probability of getting two oranges clearly depends on n: write down a formula for that probability and set it equal to 1/3. Rearrange that equation (by clearing fractions) to obtain the quadratic equation n

^{2}– n – 90 = 0, then factorize the equation to obtain two candidate solutions, only one of which makes sense.

I think it’s reasonable to teach that decoding skill and to expect school pupils to acquire it. The persistent complaint that they haven’t seen anything like it on a past paper seems wide of the mark, and more worryingly as a perceived entitlement to be tested only on *possessive* knowledge. We should be clear in our rejection of that entitlement. But we should also acknowlege that the boundaries between Perkins’s kinds of knowledge is fluid, and that our responsibility as teachers is to rearrange their positions relative to the student by acting on both, in the direction marked “progress” by Whitehead.