OUTPUT

The blog of Maxime Kjaer

CS-452 Foundations of Software

Work in progress

Writing a parser with parser combinators

In Scala, you can (ab)use the operator overload to create an embedded DSL (EDSL) for grammars. While a grammar may look as follows in a grammar description language (Bison, Yak, ANTLR, …):

1
2
3
Expr ::= Term {'+' Term | '−' Term}
Term ::= Factor {'∗' Factor | '/' Factor}
Factor ::= Number | '(' Expr ')'

In Scala, we can model it as follows:

1
2
3
def expr: Parser[Any] = term ~ rep("+" ~ term | "−" ~ term)
def term: Parser[Any] = factor ~ rep("∗" ~ factor | "/" ~ factor)
def factor: Parser[Any] = "(" ~ expr ~ ")" | numericLit

This is perhaps a little less elegant, but allows us to encode it directly into our language, which is often useful for interop.

The ~, |, rep and opt are parser combinators. These are primitives with which we can construct a full parser for the grammar of our choice.

Boilerplate

First, let’s define a class ParseResult[T] as an ad-hoc monad; parsing can either succeed or fail:

1
2
3
sealed trait ParseResult[T]
case class Success[T](result: T, in: Input) extends ParseResult[T]
case class Failure(msg : String, in: Input) extends ParseResult[Nothing]

👉 Nothing is the bottom type in Scala; it contains no members, and nothing can extend it

Let’s also define the tokens produced by the lexer (which we won’t define) as case classes extending Token:

1
2
3
4
5
sealed trait Token
case class Keyword(chars: String) extends Token
case class NumericLit(chars: String) extends Token
case class StringLit(chars: String) extends Token
case class Identifier(chars: String) extends Token

Input into the parser is then a lazy stream of tokens (with positions for error diagnostics, which we’ll omit here):

1
type Input = Reader[Token]

We can then define a standard, sample parser which looks as follows on the type-level:

1
2
3
class StandardTokenParsers {
    type Parser = Input => ParseResult
}

The basic idea

For each language (defined by a grammar symbol S), define a function f that, given an input stream i (with tail i'):

  • if a prefix of i is in S, return Success(Pair(x, i')), where x is a result for S
  • otherwise, return Failure(msg, i), where msg is an error message string

The first is called success, the second is failure. We can compose operations on this somewhat conveniently, like we would on a monad (like Option).

Simple parser primitives

All of the above boilerplate allows us to define a parser, which succeeds if the first token in the input satisfies some given predicate pred. When it succeeds, it reads the token string, and splits the input there.

1
2
3
4
5
def token(kind: String)(pred: Token => boolean) = new Parser[String] {
    def apply(in : Input) =
        if (pred(in.head)) Success(in.head.chars, in.tail)
        else Failure(kind + " expected ", in)
}

We can use this to define a keyword parser:

1
2
3
4
implicit def keyword(chars: String) = token("'" + chars + "'") {
    case Keyword(chars1) => chars == chars1
    case _ => false
}

Marking it as implicit allows us to write keywords as normal strings, where we can omit the keyword call (this helps us simplify the notation in our DSL; we can write "if" instead of keyword("if")).

We can make other parsers for our other case classes quite simply:

1
2
3
def numericLit = token("number")( .isInstanceOf[NumericLit])
def stringLit = token("string literal")( .isInstanceOf[StringLit])
def ident = token("identifier")( .isInstanceOf[Identifier])

Parser combinators

We are going to define the following parser combinators:

  • ~: sequential composition
  • <~, >~: sequential composition, keeping left / right only
  • |: alternative
  • opt(X): option (like a ? quantifier in a regex)
  • rep(X): repetition (like a * quantifier in a regex)
  • repsep(P, Q): interleaved repetition
  • ^^: result conversion (like a map on an Option)
  • ^^^: constant result (like a map on an Option, but returning a constant value regardless of result)

But first, we’ll write some very basic parser combinators: success and failure, that respectively always succeed and always fail:

1
2
3
4
5
6
7
def success[T](result: T) = new Parser[T] {
    def apply(in: Input) = Success(result, in)
}

def failure(msg: String) = new Parser[Nothing] {
    def apply(in: Input) = Failure(msg, in)
}

All of the above are methods on a Parser[T] class. Thanks to infix space notation in Scala, we can denote x.y(z) as x y z, which allows us to simplify our DSL notation; for instance A ~ B corresponds to A.~(B).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
abstract class Parser[T] {
    // An abstract method that defines the parser function
    def apply(in : Input): ParseResult

    def ~[U](rhs: Parser[U]) = new Parser[T ~ U] {
        def apply(in: Input) = Parser.this(in) match {
            case Success(x, tail) => rhs(tail) match {
                case Success(y, rest) => Success(new ~(x, y), rest)
                case failure => failure
            }
            case failure => failure
        }
    }

    def |(rhs: => Parser[T]) = new Parser[T] {
        def apply(in : Input) = Parser.this(in) match {
            case s1 @ Success(_, _) => s1
            case failure => rhs(in)
        }
    }

    def ^^[U](f: T => U) = new Parser[U] {
        def apply(in : Input) = Parser.this(in) match {
            case Success(x, tail) => Success(f(x), tail)
            case x => x
        }
    }

    def ^^^[U](r: U): Parser[U] = ^^(x => r)
}

👉 In Scala, T ~ U is syntactic sugar for ~[T, U], which is the type of the case class we’ll define below

For the ~ combinator, when everything works, we’re using ~, a case class that is equivalent to Pair, but prints the way we want to and allows for the concise type-level notation above.

1
2
3
case class ~[T, U](_1 : T, _2 : U) {
    override def toString = "(" + _1 + " ~ " + _2 +")"
}

At this point, we thus have two different meanings for ~: a function ~ that produces a Parser, and the ~(a, b) case class pair that this parser returns (all of this is encoded in the function signature of the ~ function).

Note that the | combinator takes the right-hand side parser as a call-by-name argument. This is because we don’t want to evaluate it unless it is strictly needed—that is, if the left-hand side fails.

^^ is like a map operation on Option; P ^^ f succeeds iff P succeeds, in which case it applies the transformation f on the result of P. Otherwise, it fails.

Shorthands

We can now define shorthands for common combinations of parser combinators:

1
2
3
4
5
6
7
def opt[T](p : Parser[T]): Parser[Option[T]] = p ^^ Some | success(None)

def rep[T](p : Parser[T]): Parser[List[T]] = 
    p ~ rep(p) ^^ { case x ~ xs => x :: xs } | success(Nil)

def repsep[T, U](p : Parser[T], q : Parser[U]): Parser[List[T]] = 
    p ~ rep(q ~> p) ^^ { case r ~ rs => r :: rs } | success(Nil)

Note that none of the above can fail. They may, however, return None or Nil wrapped in success.

As an exercise, we can implement the rep1(P) parser combinator, which corresponds to the + regex quantifier:

1
def rep1[T](p: Parser[T]) = p ~ rep(p)

Example: JSON parser

We did not mention lexical.delimiters and lexical.reserved in the above, and for the sake of brevity, we omit the implementation of stringLit and numericLit.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
object JSON extends StandardTokenParsers {
    lexical.delimiters += ("{", "}", "[", "]", ":")
    lexical.reserved += ("null", "true", "false")

    // Return Map
    def obj: Parser[Any] = "{" ~ repsep(member, ",") ~ "}" ^^ (ms => Map() ++ ms)

    // Return List
    def arr: Parser[Any] = "[" ~> repsep(value, ",") <~ "]"

    // Return name/value pair:
    def member: Parser[Any] = stringLit ~ ":" ~ value ^^ {
        case name ~ ":" ~ value => (name, value) 
    }

    // Return correct Scala type
    def value: Parser[Any] =
          obj 
        | arr 
        | stringLit
        | numericLit ^^ (_.toInt)
        | "null" ^^^ null
        | "true" ^^^ true
        | "false" ^^^ false
}

The trouble with left-recursion

Parser combinators work top-down and therefore do not allow for left-recursion. For example, the following would go into an infinite loop, where the parser keeps recursively matching the same token unto expr:

1
def expr = expr ~ "-" ~ term

Let’s take a look at an arithmetic expression parser:

1
2
3
4
5
6
object Arithmetic extends StandardTokenParsers {
    lexical.delimiters ++= List("(", ")", "+", "−", "∗", "/")
    def expr: Parser[Any] = term ~ rep("+" ~ term | "−" ~ term)
    def term: Parser[Any] = factor ~ rep("∗" ~ factor | "/" ~ factor)
    def factor: Parser[Any] = "(" ~ expr ~ ")" | numericLit
}

This definition of expr, namely term ~ rep("-" ~ term) produces a right-leaning tree. For instance, 1 - 2 - 3 produces 1 ~ List("-" ~ 2, ~ "-" ~ 3).

The solution is to combine calls to rep with a final foldLeft on the list:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
object Arithmetic extends StandardTokenParsers {
    lexical.delimiters ++= List("(", ")", "+", "−", "∗", "/")
    def expr: Parser[Any] = term ~ rep("+" ~ term | "−" ~ term) ^^ reduceList
    def term: Parser[Any] = factor ~ rep("∗" ~ factor | "/" ~ factor) ^^ reduceList
    def factor: Parser[Any] = "(" ~ expr ~ ")" | numericLit

    private def reduceList(list: Expr ~ List[String ~ Expr]): Expr = list match {
        case x ~ xs => (x foldLeft ps)(reduce)
    }

    private def reduce(x: Int, r: String ~ Int) = r match {
        case "+" ~ y => x + y
        case "−" ~ y => x  y
        case "∗" ~ y => x  y
        case "/" ~ y => x / y
        case => throw new MatchError("illegal case: " + r)
    }
}

👉 It used to be that the standard library contained parser combinators, but those are now a separate module. This module contains a chainl (chain-left) method that reduces after a rep for you.

Arithmetic expressions — abstract syntax and proof principles

This section follows Chapter 3 in TAPL.

Basics of induction

Ordinary induction is simply:

Suppose P is a predicate on natural numbers.
Then:
	If P(0)
	and, for all i, P(i) implies P(i + 1)
	then P(n) holds for all n

We can also do complete induction:

Suppose P is a predicate on natural numbers.
Then:
	If for each natural number n,
	given P(i) for all i < n we can show P(n)
	then P(n) holds for all n

It proves exactly the same thing as ordinary induction, it is simply a restated version. They’re interderivable; assuming one, we can prove the other. Which one to use is simply a matter of style or convenience. We’ll see some more equivalent styles as we go along.

Mathematical representation of syntax

Let’s assume the following grammar:

1
2
3
4
5
6
7
8
t ::= 
    true
    false
    if t then t else t
    0
    succ t
    pred t
    iszero t

What does this really define? A few suggestions:

  • A set of character strings
  • A set of token lists
  • A set of abstract syntax trees

It depends on how you read it; a grammar like the one above contains information about all three.

However, we are mostly interested in the ASTs. The above grammar is therefore called an abstract grammar. Its main purpose is to suggest a mapping from character strings to trees.

For our use of these, we won’t be too strict with these. For instance, we’ll freely use parentheses to disambiguate what tree we mean to describe, even though they’re not strictly supported by the grammar. What matters to us here aren’t strict implementation semantics, but rather that we have a framework to talk about ASTs. For our purposes, we’ll consider that two terms producing the same AST are basically the same; still, we’ll distinguish terms that only have the same evaluation result, as they don’t necessarily have the same AST.

How can we express our grammar as mathematical expressions? A grammar describes the legal set of terms in a program by offering a recursive definition. While recursive definitions may seem obvious and simple to a programmer, we have to go through a few hoops to make sense of them mathematically.

Mathematical representation 1

We can use a set of terms. The grammar is then the smallest set such that:

  1. ,
  2. If then ,
  3. If then we also have .

Mathematical representation 2

We can also write this somewhat more graphically:

This is exactly equivalent to representation 1, but we have just introduced a different notation. Note that “the smallest set closed under…” is often not stated explicitly, but implied.

Mathematical representation 3

Alternatively, we can build up our set of terms as an infinite union:

We can thus build our final set as follows:

Note that we can “pull out” the definition into a generating function :

The generating function is thus defined as:

Each function takes a set of terms as input and produces “terms justified by ” as output; that is, all terms that have the items of as subterms.

The set is said to be closed under F or F-closed if .

The set of terms as defined above is the smallest F-closed set. If is another F-closed set, then .

Comparison of the representations

We’ve seen essentially two ways of defining the set (as representation 1 and 2 are equivalent, but with different notation):

  1. The smallest set that is closed under certain rules. This is compact and easy to read.
  2. The limit of a series of sets. This gives us an induction principle on which we can prove things on terms by induction.

The first one defines the set “from above”, by intersecting F-closed sets.

The second one defines it “from below”, by starting with and getting closer and closer to being F-closed.

These are equivalent (we won’t prove it, but Proposition 3.2.6 in TAPL does so), but can serve different uses in practice.

Induction on terms

First, let’s define depth: the depth of a term is the smallest such that .

The way we defined , it gets larger and larger for increasing ; the depth of a term gives us the step at which is introduced into the set.

We see that if a term is in , then all of its immediate subterms must be in , meaning that they must have smaller depth.

This justifies the principle of induction on terms, or structural induction. Let P be a predicate on a term:

If, for each term s,
    given P(r) for all immediate subterms r of s we can show P(s)
    then P(t) holds for all t

All this says is that if we can prove the induction step from subterms to terms (under the induction hypothesis), then we have proven the induction.

We can also express this structural induction using generating functions, which we introduced previously.

Suppose T is the smallest F-closed set.
If, for each set U,
    from the assumption "P(u) holds for every u ∈ U",
    we can show that "P(v) holds for every v ∈ F(U)"
then
    P(t) holds for all t ∈ T

Why can we use this?

  • We assumed that was the smallest F-closed set, which means that for any other F-closed set .
  • Showing the pre-condition (“for each set , from the assumption…”) amounts to showing that the set of all terms satisfying (call it ) is itself an F-closed set.
  • Since , every element of satisfies .

Inductive function definitions

An inductive definition is used to define the elements in a set recursively, as we have done above. The recursion theorem states that a well-formed inductive definition defines a function. To understand what being well-formed means, let’s take a look at some examples.

Let’s define our grammar function a little more formally. Constants are the basic values that can’t be expanded further; in our example, they are true, false, 0. As such, the set of constants appearing in a term , written , is defined recursively as follows:

This seems simple, but these semantics aren’t perfect. First off, a mathematical definition simply assigns a convenient name to some previously known thing. But here, we’re defining the thing in terms of itself, recursively. And the semantics above also allow us to define ill-formed inductive definitions:

The last rule produces infinitely large rules (if we implemented it, we’d expect some kind of stack overflow). We’re missing the rules for if-statements, and we have a useless rule for 0, producing empty sets.

How do we tell the difference between a well-formed inductive definition, and an ill-formed one as above? What is well-formedness anyway?

What is a function?

A relation over is a subset of , where the Cartesian product is defined as:

A function from (domain) to (co-domain) can be viewed as a two-place relation, albeit with two additional properties:

  • It is total:
  • It is deterministic:

Totality ensures that the A domain is covered, while being deterministic just means that the function always produces the same result for a given input.

Induction example 1

As previously stated, is a relation. It maps terms (A) into the set of constants that they contain (B). The induction theorem states that it is also a function. The proof is as follows.

is total and deterministic: for each term there is exactly one set of terms such that 1 . The proof is done by induction on .

To be able to apply the induction principle for terms, we must first show that for an arbitrary term , under the following induction hypothesis:

For each immediate subterm of , there is exactly one set of terms such that

Then the following needs to be proven as an induction step:

There is exactly one set of terms such that

We proceed by cases on :

  • If is , or

    We can immediately see from the definition that of that there is exactly one set of terms ) such that .

    This constitutes our base case.

  • If is , or

    The immediate subterm of is , and the induction hypothesis tells us that there is exactly one set of terms such that . But then it is clear from the definition that there is exactly one set of terms such that .

  • If is

    The induction hypothesis tells us:

    • There is exactly one set of terms such that
    • There is exactly one set of terms such that
    • There is exactly one set of terms such that

    It is clear from the definition of that there is exactly one set such that .

This proves that is indeed a function.

But what about ? It is also a relation, but it isn’t a function. For instance, we have and , which violates determinism. To reformulate this in terms of the above, there are two sets such that , namely and .

Note that there are many other problems with , but this is sufficient to prove that it isn’t a function.

Induction example 2

Let’s introduce another inductive definition:

We’d like to prove that the number of distinct constants in a term is at most the size of the term. In other words, that

The proof is by induction on :

  • is a constant; , or

    The proof is immediate. For constants, the number of constants and the size are both one:

  • is a function; , or

    By the induction hypothesis, .

    We can then prove the proposition as follows:

  • is an if-statement:

    By the induction hypothesis, , and .

    We can then prove the proposition as follows:

Operational semantics and reasoning

Evaluation

Suppose we have the following syntax

1
2
3
4
t ::=                  // terms
    true                   // constant true
    false                  // constant false 
    if t then t else t     // conditional

The evaluation relation is the smallest relation closed under the following rules.

The following are computation rules, defining the “real” computation steps:

The following is a congruence rule, defining where the computation rule is applied next:

We want to evaluate the condition before the conditional clauses in order to save on evaluation; we’re not sure which one should be evaluated, so we need to know the condition first.

Derivations

We can describe the evaluation logically from the above rules using derivation trees. Suppose we want to evaluate the following (with parentheses added for clarity): if (if true then true else false) then false else true.

In an attempt to make all this fit onto the screen, true and false have been abbreviated T and F in the derivation below, and the then keyword has been replaced with a parenthesis notation for the condition.

The final statement is a conclusion. We say that the derivation is a witness for its conclusion (or a proof for its conclusion). The derivation records all reasoning steps that lead us to the conclusion.

Inversion lemma

We can introduce the inversion lemma, which tells us how we got to a term.

Suppose we are given a derivation witnessing the pair in the evaluation relation. Then either:

  1. If the final rule applied in was , then we have and for some and
  2. If the final rule applied in was , then we have and for some and
  3. If the final rule applied in was , then we have and , for some . Moreover, the immediate subderivation of witnesses .

This is super boring, but we do need to acknowledge the inversion lemma before we can do induction proofs on derivations. Thanks to the inversion lemma, given an arbitrary derivation with conclusion , we can proceed with a case-by-case analysis on the final rule used in the derivation tree.

Let’s recall our definition of the size function. In particular, we’ll need the rule for if-statements:

We want to prove that if , then .

  1. If the final rule applied in was , then we have and , and the result is immediate from the definition of
  2. If the final rule applied in was , then we have and , and the result is immediate from the definition of
  3. If the final rule applied in was , then we have and . In this case, is witnessed by a derivation . By the induction hypothesis, , and the result is then immediate from the definition of

Abstract machines

An abstract machine consists of:

  • A set of states
  • A transition relation of states, written

means that evaluates to in one step. Note that is a relation, and that is shorthand for . Often, this relation is a partial function (not necessarily covering the domain A; there is at most one possible next state). But without loss of generality, there may be many possible next states, determinism isn’t a criterion here.

Normal forms

A normal form is a term that cannot be evaluated any further. More formally, a term is a normal form if there is no such that . A normal form is a state where the abstract machine is halted; we can regard it as the result of a computation.

Values that are normal form

Previously, we intended for our values (true and false) to be exactly that, the result of a computation. Did we get that right?

Let’s prove that a term is a value it is in normal form.

  • The direction is immediate from the definition of the evaluation relation .
  • The direction is more conveniently proven as its contrapositive: if is not a value, then it is not a normal form, which we can prove by induction on the term .

    Since is not a value, it must be of the form . If is directly true or false, then or apply, and we are done.

    Otherwise, if where isn’t a value, by the induction hypothesis, there is a such that . Then rule yields , which proves that is not in normal form.

Values that are not normal form

Let’s introduce new syntactic forms, with new evaluation rules.

1
2
3
4
5
6
7
8
9
10
11
t ::=        // terms
    0            // constant 0
    succ t       // successor
    pred t       // predecessor 
    iszero t     // zero test

v ::=  nv     // values

nv ::=        // numeric values
    0             // zero value
    succ nv       // successor value

The evaluation rules are given as follows:

All values are still normal forms. But are all normal forms values? Not in this case. For instance, succ true, iszero true, etc, are normal forms. These are stuck terms: they are in normal form, but are not values. In general, these correspond to some kind of type error, and one of the main purposes of a type system is to rule these kinds of situations out.

Multi-step evaluation

Let’s introduce the multi-step evaluation relation, . It is the reflexive, transitive closure of single-step evaluation, i.e. the smallest relation closed under these rules:

In other words, it corresponds to any number of single consecutive evaluations.

Termination of evaluation

We’ll prove that evaluation terminates, i.e. that for every term there is some normal form such that .

First, let’s recall our proof that . Now, for our proof by contradiction, assume that we have an infinite-length sequence such that:

But this sequence cannot exist: since is a finite, natural number, we cannot construct this infinite descending chain from it. This is a contradiction.

Most termination proofs have the same basic form. We want to prove that the relation is terminating — that is, there are no infinite sequences such that for each . We proceed as follows:

  1. Choose a well-suited set with partial order such that there are no infinite descending chains in . Also choose a function .
  2. Show
  3. Conclude that are no infinite sequences such that for each . If there were, we could construct an infinite descending chain in .

As a side-note, partial order is defined as the following properties:

  1. Anti-symmetry:
  2. Transitivity:

We can add a third property to achieve total order, namely .

Lambda calculus

Lambda calculus is Turing complete, and is higher-order (functions are data). In lambda calculus, all computation happens by means of function abstraction and application.

Lambda calculus is isomorphic to Turing machines.

Suppose we wanted to write a function plus3 in our previous language:

plus3 x = succ succ succ x

The way we write this in lambda calculus is:

is written x => t in Scala, or fun x -> t in OCaml. Application of our function, say plus3(succ 0), can be written as:

Abstraction over functions is possible using higher-order functions, which we call -abstractions. An example of such an abstraction is the function below, which takes an argument and uses it in the function position.

If we apply to an argument like , we can just use the substitution rule to see how that defines a new function.

Another example: the double function below takes two arguments, as a curried function would. First, it takes the function to apply twice, then the argument on which to apply it, and then returns .

Pure lambda calculus

Once we have -abstractions, we can actually throw out all other language primitives like booleans and other values; all of these can be expressed as functions, as we’ll see below. In pure lambda-calculus, everything is a function.

Variables will always denote a function, functions always take other functions as parameters, and the result of an evaluation is always a function.

The syntax of lambda-calculus is very simple:

1
2
3
4
t ::=      // terms, also called λ-terms
    x         // variable
    λx. t     // abstraction, also called λ-abstractions
    t t       // application

A few rules and syntactic conventions:

  • Application associates to the left, so means , not .
  • Bodies of lambda abstractions extend as far to the right as possible, so means , not

Scope

The lambda expression binds the variable , with a scope limited to . Occurrences of inside of are said to be bound, while occurrences outside are said to be free.

Let be the set of free variables in a term . It’s defined as follows:

Operational semantics

As we saw with our previous language, the rules could be distinguished into computation and congruence rules. For lambda calculus, the only computation rule is:

The notation means “the term that results from substituting free occurrences of in with ”.

The congruence rules are:

A lambda-expression applied to a value, , is called a reducible expression, or redex.

Evaluation strategies

There are alternative evaluation strategies. In the above, we have chosen call by value (which is the standard in most mainstream languages), but we could also choose:

  • Full beta-reduction: any redex may be reduced at any time. This offers no restrictions, but in practice, we go with a set of restrictions like the ones below (because coding a fixed way is easier than coding probabilistic behavior).
  • Normal order: the leftmost, outermost redex is always reduced first. This strategy allows to reduce inside unapplied lambda terms
  • Call-by-name: allows no reductions inside lambda abstractions. Arguments are not reduced before being substituted in the body of lambda terms when applied. Haskell uses an optimized version of this, call-by-need (aka lazy evaluation).

Classical lambda calculus

Classical lambda calculus allows for full beta reduction.

Confluence in full beta reduction

The congruence rules allow us to apply in different ways; we can choose between and every time we reduce an application, and this offers many possible reduction paths.

While the path is non-deterministic, is the result also non-deterministic? This question took a very long time to answer, but after 25 years or so, it was proven that the result is always the same. This is known the Church-Rosser confluence theorem:

Let be terms such that and . Then there exists a term such that and

Alpha conversion

Substitution is actually trickier than it looks! For instance, in the expression , the first occurrence of is bound (it refers to a parameter), while the second is free (it does not refer to a parameter). This is comparable to scope in most programming languages, where we should understand that these are two different variables in different scopes, and .

The above example had a variable that is both bound and free, which is something that we’ll try to avoid. This is called a hygiene condition.

We can transform a unhygienic expression to a hygienic one by renaming bound variables before performing the substitution. This is known as alpha conversion. Alpha conversion is given by the following conversion rule:

And these equivalence rules (in mathematics, equivalence is defined as symmetry and transitivity):

The congruence rules are as usual.

Programming in lambda-calculus

Multiple arguments

The way to handle multiple arguments is by currying:

Booleans

The fundamental, universal operator on booleans is if-then-else, which is what we’ll replicate to model booleans. We’ll denote our booleans as and to be able to distinguish these pure lambda-calculus abstractions from the true and false values of our previous toy language.

We want true to be equivalent to if (true), and false to if (false). The terms and represent boolean values, in that we can use them to test the truth of a boolean value:

We can consider these as booleans. Equivalently tru can be considered as a function performing (t1, t2) => if (true) t1 else t2. To understand this, let’s try to apply to two arguments:

This works equivalently for fls.

We can also do inversion, conjunction and disjunction with lambda calculus, which can be read as particular if-else statements:

  • not is a function that is equivalent to not(b) = if (b) false else true.
  • and is equivalent to and(b, c) = if (b) c else false
  • or is equivalent to or(b, c) = if (b) true else c

Pairs

The fundamental operations are construction pair(a, b), and selection pair._1 and pair._2.

  • pair is equivalent to pair(f, s) = (b => b f s)
  • When tru is applied to pair, it selects the first element, by definition of the boolean, and that is therefore the definition of fst
  • Equivalently for fls applied to pair, it selects the second element

Numbers

We’ve actually been representing numbers as lambda-calculus numbers all along! Our succ function represents what’s more formally called Church numerals.

Note that ’s implementation is the same as that of (just with renamed variables).

Every number is represented by a term taking two arguments, which are and (for “successor” and “zero”), and applies to , times. Fundamentally, a number is equivalent to the following:

With this in mind, let us implement some functions on numbers.

  • Successor : we apply the successor function to (which has been correctly instantiated with and )
  • Addition : we pass the instantiated as the zero of
  • Subtraction : we apply times to
  • Multiplication : instead of the successor function, we pass the addition by function.
  • Zero test : zero has the same implementation as false, so we can lean on that to build an iszero function. An alternative understanding is that we’re building a number, in which we use true for the zero value . If we have to apply the successor function once or more, we want to get false, so for the successor function we use a function ignoring its input and returning false if applied.

What about predecessor? This is a little harder, and it’ll take a few steps to get there. The main idea is that we find the predecessor by rebuilding the whole succession up until our number. At every step, we must generate the number and its predecessor: zero is , and all other numbers are . Once we’ve reconstructed this pair, we can get the predecessor by taking the first element of the pair.

Sidenote

The story goes that Church was stumped by predecessors for a long time. This solution finally came to him while he was at the barber, and he jumped out half shaven to write it down.

Lists

Now what about lists?

Recursion in lambda-calculus

Let’s start by taking a step back. We talked about normal forms and terms for which we terminate; does lambda calculus always terminate? It’s Turing complete, so it must be able to loop infinitely (otherwise, we’d have solved the halting problem!).

The trick to recursion is self-application:

From a type-level perspective, we would cringe at this. This should not be possible in the typed world, but in the untyped world we can do it. We can construct a simple infinite loop in lambda calculus as follows:

The expression evaluates to itself in one step; it never reaches a normal form, it loops infinitely, diverges. This is not a stuck term though; evaluation is always possible.

In fact, there are no stuck terms in pure lambda calculus. Every term is either a value or reduces further.

So it turns out that isn’t so terribly useful. Let’s try to construct something more practical:

Now, the divergence is a little more interesting:

This function is known as a Y-combinator. It still loops infinitely (though note that while it works in classical lambda calculus, it blows up in call-by-name), so let’s try to build something more useful.

To delay the infinite recursion, we could build something like a poison pill:

It can be passed around (after all, it’s just a value), but evaluating it will cause our program to loop infinitely. This is the core idea we’ll use for defining the fixed-point combinator , which allows us to do recursion. It’s defined as follows:

This looks a little intricate, and we won’t need to fully understand the definition. What’s important is mostly how it is used to define a recursive function. For instance, if we wanted to define a modulo function in our toy language, we’d do it as follows:

1
2
3
def mod(x, y) = 
    if (y > x) x
    else mod(x - y, y)

In lambda calculus, we’d define this as:

We’ve assumed that a greater-than function was available here.

More generally, we can define a recursive function as:

Equivalence of lambda terms

We’ve seen how to define Church numerals and successor. How can we prove that is equal to ?

The naive approach unfortunately doesn’t work; they do not evaluate to the same value.

This still seems very close. If we could simplify a little further, we do see how they would be the same.

The intuition behind the Church numeral representation was that a number is represented as a term that “does something times to something else”. takes a term that “does something times to something else”, and returns a term that “does something times to something else”.

What we really care about is that behaves the same as when applied to two arguments. We want behavioral equivalence. But what does that mean? Roughly, two terms and are behaviorally equivalent if there is no “test” that distinguishes and .

Let’s define this notion of “test” this a little more precisely, and specify how we’re going to observe the results of a test. We can use the notion of normalizability to define a simple notion of a test:

Two terms and are said to be observationally equivalent if they are either both normalizable (i.e. they reach a normal form after a finite number of evaluation steps), or both diverge.

In other words, we observe a term’s behavior by running it and seeing if it halts. Note that this is not decidable (by the halting problem).

For instance, and are not observationally equivalent (one diverges, one halts), while and are (they both halt).

Observational equivalence isn’t strong enough of a test for what we need; we need behavioral equivalence.

Two terms and are said to be behaviorally equivalent if, for every finite sequence of values the applications and are observationally equivalent.

This allows us to assert that true and false are indeed different:

The former returns a normal form, while the latter diverges.

Types

As previously, to define a language, we start with a set of terms and values, as well as an evaluation relation. But now, we’ll also define a set of types (denoted with a first capital letter) classifying values according to their “shape”. We can define a typing relation . We must check that the typing relation is sound in the sense that:

These rules represent some kind of safety and liveness, but are more commonly referred to as progress and preservation, which we’ll talk about later. The first one states that types are preserved throughout evaluation, while the second says that if we can type-check, then evaluation of will not get stuck.

In our previous toy language, we can introduce two types, booleans and numbers:

1
2
3
T ::=     // types
    Bool     // type of booleans
    Nat      // type of numbers

Our typing rules are then given by:

With these typing rules in place, we can construct typing derivations to justify every pair (which we can also denote as a pair) in the typing relation, as we have done previously with evaluation. Proofs of properties about the typing relation often proceed by induction on these typing derivations.

Like other static program analyses, type systems are generally imprecise. They do not always predict exactly what kind of value will be returned, but simply a conservative approximation. For instance, if true then 0 else false cannot be typed with the above rules, even though it will certainly evaluate to a number. We could of course add a typing rule for if true statements, but there is still a question of how useful this is, and how much complexity it adds to the type system, and especially for proofs. Indeed, the inversion lemma below becomes much more tedious when we have more rules.

Properties of the Typing Relation

The safety (or soundness) of this type system can be expressed by the following two properties:

  • Progress: A well-typed term is not stuck.

    If then either is a value, or else for some .

  • Preservation: Types are preserved by one-step evaluation.

    If and , then .

We will prove these later, but first we must state a few lemmas.

Inversion lemma

Again, for types we need to state the same (boring) inversion lemma:

  1. If , then .
  2. If , then .
  3. If , then , and
  4. If then
  5. If then and
  6. If then and
  7. If then and

From the inversion lemma, we can directly derive a typechecking algorithm:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
def typeof(t: Expr): T = t match {
    case True | False => Bool
    case If(t1, t2, t3) =>
        val type1 = typeof(t1)
        val type2 = typeof(t2)
        val type3 = typeof(t3)
        if (type1 == Bool && type2 == type3) type2
        else throw Error("not typable")
    case Zero => Nat
    case Succ(t1) => 
        if (typeof(t1) == Nat) Nat
        else throw Error("not typable")
    case Pred(t1) => 
        if (typeof(t1) == Nat) Nat
        else throw Error("not typable")
    case IsZero(t1) => 
        if (typeof(t1) == Nat) Bool
        else throw Error("not typable")
}

Canonical form

A simple lemma that will be useful for lemma is that of canonical forms. Given a type, it tells us what kind of values we can expect:

  1. If is a value of type Bool, then is either or
  2. If is a value of type Nat, then is a numeric value

The proof is somewhat immediate from the syntax of values.

Progress Theorem

Theorem: suppose that is a well-typed term of type . Then either is a value, or else there exists some such that .

Proof: by induction on a derivation of .

  • The , and are immediate, since is a value in these cases.
  • For , we have , with , and . By the induction hypothesis, there is some such that .

    If is a value, then rule 1 of the canonical form lemma tells us that must be either or , in which case or applies to .

    Otherwise, if , then by ,

  • For , we have .

    is a value, by rule 5 of the inversion lemma and by rule 2 of the canonical form, for some numeric value . Therefore, is a value. If , then .

  • The cases for , and are similar.

Preservation Theorem

Theorem: Types are preserved by one-step evaluation. If and , then .

Proof: by induction on the given typing derivation

  • For and , the precondition doesn’t hold (no reduction is possible), so it’s trivially true. Indeed, is already a value, either or .
  • For , there are three evaluation rules by which can be derived, depending on
    • If , then by we have , and from rule 3 of the inversion lemma and the assumption that , we have , that is
    • If , then by we have , and from rule 3 of the inversion lemma and the assumption that , we have , that is
    • If , then by the induction hypothesis, . Combining this with the assumption that and , we can apply to conclude , that is

Messing with it

Removing a rule

What if we remove ? Then pred 0 type checks, but it is stuck and is not a value; the progress theorem fails.

Changing type-checking rule

What if we change the to the following?

This doesn’t break our type system. It’s still sound, but it rejects if-else expressions that return other things than numbers (e.g. booleans). But that is an expressiveness problem, not a soundness problem; our type system disallows things that would otherwise be fine by the evaluation rules.

Adding bit

We could add a boolean to natural function bit(t). We’d have to add it to the grammar, add some evaluation and typing rules, and prove progress and preservation.

We’ll do something similar this below, so the full proof is omitted.

Simply typed lambda calculus

Simply Typed Lambda Calculus (STLC) is also denoted . The “pure” form of STLC is not very interesting on the type-level (unlike for the term-level of pure lambda calculus), so we’ll allow base values that are not functions, like booleans and integers. To talk about STLC, we always begin with some set of “base types”:

1
2
3
T ::=     // types
    Bool    // type of booleans
    T -> T  // type of functions

In the following examples, we’ll work with a mix of our previously defined toy language, and lambda calculus. This will give us a little syntactic sugar.

1
2
3
4
5
6
7
8
9
10
11
12
t ::=                // terms
    x                   // variable
    λx. t               // abstraction
    t t                 // application
    true                // constant true
    false               // constant false
    if t then t else t  // conditional

v ::=   // values
    λx. t  // abstraction value
    true   // true value
    false  // false value

Type annotations

We will annotate lambda-abstractions with the expected type of the argument, as follows:

We could also omit it, and let type inference do the job (as in OCaml), but for now, we’ll do the above. This will make it simpler, as we won’t have to discuss inference just yet.

Typing rules

In STLC, we’ve introduced abstraction. To add a typing rule for that, we need to encode the concept of an environment , which is a set of variable assignments. We also introduce the “turnstile” symbol , meaning that the environment can verify the right hand-side typing, or that must imply the right-hand side.

This additional concept must be taken into account in our definition of progress and preservation:

  • Progress: If , then either is a value or else for some
  • Preservation: If and , then

To prove these, we must take the same steps as above. We’ll introduce the inversion lemma for typing relations, and restate the canonical forms lemma in order to prove the progress theorem.

Inversion lemma

Let’s start with the inversion lemma.

  1. If then
  2. If then
  3. If then and .
  4. If then
  5. If then for some with
  6. If then there is some type such that and .

Canonical form

The canonical forms are given as follows:

  1. If is a value of type Bool, then it is either or
  2. If is a value of type then has the form

Progress

Finally, we get to prove the progress by induction on typing derivations.

Theorem: Suppose that is a closed, well typed term (that is, for some type ). Then either is a value, or there is some such that .

  • For boolean constants, the proof is immediate as is a value
  • For variables, the proof is immediate as is closed, and the precondition therefore doesn’t hold
  • For abstraction, the proof is immediate as is a value
  • Application is the only case we must treat.

    Consider , with and .

    By the induction hypothesis, is either a value, or it can make a step of evaluation. The same goes for .

    If can reduce, then rule applies to . Otherwise, if it is a value, and can take a step, then applies. Otherwise, if they are both values (and we cannot apply -reduction), then the canonical forms lemma above tells us that has the form , and so rule applies to .

Preservation

Theorem: If and then .

Proof: by induction on typing derivations. We proceed on a case-by-case basis, as we have done so many times before. But one case is hard: application.

For , such that and , and where , we want to show .

To do this, we must use the inversion lemma for evaluation (note that we haven’t written it down for STLC, but the idea is the same). There are three subcases for it, starting with the following:

The left-hand side is , and the right-hand side of application is a value . In this case, we know that the result of the evaluation is given by .

And here, we already run into trouble, because we do not know about how types act under substitution. We will therefore need to introduce some lemmas.

Weakening lemma

Weakening tells us that we can add assumptions to the context without losing any true typing statements:

If , and the environment has no information about —that is, —then the initial assumption still holds if we add information about to the environment:

Moreover, the latter derivation has the same depth as the former.

Permutation lemma

Permutation tells us that the order of assumptions in does not matter.

If and is a permutation of , then .

Moreover, the latter derivation has the same depth as the former.

Substitution lemma

Substitution tells us that types are preserved under substitution.

That is, if and , then .

The proof goes by induction on the derivation of , that is, by cases on the final typing rule used in the derivation.

  • Case : in this case, .

    Thanks to typechecking, we know that the environment validates and . In this case, the resulting type of the application is .

    By the induction hypothesis, , and .

    By , the environment then also verifies the application of these two substitutions as : . We can factorize the substitution to obtain the conclusion, i.e.

  • Case : if ( is a simple variable ) where . There are two subcases to consider here, depending on whether is or another variable:
    • If , then . The result is then , which is among the assumptions of the lemma
    • If , then , and the desired result is immediate
  • Case : if , with , and .

    Based on our hygiene convention, we may assume and .

    Using permutation on the first given subderivation in the lemma (), we obtain