Reasoning About Programs

Haskell can use normal software testing methods to verify correctness, but because haskell is a pure language, we can do better and formally prove properties of our functions and types.

Natural Numbers

Natural numbers can be defined as data Nat = Z | S Nat in haskell. Alternatively, using mathematical notation, this can be written $0 \in N \forall n \in N, n + 1 \in N$ . Addition can then be defined recursively:

add :: Nat -> Nat -> Nat
add Z     m = m
add (S n) m = S (add n m)

Addition has certain properties which must hold true:

Left identity: ∀m :: Nat, add Z m == m
- $0 + m = m$
Right identity: ∀m :: Nat, add m Z == m
- $m + 0 = m$
Associativity: ∀x y z :: Nat, add x (add y z) == add (add x y) z
- $x + (y + z) = (x + y) + z$

These can be proven using equational reasoning, which proves that an equality holds in all cases. Generally, either a property can be proved by applying and un-applying either side of an equation, and/or by induction.

To prove the left identity is easy, as it is an exact match of one of our equations for add:

add Z m
-- applying add
= m

The right identity is a little harder, as we can't just directly apply one of our equations. We can instead induct on m. First, the base case:

add Z Z
-- applying add
= Z

Using the induction hypothesis add m Z = m, we need to show the inductive step holds for S m (m+1):

add (S m) Z
-- applying add
= S (add m Z)
-- applying induction hypothesis
= S m

This proves the right identity. To prove associativity we will again use induction, this time on x. The base case is add Z (add y z):

add Z (add y z)
-- applying add
= add y z
-- un-applying add
= add (add Z y) z

The proof holds for x = Z. Here, the proof was approached from either end to meet in the middle, but written as a single list of operations for clarity. Sometimes it is easier to do this and work from either direction, especially when un-applying functions as it is more natural.

The induction hypothesis is add x (add y z) == add (add x y) z, and can be assumed. We need to prove the inductive step add (S x) (add y z) == add (add (S x) y) z:

add (S x) (add y z)
-- applying add
= S (add x (add y z))
-- applying induction hypothesis
= S (add (add x y ) z)
-- un-applying add
= add (S (add x y)) z
-- un-applying add
= add (add (S x) y) z

This proves associativity.

Induction on Lists

We can induct on any recursive type, including lists: data List a = Empty | Cons a (List a). Using this definition, we can prove map fusion. Map fusion states that we can turn multiple consecutive map operations into a single one with composed functions:

map f (map g xs) = map (f.g) xs
- ∀f :: b -> c
- ∀g :: a -> b
- ∀xs :: [a]

The definitions of map and . may be useful:

map :: (a -> b) -> [a] -> [b]
map f []     = []
map f (x:xs) = f x : map f xs

(.) :: (b -> c) -> (a -> b) -> a -> c
(.) f g x = f (g x)

Map fusion can be proved by induction on xs. The base case is map f (map g []) = map (f.g) []:

map f (map g [])
-- applying map
= map f []
-- applying map
= []
-- un-applying map
= map (f.g) []

Using the induction hypothesis map f (map g xs) = map (f.g) xs, we can prove the inductive case map f (map g (x : xs)) = map (f.g) (x : xs):

map f (map g (x : xs))
-- applying map
= map f (g x : map g xs)
-- applying map
= f (g x) : map f (map g xs)
-- induction hypothesis
= f (g x) : map (f.g) xs
-- un-applying (.)
= (f.g) x : map (f.g) xs
-- un-applying map
= map (f.g) (x : xs)

Proving a Compiler

Given a simple expression language:

data Expr = Val Int | Plus Expr Expr

And a simple instruction set:

data Instr = Push Int | Add
type Program = [Instr]
type Stack = [Int]

We can write an exec function as an interpreter for our instruction set:

exec :: Program -> Stack -> Stack
exec []                    s  = s
exec (Push n : p)          s  = exec p (n : s)
exec (Add    : p) (y : x : s) = exec p (x + y : s)

An eval function to evaluate our expressions:

eval :: Expr -> Int
eval (Val n)    = n
eval (Plus l r) = eval l + eval r

And a comp function as a compiler for our Expr language to our Instr instruction set:

comp :: Expr -> Program
comp (Val n) = [PUSH n]
comp (Plus l r) = comp l ++ comp r ++ [ADD]

Our compiler will be considered correct if for any expression, evaluating it yields the same result as compiling and then executing it:

∀ e :: Expr, s :: Stack . eval e : s == exec (comp e) s

This can be proved by induction on e. The base case for Expr is for Vals, and we want to show that eval (Val n) s == exec (comp (Val n)) s. This time, we start with the RHS:

exec (comp (Val n)) s
-- applying comp
= exec [Push n] s
-- applying exec
= exec [] (n : s)
-- applying exec
= (n : s)
-- unappplying eval
= eval (Val n) s

Our inductive case to be proved is eval (Plus l r) s == exec (comp (Plus l r)) s. Since the Plus constructor has two values of type Expr, there are two induction hypotheses:

for l: eval l : s == exec (comp l) s
for r: eval r : s == exec (comp r) s

exec (comp (Plus l r)) s
-- applying comp
= exec (comp l ++ comp r ++ [Add]) s
-- distributivity of (++)
= exec (comp l ++ (comp r ++ [Add])) s
-- distributivity lemma
= exec (comp r ++ [Add]) (exec (comp l) s)
-- distributivity lemma
= exec [Add] (exec (comp r) (exec (comp l) s))
-- induction hypothesis
= exec [Add] (exec (comp r) (eval l : s))
-- induction hypothesis
= exec [Add] (eval r : (eval l : s))
-- applying exec
= exec [] ((eval l + eval r) : s)
-- applying exec
= (eval l + eval r) : s
-- un-applying exec
= eval (Plus l r) s

The proof holds, but relies on a lemma proving the distributivity of the exec function, which states that executing a program where a list of instructions xs is followed by a list of instructions ys is the same as first executing xs and then executing ys with the stack that results from executing xs: ∀ xs ys::Program, s::Stack . exec (xs++ys) s == exec ys (exec xs s).

This can be proved by induction on xs. The base case is the empty list []: exec ([] ++ ys) s == exec ys (exec [] s):

exec ys (exec [] s)
-- applying exec
= exec ys s
-- un-applying (++)
= exec ([] ++ ys) s

The induction hypothesis is exec (xs++ys) s == exec ys (exec xs s). The inductive step is exec ((x : xs) ++ ys) s == exec ys (exec (x : xs) s). As x could be either Push x or Add, we perform case analysis on x, first with the case where x = Push n:

exec ys (exec (Push n : xs) s)
-- applying exec
= exec ys (exec xs (n : ns))
-- induction hypothesis
= exec (xs ++ ys) (n : s)
-- un-applying exec
= exec (Push n : (xs ++ ys)) s
-- un-applying (++)
= exec ((Push n : xs) ++ ys) s

The inductive step holds for the Push n case. The Add case:

exec ys (exec (Add : xs) s)
-- assuming stack has at least 2 elements
exec ys (exec (Add : xs) (b : a : s'))
-- applying exec
exec ys (exec xs (a + b : s'))
-- induction hypothesis
exec (xs ++ ys) (a + b : s')
-- un-applying exec
exec (Add : (xs ++ ys)) (b : a : s')
-- un-applying (++)
exec ((Add : xs) ++ ys) (b : a : s')
-- assumption
exec ((Add : xs) ++ ys) s

This proves the inductive case for the Add instruction, and therefore the proof for the distributivity of exec lemma, which supported our initial proof of the correctness of our compiler.

Computer Systems Engineering Notes

Reasoning About Programs

Natural Numbers

Induction on Lists

Proving a Compiler