Reasoning About Programs
Haskell can use normal software testing methods to verify correctness, but because haskell is a pure language, we can do better and formally prove properties of our functions and types.
Natural Numbers
Natural numbers can be defined as data Nat = Z | S Nat
in haskell. Alternatively, using mathematical notation, this can be written . Addition can then be defined recursively:
add :: Nat -> Nat -> Nat
add Z m = m
add (S n) m = S (add n m)
Addition has certain properties which must hold true:
- Left identity:
∀m :: Nat, add Z m == m
- Right identity:
∀m :: Nat, add m Z == m
- Associativity:
∀x y z :: Nat, add x (add y z) == add (add x y) z
These can be proven using equational reasoning, which proves that an equality holds in all cases. Generally, either a property can be proved by applying and un-applying either side of an equation, and/or by induction.
To prove the left identity is easy, as it is an exact match of one of our equations for add
:
add Z m
-- applying add
= m
The right identity is a little harder, as we can't just directly apply one of our equations. We can instead induct on m
. First, the base case:
add Z Z
-- applying add
= Z
Using the induction hypothesis add m Z = m
, we need to show the inductive step holds for S m
(m+1
):
add (S m) Z
-- applying add
= S (add m Z)
-- applying induction hypothesis
= S m
This proves the right identity. To prove associativity we will again use induction, this time on x
. The base case is add Z (add y z)
:
add Z (add y z)
-- applying add
= add y z
-- un-applying add
= add (add Z y) z
The proof holds for x = Z
. Here, the proof was approached from either end to meet in the middle, but written as a single list of operations for clarity. Sometimes it is easier to do this and work from either direction, especially when un-applying functions as it is more natural.
The induction hypothesis is add x (add y z) == add (add x y) z
, and can be assumed. We need to prove the inductive step add (S x) (add y z) == add (add (S x) y) z
:
add (S x) (add y z)
-- applying add
= S (add x (add y z))
-- applying induction hypothesis
= S (add (add x y ) z)
-- un-applying add
= add (S (add x y)) z
-- un-applying add
= add (add (S x) y) z
This proves associativity.
Induction on Lists
We can induct on any recursive type, including lists: data List a = Empty | Cons a (List a)
. Using this definition, we can prove map fusion. Map fusion states that we can turn multiple consecutive map operations into a single one with composed functions:
map f (map g xs) = map (f.g) xs
∀f :: b -> c
∀g :: a -> b
∀xs :: [a]
The definitions of map
and .
may be useful:
map :: (a -> b) -> [a] -> [b]
map f [] = []
map f (x:xs) = f x : map f xs
(.) :: (b -> c) -> (a -> b) -> a -> c
(.) f g x = f (g x)
Map fusion can be proved by induction on xs. The base case is map f (map g []) = map (f.g) []
:
map f (map g [])
-- applying map
= map f []
-- applying map
= []
-- un-applying map
= map (f.g) []
Using the induction hypothesis map f (map g xs) = map (f.g) xs
, we can prove the inductive case map f (map g (x : xs)) = map (f.g) (x : xs)
:
map f (map g (x : xs))
-- applying map
= map f (g x : map g xs)
-- applying map
= f (g x) : map f (map g xs)
-- induction hypothesis
= f (g x) : map (f.g) xs
-- un-applying (.)
= (f.g) x : map (f.g) xs
-- un-applying map
= map (f.g) (x : xs)
Proving a Compiler
Given a simple expression language:
data Expr = Val Int | Plus Expr Expr
And a simple instruction set:
data Instr = Push Int | Add
type Program = [Instr]
type Stack = [Int]
We can write an exec
function as an interpreter for our instruction set:
exec :: Program -> Stack -> Stack
exec [] s = s
exec (Push n : p) s = exec p (n : s)
exec (Add : p) (y : x : s) = exec p (x + y : s)
An eval
function to evaluate our expressions:
eval :: Expr -> Int
eval (Val n) = n
eval (Plus l r) = eval l + eval r
And a comp
function as a compiler for our Expr
language to our Instr
instruction set:
comp :: Expr -> Program
comp (Val n) = [PUSH n]
comp (Plus l r) = comp l ++ comp r ++ [ADD]
Our compiler will be considered correct if for any expression, evaluating it yields the same result as compiling and then executing it:
∀ e :: Expr, s :: Stack . eval e : s == exec (comp e) s
This can be proved by induction on e
. The base case for Expr
is for Val
s, and we want to show that eval (Val n) s == exec (comp (Val n)) s
. This time, we start with the RHS:
exec (comp (Val n)) s
-- applying comp
= exec [Push n] s
-- applying exec
= exec [] (n : s)
-- applying exec
= (n : s)
-- unappplying eval
= eval (Val n) s
Our inductive case to be proved is eval (Plus l r) s == exec (comp (Plus l r)) s
. Since the Plus
constructor has two values of type Expr
, there are two induction hypotheses:
- for
l
:eval l : s == exec (comp l) s
- for
r
:eval r : s == exec (comp r) s
exec (comp (Plus l r)) s
-- applying comp
= exec (comp l ++ comp r ++ [Add]) s
-- distributivity of (++)
= exec (comp l ++ (comp r ++ [Add])) s
-- distributivity lemma
= exec (comp r ++ [Add]) (exec (comp l) s)
-- distributivity lemma
= exec [Add] (exec (comp r) (exec (comp l) s))
-- induction hypothesis
= exec [Add] (exec (comp r) (eval l : s))
-- induction hypothesis
= exec [Add] (eval r : (eval l : s))
-- applying exec
= exec [] ((eval l + eval r) : s)
-- applying exec
= (eval l + eval r) : s
-- un-applying exec
= eval (Plus l r) s
The proof holds, but relies on a lemma proving the distributivity of the exec function, which states that executing a program where a list of instructions xs
is followed by a list of instructions ys
is the same as first executing xs
and then executing ys
with the stack that results from executing xs
: ∀ xs ys::Program, s::Stack . exec (xs++ys) s == exec ys (exec xs s)
.
This can be proved by induction on xs
. The base case is the empty list []
: exec ([] ++ ys) s == exec ys (exec [] s)
:
exec ys (exec [] s)
-- applying exec
= exec ys s
-- un-applying (++)
= exec ([] ++ ys) s
The induction hypothesis is exec (xs++ys) s == exec ys (exec xs s)
. The inductive step is exec ((x : xs) ++ ys) s == exec ys (exec (x : xs) s)
. As x
could be either Push x
or Add
, we perform case analysis on x
, first with the case where x = Push n
:
exec ys (exec (Push n : xs) s)
-- applying exec
= exec ys (exec xs (n : ns))
-- induction hypothesis
= exec (xs ++ ys) (n : s)
-- un-applying exec
= exec (Push n : (xs ++ ys)) s
-- un-applying (++)
= exec ((Push n : xs) ++ ys) s
The inductive step holds for the Push n
case. The Add
case:
exec ys (exec (Add : xs) s)
-- assuming stack has at least 2 elements
exec ys (exec (Add : xs) (b : a : s'))
-- applying exec
exec ys (exec xs (a + b : s'))
-- induction hypothesis
exec (xs ++ ys) (a + b : s')
-- un-applying exec
exec (Add : (xs ++ ys)) (b : a : s')
-- un-applying (++)
exec ((Add : xs) ++ ys) (b : a : s')
-- assumption
exec ((Add : xs) ++ ys) s
This proves the inductive case for the Add
instruction, and therefore the proof for the distributivity of exec
lemma, which supported our initial proof of the correctness of our compiler.