Types and Programming Languages: Chapter 4

I'm working through Types and Programming Languages, by Benjamin Pierce.



I'm up to somewhere around chapter 13, References, but it's starting not to make sense. Which means it's time to back up and do more of the work, instead of just nodding as though I really understand it.

One of the things he does is build typecheckers for the languages he describes, in the language ocaml, or Objective Caml, a version of the language ML. For a variety of reasons, I'm trying to implement in Haskell. Mostly to give me something concrete to work on that isn't too large in scope to learn the language a little bit. aSomething more than 'Hello, World!', strip_spaces, or traverse a linked list, but less than a 'real application'.

For those not familiar, a typechecker is somewhere between a compiler and a grqammar driven parser. A type checker inspects what you give it for type correctness and consitency. It makes sure that you don't assign a Foo to a Bar, unless there's a rule in the typesystem that says you can. It may, in the process of typechecking, do some steps that a compiler also does, like reduce expressions, but it does this in the interest of determining what the type, not the value, is. Of course, if I were writing a compiler, it would make sense not to throw away that information, and do a bit of both at once.

That does lead me to a bit of a sub-rant. The first step in the process I'm working on is parsing the textual expression. Which means using a parsing library. (Pierce does, so it isn't cheating) Haskell has two; happy, a yacc analogue, and parsec, a 'monadic parser combinator' library. Since the point of doing this in Haskell is to get a better idea what phrases like 'monadic parser combinator' libraries mean, I was a bit biased towards Parsec. I already know and loathe yacc.

So I start in on the documentation. At least it has some. That's a nice benfit of the fact that Haskell grew up in the acadamic community. They need to publish or perish, and the publication serves as documentation. Somewhat, sort of. Although Parsec doesn't really suffer in that respect. The docs are pretty clear. They just suffer from the same problem that all parser lib docs do. They want to show that you can implement an entire compiler inside the parser.

And that's usually a bad idea.

The docs show you how you can attach interesting semantic actions to events in the parser, like evaluating the expression that's currently being parsed. However, in practice, that's hardly ever what you want to do. You want the parse to return something that abstracts the textual expression into a data structure, usually some kind of abstract syntax tree, where the nodes in the tree reflect elements in the grammar. Then you can write other stuff that accepts the AST and processes it in interesting ways, like compiling or typechecking it.

That's certainly what Pierce's code does in ML. And I'm trying to avoid being too inventive.

In any case, it turned out to be pretty trivial to return an AST from a Parsec parser, and in fact, all the examples of real parsers that come with Parsec take that approach. Which gave me some comfort about being on the right track.

Now, the arith language that we're starting with is pretty primitive. It has booleans and non-negative integers, aka natural numbers . And the latter are all of the form successor 0' or 'successor successor 0', meaning 1 and 2, respectively. Not a real language, but a place to start. Complicated enough that a few theorems could be proved non-trivially, but not so complicated you couldn't easily work everything by hand.

The language's syntax can be described with the following grammar


t ::=
true
false
if t then t else t
0
succ t
pred t
isZero t



This tranlates to the Haskell datatype ArithExpr:

data ArithExpr
= TmTrue
| TmFalse
| TmIfExpr ArithExpr ArithExpr ArithExpr
| TmZero
| TmSucc ArithExpr
| TmPred ArithExpr
| TmIsZero ArithExpr
deriving (Show, Eq)


Note that's almost a mechanical transliteration, and that's exactly what I'm aiming for. I'm a firm believer in the rule that there are two kinds of source code, that which obviously has no defects, and that which has no obvious defects.

So now we need a parser that will return those terms. In Parsec, that looks like this:

arithExpr :: Parser ArithExpr
arithExpr =
trueExpression
falseExpression
ifExpression
zeroExpression
succExpression
predExpression
isZeroExpression
parens arithExpr
"expression"


So the arithExpr is a parser of ArithExpr, and the parser returns either a trueExpression, falseExpression, ifExpression, etc.

Those look like:

trueExpression =
do{ reserved "true"
; return TmTrue
}

falseExpression =
do{ reserved "false"
; return TmFalse
}

ifExpression :: Parser ArithExpr
ifExpression =
do{ reserved "if"
; condition <- arithExpr
; reserved "then"
; thenClause <- arithExpr
; reserved "else"
; elseClause <- arithExpr
; return (TmIfExpr condition thenClause elseClause)
}


Hopefully, even if the Haskell syntax is unusual and unfamiliar, the intent is pretty clear. true, false, if, then, else are parsed as reserved words, and true and false just return a TmTrue and TmFalse respectively. The ifExpression is a bit more interesting. It parses if, then it looks for an arithExpression, a then followed by another arithExpr, and finally an else followed by a third arithExpr, and returns the three of them wrapped up in a TmIfExpr.

So with those out of the way, we can look at how to evaluate the expression returned by the parser, testing them against a set of evaluation rules. For this small language, there are only a few, in two sets.

For Booleans we have
terms

t ::=
true
false
if t then t else t


values

v ::=
true
false

evaluation rules

if true then t2 else t3 -> t2

if false then t2 else t3 -> t3

t1 -> t1'
----------
if t1 then t2 else t3 -> if t1' then t2 else t3


Then arithmetic is added (the terms from above are elided)

Arithmetic Expressions

new terms


t ::= ...
0
succ t
pred t
iszero t

new values


v ::= ...
nv

nv ::=
0
succ nv


new evaluation rules

t1 -> t1'
---------
succ t1 -> succ t1'

pred 0 -> 0

pred (succ nv1) -> nv1

t1 -> t1'
--------
pred t1 -> pred t1'

iszero 0 -> true

iszero (succ nv1) -> false

t1 -> t1'
--------
iszero t1 -> iszero t1'



So now we want to transliterate those evaluation rules into a single step evaluator. Now, Pierce writes his in such a way that it throws an exception when no rule matches. I haven't figured out exceptions in Haskell, and in any case he does note in a footnote that they really aren't considered good style in ML in the way that he uses them, to terminate a recursive functions.

Instead, I'm choosing to represent the eval function as possibly returning a value, and in Haskell, that's returning a Maybe. So the signature of my eval1 is

eval1 :: ArithExpr -> Maybe ArithExpr


and the evaluation rules can be written like so

eval1 (TmIfExpr TmTrue t _) = Just t

eval1 (TmIfExpr TmFalse _ t) = Just t

eval1 (TmIfExpr t1 t2 t3) =
let t1' = eval1 t1
in case t1' of
{ Just t1'' -> Just $ TmIfExpr t1'' t2 t3
; Nothing -> Nothing
}



That is, if we're eval'ing an if expression, and the first clause is true or false, we can reduce it to either the then clause or the else clause immediately. On the other hand, if it doesn't match those, we can instead evaluate the first term, and return an if expression with the first term evaluated. If the first term doesn't evaluate, then we return nothing. I'm taking advantage of pattern matching in Haskell to select the right function based on the details of the argument supplied. They all take a single argument, but I'm asking to distinguish what constructor was used to create that argument. The '_' character means that I don't are what the type or value of that part of the argument is, match anything.

This is the way I wrote it at first, at least. I ran this by the haskell-cafe mailling list, and recieved some suggestions that I wasn't taking good advantage of Maybe being a monad, and that it might make more sense to do the sub-eval in a do block. In particular the advice in Nomaware's All About Monads tutorial is exactly on point. Those Nothings and Justs don't need to be there.

I'll be reworking the code to adopt those suggestions before going much further.

To do the full evaluation, we run the eval1 until we can't anymore. That's

eval t =
let t' = eval1 t
in case t' of
{ Just t'' -> eval t''
; Nothing -> t
}


Here's all the code so far:
ArithParser.hs

module ArithParser ( parseArith
, parseArithFromFile
, arithExpr
, ParseError
) where

import Char
import Monad
import Arith

-- Parsec
import Text.ParserCombinators.Parsec
import Text.ParserCombinators.Parsec.Expr
import qualified Text.ParserCombinators.Parsec.Token as P
import Text.ParserCombinators.Parsec.Language (haskellStyle)


parseArithFromFile :: String -> IO (Either ParseError ArithExpr)
parseArithFromFile fname =
parseFromFile arithExpr fname

parseArith sourceName source =
parse arithExpr sourceName source

arithExpr :: Parser ArithExpr
arithExpr =
trueExpression
falseExpression
<|> ifExpression
<|> zeroExpression
<|> succExpression
<|> predExpression
<|> isZeroExpression
<|> parens arithExpr
> "expression"


trueExpression =
do{ reserved "true"
; return TmTrue
}

falseExpression =
do{ reserved "false"
; return TmFalse
}

zeroExpression :: Parser ArithExpr
zeroExpression =
do{ reserved "0"
; return TmZero
}

ifExpression :: Parser ArithExpr
ifExpression =
do{ reserved "if"
; condition <- arithExpr
; reserved "then"
; thenClause <- arithExpr
; reserved "else"
; elseClause <- arithExpr
; return (TmIfExpr condition thenClause elseClause)
}

succExpression =
do{ reserved "succ"
; expr <- arithExpr
; return (TmSucc expr)
}

predExpression =
do{ reserved "pred"
; expr <- arithExpr
; return (TmPred expr)
}


isZeroExpression =
do{ reserved "iszero"
; expr <- arithExpr
; return (TmIsZero expr)
}



-----------------------------------------------------------
-- Tokens
-- Use qualified import to have token parsers on toplevel
-----------------------------------------------------------
tokenParser = P.makeTokenParser haskellStyle

parens = P.parens tokenParser
braces = P.braces tokenParser
semiSep1 = P.semiSep1 tokenParser
whiteSpace = P.whiteSpace tokenParser
symbol = P.symbol tokenParser
identifier = P.identifier tokenParser
reserved = P.reserved tokenParser
natural = P.natural tokenParser
charLiteral = P.charLiteral tokenParser
stringLiteral = P.stringLiteral tokenParser

Arith.hs

module Arith where



data ArithExpr
= TmTrue
| TmFalse
| TmIfExpr ArithExpr ArithExpr ArithExpr
| TmZero
| TmSucc ArithExpr
| TmPred ArithExpr
| TmIsZero ArithExpr
deriving (Show, Eq)

isNumericalVal :: ArithExpr -> Bool
isNumericalVal TmZero = True
isNumericalVal (TmSucc t) = isNumericalVal t
isNumericalVal (TmPred t) = isNumericalVal t
isNumericalVal _ = False



isVal :: ArithExpr -> Bool
isVal TmTrue = True
isVal TmFalse = True
isVal t
| isNumericalVal t = True
| not (isNumericalVal t) = False
isVal _ = False



eval1 :: ArithExpr -> Maybe ArithExpr

eval1 (TmIfExpr TmTrue t _) = Just t

eval1 (TmIfExpr TmFalse _ t) = Just t

eval1 (TmIfExpr t1 t2 t3) =
let t1' = eval1 t1
in case t1' of
{ Just t1'' -> Just $ TmIfExpr t1'' t2 t3
; Nothing -> Nothing --Just $ TmIfExpr t1 t2 t3
}

eval1 (TmSucc t) =
let t' = eval1 t
in case t' of
{ Just t'' -> Just $ TmSucc t''
; Nothing -> Nothing --Just $ TmSucc t
}

eval1 (TmPred TmZero) = Just TmZero

eval1 (TmPred (TmSucc t))
| isNumericalVal t = Just t

eval1 (TmPred t) =
let t' = eval1 t
in case t' of
{ Just t'' -> Just $ TmPred t''
; Nothing -> Nothing -- Just $ TmPred t
}

eval1 (TmIsZero TmZero) = Just TmTrue

eval1 (TmIsZero (TmSucc t))
| isNumericalVal t = Just TmFalse

eval1 (TmIsZero t) =
let t' = eval1 t
in case t' of
{ Just t'' -> Just $ TmIsZero t''
; Nothing -> Nothing -- Just $ TmIsZero t
}

eval1 _ = Nothing

eval t =
let t' = eval1 t
in case t' of
{ Just t'' -> eval t'' --if (t /= t'') then eval t'' else t
; Nothing -> t
}