Types and Programming Languages: Chapter 4

I'm working through Types and Programming Languages, by Benjamin Pierce.



I'm up to somewhere around chapter 13, References, but it's starting not to make sense. Which means it's time to back up and do more of the work, instead of just nodding as though I really understand it.

One of the things he does is build typecheckers for the languages he describes, in the language ocaml, or Objective Caml, a version of the language ML. For a variety of reasons, I'm trying to implement in Haskell. Mostly to give me something concrete to work on that isn't too large in scope to learn the language a little bit. aSomething more than 'Hello, World!', strip_spaces, or traverse a linked list, but less than a 'real application'.

For those not familiar, a typechecker is somewhere between a compiler and a grqammar driven parser. A type checker inspects what you give it for type correctness and consitency. It makes sure that you don't assign a Foo to a Bar, unless there's a rule in the typesystem that says you can. It may, in the process of typechecking, do some steps that a compiler also does, like reduce expressions, but it does this in the interest of determining what the type, not the value, is. Of course, if I were writing a compiler, it would make sense not to throw away that information, and do a bit of both at once.

That does lead me to a bit of a sub-rant. The first step in the process I'm working on is parsing the textual expression. Which means using a parsing library. (Pierce does, so it isn't cheating) Haskell has two; happy, a yacc analogue, and parsec, a 'monadic parser combinator' library. Since the point of doing this in Haskell is to get a better idea what phrases like 'monadic parser combinator' libraries mean, I was a bit biased towards Parsec. I already know and loathe yacc.

So I start in on the documentation. At least it has some. That's a nice benfit of the fact that Haskell grew up in the acadamic community. They need to publish or perish, and the publication serves as documentation. Somewhat, sort of. Although Parsec doesn't really suffer in that respect. The docs are pretty clear. They just suffer from the same problem that all parser lib docs do. They want to show that you can implement an entire compiler inside the parser.

And that's usually a bad idea.

The docs show you how you can attach interesting semantic actions to events in the parser, like evaluating the expression that's currently being parsed. However, in practice, that's hardly ever what you want to do. You want the parse to return something that abstracts the textual expression into a data structure, usually some kind of abstract syntax tree, where the nodes in the tree reflect elements in the grammar. Then you can write other stuff that accepts the AST and processes it in interesting ways, like compiling or typechecking it.

That's certainly what Pierce's code does in ML. And I'm trying to avoid being too inventive.

In any case, it turned out to be pretty trivial to return an AST from a Parsec parser, and in fact, all the examples of real parsers that come with Parsec take that approach. Which gave me some comfort about being on the right track.

Now, the arith language that we're starting with is pretty primitive. It has booleans and non-negative integers, aka natural numbers . And the latter are all of the form successor 0' or 'successor successor 0', meaning 1 and 2, respectively. Not a real language, but a place to start. Complicated enough that a few theorems could be proved non-trivially, but not so complicated you couldn't easily work everything by hand.

The language's syntax can be described with the following grammar


t ::=
true
false
if t then t else t
0
succ t
pred t
isZero t



This tranlates to the Haskell datatype ArithExpr:

data ArithExpr
= TmTrue
| TmFalse
| TmIfExpr ArithExpr ArithExpr ArithExpr
| TmZero
| TmSucc ArithExpr
| TmPred ArithExpr
| TmIsZero ArithExpr
deriving (Show, Eq)


Note that's almost a mechanical transliteration, and that's exactly what I'm aiming for. I'm a firm believer in the rule that there are two kinds of source code, that which obviously has no defects, and that which has no obvious defects.

So now we need a parser that will return those terms. In Parsec, that looks like this:

arithExpr :: Parser ArithExpr
arithExpr =
trueExpression
falseExpression
ifExpression
zeroExpression
succExpression
predExpression
isZeroExpression
parens arithExpr
"expression"


So the arithExpr is a parser of ArithExpr, and the parser returns either a trueExpression, falseExpression, ifExpression, etc.

Those look like:

trueExpression =
do{ reserved "true"
; return TmTrue
}

falseExpression =
do{ reserved "false"
; return TmFalse
}

ifExpression :: Parser ArithExpr
ifExpression =
do{ reserved "if"
; condition <- arithExpr
; reserved "then"
; thenClause <- arithExpr
; reserved "else"
; elseClause <- arithExpr
; return (TmIfExpr condition thenClause elseClause)
}


Hopefully, even if the Haskell syntax is unusual and unfamiliar, the intent is pretty clear. true, false, if, then, else are parsed as reserved words, and true and false just return a TmTrue and TmFalse respectively. The ifExpression is a bit more interesting. It parses if, then it looks for an arithExpression, a then followed by another arithExpr, and finally an else followed by a third arithExpr, and returns the three of them wrapped up in a TmIfExpr.

So with those out of the way, we can look at how to evaluate the expression returned by the parser, testing them against a set of evaluation rules. For this small language, there are only a few, in two sets.

For Booleans we have
terms

t ::=
true
false
if t then t else t


values

v ::=
true
false

evaluation rules

if true then t2 else t3 -> t2

if false then t2 else t3 -> t3

t1 -> t1'
----------
if t1 then t2 else t3 -> if t1' then t2 else t3


Then arithmetic is added (the terms from above are elided)

Arithmetic Expressions

new terms


t ::= ...
0
succ t
pred t
iszero t

new values


v ::= ...
nv

nv ::=
0
succ nv


new evaluation rules

t1 -> t1'
---------
succ t1 -> succ t1'

pred 0 -> 0

pred (succ nv1) -> nv1

t1 -> t1'
--------
pred t1 -> pred t1'

iszero 0 -> true

iszero (succ nv1) -> false

t1 -> t1'
--------
iszero t1 -> iszero t1'



So now we want to transliterate those evaluation rules into a single step evaluator. Now, Pierce writes his in such a way that it throws an exception when no rule matches. I haven't figured out exceptions in Haskell, and in any case he does note in a footnote that they really aren't considered good style in ML in the way that he uses them, to terminate a recursive functions.

Instead, I'm choosing to represent the eval function as possibly returning a value, and in Haskell, that's returning a Maybe. So the signature of my eval1 is

eval1 :: ArithExpr -> Maybe ArithExpr


and the evaluation rules can be written like so

eval1 (TmIfExpr TmTrue t _) = Just t

eval1 (TmIfExpr TmFalse _ t) = Just t

eval1 (TmIfExpr t1 t2 t3) =
let t1' = eval1 t1
in case t1' of
{ Just t1'' -> Just $ TmIfExpr t1'' t2 t3
; Nothing -> Nothing
}



That is, if we're eval'ing an if expression, and the first clause is true or false, we can reduce it to either the then clause or the else clause immediately. On the other hand, if it doesn't match those, we can instead evaluate the first term, and return an if expression with the first term evaluated. If the first term doesn't evaluate, then we return nothing. I'm taking advantage of pattern matching in Haskell to select the right function based on the details of the argument supplied. They all take a single argument, but I'm asking to distinguish what constructor was used to create that argument. The '_' character means that I don't are what the type or value of that part of the argument is, match anything.

This is the way I wrote it at first, at least. I ran this by the haskell-cafe mailling list, and recieved some suggestions that I wasn't taking good advantage of Maybe being a monad, and that it might make more sense to do the sub-eval in a do block. In particular the advice in Nomaware's All About Monads tutorial is exactly on point. Those Nothings and Justs don't need to be there.

I'll be reworking the code to adopt those suggestions before going much further.

To do the full evaluation, we run the eval1 until we can't anymore. That's

eval t =
let t' = eval1 t
in case t' of
{ Just t'' -> eval t''
; Nothing -> t
}


Here's all the code so far:
ArithParser.hs

module ArithParser ( parseArith
, parseArithFromFile
, arithExpr
, ParseError
) where

import Char
import Monad
import Arith

-- Parsec
import Text.ParserCombinators.Parsec
import Text.ParserCombinators.Parsec.Expr
import qualified Text.ParserCombinators.Parsec.Token as P
import Text.ParserCombinators.Parsec.Language (haskellStyle)


parseArithFromFile :: String -> IO (Either ParseError ArithExpr)
parseArithFromFile fname =
parseFromFile arithExpr fname

parseArith sourceName source =
parse arithExpr sourceName source

arithExpr :: Parser ArithExpr
arithExpr =
trueExpression
falseExpression
<|> ifExpression
<|> zeroExpression
<|> succExpression
<|> predExpression
<|> isZeroExpression
<|> parens arithExpr
> "expression"


trueExpression =
do{ reserved "true"
; return TmTrue
}

falseExpression =
do{ reserved "false"
; return TmFalse
}

zeroExpression :: Parser ArithExpr
zeroExpression =
do{ reserved "0"
; return TmZero
}

ifExpression :: Parser ArithExpr
ifExpression =
do{ reserved "if"
; condition <- arithExpr
; reserved "then"
; thenClause <- arithExpr
; reserved "else"
; elseClause <- arithExpr
; return (TmIfExpr condition thenClause elseClause)
}

succExpression =
do{ reserved "succ"
; expr <- arithExpr
; return (TmSucc expr)
}

predExpression =
do{ reserved "pred"
; expr <- arithExpr
; return (TmPred expr)
}


isZeroExpression =
do{ reserved "iszero"
; expr <- arithExpr
; return (TmIsZero expr)
}



-----------------------------------------------------------
-- Tokens
-- Use qualified import to have token parsers on toplevel
-----------------------------------------------------------
tokenParser = P.makeTokenParser haskellStyle

parens = P.parens tokenParser
braces = P.braces tokenParser
semiSep1 = P.semiSep1 tokenParser
whiteSpace = P.whiteSpace tokenParser
symbol = P.symbol tokenParser
identifier = P.identifier tokenParser
reserved = P.reserved tokenParser
natural = P.natural tokenParser
charLiteral = P.charLiteral tokenParser
stringLiteral = P.stringLiteral tokenParser

Arith.hs

module Arith where



data ArithExpr
= TmTrue
| TmFalse
| TmIfExpr ArithExpr ArithExpr ArithExpr
| TmZero
| TmSucc ArithExpr
| TmPred ArithExpr
| TmIsZero ArithExpr
deriving (Show, Eq)

isNumericalVal :: ArithExpr -> Bool
isNumericalVal TmZero = True
isNumericalVal (TmSucc t) = isNumericalVal t
isNumericalVal (TmPred t) = isNumericalVal t
isNumericalVal _ = False



isVal :: ArithExpr -> Bool
isVal TmTrue = True
isVal TmFalse = True
isVal t
| isNumericalVal t = True
| not (isNumericalVal t) = False
isVal _ = False



eval1 :: ArithExpr -> Maybe ArithExpr

eval1 (TmIfExpr TmTrue t _) = Just t

eval1 (TmIfExpr TmFalse _ t) = Just t

eval1 (TmIfExpr t1 t2 t3) =
let t1' = eval1 t1
in case t1' of
{ Just t1'' -> Just $ TmIfExpr t1'' t2 t3
; Nothing -> Nothing --Just $ TmIfExpr t1 t2 t3
}

eval1 (TmSucc t) =
let t' = eval1 t
in case t' of
{ Just t'' -> Just $ TmSucc t''
; Nothing -> Nothing --Just $ TmSucc t
}

eval1 (TmPred TmZero) = Just TmZero

eval1 (TmPred (TmSucc t))
| isNumericalVal t = Just t

eval1 (TmPred t) =
let t' = eval1 t
in case t' of
{ Just t'' -> Just $ TmPred t''
; Nothing -> Nothing -- Just $ TmPred t
}

eval1 (TmIsZero TmZero) = Just TmTrue

eval1 (TmIsZero (TmSucc t))
| isNumericalVal t = Just TmFalse

eval1 (TmIsZero t) =
let t' = eval1 t
in case t' of
{ Just t'' -> Just $ TmIsZero t''
; Nothing -> Nothing -- Just $ TmIsZero t
}

eval1 _ = Nothing

eval t =
let t' = eval1 t
in case t' of
{ Just t'' -> eval t'' --if (t /= t'') then eval t'' else t
; Nothing -> t
}



Solaris network install using Linux DHCP server

This weekend's tech project was getting an old Sun Ultra 5 up and running with a new version of Solaris, in this case Solaris Nevada b33, so I can play with toys like opensolaris, dtrace, zfs,etc.

This particular machine doesn't have a cdrom, so in order to get things working I had to do a network install. Or I could have installed a cdrom, since it's an IDE based machine, but that wouldn't have been nearly as much fun.

I hosted the install on a Netra T1, so most of the installation instructions were just cut-and-paste from the Sun documentation. Solaris 10 Installation Guide: Network-Based Installations

(Note: The T1 will eventually be providing network services, and live in the basement. It's a little loud sitting on my desk. That's why it's not going to be the Sparc play machine.)

The complicated part was the DHCP server. I already have one on my network, on a Debian Linux box, and didn't want to try having two of them. That would probably be bad.

In order to supply all of the information for the install, I needed to add a new name space to the dhcp.conf, a class and subclass for the particular hardware type, and some information specific to the machine.

Here's the relevant pieces from dhcpd.conf:
First the SUNW option namespace, used by the Sun net installation:


option space SUNW;
option SUNW.SrootOpt code 1 = text;
option SUNW.SrootIP4 code 2 = ip-address;
option SUNW.SrootNM code 3 = text;
option SUNW.SrootPTH code 4 = text;
option SUNW.SswapIP4 code 5 = ip-address;
option SUNW.SswapPTH code 6 = text;
option SUNW.SbootFIL code 7 = text;
option SUNW.Stz code 8 = text;
option SUNW.SbootRS code 9 = integer 16;
option SUNW.SinstIP4 code 10 = ip-address;
option SUNW.SinstNM code 11 = text;
option SUNW.SinstPTH code 12 = text;
option SUNW.SsysidCF code 13 = text;
option SUNW.SjumpsCF code 14 = text;
option SUNW.Sterm code 15 = text;
option SUNW.SbootURI code 16 = text;
option SUNW.SHHTPProxy code 17 = text;


Then the class and subclass based on the vendor-class-identifier, which is sent out by the Ultra 5 when it's trying to DHCP an address.

class "vendor-classes" {
match option vendor-class-identifier;
}

subclass "vendor-classes" "SUNW.Ultra-5_10" {
vendor-option-space SUNW;
option SUNW.SbootURI = "tftp://10.10.1.131/inetboot.SUN4U.Solaris_11-1 ";
option SUNW.SinstIP4 10.10.1.131;
option SUNW.SinstNM = "heimdall";
option SUNW.SinstPTH = "/export/solaris11/install";
option SUNW.SrootIP4 10.10.1.131;
option SUNW.SrootNM = "heimdall";
option SUNW.SrootPTH = "/export/solaris11/install/Solaris_11/Tools/Boot" ;
}

Then, the particular information for the machine I'm trying to boot and install from the net:

host chimera {
hardware ethernet 08:00:20:a2:22:66;
option domain-name "sdowney.org";
option host-name "chimera";
next-server 10.10.1.131;
fixed-address 10.10.1.132;
}

The machine I'm installing onto is chimera, which has the MAC address 08:00:20:a2:22:66. It will get the address 10.10.1.132. The install and boot server are both heimdall, which had IP addresses 10.10.1.131 respectively. The 'next-server' tells chimera to netboot from heimdall. I'm calling that out in particular because I wasted about an hour figuring out that I needed that.

Once all that was done, it was a 'simple' matter of running boot net:dhcp - install from the openboot OK prompt.

The machine isn't exactly a screamer by today's standards, it has a 333Mhz UltraSparc IIi chip in it, but it does have 512Mb of RAM, which covers a multitude of sins. I think I may start over with a larger HD, since the 7G drive that's in there now doesn't leave much room for experimentation. I'll probably go ahead and put a DVDRW drive in there too, even though I don't need it now.

Brew Day!

Brewer: Steve Downey
-
Beer: February Ale Style: American Amber Ale
Type: All grain Size: 5.5 gallons
Color:
13 HCU (~9 SRM)
Bitterness: 38 IBU
OG: 1.052 FG: (Est)
1.012
Alcohol: 5.2% v/v (4.1% w/w) (Estimated)
Grain: 2 lb. Weyermann Dark Wheat
10 lb. Weyermann Vienna
1 lb. Weyermann Cara Amber
Mash: 60% efficiency
Boil: 60 minutes SG 1.044 6.5 gallons
Hops: 1 oz. Cascade (5.3% AA, 60 min.)
1 oz. Cascade (5.3% AA, 30 min.)
1 oz. Cascade (5.3% AA, 5 min.)

Recipe formatting and calculations by The Beer Recipator.


Mashed with a single decoction. Doughed in with three gallons of water to 135 degrees F. Let rest for 15 minutes, then pulled a third of the thick mash and boiled for 15 minutes. Added back, and added hot liquor to reach 158 degrees F, and let rest for about an hour (during a trip to Target with the kids...)

Sparged with another 4 gallons of liquor and collected 6.5 gallons of sweet wort. Estimated efficiency, 60%. Low. Should probably investigate.

Added the first hops in while collecting the runoff.

Boiled for 60 minutes, with two hop additions at 30 minutes and at 5. Also added 1 tsp of irish moss at 15 minutes.

Collected 5.5 gallons of 1.052 wort, and pitched 20 grams of Safeale 33 yeast.

Total time, around 6 1/2 hours. Although I still have some clean up.

Bill de hÓra: I think I figured out the list comprehensions thing...

Bill de hÓra: I think I figured out the list comprehensions thing...

I've been trying to understand this stuff myself, and Bill de hÓra's post has prodded me to write this down so I won't forget it again.

List comprehensions are really just syntatic sugar. And too much syntatic sugar can cause truth decay.

List comprehensions are forms in functional and related languages that allow you to generate an entire list with a description of the list. So, for example, I can get a list of the first 10 squares by:
[ x * x | x [1,4,9,16,25,36,49,64,81,100]

That gives me a list of the squares of x, where x is an element of the list of integers from 1 to 10. I can also add some more qualifiers, like

[ x * x | x 3]

[16,25,36,49,64,81,100]


[i * j | i 5]
[5,8,10,9,12,15,8,12,16,20,5,10,15,20,25]

This is all pretty neat, but what does it really mean?

You 'normally' think of drawing a variable from each of the generating lists, with the right most one varying most quickly, and skipping if the variables fail to meet the condition. This provides a natural analogue to the looping constructs in most programming languages.
for (int i = 1; i 
for (int j = 1; j
if ((i + j) > 5) {
list.push(i*j);
}
}
}

That close, natural, analogue can be a code smell in functional programming. It may be an indication that you're thinking of the problem in terms of loops, updating variables, etc.

The intent of list comprehension is to simulate loops and conditions, but it can be defined in terms of map and list concat. (As far as I can tell, this is due to P. Wadler, in Simon Peyton-Jones's The Implementation of Functional Programming Languages )


[t | x map (\x -> t) u
[t | p, q] ==> concat [ [t | q] | p ]
[t | b ] ==> if (b) then [t] else []

Note that the two qualifications p and q are reversed when translated.

concat will take the results of the simplified comprehension and concatenate all of the resulting lists together, eliminating empty lists.

Lets take the first, simple, example:
[ x * x | x that translates to:
map (\x -> x * x) [1..10]
The next example
[ x * x | x 3]
takes a few more steps

concat [ [x * x | x> 3] | x
concat ( map (\x -> [x * x | x>3]) [1..10] )

concat (map (\x -> (if (x>3) then [x*x] else [])) [1..10])

In this case, the list comprehension is more concise. On the other hand, the expanded version is a bit more suggestive. Particularly if I'm willing for it not to be a lambda expression.



OktAle / Novemberfest

OktAle / Novemberfest

I finished up the keg this week. That was pretty fast, for me. I usually end up with a few stray bottles of a brew hanging on forever. But, since the keg is really all or nothing, I just finished it up.

The only down side is that I don't have another brewing at the moment. I'll have to get to work on that this weekend. Probably a basic Pale Ale, consisting of whatever they have on hand at my local homebrew store, Karp's Homebrew Shop.