Lecture 15: A Simple Interpreter


Semantics

Semantics refers to what a program means. The other important part of a language, its syntax refers to how it is structured. By now, you should be suspecting that the way a language is structured has little to do with its meaning.

In this section of the course, we will be studying the semantics of various programming language features. We will do this by writing interpreters for those language features. Computer Scientists refer to an interpreter written to expose a language feature's meaning as an operational semantics. An operational semantics can be written in a language that is executable (such as Scheme) or in some made up notation.

You can think of what your doing as defining the operational semantics for a language if you like, or, if you'd rather, simply think of it as programming. Keep in mind, however, that the goal is to show what a language feature means not necessarily to give it an efficient implementation.


A Simple Interpreter

We start by defining an interpreter for a simple language. The language has literals, variables, and procedure applications. For now, it has no way of binding values to variables, no conditional statements, no way of defining procedures, etc. We will add these as we go.

Defining a language can be confusing if you are not careful to cleanly separate the language being defined from the language used to do the definition. These are sometimes called the object and meta languages respectively. We will refer to them as the defined and defining languages.

BNF and Abstract Syntax

The BNF and abstract syntax for our interpreter is defined as follows:

The BNF for <program>, <expression>, <primitive>

<program> ::= <expression> a-program (exp) <expression> ::= <number> lit-exp (datum) | <symbol> var-exp (id) | <primitive> ( { <expression> }*(,) } ) primapp-exp (prim rands) <primitive> ::= + | - | * | add1 | sub1
 

(The "*(,)" notation denotes Kleene star with comma separators.)


For now, it may be confusing why the abstract grammar looks the way it does, but a little thought and exploration should help clear up that confusion.

Note that the BNF describes a language with a syntax somewhat different from Scheme. This will help us keep the language we are defining separated from the language used to define it, but there's no need to do this. We could change the concrete syntax to look more like Scheme without changing the abstract syntax at all. We'd merely write a new parser.

The following expressions are all valid in the language as defined by this grammar:

5
x
+(3,x)
add1(+(3,x))
We can define datatypes for this abstract syntax as follows:
(define-datatype program program?
                 (a-program
                  (exp expression?)))
 
(define-datatype expression expression?
                 (lit-exp
                  (datum number?))
                 (var-exp
                  (id symbol?))
                 (primapp-exp 
                  (prim primitive?)
                  (rands (list-of expression?))))
 
(define-datatype primitive primitive?
                 (add-prim)
                 (subtract-prim)
                 (mult-prim)
                 (incr-prim)
                 (decr-prim))

The last expression in the example would have an abstract syntax tree that looks like the following figure:

The eval-program Function

The first nonterminal we have to deal with is a program. That is the starting point for all programs fed to our interpreter. The code for handling programs is very simple, especially if we follow the rules we say earlier for following grammars. a program can be only a single expression (we'll expand that later), so we simply interpret that expression:
(define eval-program 
  (lambda (pgm)
    (cases program pgm
      (a-program (body)
        (eval-expression body (init-env))))))

Remember from our discussion of environments that code doesn't run in a vacuum--you have to specify an environment in which to evaluate/execute that code. When evaluating the main expression in a program, that environments is an initial environment (init-env). It is here that we put any globally-bound variables.

The eval-exp Function

The eval-expression function is is also pretty simple if we follow the grammar:
(define eval-expression 
  (lambda (exp env)
    (cases expression exp
      (lit-exp (datum) datum)
      (var-exp (id) (apply-env env id))
      (primapp-exp (prim rands)
        (let ((args (eval-rands rands env)))
          (apply-primitive prim args)))

The interpreter is structured as a cases on the abstract syntax. There is something to do for literals, variable references, and applications. The following sections describe how we interpret each.

Literals

Literals are easy to interpret since they interpret to themselves. The expression in the interpreter that handles literals does precisely that:

   (lit-exp (datum) datum)

Variables References

To evaluate a variable reference, we merely look up its value in the current environment using apply-env from our abstraction for environments:

   (var-exp (id) (apply-env env id))

Primitive Applications

This case handles calling built-in (primitive) functions. We first evaluate all of the operands, then we apply the appropriate operation:

   (primapp-exp (prim rands)
     (let ((args (eval-rands rands env)))
       (apply-primitive prim args)))

(Later, we'll challenge that idea that you always have to evaluate the operands first.)

The operands are a list of expressions. We evaluate the list using an auxiliary function that maps eval-expression onto the list of operands:
   (define eval-rands
     (lambda (rands env)
       (map (lambda (x) (eval-rand x env)) rands)))
 
   (define eval-rand
     (lambda (rand env)
       (eval-expression rand env)))

The returned value is a list of results. (Notice that we could have just called eval-expression directly instead of building a new eval-rand function. We've use eval-rand as a place-holder for later when we build on or change this part.)

After evaluating the operands, we're ready to apply the primitive operation. This is done with a simple cases expression on the primitive:

(define apply-primitive
  (lambda (prim args)
    (cases primitive prim
      (add-prim  () (+ (car args) (cadr args)))
      (subtract-prim () (- (car args) (cadr args)))
      (mult-prim  () (* (car args) (cadr args)))
      (incr-prim  () (+ (car args) 1))
      (decr-prim  () (- (car args) 1))
      )))

The Initial Environment

There is one thing left to define before we can use our language, the initial environment. Keep in mind that the user has no way of binding values to variables (yet), so the only variables in use are the ones initially bound when the interpreter starts. We don't really have to have anything already bound, but we'll put in some simple variables so that we can test things:

(define init-env 
  (lambda ()
    (extend-env
      '(i v x)
      '(1 5 10)
      (empty-env))))

In our environment, i = 1, v = 5, and x = 10. Notice that we build this environment by creating and extending an empty one.


The Parser

Remember how we talked about how the process of parsing was straightforward given the grammar? So straightforward, in fact, that you could write a program to automatically generate parsers? That's what we'll do here.

In Section 3.2 of your book, the authors describe a parser-generation system called SLLGEN. Make sure to look over this section and the accompanying Appendix A for a description of how SLLGEN works.

SLLGEN, like most parsing systems, works in two parts: a lexical analyzer (tokenizer or scanner) and a parser.

Lexical Analyzer

You define the lexical-analyzer by using regular expressions for the different tokens:

(define the-lexical-spec
  '((whitespace (whitespace) skip)
    (comment ("%" (arbno (not #\newline))) skip)
    (identifier
      (letter (arbno (or letter digit "_" "-" "?")))
      symbol)
    (number (digit (arbno digit)) number)))

Each case of the scanner has three parts:

  1. A class of the token
  2. A regular expression for matching the token
  3. The outcome of the matching action. This is either "skip" to skip that token (e.g., whitespace, comments) or one of "symbol", "number", or "string" to indicate into what Scheme type the data for the token should be converted.

The Parser

The parser produced by SLLGEN is based on a context-free grammar.

(define the-grammar
  '(
    (program (expression) a-program)
  
    (expression (number) lit-exp)
    (expression (identifier) var-exp)   
    (expression
      (primitive "(" (separated-list expression ",") ")")
      primapp-exp)

    (primitive ("+")     add-prim)
    (primitive ("-")     subtract-prim)
    (primitive ("*")     mult-prim)
    (primitive ("add1")  incr-prim)
    (primitive ("sub1")  decr-prim)
    
    ))

The best way to understand this specification is to compare it to the BNF for the language described earlier.

The specification is a set of productions, just like the BNF. Each production has three parts:

  1. Which nonterminal this matches (equivalent to the left side of the BNF production)
  2. The string produced by that nonterminal (equivalent to the right side of the BNF production)
  3. What to name this piece of abstract syntax (the variant for that case of the nonterminal's defined datatype).

Notice that the define-datatype definitions given earlier do not appear anywhere in the code--they are automatically generated by SLLGEN from the specification.

Every time you run the interpreter, SLLGEN reads the grammatical specification and generates the parser. For later lectures and homeworks, we'll change this specification and automatically get a new parser for it.


Running the Interpreter

To run our interpreter then, we need something to tie our front-end parser with our back-end evaluator (eval-program).

(define run
  (lambda (string)
    (eval-program (scan&parse string))))

Notice that this run function takes a string, scans and parses it (scan&parse), and passes the resulting AST to eval-program.

We can define a simple read-eval-print loop as follows:
(define read-eval-print 
  (sllgen:make-rep-loop  
    "--> "                        ; the prompt
    eval-program                  ; the evaluator
    (sllgen:make-stream-parser    ; the parser
      the-lexical-spec
      the-grammar)))

Using the read-eval-print loop, we can evaluate expressions:

>(read-eval-print)
--> 7
7
--> add1(10)
11
--> *(+(2, 4), sub1(4))
18

Conclusions

As you can see from the code in this lecture, evaluation (interpretation) is not the mystery you might have thought it was. Slowly over the coming weeks, you will be able to see how many programming concepts can be described using this simple evaluator.

For much of the rest of the course, we will be extending and modifying the interpreter defined in this section. It is vital that you understand it or you will soon be lost. I encourage you to study it and run it.


Question

  1. Are primitive functions first-class in this language? Why or why not? If you had to prove it using the languages operational semantics (the interpreter), how would you do so?

The complete interpreter for Lecture 15.

Last updated at 9:54 am on Friday, August 27, 2004.