Lecture 11: Abstract Syntax


Abstract Syntax

The method we have been using to implement programs that work with data structures described by a particular BNF is called syntax directed programming since we let the syntax guide us.

Syntax directed programming is, as you might guess, especially important for programs like compilers and interpreters. As you have seen, syntax directed programming involves associating a particular behavior with each of the parts of the grammar.

We're all familiar with concrete syntax; that's the program form designed for human consumption. Concrete syntax worries about the correct placement of semicolons, period, the exact format of the if statement and so on.

While all of these three important, they can get in the way when we are trying to understand deeper issues in programming language design. For example, consider the following if statements from different languages:


   if <test> then
      <consequent>
   else
      <alternate>


   if (<test>)
      <consequent>
   else
      <alternate>


   (<test>) => <consequent>
            | <alternate>

Even though these each have a different form, they are all the same at a deeper level.

An abstract syntax expresses the structure of the concrete syntax without the details. To create an abstract syntax from a concrete syntax, we associate a name with each production in the concrete syntax as well as naming each of the non-terminals in the production. We typically will use a record for this, since records allows us to refer to members of an aggregation by name. So, we might have a name if-statement for the production and use the names test, consequent, and alternate for the three parts of an if statement.

As another example, we might express the abstract syntax for the lambda calculus as follows. The names to the right are the production names and the values in parentheses are the names of the non-terminals in the order of their appearance.

The BNF for <expression> (abstract syntax is on the right)

<exp> ::= <number> lit-exp (datum) | <symbol> var-exp (id) | (lambda (<symbol>) <exp>) lambda-exp (id body) | (<exp> <exp>) app-exp (rator rand) // Note that this grammar has been changed to include numbers.
 

Abstract Syntax Trees

The most commonly accepted method of viewing a particular expression in abstract syntax is the abstract syntax tree also commonly called a parse tree. A parse tree has nodes which are labeled with the production name and leaves which represent the terminals in the grammar. The names of the non-terminals decorate the edges of the tree. Here is an example of the syntax tree for (lambda (x) (f (f x))):


Representing Abstract Syntax with an Abstract Data Type

As we have seen, abstract data types are a good fit for representing BNF. Not surprisingly, they are also a good way to represent abstract syntax.

We'll use the define-datatype abstraction used in the last lecture to define a data type for expressions:

For example we could represent the structure of all of the above if statements with a record that had fields for each of the test expression and the consequent and alternate statements:

    if-statement (test consequent alternate)

The abstract syntax for the lambda expression BNF shown above could be likewise represented as four different records (one for each production), each with their own fields:

    lit (datum)
    varref (var)
    lambda (formal body)
    app (rator rand)

Actually, all four of these records are just four different variants on an overall expression type. We could implement that type as follows:

(define-datatype expression expression?
  (lit-exp    (datum number?))
  (var-exp    (id symbol?))
  (lambda-exp (id symbol?)
              (body expression?))
  (app-exp    (rator expression?)
              (rand expression?)))

Parsing

Parsing is the process of translating a sequence of characters (a string) into an abstract syntax tree. Parsing is a well understood science that has a well developed theory. Parsing is a very methodical process. In fact, it is so methodical that there are programs that generate other programs to serve as parsers. These programs are called parser generators, but go by the more common name of compiler compilers. One of the most famous of these is yacc which stands for "Yet Another Compiler Compiler". This is the manual reference for yacc.

A complete look at parsing is beyond the scope of this class (you'll see more about it in CS 431. We will, however, want to turn the concrete syntax of languages that we express in BNF into their abstract syntax representation. Fortunately, the read function in Scheme (part of the read-eval-print loop) will do most of the hard work for us. All we have to do is convert between the lists that read produces and our internal record structure. Completing the free-vars and lexical-address homework problems prepares you well to do this.

Here is an example of a parser for our lambda expression language from above:

(define parse-expression
  (lambda (datum)
    (cond
      ((number? datum) (lit-exp datum))
      ((symbol? datum) (var-exp datum))
      ((pair? datum)
       (if (eqv? (car datum) 'lambda)
         (lambda-exp (caadr datum)
           (parse-expression (caddr datum)))
         (app-exp
           (parse-expression (car datum))
           (parse-expression (cadr datum)))))
      (else (eopl:error 'parse-expression
              "Invalid concrete syntax ~s" datum)))))

This procedure processes concrete syntax and produces an abstract syntax.

We also want to turn the abstract syntax into concrete expressions. The book calls this unparse, but you'll also hear this referred to as pretty printing. Here is the definition of unparse.

(define unparse-expression
  (lambda (exp)
    (cases expression exp
      (lit-exp (datum) datum)
      (var-exp (id) id)
      (lambda-exp (id body) 
        (list 'lambda (list id)
          (unparse-expression body)))
      (app-exp (rator rand)
        (list (unparse-expression rator)
              (unparse-expression rand))))))


Last updated at 1:38 pm on Monday, January 26, 2004.