Lecture 10: An Abstraction for Inductive Data Types


Abstract Data Types

You should have seen the idea of abstract data types in at least a few classes so far. The key idea behind abstract data types is that the interface and the implementation are separate. The interface defines the set of operations available for that type. The implementation is the code that makes that work. For a single interface, there might be many different ways of implementing it.

So why separate the interface and the implementation? Because as long as the interface is implemented properly, any code that uses that interface to access the data doesn't have to care about how its implemented. That implementation can change at any time as needed without breaking anyone else's code.

I this lecture we'll talk about a specific abstraction that's useful for implementing the inductive types we've talked about. Don't worry about how it's implemented--just use the abstraction.

Note: the methods defined here for defining and using these types are not a standard part of Scheme. If you're using DrScheme, a set of compatibility routines are provided. If you're using something else, the code from the textbook authors' web site has similar compatibility routines for other versions of Scheme.

An Abstraction for Inductive Data Types

In languages like C++ you're used to using data structures like struct (or classes) that let you store multiple "fields" of information. Many languages support unions, which are aggregate structures that can contain different sets of fields (but only one set at at time). Unions with their different variants work somewhat like classes and subclasses.

The operator we'll use to define these unions and their variants is define-datatype:

	(define-datatype type-name type-predicate-name
	  { ( variant-name { ( field-name predicate ) }* ) }* )

As part of the definition, you specify the name of the type you're creating (the type-name), the name of the predicate that will be created to test for that type (the type-predicate-name), and each variant that this type can have (the variant-names). For each variant, you define the data fields (the field-names) and predicates that test for valid types for those fields (the predicates).

So, for example, a binary tree has either interior nodes or leaf nodes. Suppose that for our application we want the leaf nodes to store numbers and the interior nodes to have a symbol tag (as well as the two subtrees).

You can define a binary tree and these two variants with the following:

	(define-datatype bintree bintree?
	  (leaf-node
	    (datum number?))
	  (interior-node
	    (key symbol?)
	    (left bintree?)
	    (right bintree?)))  

This creates a datatype called bintree and a bintree? predicate to test for that type. This type has two variants: leaf-node and interior-node. leaf-nodes have a single field called datum that must satisfy the number? predicate. interior-nodes have three fields: a key that must satisfy the symbol? predicate and left and right subtrees that must each satisfy the bintree? predicate. (Note the recursive definition/use of bintree?.)

This is much the way you'd define a similar structure in C, but notice that instead of using a type name to define the fields you specify a predicate to test for valid entries in those fields. Not only is this function-based approach in keeping with the functional paradigm of Scheme, but it's also more flexible. If you wanted a field to contain numbers or strings (but no other types), you could use (lambda (x) (or (number? x) (string? x))) for the field's predicate. Or, if you planned to use that repeatedly you could give that function and name and use that name when you use define-datatype.

Once you define this type, it automatically defines two constructors (one for each variant) that have the same name as the variant and take parameters in the same order as the fields:

	(leaf-node 5)
	(interior-node 'a (leaf-node 1) (leaf-node 2))

The first expression returns a leaf node containing 5. The second expression returns an interior node with the symbol a for the key, a leaf node containing 1 for the left tree, and a leaf node containing 2 for the right tree.

As another example, we can also use define-datatype to define an s-list:

	(define-datatype s-list s-list? 
	  (empty-s-list)
	  (non-empty-s-list 
	    (first symbol-exp?)
	    (rest s-list?)))
	
	(define-datatype symbol-exp symbol-exp? 
	  (symbol-symbol-exp
	    (data symbol?))
	  (s-list-symbol-exp
	    (data s-list?)))

Notice that because there are two non-terminals in the original definition, we define two types.

Questions:


Predicates for Lists

Quite often, we might want to use lists as fields in a variant. The list? predicate tests only for lists, not lists of a specific type. To facilitate this, the define-datatype package also includes a useful list-of function:

	(define list-of
	  (lambda (pred)
		(lambda (val)
		  (or (null? val)
			  (and (pair? val)
				   (pred (car val))
				   ((list-of pred) (cdr val)))))))

You don't have to type this into your own code--if the define-datatype package is loaded, this function is part of it.

The function list-of is basically a curried function that takes predicate and returns a new function that applies that predicate to each element of a list. Here's how we might use it:

  (define list-of-numbers? (list-of number?))
  (define list-of-symbols? (list-of symbol?))

(See how useful currying turns out to be. You only have to code the list verification and traversal once.)

You can even use the predicate you get back from list-of as the parameter to another call to list-of:

  (define list-of-lists-of-numbers? (list-of (list-of number?)))

Cases

Now that we can define different variants, we need some way of processing the different variants a particular type can have. We do this with a cases constuct:

(cases type-name expression
  { (variant-name ( { field-name }* ) consequent) *
  (else default))

This takes an expression of type type-name and tests to see which of the variant-names it matches. If it matches one of the listed variants, each of the corresponding field-names is bound to a local variable by that name and the consequent is evaluated and returned.

The following example uses the cases constuct to define a leaf-sum function that recursively traverses the tree and adds up the leaves:

(define leaf-sum 
  (lambda (tree)
    (cases bintree tree
      (leaf-node (datum) datum)
      (interior-node (key left right)
        (+ (leaf-sum left) (leaf-sum right))))))

If tree evaluates to a leaf-node, the datum field is extracted and returned. If tree evaluates to an interior-node, the key, left, and right fields are extracted, and the sum of recursing on the left subtree and recursing on the right subtree is returned.

Questions:



Last updated at 9:54 am on Friday, August 27, 2004.