Lecture 3: Data Structures


Remember, we started off talking about the three things that every programming language has: primitives, a means of combination, and a means of abstraction. In this lecture, we discuss data types, a means of combination. Later we'll see how data abstraction allows us to use abstraction with data.

The reading for this lecture is again from An Introduction to Scheme and its Implementation, specifically "Pairs and Lists", "Equality Predicates", and "Quoting and Literals".


Data Types

Every programming languages has data types and ways of combining and abstracting them.

For any data type, we are concerned with:
  1. the values of the type

  2. the operations on that type. Later we will classify the different operations on a data type depending on what thy do.

  3. how the values are represented when they are printed and how we represent them in programs.

Type Checking

Even though every language has data types, not all language enforce the correct usage of those types. Scheme for example, is said to be untyped, but this is something of a misnomer. Scheme has types and if you get them wrong your program won't run. Scheme just doesn't help you get them right. Pascal, on the other hand, enforces correct type usage. So, we run up against two kinds of type checking:

To do static type checking, we must be able to decide at compile time what the type of every expression is. A language for which this is true is said to be strongly typed. If we can't decide the type of every expression at compile time, the language is weakly typed.

We'll talk a lot more about typing in Lectures 24-28.


Pairs

Most modern languages, including Scheme, provide ways of aggregating data. In Scheme, we can construct aggregates of objects called pairs. After we make a pair, we can refer to it by name, use it as an operand to a procedure, and in general treat it just like we'd treat a simple object such as a number.

The operator that Scheme gives us for constructing pairs is the cons operator. Cons takes two parameters and returns a pair consisting of the two arguments. If we have a pair, we can get to the individual parts using the Scheme operators car and cdr. These names are historical, but refer to the first part of the pair and the second part of the pair respectively. Thus we can use cons, car, and cdr as follows:

==> (define a (cons 10 20))
A
==> (define b (cons 3 7))
B
==> (car a)
10
==> (car b)
3
==> (cdr a)
20
==> (cdr b)
7
==> (define c (cons a b))
C
==> (car (car c))
10
==> (car c)
(10 . 20)
==> c
((10 . 20) 3 . 7)
==> (cdr (car c))
20
==> (cdr c)
(3 . 7)
==> (cdr a)
20
==> (cdr ( car a))

Illegal datum in first argument position 10
 within procedure \#[PRIMITIVE-PROCEDURE CDR]
 There is no environment available;
 using the current read-eval-print environment.
 Error-> ^G
Quit!
==> (car (car a))

 Illegal datum in first argument position 10
 within procedure \#[PRIMITIVE-PROCEDURE CAR]
 There is no environment available;
 using the current read-eval-print environment.
Error-> ^G
Quit!

Box and Pointer Diagrams

We can draw pictures to help us understand the operations of cons. Such pictures are called box and pointer diagrams for obvious reasons. For example, the box and pointer diagram for a in the preceding example would look like this:

Similarly, the box and pointer diagram for b would look like:

Of course, cons cells can point to things besides numbers as c shows:


Questions

  1. Try using the box and pointer diagrams to understand the interaction shown in the preceding example. In particular, can you tell why the errors occurred?

  2. Draw box and pointer diagrams for the following expressions:
    1. (define x (cons 1 2)
    2. (define y (cons x x))
    3. (define z (cons y 4))
    4. (define w (cons z (cons y x)))

Lists

Lists have a special form. If we draw a box and pointer diagram for the list (3 4 5) it looks like this:

We can get our pairs to look like the above box and pointer diagram if we use cons like this:
(cons 3 (cons 4 (cons 5 nil)))

We can also construct a list using the special list construct:

(list 3 4 5)

We can use car and cdr on lists since they are made from pairs. Remember that car returns whatever the first point points at and cdr returns whatever the second pointer points at. For instance, on the above diagram, car points to 3 and cdr points to another list, (4 5). Note that this is different than when we used pairs and the car and cdr both pointed to numbers.

When we execute the command (list 3 4 5), Scheme returns a list. In order to access that list again, we have to bind a name to it using define. So to define a list called 3-to-5, we say

(define 3-to-5 (list 3 4 5))
We can now use the name 3-to-5 with car, cdr, and cons to access elements and to construct new lists. For example:
==> (car 3-to-5)
 3
==> (cdr 3-to-5)
(4 5)
==> (car (cdr 3-to-5))
4
==> (cons 5 3-to-5)
(5 3 4 5)

The following do not result in lists:

==> (cons 3-to-5 5)
 ((3 4 5) . 5)
==> (cons (cdr 3-to-5) (car 3-to-5))
 ((4 5) . 3)
==> (cons 3 4)
(3 . 4)

The above examples are clearly not lists when we look at the box and pointer diagrams for them. Note that the last pointer in the last pair points at something called nil. Indeed when we used cons to form a list (rather than the list construct), the innermost cons looked like this (cons 4 nil). What is nil? The word nil is a contraction of the Latin word for nothing and means just that. It is represented by the empty list.

Note: some versions of Scheme call the empty list "nil" (MIT Scheme), some call it "null" (DrScheme's base language settings), and some don't predefine a symbol for it (DrScheme's EOPL setting). However, all versions will recognize '() for the empty list. (This will be explained in a moment.)

Having the last element in our lists be the empty list is important both from a practical and a theoretical standpoint. Its nice in theory because we have seen that the cdr of a list is always another list. The last element in the list, therefore, has to be a list, and it needs to be empty. In practice its nice because we will define many operations that use recursion to process lists. Our base case will usually be checking for the empty list.

Sample problems:

  1. Draw box and pointer diagrams for the following complex data objects:
    1. (list 1 2)
    2. (cons 1 (list 2))
    3. (cons 1 (cons 2 nil))
    4. (cons (list 3 4) (cons 3 (cons 4 (list 4 5))))
  2. What is returned by each of the following lines?
        (define a (list 1 2 3))
        (define b (list 4 5 6))
        (cons a b)
        (list a b)
        (append a b)

Quotation

It would be nice if we didn't have to use the list operator since it doesn't really add information about the structure of the list, it just tells Scheme how to build it. We could write it like this:

(define aTree ((3 4) (3 (1 2))))

but Scheme would complain since it would try to evaluate the tree since it can't tell the difference between that and any other list such as

(define anotherTree (* 3 ( +1 2)))

The problem is that data and programs look alike! Lists and trees look just like expressions in Scheme. This is one of the reasons for Scheme's power, but it means that we need a way to tell Scheme that something is to be taken literally and not evaluated. We do this with the quote character '. So we could define to two trees above like this:

(define aTree '((3 4) (3 (1 2))))
(define anotherTree '(* 3 (+ 1 2)))

The quote character in Scheme, just like in English, tells Scheme to take the next symbol, list, whatever, literally. This gives us the power to create lists of symbols as well as numbers. Take for example, the following exchange with the interpreter:

==> (define x 56)
x
==> (define y 30)
y
==> (list x y)
(56 30)
==> (list 'x 'y)
(x y)
==> (list 'x y)
(x 30)
==> (list x 'y)
(56 y)

We can perform all the normal operations on literal lists. For instance, we can type

==> (car '(a b c))
a

Three Ways to Make Lists

So, we've see three main ways to make lists:

  1. (cons 1 (cons 2 (cons 3 nil)))
  2. (list 1 2 3)
  3. '(1 2 3)

All three of these expressions return the same list.


Some Useful List Operators

(list thing1 thing2 ...) Makes a list
(cons thing lst) Returns the result of adding a new item to the beginning of a list
(car lst) Returns the first element of a list
(cdr lst) Returns the list after the first element
(append lst1 lst2) Concatonates two lists together and returns the result
(reverse lst) Returns the reverse of a list.
(length lst) Returns the length of a list.
(member thing lst) Tests to see if a "thing" is in a list.
(map func lst1 lst2 ...) Maps a function func with arity n onto n lists of operands. (Call func once for each item in lst1/lst2/.... The first time, it passes the first element of lst1, the first element of lst2, etc; the second time, it passes the second element of lst1, the second element of lst2, etc., and so on.)


Vectors

Vectors are a primitive that can be used to build data structures that look like either arrays or records (do you know the difference?).

Vectors can be made by using either vector and specifying the contents (much like list):

    (vector 1 2 3 4)

or by using make-vector and specifying a length.They are accessed by the operands vector-ref and vector-set!. Another way to make a vector is to use list->vector to convert a list of values into a vector.

For example:

    (define v (vector 1 2 3)))  ; makes a vector [1 2 3]
    (vector-ref v 0)            ; returns v[0]
    (vector-set! v 1 99)        ; v[1] = 99

When a v is printed, it looks like this:

    #(1 2 3)
or like this in some implementations that include the length:
    #3(1 2 3)

Notice that this is much like the way a list is printed, except for the "#" prefix.


Comparing Data

All languages have some means of making comparisons. For numeric data, these may be relations based on size of the number. Scheme supports =, <, >, >=, and >= for comparing data.

Scheme also has two other comparison operators for other data types:

(equal? a b) Compares two items to see if they have the same value. For lists, this comparison is done recursively and is called a deep comparison.
(eq? a b) Tests whether two expressions refer to the same object (a pointer or shallow comparison).



Last updated at 9:54 am on Friday, August 27, 2004.