# Simon Willison’s Weblog

## Languages and grammars

### Languages and Grammars

These notes are from a lecture on the 26th September.

Let P be a Σ rewriting system.

A word ω is terminal in P if there is no word Z so that ω =>P Z

A grammar is a Σ rewriting system P together with an initial string I: G = (P, I). The grammar defines all possible words in the language. To generate the language you start with the initial string and apply the rules of the Σ rewriting system in as many ways as possible.

The grammar generates the language:

LG = { ω: I =>*P ω, ω is terminal }

So only terminal words are included in the language. A formal language is one generated in this way.

#### Notation

```α ~> β1
α ~> β2 can be rewritten as α ~> β1 | β2 | β3
α ~> β3
```

#### Example 1

```Σ = { a, b, X }
P = { X ~> ab | aXb }
I = X (initial string)
G = (P, X)

Typical Derivation:

X
aXb
aaXbb
terminal: aaabbb ∈ LG
```

In this language, any words that do not contain an X are terminal:

Terminal in P = {a, b}*

LG = { anbn: n = 1, 2, 3, 4, ... }

#### Example 2

```Σ = { S, X, Y, +, *, 0, 1, ), ( }
P = { S ~> (S + S) | (S * S) | 0 | 1 | X | Y }
G = (P, S) so S is the initial string

Typical Derivation:

S
(S + S)
((S * S) + S) =>* ((X * Y) + X)
```

Note that we do not make any assumptions about the meanings of 0, 1, + and *—as far as we are concerned they are just symbols.

The language described above contains all of the expressions that can be built up using addition and multiplication of X, Y, 0 and 1. For example, (((X * Y) + (Y * X)) * X) can be formed with:

```      S
(S * S)
((S + S) * S) =>* (((S * S) + (S + S)) * S) =>* (((X * Y) + (Y * X)) * X)
```

#### Applications

Grammars give a succinct and unambiguous description of a language. Programming languages are described by grammars. For something like C this would seem like an almost impossible task but can be achieved by building grammars for simple things such as allowed variable names and C expressions and then moving up to conditional statements, function declarations and eventually whole programs. This takes advantage of the fact that grammars can be used as building blocks in the same way as C macros.

We need to be able to do the following:

1. Given a simple grammar G, find the language LG
2. Given a simple language (or example), find a grammar for it

#### Example: Abducted by Aliens

You have been abducted by aliens. You regain consciousness on their ship, and they speak to you in the followign manner:

Alien 1: dee paa da wee pam

Alien 2: dee wap paa da wee pam

Alien 3: dee wee wap paa da wee pam

What do you say? Let’s work out the grammar for their language (from what little we’ve seen of it).

Σ = { da, dee, paa, wee, pam, wap, X, Y, Z }

We’ve added some extra terms X, Y and Z to help us construct the grammar as the symbols we already have are terminal. Looking at the alien’s speech so far everything starts with “dee” and ends with “paa da wee pam”. We will use X as our intial string. Remember that (0) is the empty string.

```X ~> dee Y paa da wee pam
Y ~> (0) | wap | Z
Z ~> wee wap | wee wap Z
```

From the above, we can try “dee wee wap wee wap paa da wee pam” and see what happens.

#### Zermelo Fraenkel Set Theory

LZF (Zermelo Fraenkel Set Theory) is a grammar that can be used to represent almost all of contemporary mathematics in very precise (if lengthy) terms.

```S ~> (S ^ S) | (S v S) | (> S) | (S -> S) | (S ⇔ S)
| (∀ V ) S | (∃ V) S | V = V | V ∈ V
V ~> variable (sequence of letters and digits)
```

Some examples:

```A = B ∪ C in LZF is (∀ X)(X ∈ A ⇔ (X ∈ B v X ∈ C))
A = {B, C} in LZF is (∀ X)(X ∈ A ⇔ (X = B v X = C))
A ⊆ B in LZF is (∀ X)(X ∈ A ⇔ X ∈ B)
```

This is Languages and grammars by Simon Willison, posted on 30th September 2002.

Next: Basic Lisp

Previous: Managing data

Previously hosted at http://simon.incutio.com/archive/2002/09/30/languagesAndGrammars