## Languages and grammars

30th September 2002

### Languages and Grammars

These notes are from a lecture on the 26th September.

Let P be a Σ rewriting system.

A word ω is terminal in P if there is no word Z so that ω =>_{P} Z

A grammar is a Σ rewriting system P together with an initial string I: G = (P, I). The grammar defines all possible words in the language. To generate the language you start with the initial string and apply the rules of the Σ rewriting system in as many ways as possible.

The grammar generates the language:

L_{G} = { ω: I =>*_{P} ω, ω is terminal }

So only terminal words are included in the language. A formal language is one generated in this way.

#### Notation

α ~> β_{1}α ~> β_{2}can be rewritten as α ~> β_{1}| β_{2}| β_{3}α ~> β_{3}

#### Example 1

Σ = { a, b, X } P = { X ~> ab | aXb } I = X (initial string) G = (P, X) Typical Derivation: X aXb aaXbb terminal: aaabbb ∈ L_{G}

In this language, any words that do not contain an X are terminal:

Terminal in P = {a, b}*

L_{G} = { a^{n}b^{n}: n = 1, 2, 3, 4, ... }

#### Example 2

Σ = { S, X, Y, +, *, 0, 1, ), ( } P = { S ~> (S + S) | (S * S) | 0 | 1 | X | Y } G = (P, S) so S is the initial string Typical Derivation: S (S + S) ((S * S) + S) =>* ((X * Y) + X)

Note that we do not make any assumptions about the *meanings* of 0, 1, + and *—as far as we are concerned they are just symbols.

The language described above contains all of the expressions that can be built up using addition and multiplication of X, Y, 0 and 1. For example, (((X * Y) + (Y * X)) * X) can be formed with:

S (S * S) ((S + S) * S) =>* (((S * S) + (S + S)) * S) =>* (((X * Y) + (Y * X)) * X)

#### Applications

Grammars give a succinct and unambiguous description of a language. Programming languages are described by grammars. For something like C this would seem like an almost impossible task but can be achieved by building grammars for simple things such as allowed variable names and C expressions and then moving up to conditional statements, function declarations and eventually whole programs. This takes advantage of the fact that grammars can be used as building blocks in the same way as C macros.

We need to be able to do the following:

- Given a simple grammar G, find the language L
_{G} - Given a simple language (or example), find a grammar for it

#### Example: Abducted by Aliens

You have been abducted by aliens. You regain consciousness on their ship, and they speak to you in the followign manner:

*Alien 1*: dee paa da wee pam

*Alien 2*: dee wap paa da wee pam

*Alien 3*: dee wee wap paa da wee pam

What do you say? Let’s work out the grammar for their language (from what little we’ve seen of it).

Σ = { da, dee, paa, wee, pam, wap, X, Y, Z }

We’ve added some extra terms X, Y and Z to help us construct the grammar as the symbols we already have are terminal. Looking at the alien’s speech so far everything starts with “dee” and ends with “paa da wee pam”. We will use X as our intial string. Remember that (0) is the empty string.

X ~> dee Y paa da wee pam Y ~> (0) | wap | Z Z ~> wee wap | wee wap Z

From the above, we can try “dee wee wap wee wap paa da wee pam” and see what happens.

#### Zermelo Fraenkel Set Theory

L_{ZF} (Zermelo Fraenkel Set Theory) is a grammar that can be used to represent almost all of contemporary mathematics in very precise (if lengthy) terms.

S ~> (S ^ S) | (S v S) | (> S) | (S -> S) | (S ⇔ S) | (∀ V ) S | (∃ V) S | V = V | V ∈ V V ~> variable (sequence of letters and digits)

Some examples:

A = B ∪ C in L_{ZF}is (∀ X)(X ∈ A ⇔ (X ∈ B v X ∈ C)) A = {B, C} in L_{ZF}is (∀ X)(X ∈ A ⇔ (X = B v X = C)) A ⊆ B in L_{ZF}is (∀ X)(X ∈ A ⇔ X ∈ B)

## More recent articles

- Give people something to link to so they can talk about your features and ideas - 13th July 2024
- Weeknotes: a livestream, a surprise keynote and progress on Datasette Cloud billing - 2nd July 2024
- Open challenges for AI engineering - 27th June 2024
- Building search-based RAG using Claude, Datasette and Val Town - 21st June 2024
- Weeknotes: Datasette Studio and a whole lot of blogging - 19th June 2024
- Language models on the command-line - 17th June 2024
- A homepage redesign for my blog's 22nd birthday - 12th June 2024
- Thoughts on the WWDC 2024 keynote on Apple Intelligence - 10th June 2024
- Accidental prompt injection against RAG applications - 6th June 2024
- Training is not the same as chatting: ChatGPT and other LLMs don't remember everything you say - 29th May 2024