Languages and grammars
30th September 2002
Languages and Grammars
These notes are from a lecture on the 26th September.
Let P be a Σ rewriting system.
A word ω is terminal in P if there is no word Z so that ω =>P Z
A grammar is a Σ rewriting system P together with an initial string I: G = (P, I). The grammar defines all possible words in the language. To generate the language you start with the initial string and apply the rules of the Σ rewriting system in as many ways as possible.
The grammar generates the language:
LG = { ω: I =>*P ω, ω is terminal }
So only terminal words are included in the language. A formal language is one generated in this way.
Notation
α ~> β1 α ~> β2 can be rewritten as α ~> β1 | β2 | β3 α ~> β3
Example 1
Σ = { a, b, X } P = { X ~> ab | aXb } I = X (initial string) G = (P, X) Typical Derivation: X aXb aaXbb terminal: aaabbb ∈ LG
In this language, any words that do not contain an X are terminal:
Terminal in P = {a, b}*
LG = { anbn: n = 1, 2, 3, 4, ... }
Example 2
Σ = { S, X, Y, +, *, 0, 1, ), ( } P = { S ~> (S + S) | (S * S) | 0 | 1 | X | Y } G = (P, S) so S is the initial string Typical Derivation: S (S + S) ((S * S) + S) =>* ((X * Y) + X)
Note that we do not make any assumptions about the meanings of 0, 1, + and *—as far as we are concerned they are just symbols.
The language described above contains all of the expressions that can be built up using addition and multiplication of X, Y, 0 and 1. For example, (((X * Y) + (Y * X)) * X) can be formed with:
S (S * S) ((S + S) * S) =>* (((S * S) + (S + S)) * S) =>* (((X * Y) + (Y * X)) * X)
Applications
Grammars give a succinct and unambiguous description of a language. Programming languages are described by grammars. For something like C this would seem like an almost impossible task but can be achieved by building grammars for simple things such as allowed variable names and C expressions and then moving up to conditional statements, function declarations and eventually whole programs. This takes advantage of the fact that grammars can be used as building blocks in the same way as C macros.
We need to be able to do the following:
- Given a simple grammar G, find the language LG
- Given a simple language (or example), find a grammar for it
Example: Abducted by Aliens
You have been abducted by aliens. You regain consciousness on their ship, and they speak to you in the followign manner:
Alien 1: dee paa da wee pam
Alien 2: dee wap paa da wee pam
Alien 3: dee wee wap paa da wee pam
What do you say? Let’s work out the grammar for their language (from what little we’ve seen of it).
Σ = { da, dee, paa, wee, pam, wap, X, Y, Z }
We’ve added some extra terms X, Y and Z to help us construct the grammar as the symbols we already have are terminal. Looking at the alien’s speech so far everything starts with “dee” and ends with “paa da wee pam”. We will use X as our intial string. Remember that (0) is the empty string.
X ~> dee Y paa da wee pam Y ~> (0) | wap | Z Z ~> wee wap | wee wap Z
From the above, we can try “dee wee wap wee wap paa da wee pam” and see what happens.
Zermelo Fraenkel Set Theory
LZF (Zermelo Fraenkel Set Theory) is a grammar that can be used to represent almost all of contemporary mathematics in very precise (if lengthy) terms.
S ~> (S ^ S) | (S v S) | (> S) | (S -> S) | (S ⇔ S) | (∀ V ) S | (∃ V) S | V = V | V ∈ V V ~> variable (sequence of letters and digits)
Some examples:
A = B ∪ C in LZF is (∀ X)(X ∈ A ⇔ (X ∈ B v X ∈ C)) A = {B, C} in LZF is (∀ X)(X ∈ A ⇔ (X = B v X = C)) A ⊆ B in LZF is (∀ X)(X ∈ A ⇔ X ∈ B)
More recent articles
- My AI/LLM predictions for the next 1, 3 and 6 years, for Oxide and Friends - 10th January 2025
- Weeknotes: Starting 2025 a little slow - 4th January 2025
- I still don't think companies serve you ads based on spying through your microphone - 2nd January 2025