BNF or Backus Naur Form is a notation for defining languages.
<program> ::= BEGIN <block> END
<block> ::= { <if-statement> | <loop> | <break>
<if-statement> ::= <if-keyword> <expression> <block> [ ELSE <block> ] END
<if-keyword> ::= IFZ | IFP | IFN
<loop> ::= LOOP <block> END
<break> ::= BREAK
BNF can be used to describe itself
<line> ::= '<' <word> '>' '::=' <definition>
<definition> ::= <word> '|' | '' <definition> | ''
I am a retired independent software engineer currently living in Kentwood, Mi. Most recent professional experience is on the back end of web apps using Ruby on Rails.
-Started out as an astronomer
Probably the most significant programming language nobody has ever heard of!
The man page section 5 for the sudoers file is described in EBNF. Different style.
-<S> ; Start nonterminal
- <NonTerminal> ; In CamelCase in pointy brackets
- ::= ; Derives operator
- Terminal ; exact string: E.g. : abc or surrounded by double quotes : "a c"
* Or CameLCase for lexical token names: E.g. Ident, WhiteSpace, .etc
- Alternation operator: "|" . Or operator.
- Eps : Epsilon : Empty string. Actually: Nondeterministic connector
- Comment start character ";" Comment continues to end of line
; Here Be Dragons
;Start symbol is often <S> also <Sentence>
<S> ::= <NonTerminal>
<NonTerminal> ::= Terminal <NonTerminal2> foo "bar baz" | Eps
<NonTerminal2> ::= a ", " ; Two terminals: 'a' and ', '
<S> ::= [<NonTerminal> Terminal] <Group> <Concat>
<NonTerminal> ::= {", " <S>}
<Group> ::= ("+" | "-")
<ConCat> ::= abc, def, ghi
; U.S. Postal address ABNF (From Wikipedia.org)
postal-address = name-part street zip-part
name-part = *(personal-part SP) last-name [SP suffix] CRLF
name-part =/ personal-part CRLF
personal-part = first-name / (initial ".")
first-name = *ALPHA
initial = ALPHA
last-name = *ALPHA
suffix = ("Jr." / "Sr." / 1*("I" / "V" / "X"))
street = [apt SP] house-num SP street-name CRLF
apt = 1*4DIGIT
house-num = 1*8(DIGIT / ALPHA)
street-name = 1*VCHAR
zip-part = town-name "," SP state 1*2SP zip-code CRLF
town-name = 1*(ALPHA / SP)
state = 2ALPHA
zip-code = 5DIGIT ["-" 4DIGIT]
Pitfalls[edit]
RF
We consider only written forms of communication here. Included also is electronic forms of communication, especially digital.
Of or relating to words in a vocabulary .
The words or identifiers and even keywords in our grammar that get represented by terminal symbols in the BNF notation.
<AddOp> ::= "+" | "-"
<Letter> ::= "a" | "b" | "c" | ... | "z"
<Number> ::= "0" | "1" | "2" | "3" | "4" | ... | "9"
<LetterOrNumber> ::= <Letter> | <Number>
<Identifier> ::= <LetterOrNumber> <Identifier>
| Eps
Of, or pertaining to the structure of a sentence in a language. Especially with regard to order of words and any punctuation therein.
The grammar is where the language really meets the road. We use notational forms like BNF to formally define the syntax of the language.
Not all syntacticly correct sentences convey the same meaning. You have to imply some surrounding context or background knowledge.
E.g. consider these two sentences with similar words in the same syntax placement:
A natural number can be defined inductively as a natural number followed by a digit. Notice the use of recursion in this statement and the following BNF.
<S> ::= <NaturalNumber>
<NaturalNumber> ::= <NaturalNumber> <Digit>
| <Digit>
<Digit> ::= "0" | "1" | "2" |" 3" | "4"
| "5" | "6" |" 7" | "8" | "9"
A language is regular over some alphabet (Sigma) if it is one of:
As defined by the mathematician Stephen Cole Kleene.
Differs from regular languages because they can have nesting.
<S> ::= "(" <S> ")"
| a
Accepts all correctly balanced parenthesis surrounding a single “a”
Top down parsers can accept LL(K) where K >= 0 and represents the count of items of lookahead.
Bottom up parsers can accept LR(K) languages
The first L means Left => Right scanning The second letter represents the type of derivation construction: L: Left, R: Right
Derivation trees are also known as parse trees or concrete syntax trees (CST).
Such trees can be formed either top-down or bottom-up. The root of this tree is the start symbol. Child nodes are the nonterminals in the grammar as chosen for a given input sentence. Every nonterminal node and its immediate children consist of a sentential form for that part of the sentence. Leaf nodes are exactly the terminals encountered in the same order of the input.
Starting from the starting nonterminal, derive each production’s left-most nonterminal. Working from left to right.
Starting from the leaf nodes in the tree, build up higher and higher nonterminals in the tree working from right to left in some production.
Parsing is the act of scanning the input a single lexical token at a time and constructing a derivation tree based on the grammar and its type: LL or LR.
State machines can be used to construct a parser for a given grammar.
A grammar is said to be of a Chomsky type if, and only if, there is a matching type of state machine for that Chomsky type.
Regular grammars are equivalent to DFAs.
A language is regular if, and only if, a deterministic finite automaton can be constructed to recognize all, and only all, legal sentences in that language. Conversely, if a DFA exists that can recognize legal strings in a regular language, then a regular grammar can be constructed for it.
It is easy to translate a regular expression to a NFA.
Any NFA for a regular language can be mechanically converted into a DFA that recognizes the exact same language.
CFGs are equivalent to pushdown automata (PDA).
A language is context free if, and only if, a PDA can be constructed to recognize any legal string in that language. Conversely if a PDA exists for a given language recognizer, a CFG can be constructed for it.
Nondeterministic PDAs have more power than deterministic PDAs
Unlike their regular language NFA cousins which are exactly as powerful as their DFA counterparts.
PDAs are essentially DFAs but with one additional appendage: The stack.
The stack turns the state machine into a O(M) linear space complexity.
In practice, the stack is either the call stack for recursive descent, or a state stack. and data stack in the case of shift/reduce parsers.
Nondeterminism is a super power that is granted to deterministic automata. It conveys the ability to choose between multiple state transitions given the same input.
The effect of the above is that whenever a state transition occurs, the result is multiple computations are spawned simultaneously.
!vn
This is similar to the Unix ‘fork’ system call. Recall that two values will be returned from ‘fork’. 0 in the case of the child process, and the process id in the parent.
NFAs return one of the alternate states from the delta function in each spawned thread.
Computationally complex, especially in the case of a single threaded system. It would require backtracking and thus no longer linear time complexity.
Nondeterminism can be thought of as computationally expensive. Consequently, when constructing the state table, it is prudent to look ahead in the grammar for a possible (eventually derived) terminal symbol that can be used to constrain any useless nondeterminism.
Therefore, most LR(1) can be simplified to be deterministic and not nondeterministic with the ahead of time compile step creating the better state table.
Although it appears as a column in the delta lookup table, epsilon is actually not a terminal symbol from the input. If the state row (from Q) has an entry in the Epsilon column, it will always take that branch. IOW: A spawned FSM will be launched along the new state path.
In the following grammar, we remove the simple terminal symbol: a, and replace it with: Eps, or Epsilon.
<S> ::= "(" <S> ")"
| Eps
The above grammar now recognizes :
Our balanced parens CFG showed how to use a stack to check for syntax correctness. Applying the same strategy, we can push symbols from the input onto the stack until we reach the central point. Then we reverse the action, popping the same symbols off the stack and comparing them to the input. If they all match, and the stack is empty when the input is exhausted, it is a palindrome.