CP502: Rules for languages.

6.4. Rules for languages.

If we are designing a language that is to be efficiently parsed then we must have no ambiguity in the syntax. In syntax analysis we attempt to reconstruct the generating steps and hence the states of the sentence, by reading the sentence symbol by symbol. We do not want to have to backtrack.


We want to be able to know exactly where we are in state or

syntax graph by only looking one symbol ahead.

This is called one symbol lookahead with no backtracking.

Java is LL1 (or nearly so) as opposed to LR1. LL1 means Left to Right scanning of symbols, 1 character look ahead and using the left derivation of the tree when analysing the sentence.

Our requirement for efficiency leads to two rules.


(1)  Given A::=  ₁|₂|₃|......_n

     then initial symbol of each set of sequences must be unique.


     i.e. first symbol of _i  !=  first symbol of _j  for all i  !=  j


      S::=A|B                  How do we reconstruct

      A::=xA|y                         xxz?

      B::=xB|z

If we choose A -> xA -> xxA we are now at a position in which it is not possible to go forward. We have to backtrack which is inefficient.


     Rule one says      A::= xA|y       is not allowed

                        B::= xB|z

(2)  For every symbol which can generate an empty sequence,

     often denoted by  the set of its first symbols must be

     disjoint from set of symbols that may follow any sequence

     generated by A.


     This saves us from parsing a sentence into a dead end

     by choosing wrong path.

These rules, though rather strict, lead to very efficient parsing of the resulting language. It is possible to define an entire [computer] language in this manner and hence we have the syntax of the language as a set of rules that can be translated into program code which parses the language.