"The man who speaks with primordial images speaks with a thousand tongues" (Jung).
"The study of grammar, in my opinion, is capable of throwing much more light on philosophical questions than is commonly supposed by philosophers."(Bertrand Russell)
Syntactic Grammar
A syntactic or formal grammar −also called a "generative grammar"− is a metalanguage that specifies or describes the syntax (the form) of the different sentences of a language. In the case of a programming language, the grammar describes the structure of the passive elements (data) and the active ones (processes), which constitutes a metalinguistic unification of these two types of elements in the syntactic aspect.
A formal grammar consists of 4 elements:
A set T of terminal elements.
A set N of non-terminal elements.
A set P of productions (or production rules) of type α ::= β. The left part α is a non-terminal element. The symbol "::=" means "is defined as". The right part β is a sequence of terminal and non-terminal elements. The length of the sequence is the number of elements it consists of. There is the empty sequence, which has zero length.
A non-terminal element S designated as the initial non-terminal element.
The language associated with a grammar is the set of finite sequences generated by that grammar from the initial nonterminal element. Hence the name "generative grammar".
Two grammars are equivalent if they generate or describe the same language. The language generated by a grammar can be finite or infinite.
The BNF notation
Invented by John Backus [1959], it was initially called "Backus Normal Form". This notation was adopted to describe the Algol 60 programming language. Peter Naur's [1963] contribution in editing the report on Algol 60 led to the notation BNF becoming known as "Backus Naur Form" [Knuth, 1964].
In the original BNF notation, only two information structuring mechanisms were contemplated:
Composition.
Corresponds to the sequence of elements (terminal or not). No specific symbol is used.
Alternative.
Represents a choice between different sequences. It is expressed by the symbol "|".
This notation was clearly insufficient and the EBNF (Extended BNF) appeared, of which there are many variants. The aim was to reduce the number of production rules and increase readability. The most widespread one additionally contemplates the mechanisms:
Grouping of alternatives.
It is expressed by parentheses.
Repetition.
Of an element zero or more times (*) or one or more times (+).
Optional
Indicates that it may or may not appear. It is expressed by square brackets: [ ].
Generative grammar is really a theory of syntax, since semantics is not considered. This is why, despite having pushed forward the study of grammars and the formalization of syntax, it has been criticized for not providing a semantic model for languages.
The repetition mechanism is very limited, since it is not possible to indicate a concrete number of repetitions.
Only context-free properties are referred to. Therefore, neither structural nor functional constraints, e.g. conditions, attributes, etc., can be specified.
The generic nature of the EBNF notation
The EBNF notation is really a meta-metalanguage, that is, it is a language for defining metalanguages of concrete languages.
EBNF, as a language itself, is a high-level language, in which several authors have believed to see a powerful formalism, generic in nature, that can, by itself, be used as a programming language, as a formal specification language, as the foundation of a science or theory of programming languages and of programming itself [Harrison, 1965] [Gries, 1981] [Schmidt, 1982].
Semantic Grammar
Just as traditional syntactic grammars deal with abstract categories (which are usually related to the structure of language itself), semantic grammars are structured and organized on the basis of lexical categories, that is, on concepts, categories or concrete semantic properties related to a domain, such as, for example, time, place, color, objects, etc.
The main features of semantic grammars are:
The sentences of a semantic grammar specify the allowed relations between concepts in an application domain, beyond their merely formal description (covered by a traditional syntactic grammar).
Rules are normally used to relate concepts. These are "semantic rules", rather than "syntactic rules" (in the form of productions), although there is some similarity between them.
While in a syntactic grammar an analysis is performed exclusively oriented to verify the correct syntax of a sentence, in a semantic grammar a semantic analysis is additionally performed, which provides the structure that relates the concepts or categories found. This structure, in its most general case, is a "semantic network", that is, a network of interrelated concepts.
For the interpreter performing the analysis, the most important thing is the semantic analysis, since he always tries to "understand" the meaning, which is the important thing, and therefore allows himself a certain syntactic flexibility.
Semantic grammars are relatively easy to develop for limited domains, when there are few concepts involved. But they are difficult to extend to cover new domains. To overcome this, "modular semantic grammars" are used in which:
There is a semantic grammar in each domain, organized in the form of rules.
There is a set of rules shared by all domains to relate common concepts.
Semantic grammars have their main application in the analysis of natural language, with the aim of discovering, identifying and relating the underlying concepts that are hidden in sentences to aid their subsequent understanding or interpretation. In fact, they were developed with the aim of adding semantics to traditional grammars in the analysis of natural language subdomains [Burton, 1976].
However, the boundary between the two types of grammars (syntactic and semantic) is somewhat blurred. For example, the formal grammars of programming languages can also be considered semantic, since in addition to a syntactic analysis, a structure is generated that relates the entities or categories contemplated by the language.
MENTAL as Syntactic-Semantic Grammar
MENTAL allows to express the EBNF notation, so it has at least its same expressive power. Therefore, it serves to define syntactic grammars.
Mechanism
EBNF
MENTAL
Definition
::=
=: (potential substitution)
Composition of x, y, z, ...
xyz...
(x y z ...)
Alternative between x, y, z, ...
x|y|z|...
∈{x y z ...} (membership in a set)
Repetition of x 0 or more times
x*
x★[0…]
Repeating x 1 or more times
x+
x★[1…]
Empty sequence
no symbol
θ (null expression)
x optional
[x]
∈{x θ}
MENTAL is a universal semantic grammar, since it allows to relate, without restrictions, concepts present in all domains.
From the point of view of MENTAL, a formal grammar G is an extensive or intensive expression that describes all the sentences of a language. Language and grammar are confused. In the case of an infinite language, the expression describing the language is recursive.
Examples
We have a robot that moves in only one direction: forward (a) or backward (b). A V trip of the robot is a sequence of movements that end at the starting point. In the middle of the trip another trip can be started.
The language is defined by the following recursive expression:
( V =: {ab ba a∪V∪b b b∪V∪a} )
A simpler form is:
( V =: {θ a∪V∪b b b∪V∪a} )
where θ is the null trip, i.e. no displacement.
Examples of trips: aabb, baba, abab, ababaabb, etc.
Every trip has a length (the length of the sequence). The minimum trip has length 2.
A trip in all four directions.
This is a generalization of the previous example. The 4 directions are: up (a), down (b), left (i) and right (d).
The language is defined by the following recursive expression:
( V =: {θ a∪V∪b b b∪V∪a i∪V∪d d∪V∪i} )
Travel examples: aidb, aaabbb, aiiidddb, iiaabbdd, etc.
We could also have used another notation reflecting the opposite paths:
( (a' =: b) (i' =: d) )
Binary numbers.
The language is also expressed recursively:
( B =: {0 1 B∪0 B∪1} )
Examples of binary numbers: 0, 1, 01, 10, 11, 011, 111000, etc.
Every binary number has a length (the length of the sequence of digits). The minimum binary number has length 1.
In all three examples the languages are infinite.
Parameterized language
The definition of a language can be parameterized. In the example of binary numbers, we can consider, for example:
Numbers of length n:
〈( B(n) =: ([[0 1]★n] )〉
Numbers of length less than or equal to n:
〈( B(n) =: ([[0 1]★[1…n]] )〉
Advantages of MENTAL as a grammar
The distinction between grammar and language is diluted.
The concepts associated with the definition of grammars: "production rule", "terminal element" and "non-terminal element" are not necessary, nor is the "initial non-terminal element" needed.
The formal semantics is integrated with the semantics of use (pragmatics), diluting the distinction between the specification of particular languages and applications.
The distinction between semantic grammar and ontology is blurred. An ontology is a set of classes or concepts and relationships between them to model, ground, describe and represent a domain.
No specific language such as EBNF or regular expressions is needed. You have a universal, complete, formal language with all its expressive power and flexibility of specification. For example, you can specify the number of concrete repetitions, you can define ranges of values, use distributions, parametric expressions, conditions, attributes, etc.
There is a biunivocal relationship between syntax and semantics. And it is not necessary to distinguish between syntactic grammar and semantic grammar. They are two aspects of one and the same thing.
The orientation is syntactic-semantic. Mechanisms are provided to define "how they are", to develop the elements of language, both at the semantic and syntactic level.
In a syntactic grammar, mechanisms are provided to define "how they can be" the elements, but only at the syntactic level.
The basic construct of a syntactic grammar is the production rule of the metalanguage used. With MENTAL we can use the power of the language primitives.
Higher order grammars (and their corresponding languages), interrelated grammars and languages, etc. can be specified.
The syntax is flexible, as it can be dynamically modified, even the one initially defined by the primitives through the mechanism of derived operations.
Each syntactic construction or compound expression has a perfectly defined semantics associated with it. This is the so-called "Compositional Semantics": the meaning of a compound expression is derived from the meanings of the component expressions.
There is constructional freedom. A grammar imposes certain combinatorial restrictions. In MENTAL, the semantic grammar is free, all possible combinations are allowed, without restrictions. MENTAL may impose restrictions on the definition of languages, but it is itself unconstrained. And so it has to be, since in MENTAL there are all the possibilities of universal semantic primitives.
MENTAL can specify any form of syntactic-semantic grammar, beyond traditional grammars.
Language is open (to new syntactic-semantic constructions). This justifies not defining a classical (syntactic) grammar for MENTAL because defining it would imply limiting it.
In a syntactic grammar, the language is syntactically closed.
All languages are united by a common semantics, because the same syntactic-semantic grammar is always used.
In a syntactic grammar all languages are disconnected and separate, as each uses a different syntactic grammar.
The compiler or interpreter is universal. It is valid for all languages.
In a syntactic grammar, a specific interpreter or compiler must be developed for each language. In a syntactic grammar there is a parsing process.
The syntax and semantics of all languages are standardized, since they are based on the same semantic primitives.
With a syntactic grammar, the syntax of all languages is standardized, as long as the same syntactic metalanguage is used.
MENTAL as language and metalanguage
MENTAL is a language and a metalanguage:
It is a language for developing applications using universal semantic primitives.
It is a metalanguage, since it allows to describe specific languages by means of grammars.
It is a meta-metalanguage, since it is a metalanguage that allows to specify metalanguages that describe specific languages.
It is a meta-meta-metalinglanguage, and so on. There are infinite levels.
It is a universal language, but it is also a universal metalanguage, since it allows defining particular languages (and even other metalanguages). MENTAL is not a metalanguage of itself, but it can express the metalanguage of other languages. MENTAL is the foundation of all languages.
Addenda
Origin of formal grammars
The concept of grammar comes from the field of linguistics. The concept of formal grammar has its origin in modern linguistics, whose main representative is Noam Chomsky [1956], who developed a symbolic language of descriptive type that allowed to explain the unlimited expressive power of natural languages by relying on recursive mechanisms.
In computer science, the idea of formal grammar acquired great importance for the definition of programming languages and all kinds of languages in general.
Classification of grammars and languages
Grammars are classified into:
Type 0 or unrestricted.
There are no restrictions on α ::= β rules, except that α cannot be the empty string or sequence.
Type 1 or context-sensitive.
The rules are of the form α11Aα2. That is, A can be replaced by β if it is between α1 and α2 (which are the context).
A is a non-terminal element.
α1, α2 and β are sequences of elements (terminal or not).
α1 and α2 can be empty strings.
Type 2 or context-free (Chomsky grammars).
It is a particular case of type 1 when α1 and α2 are empty strings.
The rules are therefore of the form A ::= β
Type 3 or regular (Kleene's grammars).
It is a particular case of type 2.
The rules are of the form A ::= aB or A ::= a.
A and B are non-terminal elements and a is a terminal element.
The hierarchy of grammars is manifested in the following:
Every regular grammar is context-free. Every context-free grammar is context-sensitive. Every context-sensitive grammar is unconstrained.
Languages of type n are languages generated by a grammar of type n.
Bibliography
Aho, Alfred V.; Sethi, Ravi y Ullman, Jeffrey D. Compiladores. Principios, Técnicas y herramientas. Addison-Wesley Iberoamericana, 1990.
Backus, John. The syntax and semantics of the proposed international algebraic language of the Zurich ACM-GAMM Conference. International Conference on Information Processing, Junio 1959, UNESCO, París. pp. 125-132.
Burton, Richard R. Semantic Grammar: An Engineering Technique for Constructing Natural Language Understanding Systems. Report 3453, Bolt, Beranek and Newman Inc., December 1976.
Chomsky, Noam. Three models for the description of language. IEEE Trans. on Information Theory, vol. 2, no. 3, pp. 113-124, 1956.
Gries, David. The Science of Computer Programming. Springer-Verlag, 1981.
Harrison, M.A. Introduction to switching and automata theory. McGraw-Hill, New York, 1965.
Harrison, M.A. Introduction to formal languages theory. Addison-Wesley, Reading, Mass., 1978.
Knuth, Donald Ervin. Backus Normal Form vs. Backus Naur Form. Communications of the ACM, vol. 7, no. 7, Dec. 1964, pp. 735-736.
Meyer, Bertrand. Introduction to the Theory of Programming Languages. Prentice Hall International Series in Computer Science, C.A.R. Hoare Series Editor, 1990.
Naur, Peter. Revised Report on the Algorithmic Language Algol 60. Comm. ACM. vol. 6, no. 1, pp. 1-17, 1963.
Schmidt, David A. Denotacional semantics as a programming languaje. University of Edimburgh, Dept. of Computer Science, Reino Unido, 1982.