MENTAL
 Main Menu
 Applications
 Computer Science
 Generalized Mark-up


Generalized mark-up
 GENERALIZED
MARL-UP

"A document's marking labels describe its structure" (Charles Goldfarb).



The SGML Language

SGML stands for "Standard Generalized Markup Language". The concept of "markup" or "markup" refers to the assignment of attributes to different parts of a document. An attribute is a "name-value" pair, where "name" is the name of the attribute and "value" is the value associated with the attribute.

SGML is a generic language that was born to describe the logical structure of the content of documents, but its vocation and projection is universal, beyond the world of documentation. Indeed, although current applications of this standard have focused on a single way of structuring information (document content), the language is generic, making it potentially applicable to other information structures as well, including databases. SGML has now been freed from document-oriented concepts, so that it is applicable to the description of any information structure, conceptually and independently of the application or tool used.

SGML is really a metalanguage, since it is a language for defining specific markup languages. A specific markup language has its specific vocabulary and a syntax that defines the relationships between its elements.


The logical structure of a document

The description of the logical structure of a document is based on the identification of the types of component elements (chapters, sections, paragraphs, lists, etc.), their attributes, as well as the relationships between these elements. This description is carried out completely independently of the different processes that can be carried out with the document: It is what is called "Open Information Management" (OIM), that is, making information available for all types of applications.

The identification of the different elements of a document is done by marking them with tags. These tags have the following characteristics:
Generalized marking vs. specific marking

Specific markup consists of inserting, within the text of a document, controls related to a specific process. For example, in the case of a document formatting process (text presentation): Specific marking, also called procedural, has the following disadvantages: On the other hand, with generalized marking, it is achieved:
Definition of document types

In SGML, the definition of a document type is done by means of a DTD (Document Type Definition, which specifies the structural constraints imposed on documents of a certain type. There are DTDs defined for military applications, in the aeronautical industry, in large corporations, etc.

The language of DTDs is realized by a variant of the regular expression notation [see Applications - Linguistics - Formal Grammars and Regular Expressions], in which the following symbols are used:

SymbolMeaning
&Union of elements of a set (elements in any order)
,Separation of elements of a sequence
(  )Grouping elements together
|Alternative elements
?Optional element
*Repetition of an element zero or more times
+Repeating an element one or more times


Limitations of SGML

It has been said that SGML walks between humanism and science. Indeed, the document encoding system is very intuitive and easy to understand, all because it is based on the concept of attribute, which has a high semantic level. But SGML suffers from many limitations:
The XML Language

XML (eXtended Markup Language, Extended Markup Language) is a subset of SGML, a simplified SGML developed in 1998 by the W3C (WWW Consortium), for use on the Internet and in all types of applications in general.

SGML is more powerful and flexible than XML. In SGML it is even possible to change the syntax of the angle brackets. But SGML is more complex and more difficult to implement than XML. SGML is currently being replaced by XML, because it is simpler, because it integrates better with the current Web and because it is one of the technologies chosen for the future Semantic Web [see Applications - Computing - Semantic Web].


Ejemplo

A hierarchical structure of information specified using XML is, for example the following:

<Employee Id/2126>
   <PersonalData>
      <Name>José Pons</Name>
      <YearNac>1936</YearNac>
      <PlaceNac>Cartagena</PlaceNac>
   </PersonalData>
   <AcademicData>
      <Title>Physical</Degree>
      <YearTitle>1960</YearTitle>
   </AcademicData>
</Employee>


Limitations of XML
XML Schema (XMLS)

XML Schema is a language oriented to define XML document types or structures, that is, syntactically valid documents. It has the advantage that its syntax is also XML (as opposed to DTDs, which have no SGML notation).

Limitations:
The universality of XML

XML is currently being applied for the specification of all kinds of information structures. This is at least a debatable line, since claiming to use XML "for everything" leads to an inconsistency similar to claiming in OOP (Object Oriented Programming) that "everything is an object". For example, In the first case a semantic error (of interpretation) is committed and in the second case an error of representation, by adopting unnecessary complexity.

It should be borne in mind that it is first the semantics and then the syntax, which should be as simple, readable and adequate as possible and evoke the associated semantics.


MENTAL as a Generalized Markup Language

MENTAL provides a complete language for the generalized markup philosophy, making it unnecessary to use a special language for this field. It can be applied to the specification of all types of information structures, overcoming the limitations of SGML (and its simplified version XML).

Markup with MENTAL is mainly done with the primitive "/" (particularization):
Example

In MENTAL, the above example would be specified as follows:

(Empleado/2126
   PersonalData/
      (Name/"José Pons"
      YearNac/1936
      PlaceNac/Cartagena)
   Academic Data/
      (Title/Physical
      AñoTítulo/1959)
)


This coding has been done in a hierarchical fashion for readability, but obviously they can be done in a fully linear fashion.

In this case, attribute names are not repeated as in XML.


Specification of types of markup structures

This is done by means of parameterized generic expressions. For example,

⟨Employee(id name yearnac placenac title yeartitle) = (Empleado/id
   PersonalData/
      (Name/name
     YearNac/yearnac/
     PlaceNac/placenac)
   Academic Data/
      (Title/title
      YearTitle/yeartitle)
)⟩


Usage, for example: If you want to restrict the values associated with the parameters, you would have to include conditions. For example, length, type (numeric, alphanumeric, date, etc.), ranges of values, etc.


Use of XML syntax

We can, if we wish, use XML syntax. To do so, we can use the following definition: As the angle brackets and the slash have meaning in MENTAL, they should be differentiated by, for example, another color. Potential substitution has been used, so representation is being indicated. For example, the HTML statement represents in MENTAL the expression
Advantages of MENTAL as a generalized markup language

Addenda

Origin of SGML

Charles F. Goldfarb (along with Edward Mosher and Raymond Lorie), invented GML (which are also the initials of their last names), the precursor of SGML, in 1969, at IBM, inventing the concept of "markup" as a means of structuring and sharing the content of a document between different applications. In 1974 SGML was born as an evolution of GML, although it took more than a decade before it was fully developed and standardized. SGML has been an international standard since 1986 (ISO 8879).

The standard does not define tags, although a basic set appears as an annex to ISO 8879, in which examples of application of the language appear. Today, SGML, although a widely accepted and widespread international language for information exchange, has been replaced in practice by XML, as it is simpler.

Some developments in SGML are: A complete description of SGML can be found in [Goldfarb, 1991], written by the main inspirer of this language. A more practical approach is provided by [Herwijnen, 1994]. [Wright, 1992] explains SGML as a technique for releasing information.


Bibliography