MENTAL
 Main Menu
 Applications
 Computer Science
 Relational Programming


Relational Programming
 RELATIONAL
PROGRAMMING

"The most important motivation for the research work that culminated in the relational model was the goal of drawing a clear and sharp boundary between the logical and physical between the logical and physical aspects of database management. database management" (Edgar Frank Codd).



The Relational Paradigm

The concept of relationship

The relational model, developed by Edgar Frank Codd [1970], is based on elements called "relations". A relation is a two-dimensional table. The horizontal dimension corresponds to different entities. The vertical dimension corresponds to a set of attributes of those entities. For example, the relation Employees, which corresponds to a set of employees of a company expressed as a table, where the rows are the employees, and the columns are attributes of those employees:

EmployeeNameSurname1Surname2City
E001JuanPérezGómezMadrid
E002LuisFernándezPinBarcelona
E003PedroRuizLlamasValencia
E004RamónRuizRodríguezAlicante

Formally, a relation R defined over the attributes A1, ... , Am of a certain class of entities is a subset of the Cartesian product of the domains of those attributes. where Dom(Ai) is the set of possible values of attribute Ai.

This relationship can be represented as the generic two-dimensional table:

EntityAttributes
A1...Am
1V11...V1m
............
nVn1...Vnm

being: A relation is thus a set of tuples.

The order of the tuples in a relation is irrelevant, since each row corresponds to a different entity. The order of the columns is also irrelevant, since each column has the name of the corresponding attribute associated with it.

In a relationship, a key or primary (or, simply, key) attribute is an attribute whose value is unique for each entity. In the case of the relationship Employees, the key is Employee. The key of a relation could also be a concatenation of attributes (we speak, in this case, of a composite key or superkey).

A set of relationships (tables), interrelated through common attributes, constitutes a relational database (RDB). For example, in addition to the previous table (Employees), there could be another table of Cities with different attributes (Population, etc.). Both tables would be implicitly linked through the common attribute City. For example,

CityPopulation
(millions)
Madrid5
Barcelona3
Valencia2
Sevilla1

Note that Alicante does not appear in this table (which did appear in the Employees table), but Seville appears.

An attribute of a relation that is not a key in one relation can be a key in another. For example, the attribute City is not key to the relationship Employees, but it is key to the relationship Cities. It is said that City is a foreign key (foreign) in Employees.

A value Null (null) means that the value is unknown or does not exist. All operations involving Null are false by definition.

A value is atomic if it cannot be subdivided into smaller values. For example, a person's age in years is an atomic value. The concept of atomicity is relative, since a value can be considered atomic or composite, depending on the purpose.


Relational algebra

The relational model is based on a set of basic operations, which constitute the so-called "relational algebra". This is a set of operations that act on relations and produce a relation as a result. The basic operations are 5:
  1. Union of two relations R1 and R2.

    R1R2 = set of tuples belonging to R1, to R, or to both.

    The relationships R1 and R2 must be compatible, i.e., they must be defined on the same set of attributes.

  2. Difference of two relations R1 and R2.

    R1R2 = set of tuples of R1 that do not belong to R2.

    As above, the relationships must be compatible.

  3. Cartesian product of two relations R1 (of degree m1) and R2 (of degree m2).

    R1×R2 = set of all possible tuples in which the m1 first elements constitute a tuple of R1 and the m2 last ones a tuple of R2. That is, each of the rows of R1 are concatenated with each of the rows of R2. The new relation is of degree m1+m2 and of cardinality m1×m2.

    This definition is a variant of the classical Cartesian product.

  4. Projection of a relation R.

    It is a selection of table columns.

    If A is the set of attributes on which the relationship R is defined and BA is a subset of A (a selection of columns), Π B(R) is another relation with the same tuples as R, but considering only the attributes specified in B and eliminating duplicate tuples.

    For example,
    Π{FirstName, LastName1}(Employees)
    produces a new relation with only the columns corresponding to the specified attributes (FirstName and LastName1) and eliminating duplicate tuples.

    NameSurname1
    JuanPérez
    LuisFernández
    PedroRuiz
    RamónRuiz

  5. Selection of a R relationship.

    Is another relation in which the tuples of R that meet a selection criterion F are selected. The notation is σF(R).

    The selection criterion F is defined by a formula constructed with operands (attribute names or constants), comparison operators (>, <, etc.) and classical logical operators (or, and, not: ∨, ∧, ¬).

    For example,
    σLastName1=Ruiz(Employees)
    produces a two-row relationship:

    EmployeeNameApellido1Surname2City
    E003PedroRuizLlamasValencia
    E004RamónRuizRodríguezAlicante

    Another example would be σ(LastName1=Ruiz ∧ City=Barcelona)(Employees) which would produce the empty relation (the empty set).
This set of basic operations is shown to be complete [Codd, 1972], and these operations can be combined to construct derived operations.


Derived operations

In order to facilitate the specification of relational algebra expressions, 7 additional derived operations are defined [Korth & Silberschatz, 1993], (which together with the 5 basic ones, give a total of 12 operations):
  1. Intersection of two relations R1 and R2.

    The relationships R1 and R2 must be compatible. The intersection is equivalent to the operation defined in set theory. It is defined from the basic operation difference:

      R1R2 = R1 − (R1R2)

  2. natural join (natural join) of two relations R1 and R1, symbolized by R1 |×| R2).

    It is a column-level concatenation based on equal values in common columns. It has the form σF(R1×R2) where F is a formula indicating those equal values in common columns.

    For example, the natural union of Employees |×| Cities produces the following table:

    EmployeeNameSurname1Surname2CityPopulation
    E001JuanPérezGómezMadrid5
    E002LuisFernándezPinBarcelona3
    E003PedroRuizLlamasValencia2

    Note that employee E004 does not appear because Alicante is not in Cities.

  3. Outer Join (Outer Join).

    It is an extension of the natural join that avoids loss of information. Non-existing values are assigned the value Null (null). There are three types:

    1. R1 ]×| R2 (left outer bond).
      The rows of the first operand are taken into account.

    2. R1 |×[ R2 (right outer join).
      The rows of the second operand are taken into account.

    3. R1 ]×[ R2 (outer join complete).
      The rows of both operands are taken into account.

    In the case of Employees and Cities, we have:

    1. Employees ]×| Cities:

      EmployeeFirstNameLastName1LastName2CityPopulation
      millions)
      E001JuanPérezGómezMadrid5
      E002LuisFernándezPinBarcelona3
      E003PedroRuizLlamasValencia2
      E004RamónRuizRodríguezAlicanteNull

    2. Employees |×[ Cities:

      EmployeeFirstNameSurname1Surname2CityPopulation
      millions)
      E001JuanPérezGómezMadrid5
      E002LuisFernándezPinBarcelona3
      E003PedroRuizLlamasValencia2
      NullNullNullNullSevilla1

    3. Employees ]×[ Cities:

      EmployeeFirstNameLastName1LastName2CityPopulation
      millions)
      E001JuanPérezGómezMadrid5
      E002LuisFernándezPinBarcelona3
      E003PedroRuizLlamasValencia2
      E004RamónRuizRodríguezAlicanteNull
      NullNullNullNullSevilla1

  4. Division of two relations R1 and R2.

    To perform the operation, two conditions must be met:

    1. grado(R1) > grado(R2).

    2. The set of attributes C of R2, A(R2), must be included in the attributes of R 1, A(R1), es decir, A(R2) ⊂ A(R1).

    R1÷R2 is defined from the operations of Cartesian product, difference and projection. Its formal definition is:

      X1 = ΠC(R1)
      X2 = ΠC(R2×X1R1)
      R1÷R2 = X1X2

    R1÷R2 defines a new relation on the subset A(< i>R1)−A(R2) de atributos de R1. It contains the values of these attributes that in the tuples of R1 are combined with each of the tuples of R2.

    For example, Employees÷Cities produces a relation in which the column City does not appear, nor does the employee E004 appear because Alicante does not appear in the Cities table.

    EmployeeFirst nameSurname1Surname2
    E001JuanPérezGómez
    E002LuisFernándezPin
    E003PedroRuizLlamas

  5. Assignment.

    A relational expression of relational algebra is given a name:

      name ← expression

    The name is used in subsequent relational expressions. Por ejemplo, T ← ΠB(R1R2)

  6. Rename.

    It is expressed by ρX(A1,.... ,An)(E) and indicates to rename the relational expression < i>E as X, and rename also the attributes as A1, . .., An.

  7. Update.

    Allows to update a value of a tuple of a relation R: σF(R) ← E, where E is a relational expression.

    With these additional operations we have a true procedural language. For example, to add and remove a relational expression E to the relation R, it is done respectively by means of.

      RRE     RRE

The SQL language

SQL (Structured Query Language) is the standard language for accessing relational databases. It is a fourth-generation high-level language (4GL), declarative (not procedural), simple to use, based on relational algebra.

SQL also has four basic operations: INSERT, SELECT, UPDATE and DELETE, to insert, select, update and delete rows, respectively. The form of the query statement is. The result is a subtable, in which rows and columns have been selected.

SQL can be interactive or non-interactive. Interactive SQL is suitable for defining the structure of a database, performing queries and prototyping. Non-interactive SQL is SQL embedded in a programming language, and can be static or dynamic. In dynamic SQL, variables can be used.

In our example of Employees: The result is:

EmployeeCity
E003Valencia
E004Alicante

Multidimensional databases

Relationships (tables) are two-dimensional, since they relate two dimensions: rows (entities) with columns (attributes of those entities). But relationships of more than two dimensions can exist. For example, the sales of products manufactured by a company, with the attributes of: product, area and year of sale. In this case we do not have a two-dimensional table, but a three-dimensional table (or cube), being the dimensions: Product, Zone and Year.

Each dimension has an identifying key. In the case of the Employees (2D) table, the identifying keys are Employee and Attribute. In the case of the product sales table (3D), the keys are Product, Zone and Year. If we consider the quarter number of the year (a value between 1 and 4), we would have 4 dimensions (a hypercube): Product, Zone, Year and Quarter.


Triggers

These are procedures that are executed when a previously specified event occurs. The events are the operations of inserting (INSERT), deleting (DELETE) and updating (UPDATE) rows. A trigger can be executed before or after the data is modified.

The triggers are used for:
OLAP (Online Analytical Processing)

It is a technique of real-time analysis of a database to search for or discover hidden trends or patterns. It is called "data mining". It can involve millions of records with several gigabytes of occupancy. In general, the operations are read-only.

The system allows selective scanning of data and responds to queries quickly enough. It allows users to easily extract data selectively and view it from different viewpoints. To facilitate this type of analysis, OLAP data is stored in a high-performance multidimensional database. OLAP is used in the field called "Business Intelligence", which aims to streamline the querying of large enterprise databases to facilitate decision making.

Since OLAP applications use multidimensional databases, they are also called MOLAP (Multidimensional OLAP). This term is used to differentiate it from other types of OLAP such as ROLAP (Relational OLAP) and HOLAP (Hybrid OLAP, which combines MOLAP and ROLAP).

In ROLAP, all the cube information, its data, its sums, etc., are stored in a relational database.


Normal forms

Normal forms are organizational structures for the data in a database. Normalization pursues several purposes: 1) to eliminate redundant data; 2) to restrict impermissible operations; 3) to ensure consistency so that logical dependencies between data make sense.
Limitations of the relational model

The restriction of the universe of possible elements to (two-dimensional) tables is a major conceptual limitation a priori. Moreover, the term "relation" is not very appropriate, since it is too generic to be applied to tables, which are specific data structures.

And as tables there are also limitations: With respect to SQL, the standard relational database access language, there are several major drawbacks:
Specification in MENTAL

Relationship as a set of sets

A relationship R can be specified as a set of rows, where each row is itself a set of attributes. For example, the table Employees can be specified like this:

( R = {
{E001/Employee Juan/First name Perez/Surname1 Gómez/Surname2 Madrid/City}

{E002/Employee Luis/Nickname Fernandez/Surname1 Pin/Surname2 Barcelona/City}

{E003/Employee Pedro/Nickname Ruiz/Surname1 Llamas/Surname2 Valencia/City}

{E004/Employee Ramón/Name Ruiz/Surname1 Rodríguez/Surname2 Alicante/City}
} )


In general,

( R = { {v11/A1 ... v1m/Am} ... {vn1/A1 ... vnm/Am} } )

where vij is the value of order entity i and order attribute j, and Aj the name of order attribute j.

The cardinality of R is R# = n (number of rows), and its degree is A# = m, where A is the set of attribute names: (A = {A1...Am}).


Ratio as an extensive function

A R (two-dimensional table) relationship can be considered a set of basic or elementary relationships consisting of 3 elements: the key c (the identifier of an entity), an attribute a of that entity, and a value v of that attribute. This relationship is expressed as an extensive function of two parameters (c and a) and result v: For example, in the relation Employees, the first tuple can be coded as a set like this: To avoid repeating the attribute names in each tuple, we can specify : In general, { [( R(c A) = [⌊attribute values⌋ )] where c is the key of the corresponding entity, ( A = [A1...Am] ) and A1...Am are the names of the attributes.


The five basic operations of the relational model
  1. Union of two relations R1 and R2: R1∪R2 (union of sets).

    R1 and R2 must be defined as sets of sets.

  2. Difference of two relations R1 and R2: (R1 ∪' R2) (difference between sets).

    R1 and R2 must also be defined as sets of sets.

  3. Cartesian product of the relations R1 and R2.

    ⟨( R1×R2 = {[ [R1↓]∪[R2↓] ]} )⟩

    R1 and R2 must also be defined as sets of sets.

    The resulting degree is (A1# + A2#). And its cardinality is (R1#)*(R2#).

    A1 and A2 are the attribute sets of R1 and R2, respectively.

  4. Projection of the relation R with respect to the subset of attributes B⊂A, where A is the set of attribute names of R.

    If we define the relation R as an extensive function, the expression is:

    ⟨( Proy(R c B) = {⟨([R(c [B↓]])]] )⟩ or.

    ⟨( Proy(R c B) = {⟨(R(c x) ← xB))⟩} )⟩

    Example:

      (R = Employees)

      (A = {Employee First Name Last Name1 Last Name2 City})

      (B = {City})

      Proy(Employees B) // ev. { (Madrid Barcelona Valencia Valencia Sevilla) }

      (B = {FirstName LastName1})

      proj(Employees B) // ev. { (Juan Luis Pedro Ramón) (Perez Fernandez Ruiz Ruiz) }

  5. Selection of tuples of R by a selection criterion s.

    The relation R must be defined as a set of sets.

    ⟨( Selec(R s) = Rs )⟩

    For example,

      Select(Employees E001/Employee) // ev. {E001/Employee John/FirstName Perez/LastName1 Gomez/LastName2}

      Select(Employee Ruiz/Surname1) // ev. { {E003/Employee Pedro/FirstName Ruiz/Surname1 Llamas/Surname2 Valencia/City} {E004/Employee Ramón/First name Ruiz/Surname1 Rodríguez/Surname2 Alicante/City} }

The seven derivative operations
  1. Intersection of two relations R1 and R2: R1∩R2 (intersection of sets).

    R1 and R2 must be defined as sets of sets.

  2. Natural union of two relations R1 and R2.

    ⟨( UnionNat(R1 R2) = {[(([R1↓] ∪ [R2↓]) ← (R1∈R2)]} )⟩

    R1 and R2 must be defined as sets of sets.

    Automatically remove duplicate attribute-values.

  3. Outer Join.

    1. ⟨( (R1 ]×| R2) = {[([R1↓ ] ∪ [R2↓]) ← (R1R2) →' θ)]} )⟩

    2. ⟨( (R1 |×[ R2) = (R2 ]×| R1) )⟩

    3. ⟨( (R1 ]×[ R2) = ((R1 ]×| R2) ∪ (R1 ]×[ R2))) )⟩

  4. Division of two relations R1 and R2.

    (C = A(R2)) // set of attributes of R2
    (X1 = Proj(R C))
    (X2 = Proy((R2×X1 − R1) C)))
    (R1¸R2 = (X1 − X2))


  5. Assignment.

    It is performed by a substitution expression, immediate, delayed or initial:

    ( name = expression ) // immediate substitution
    ( name =: expression ) // deferred substitution
    ( name := expression ) // initial substitution


  6. Rename.

    It is equivalent to the previous one.

  7. Update.

    Allows to update a value of a tuple (R(c a) = v).

Multidimensional databases

In MENTAL, the generalization of (two-dimensional) tables to multidimensional relations is very simple, since it is enough to consider functions of more than two arguments. The general expression is where c1 ... cn are the keys associated to each dimension.

For example, in the case of the 3D product sales relation, the relation defined as a function is composed of a set of terms of the form. Abbreviated,
OLAP

MENTAL's flexibility allows to perform database analysis processes in a simple and intuitive way.

Examples for 3D product sales relationship:
  1. Total products sold in Spain of product P001 in 2014:

    +⊣{⟨( V(p z a) ← (p=P001 ∧ z=Spain ∧ a=2014) )&⟩)}

  2. Products sold in Spain in 2014 more than 5 units:

    (⟨ (p ← (V(p z a) > 5)∧(z=Spain)∧(a=2014) ⟩)

  3. Best-selling product in Spain in the years 2000 to 2014 (there may be several products with the same amount of sales):

    ⟨ (Vmax(a) := 0) ⟩ // initial value of the nr. of products sold in the year.

    ⟨( (V(p z a)) ⟨( Vmax(a)) → (Vmax(a) = V(p z a)) ) ⟩)

    (⟨ Pmax(a) = {⟨p ← V(p z a) = Vmax(a) ⟩) // best-selling products in the year.

    (years = [2000...2014])
    [(años Pmax(años))]

Advantages of MENTAL as a Relational Database Management System (RDBMS)

MENTAL is a valid language for expressing relationships (two-dimensional tables) and performing the operations (basic and derived) of the relational model. Moreover, with MENTAL you can overcome the limitations of the relational model by providing generality and flexibility to relax the restrictions imposed by this paradigm: support hierarchical objects, share expressions, interrelate expressions, perform advanced selections (by types, by hierarchical levels, etc.), group attributes, etc.

Addenda

Brief history of the relational model

The relational database model was born in June 1970, when Edgar Frank Codd (while working at IBM) published an article entitled "A Relational Model of Data for Large Shared Data Banks" in the journal Communications of the ACM, which provided a new view of data as two-dimensional tables, accompanied by a solid theoretical foundation: relational algebra.

The relational model is considered one of the great technical achievements of the 20th century. Codd formalized a field that until then was an unstructured collection of ad hoc products and techniques and turned it into a theoretical scientific discipline with practical application. It can be said that all DBMSs are based on Codd's ideas. Codd is considered the father of relational databases.

Subsequently, Codd presented another language based on first-order predicate logic, showing that it was equivalent in expressive power to relational algebra. This second language he called "relational calculus", a non-procedural language (it specifies the "what", not the "how"). In contrast, relational algebra is a procedural language, since the computation of a new relation is done by specifying the operations to be performed and the order in which they must be performed.

Other highlights were:
The generalization of the relational model

There have been several attempts to solve these problems by introducing some generality and flexibility into the relational model, relaxing the restrictions imposed by 1NF. Currently the trend is towards NF2 (not first normal form), in which grouping attributes and values, sharing data or relationships, etc. are allowed.

Here are some of the so-called post-relational models:
Bibliography