"The revolution in data abstraction has affected
the entire spectrum of language design, programming methodology, and the way we think about programs" (Judy Bishop).
Abstract Data Types
Primitive and derived data types
First, we must distinguish between primitive and derived data types:
Primitive data types are those that are closely tied to the machine architecture: integers, Booleans, strings, floating-point, etc. Each data type has a set of allowed values and a set of associated operations. For example, the "integer" type is integers (..., −2, −1, 0, 1, 2, ...) and the operations are: addition, subtraction, product, etc. The "boolean" type has as values F (true) and F (false) and a set of logical operations (conjunction, disjunction, negation, etc.).
Derived data types can be of two kinds:
Data subtypes. These are primitive data types restricted by some property (possible values, length, etc.). For example, even numbers, integers between 0 and 128, character strings containing only zeros and ones, character strings of length greater than 3, 3-digit numbers, etc.
Data supertypes. They are structures of data types or subtypes. Structures can be homogeneous or heterogeneous (or polymorphic). An example of a homogeneous structure is a sequence of integers. An example of a heterogeneous structure is a sequence of elements consisting of an integer, a Boolean, and a string.
Abstract data types
An abstract data type (ADT) consists of two layers or levels:
The inner level. This is the implementer level, which is hidden from the user. At this level is the data structure. The feature of hiding the implementer details of a data object is called "encapsulation".
The outer level. This is the conceptual and usage level, which is the public interface. The interface is constituted by a set of operations defined on the data structure of the inner level.
The public or external interface corresponds to the "what", to the specification. The private or internal part corresponds to the "how", to the implementation.
The ADT has the implementation hidden from the user. But if it is exposed, we speak of a "transparent" data type.
Typical operations on a data structure are: store, retrieve (or access), select, modify, delete, etc. elements (or objects) in the data structure. The set of operations that can be performed with a ADT is closed, i.e., it can only be accessed through the interface, which is fixed. By their very nature, ADTs are dynamic data structures.
A ADT can be defined as a class of objects (or elements) defined by a structure, a set of values, a set of operations, and a set of constraints defined as preconditions and postconditions.
In OOP (Object Oriented Programming), a ADT is a class of objects because the public interface is parameterized. An instance of a class is an object.
All high-level languages have predefined ADTs, but the user can define his own ADTs.
Examples
Data structures that can be implemented as ADTs are: sets, sequences, stacks, queues, vectors, graphs, containers, dictionaries, maps, algebraic structures (groups, rings, lattices, etc.), etc. For example:
A sequence can be implemented as a physical sequence (in contiguous memory locations), as a one-dimensional array, as a linked list, as a doubly linked list, and so on. In a linked list, data is stored in nodes. Each node has a content (the data) and an address that points to the next element in the list. In a doubly linked list, each node has a content and two addresses: one pointing to the next element and one pointing to the previous one.
A set can be implemented as a two-level tree in which the top node represents the set and the lower nodes its elements.
A stack can be defined by three operations: push (insert), pop (extract, removing) and peek> (access without removing the top element of the stack). The protocol is LIFO (last in, first out). In the queues the protocol is FIFO (first in, first out).
Advantages of ADTs
There are two main advantages:
It simplifies programming, since it allows the programmer to operate with the data from a higher level of abstraction, in a conceptual way, without having to know the details of its implementation.
The implementation may change over time (for example, to improve performance or consume fewer resources), but not the interface, so the source code remains unchanged.
MENTAL and Data Abstraction
The history of programming languages is linked to increasing abstraction. A process of abstraction consists of focusing on the essential characteristics of something, disregarding the particular, accessory or contingent. The fundamental problem in the design of software systems is to reduce complexity, and this objective necessarily involves the process of abstraction. The first abstraction that appeared was the procedural abstraction. Subsequently, other abstractions appeared: objects, functions, rules, events, aspects, agents, etc., including ADTs.
MENTAL, supreme abstraction.
It is the culmination of the abstraction process of programming languages. It reduces complexity to the maximum by dealing with concepts of supreme level of abstraction. Universal semantic primitives are the supreme abstractions.
Efficiency in program development is maximized by working with primitive archetypes. Programming has never been easy. Developments are simple, conceptual, straightforward, without worrying about implementation.
MENTAL, a language based on data abstraction.
Indeed, the language constructs only refer to semantics and not to their possible implementations. For example, the set {a b c d} or the sequence (a b c d) are concrete instances of set and sequence, respectively, but without reference to how they will be implemented, there being many possible ways to do so.
MENTAL, a universal polymorphic language.
The components of primitives can be any expressions. For example, a set could be
{x 7*y "abc" 〈x+y〉 〈( f(x y) = (x+yx*y) )〉}
MENTAL, the real revolution.
Judu Bishop [1989] claims that data abstraction has been a real revolution in programming. But the real revolution is that of MENTAL, because it involves the supreme abstraction.
The variable as the simplest ADT
A variable x can be considered a dynamic ADT that has two operations:
Store a value v in the variable x: (x = v).
Access the value of the variable. This is done by simply mentioning x:
x // ev. v
The next level in ADT complexity is a parameterized variable:
( x(y) = z ) // store
x(y) // access
where x, y, z can be any expressions. For example:
( x(a b c) = 〈( f(uv) = (u+vu*v )〉 )
( x(a b c) // ev. 〈( f(uv) = (u+vu*v )〉 )
Container example
A container consists of a set of elements. Each element has a key and associated data. The key and data can be as complex as desired. For example, a key could be a parameterized variable. We are as in the previous case.
( key = data ) // store
key // retrieve: evaluates as data
( container(key) = data ) // store
container(key) // retrieve: evaluates as data
Examples:
(1715 = "abcde")
1715 // ev. "abcde".
( AñoNacimiento(Pepe) = 1936 )
YearBorn(Pepe) // ev. 1936
( PlaceBirth(Pepe) = "Cartagena" )
BirthPlace(Pepe) // ev. "Cartagena" )
In this example, YearBirth and PlaceBirth can be considered as containers.
Example of stack
If we call s the stack name and x the value to add, we have the following operations:
The stack has been implemented as a one-dimensional array, but it could have been implemented as a sequence or in any other way, but from the user's point of view, the interface is unchanged.
Addenda
History of ADTs
In 1972, David Parnas published his seminal paper on modularization [Parnas, 1972] in which each module stored internal data, with a procedural interface.
In 1974, John Guttag first proposed the concept of data abstraction.
That same year, Barbara Liskov and Stephen Zilles proposed implementing ADTs in the CLU language [Liskov & Zilles, 1974]. This article had a great influence and led to a few years of intense research on ADTs, which were reflected in languages, techniques and methodologies.
In the 1970s, the Turbo Pascal language incorporated the Units, which were ADTs with data encapsulation and modularity. This language was decisive for the general acceptance of ADTs. Later, the Ada language was designed, which incorporated the packages, which were ADTs.
In 1973, Stephen Zilles published the article "Procedural Abstraction: a linguistic protection technique" [Zilles, 1973], in which he showed how procedures could be used to represent data objects. His notion of procedural abstraction was very similar to that of Parnass modules. But Zilles saw them as data that could even be passed as arguments to other procedures.
In 1976, an ACM SIGPLAN conference was held on Data: Abstraction, Definition and Structure.
In 1977 the term "data abstraction" was coined in which the focus shifted from algorithms (or functional abstractions) to data abstractions: procedures as an indirect way to access data. Subsequently the terms "encapsulated data types" and "abstract data types" appeared.
By 1979, ADTs were widely accepted in programming languages.
Bibliography
Bishop, Judy. Abstracción de Datos. Paraninfo, 1989.
Cardelli, L.; Wegner, P. On understanding types, data abstraction, and polymorphism. Computing Surveys, 17 (4): 471–522, 1986.
Cook, William R. Object-Oriented Programming Versus Abstract Data Types. Proceedings of the REX Workshop/School on the Foundations of Object-Oriented Languages, LNCS, Springer –Verlag, pp. 151-178, 1990. Disponible online.
Gries, D.; Gehani, N. Some ideas on data types in high-level languages. Coms. of the ACM 20 (6): 414-320, 1977.
Guttag, John. Abstract data types and the development of data structures. Coms. of the ACM, 20 (6): 396-404, 1977.
Ledgard, H.F.; Taylor, R.W. Two views of data abstraction. Coms. of the ACM 20 (6): 382-384, 1977.
Liskov, B.; Ziller, S. Programming with Abstract Data Types. SIGPlan Notices, 9 (4): 50-59, 1974.
Liskov, B.; Snyder, A.; Atkinson, R.; Schaffert, C. Abstraction Mechanisms in CLU. Communications of the ACM, 20: 564-576, 1977.
Mitchell, J.C.; Plotkin, G.D. Abstract types have existential type. In Proc. of the ACM Symp. on Principles of Programming Languages, 1985.
Parnas, David. On the criteria to be used in decomposing systems into modules. Communications of the ACM, 15: 1053–1058, 1972.
Parnas, David. A technique for software module specification. Communications of the ACM, 15: 330–336, 1972.
Zilles, S.N. Procedural Abstraction: a linguistic protection technique. ACM Sigplan Notices 8 (9): 142-146, 1973.