"Knowledge is language that names and describes" (Wittgenstein).
"Science is organized knowledge" (Herbert Spencer).
"Knowledge is linguistic" (Peirce)
The Scale "Data - Information - Knowledge"
The triad "Data - Information - Knowledge" constitutes a scale of increasing semantics, from the superficial to the deep, from form to substance. Since semantics is inexpressible, there is a certain fuzzy boundary between these three concepts. Nevertheless, we can point out the main characteristics of each of them:
Data
In general, a datum is an isolated value or is the value of an attribute or predicate of a certain unknown entity. The value can be quantitative or qualitative, with or without unit (in the case of numerical value). For example: the datum "33" we do not know if it is someone's age, a temperature, a length, a speed, etc.; the datum "blue" we do not know if it is the color of something, an example of a 4-letter word, etc. And if we know the numerical value and the unit (i.e. we have a magnitude), we do not know to which entity it corresponds. For example, 1, 80 meters can be the height of a person, the width of a closet, etc.
Information
A piece of information is a piece of data (or set of data) with context, with reference, associated with a particular entity. Information = data + interpretation. For example: "Pepe is 33 years old", "the temperature in this room is 33 degrees", "my speed is 33 km/hour", etc.
Knowledge
Knowledge corresponds to the top of the semantic scale. As semantics cannot be defined, neither can knowledge be defined. It is the same as with consciousness, truth and life.
We can say that knowledge is the subjective interpretation of information. Knowledge is information (or set of information) with meaning, which is internalized, interpreted or evaluated subjectively within a context. When knowledge is made explicit, it becomes a set of interrelated information.
Knowledge representation
Knowledge representation is mainly used in so-called "expert systems". An expert system is a system that collects the knowledge of an expert (or set of experts) in a given domain. They are usually rule-based.
Some examples of expert systems are: Dendral (the first expert system for aiding in the identification of molecular structures of unknown substances), Macsyma (symbolic manipulation of mathematical expressions), Mycin (diagnosis and treatment of blood diseases), Prospector (search for geological deposits), KRM (nonlinear dynamics), ARIS (airport information resources), etc.
Knowledge vs. Information
There are many differentiating aspects between knowledge and information.
Information is always specific, based on specific facts or events (e.g., "this table is green"). Knowledge is generic, based on patterns or general type mechanisms (e.g., "all men are mortal").
Information is external, superficial. Knowledge is deep, internal; it is associated with consciousness, with the union or connection between the internal (the subjective world) and the external (the objective world).
Information is objective; the interpretation is always the same for all observers. Knowledge is subjective; it is information interpreted subjectively.
Information is public. Knowledge is private. Information is what we know. Knowledge is what one knows.
An information is disconnected from other information. A unit of knowledge, in general, interrelates information.
Information ends in itself, it does not produce more information. With knowledge it is possible to reason and make inferences that generate more detailed information or knowledge.
Information does not require a formal language. A limited, specific, more or less informal language is enough. Knowledge, in order to be expressed, requires a formal language, of a general type, complete, with its syntax and semantics, capable of representing all types of information in a certain domain.
Information is represented in a single form or in very limited forms. Knowledge can be represented in many forms: attributes (or predicates), rules, frames, scripts, semantic networks, labels, etc.
Information can be processed. Knowledge can be managed. Managing knowledge means: retrieving or accessing, selecting, modifying, considering aspects of knowledge, etc.
Information is integrated in Information Systems. Knowledge is integrated in Knowledge Management Systems.
The information that is processed is exclusively linked to informatics. Knowledge is transdisciplinary, as it can be used in a multitude of domains: artificial intelligence, linguistics, cognitive science, etc.
Information is concrete, specific. Knowledge is generic or abstract.
Information is associated with the real world. Knowledge is not necessarily associated with the real world; it can be virtual, imaginative, speculative, possible, etc.
Metadata, Metainformation and Metaknowledge
A metadata is a piece of information applied to a piece of data. For example, in "dark blue", "dark" is a datum applied to the datum "blue".
A meta-data is a piece of information applied to a piece of information. For example, in "33 fruits, 12 are apples", "12 apples" is a meta-information of the information "33 fruits". And in "John has two children", a meta-information would be to say that his children are named Elena and Ivan.
A metaknowledge is knowledge that refers to other knowledge. For example, it can refer to the evaluation of knowledge, how to use the knowledge, its priority or weight, etc.
The Problem of Knowledge Representation
In the area of knowledge representation we can distinguish the following problems:
Diversity of paradigms.
The issue of knowledge representation is one of the central issues in AI (artificial intelligence), since it constitutes the necessary foundation for the resolution of the problems that arise in this area.
Knowledge representation is a transdisciplinary topic, since it transcends all domains, constituting the element of union of the different domains. It affects, not only AI, but also cognitive psychology,philosophy (epistemology), linguistics (psycholinguistics), etc.
The problem is that until now there has been no clear definition of what exactly knowledge is. And as a result of this lack of knowledge, numerous and diverse languages and systems of knowledge representation have been proposed, each with its own particular conception or paradigm of what knowledge is, including those based on mathematics and logic.
Diversity of languages and knowledge representation systems.
The result of the diversity of paradigms has been a tower of Babel, both in languages and systems. We thus find ourselves in a situation similar to that of programming languages and their different paradigms.
Among these languages or systems are: CYC, LOOM, KOS, KRL, SOAR, SHRDLU and LOOP. Some are common sense knowledge representation, others incorporate general reasoning or inference mechanisms, some are multi-paradigm, etc.
And the ideal solution to this problem is that there should be a standard language or system of knowledge representation that allows all the different paradigms to be expressed.
Given the diversity of languages and knowledge representation systems, the question is: Is there a universal language that allows us to represent all kinds of knowledge and in all kinds of domains? That is, can the concrete systems of knowledge representation be like particular cases of that universal language of knowledge representation?
The choice of a good knowledge representation language is vital to make certain problems easier to address and solve. A paradigmatic example halfway between information and knowledge is the Indo-Arabic system of numerical representation, much simpler and more intelligible than Roman numeration, and much easier to perform arithmetic calculations.
Levels of reality.
A knowledge representation system is a map of a certain reality. The map is an abstraction, a simplification that only considers particular aspects or characteristics of interest. The map is another kind of reality, from which we could make another map (map of order 2), and so on. The problem is the representation of the levels of reality.
The internal representation of knowledge.
The main problem facing cognitive psychology is the issue of how knowledge is stored or represented at the internal or mental level. The so-called "internal knowledge representation hypothesis" is the hypothesis that the human mind or brain internally represents knowledge in some way, and that this knowledge can be made explicit at a formal, external level. Also cognitive psychology investigates how sounds, colors, smells, shapes, emotions, abstract ideas, etc. are stored.
The representation of common sense.
A particularly important aspect is the representation of commonsense knowledge. Common sense is generic knowledge, transdisciplinary and closely related to consciousness.
Knowledge Representation Systems
There are numerous knowledge representation systems. The following stand out:
Frames
Frames, proposed by Marvin Minsky in 1975 are the knowledge representation system that have historically had the highest level of acceptance.
Their characteristics are the following:
A frame represents an entity and intends to express what is known about that entity by representing a typical or stereotyped situation. For example, paying a visit to a sick person, going to a birthday party, entering a restaurant, etc. It includes information on how to handle the situation and what to do in the event that the situation does not conform to expectations.
A frame has a name associated with it and is made up of a set of attributes or fields (slots). Each slot contains one or more associated values, possible values (the conditions that the data must satisfy) and default values.
Procedures and triggers can be associated to a frame or to one of its components (slots). Triggers are automatically activated (executed) based on certain events (defined by rules), such as when the value of an attribute is changed.
Frames are structured hierarchically, where the lower ones inherit the slots and values from the upper frames. The upper frames have typical, general, little variable information, while the lower ones have more specific information. There are generic frames (classes) and particular frames (instances).
A frame-based knowledge base is a collection of hierarchically organized frames. The knowledge base is modular in that it is organized into clearly differentiated components, which are the frameworks.
Inheritance. The frameworks are conceptually related, allowing the attributes of the frameworks to be inherited from other frameworks higher in the hierarchy.
An object (in OOP, object-oriented programming) is very similar to a frame. Objects have properties, which are internal attributes and methods. An object can share (inherit) properties (attributes and methods) from objects at a higher level. Objects communicate by means of messages. Each message causes the object to react, internally, externally, or both. Similarly, frames have associated properties, which are attributes (slots), procedures and rules.
Since Minsky's theoretical formulation, there have been several implementations of the frameworks, such as KL-ONE, KRL, OWL, and CLASSICS. KL-ONE is the most relevant. CLASSICS is a descendant of KL-ONE.
Minsky admitted that frames do not constitute a complete theory, but that frames can explain many features of human consciousness. For example, that intelligence consists of selecting in each new situation the most appropriate general frame and adapting it by changing the details. And that learning consists in the construction of new frames.
Semantic networks
A semantic network is a set of nodes and arcs. Each node represents an entity, which can be an action, an attribute, an event, a structure, a class, a frame, etc. An arc is a relationship between two nodes (entities). There are many types of relationships, among which are:
IS-A (is a/an). It is a membership relation. It indicates that a node (entity) belongs to a class.
HAVE-A (has a/one). It is a property relation. It indicates that a node (entity) has a certain characteristic or property.
A semantic network is different from a network. A network is a data structure. A semantic network is a knowledge representation system. Therefore, a semantic network has a higher semantic level than a network.
In a semantic network, the network can be hierarchical (a taxonomic hierarchy) or relational. In the hierarchical one there is a top node to which is assigned one or more child nodes, which in turn have other child nodes and so on until the end (bottom) is reached, whose nodes can be either entities or instances of entities.
The concept of inheritance is fundamental in hierarchical semantic networks. The properties of a node are based on the properties of the higher nodes in the hierarchy.
The most common type of semantic network is the IS-A network. In fact, this type is often mentioned as a synonym for semantic network. A "IS-A" network is a taxonomic hierarchy consisting of a system of hierarchy and inheritance links between nodes. Classical natural taxonomies are a good example: a dog is a canid, a canid is a mammal, a mammal is an animal.
Semantic networks "IS-A" are very flexible, but AI researchers have highlighted some major problems and drawbacks, including the following:
The choice of semantic network nodes and arcs is crucial in the analysis phase. Once a given structure has been decided upon, it is very difficult to modify it.
Difficulty in expressing quantification. For example in expressions such as "some birds fly" or "all birds chirp".
Difficulty in representing the intentional dimension. For example in propositions such as "Pedro believes that Ana knows how to drive".
They do not have processing capacity.
Frames vs. semantic networks
There are overlaps and differences between semantic networks and frames:
Both systems are networks formed by nodes and relationships between nodes. In the semantic network, relationships are established between two nodes. In the frame network, the relations are associated to the slots of the frames.
Semantic networks can be hierarchical or non-hierarchical. Frame networks are hierarchical.
The structure of frame network nodes is richer than that of semantic network nodes.
The frame network allows to detect and process events. The semantic network does not.
Inheritance in semantic networks is monotonic. Inheritance in frames may be non-monotonic, that is, a node may or may not inherit the slots of a parent node, depending on what is specified.
In general, the frame-based network has received the most attention, both theoretically (cognitive science and linguistics) and practically, because of its flexibility and possibilities.
Ontologies
An ontology is a set of concepts and relationships between those concepts. Those relationships include system consistency constraints. For example, in the block world, the concepts are "block" and "ground", and the relationship "on". A block is on another block or on the ground. A block cannot be on itself.
Currently, knowledge representation has been reoriented towards the broad domain of ontologies. Most ontology languages are declarative, and are based, to a greater or lesser extent, on frames or on first-order predicate logic.
New developments in languages and knowledge representation systems with ontologies have been reoriented towards the Web. They make use of XML, XML Schema, RDF, RDF Schema standards, as well as Web ontology languages such as Web Ontology Language.
XML is a low-level tag-based syntactic representation system. It is a simplified version of SGML for the Web. It encodes binary object-value relationships. For example, in "color blue", "color" is the object and "blue" is the value.
XML Schema defines a document structure (or structures in general) encoded with XML. It defines what kind of elements an XML document can contain and its structure.
RDF (Resource Description Framework) is a language for defining relationships between objects and metadata. It encodes ternary subject-property-object or entity-attribute-value relationships (also called "triples"). The subject indicates the resource. The property (or predicate) is an aspect of the resource that expresses a relationship between subject and object. For example, in "the sky is blue", "the sky" is the subject, "color" is the property, and "blue" is the object.
RDF Schema is a semantic extension of RDF. It is a basic language for defining ontologies.
Ontologies are conceptually placed at a higher layer than RDF and RDF Schema.
Production Systems
A production system is based on the logic programming paradigm. It is composed of a fact base (specific knowledge) and a rule base (generic knowledge) of the type "condition → action".
To infer new knowledge by rules there are two mechanisms:
Forward inference (forward chaining). It is the generalized modus ponens: from facts and rules, new facts are obtained, and from deduced facts new facts are obtained, and so on until all possibilities of inference are exhausted.
Backwards inference (backward chaining). It is the inverse process to the previous one. It goes from a possible fact to its validation by existing rules.
Advantages: The rules are independent of each other, are simple and are easily updated.
Disadvantages: Many rules are required, it needs an inference engine, and it does not allow inheritance or sharing in general.
Prolog is the best known logic programming language.
MENTAL, a Universal Language for Knowledge Representation
MENTAL, as a language for knowledge representation, solves the problem posed above:
Universality.
Because of its flexibility, MENTAL allows to represent knowledge using different paradigms: semantic networks, frames, ontologies, events, predicates (or attributes), rules, etc. All specific knowledge representation systems derive from MENTAL, which is a universal representation language.
Theory and practice.
MENTAL is a theory and practice of knowledge representation, as well as a universal model of internal (mental) and external (physical) reality. MENTAL is self-sufficient as a theory (and practice) of representation.
The thesis of MENTAL.
The thesis is that knowledge is all that is representable by means of the primitives of MENTAL, which are the primary archetypes present in internal and external reality.
The Church-Turing thesis refers to computation exclusively, which is a type of information or knowledge. The MENTAL thesis is more general, because it refers to all kinds of knowledge.
The union language - system.
With MENTAL the distinction between knowledge representation language and knowledge representation system is diluted, in the same way that the distinction between programming language and operating system is diluted.
The "data - information - knowledge" scale.
This scale is extended with "wisdom" and "transcendence". The deeper a knowledge is, the more important it is. Wisdom is the general knowledge common to several different domains. Transcendence is a privileged vantage point from which to contemplate the essential unity of all things. MENTAL implies wisdom and a transcendent vision of the real and the possible.
Union of opposites.
MENTAL contemplates opposites to allow us to represent all kinds of knowledge: generic and specific, descriptive and operative, qualitative and quantitative, abstract and concrete, precise and diffuse, etc. With MENTAL, we use the same semantic resources of the internal (mental) world to represent the external (physical) world. Through abstraction we connect with the essence of things and with the essence of information and knowledge. At the abstract level there is equivalence between ontology and epistemology, between the representation and the represented, between map and reality.
Meta levels.
MENTAL allows to represent data, information and knowledge, as well as meta-levels: meta-data, meta-information and meta-knowledge, as well as higher order meta-levels.
Granularity.
MENTAL allows to represent knowledge with the desired degree of detail (granularity).
Modularity.
MENTAL allows, if desired, to modularize or fragment knowledge into interrelated units.
Ontologies.
MENTAL allows to work easily with ontologies because MENTAL is a universal ontology-epistemology. MENTAL is also a meta-ontology that allows to build particular ontologies. In the same way that it is a universal grammar/language that allows to build particular grammars and languages.
The question of the internal representation of knowledge.
MENTAL solves this problem. Knowledge, both internal and external, is expressed by the primary archetypes. There is isomorphism between knowledge representation and reality.
Common sense knowledge representation.
The topic of commonsense knowledge representation has been, paradoxically, one of the most difficult in AI. MENTAL is a common sense language that allows representing this type of knowledge in a simple and intuitive way.
Simplicity.
MENTAL is a language of cognitive economy, since it is based only on 12 universal semantic primitives or primary archetypes.
Knowledge management.
Knowledge management (modifying, adding, deleting, selecting aspects of knowledge, etc.) can also be done with MENTAL, because knowledge management is also knowledge (or meta-knowledge).
Inferences.
MENTAL does not need forward inference engine because it already has automatic inference with forward chaining. The backward chaining engine is not necessary, because all possible inferences are performed automatically, being available at all times.
Sharing.
MENTAL has a generic sharing mechanism, a mechanism superior to traditional inheritance, which is limited to hierarchical structures.
Knowledge modeling.
There is an area called "Knowledge Modeling" (KM), which is a trans/interdisciplinary approach to represent knowledge in a way that is reusable and shared across domains and to simulate intelligence. It is a key topic in AI and cognitive science.
MENTAL is a system that allows modeling the knowledge of any system, because it is universal, domain independent. It is open, flexible and creative.
Language of consciousness.
In MENTAL, knowledge is linked to consciousness, through the primary archetypes.
New possibilities.
With MENTAL, unsuspected possibilities open up, such as modifying a rule according to certain circumstances, creating new rules dynamically, etc.
Examples
The "IS-A" (x is y) relation of semantic networks is a binary subject-predicate type relation and is simply expressed as < i>x/y, where x is the subject, and y is the predicate. Examples:
Socrates/man (Socrates is a man).
blue/color (blue is a color).
cat/mammal (a cat is a mammal).
The relation "HAVE-A" (x has property y) of semantic networks is a ternary relation of the form x/(property/y). Examples:
cat/(color/black) (the cat has the property of being black)
table/(NroLegs/3) (the table has 3 legs)
Addenda
MENTAL, a language Tertium Comparationis for translation
"Tertium Comparationis" means in Latin "the third (part) of the comparison". It is the quality that two things have in common when compared. MENTAL can be used as an intermediate language to represent the common structure between different languages, for the process of translation (especially automatic) of a text in one language A into another language B.
Normally the translation of a text from one language to another is done from the surface level. With MENTAL the text is passed from the language A to a deep level (the MENTAL code), to then "emerge" in another language B. The deep level of MENTAL, together with its flexibility and power, is the one that can best reflect and represent the knowledge associated with a text.
Bibliography
Fagin, Ronald; Halpern, Joseph Y.; Mose, Yoram; Vardi, Moshe Y. Reasoning About Knowledge. The MIT Press, 1995.Disponible en Internet.
Lenat, Douglas B.; R.V. Guha, R.V. Building Large Knowledge-Based Systems. Addison-Wesley, 1990.
Newell, Allen. The Knowledge Level. Artificial Intelligence 18 pp. 87-127, 1982.
Minsky, Marvin. A Framework for Representing Knowledge. En P. Winston (ed.) The Psychology of Computer Vision. McGraw-Hill, 1975. Disponible en Internet.
Sowa, John F. Knowledge Representation: Logical, Philosophical, and Computational Foundations. Brooks/Cole, 2000.
Sowa, John F. Conceptual Structures: Information Processing in Mind and Machine. Addison-Wesley, 1984.