Saturday, November 3, 2012

Tower of Babel, Semantics Initiative, and Ontology

What's in a name? That which we call a rose by any other name would smell as sweet. ~ William Shakespeare 
The beginning of wisdom is to call things by their right names. ~ Chinese Proverb

At a symposium held by the Securities Industry and Financial Markets Association (SIFMA) in March 2012, Andrew G. Haldane, Executive Director of Financial Stability for the Bank of England, gave a speech titled, “Towards a common financial language”.[1] Using the imagery of the Tower of Babel, Mr. Haldane described how…
Finance today faces a similar dilemma. It, too, has no common language for communicating financial information. Most financial firms have competing in-house languages, with information systems silo-ed by business line. Across firms, it is even less likely that information systems have a common mother tongue. Today, the number of global financial languages very likely exceeds the number of global spoken languages.

The economic costs of this linguistic diversity were brutally exposed by the financial crisis. Very few firms, possibly none, had the information systems necessary to aggregate quickly information on exposures and risks.[2] This hindered effective consolidated risk management. For some of the world’s biggest banks that proved terminal, as unforeseen risks swamped undermanned risk systems.

These problems were even more acute across firms. Many banks lacked adequate information on the risk of their counterparties, much less their counterparties’ counterparties. The whole credit chain was immersed in fog. These information failures contributed importantly to failures in, and seizures of, many of the world’s core financial markets, including the interbank money and securitization markets.

Why is this? One would think that the financial industry would be in a great position to capitalize on the growth of digital information. After all, data has been the game changer for decades. But Wall Street, while proficient at handling market data and certain financial information, is not well prepared for the explosion in unstructured data.

The so-called “big data” problem of handling massive amounts of unstructured data is not just about implementing new technologies like Apache Hadoop. As discussed at the CFTC Technology Advisory Committee on Data Standardization held September 30, 2011, there is significant confusion in the industry regarding “semantics”.[3]

EDM Council - FIBO Semantics Initiative

The “semantic barrier” is a major issue in the financial industry, necessitating the creation of standards such as ISO 20022 to resolve.[4] For example, what some participants in the payments industry call an Ordering Customer, others refer to a Payer or Payor, while still others refer to a Payment Originator or Initiator. The context also plays a role here: the Payment Originator/ Initiator is a Debtor/ Payor in a credit transfer, while that Payment Originator/Initiator is a Creditor/Payee in a direct debit.[5]

It should therefore be apparent that intended use of systems is reliant on “human common sense” and understanding. Unfortunately, especially within the context of large organizations or across an industry, boundaries of intended use are often not documented and exist as “tribal knowledge”. Even if well documented, informal language maintained in policies and procedures can result in unintentional misapplication, with consequences no less hazardous than intentional misapplication.


Overcoming semantic barriers...

If your avocation[6] involves organizing information and/or modeling data and systems you invariably start asking epistemo-logical[7] questions, even though such questions may not be immediately practical to the task at hand: What is knowledge? How is knowledge acquired? To what extent is it possible for a given concept, either physical or abstract, to be known? Can computers understand meaning from the information they process and synthesize knowledge? “Can machines think?”[8]

Such questions are impetus to an ongoing debate about “the scope and limits of purely symbolic models of the mind and about the proper role of connectionism in cognitive modeling.” Harnad (1990) proffered such quandary as the Symbol Grounding Problem. “How can the semantic interpretation of a formal symbol system be made intrinsic to the system, rather than just parasitic on the meanings in our heads?”[9] The meaning triangle[10] illustrates the underlying problem.

 
Figure 1 – Ogden and Richards (1923) meaning triangle 

Figure 1 is a model of how linguistic symbols stand for objects they represent, which in turn provide an index to concepts in our mind. Note too, that such triangle represents the perspective of only one person, whereas communication often takes place between two or more persons (or devices such as computers). Hence, in order for two people or devices to understand each other, the meaning that relates term, referent and concept must align.

Now consider that different words might refer to the same concept, or worse, the same word could have different meanings, as in our example of the term “orange”. Are we referring to a fruit or to a color? This area of study is known as semantics.[11]


Relating semantics to ontology

As the meaning triangle exemplifies, monikers—whether they be linguistic[12] or symbolic[13]—are imperfect indexes. They rely on people having the ability to derive denotative (ie, explicit) meaning, and/or connotative (ie, implicit) meaning from words/signs. If the encoder (ie, sender) and the decoder (ie, receiver) do not share both the denotative and connotative meaning of a word/sign, miscommunication can occur. In fact, at the connotative level, context determines meaning.

Analytic approaches to this problem falls under the domain of semiotics,[14] which for our purposes encompasses the study of words and signs as elements of communicative behavior. Consequently, we consider linguistics and semiosis[15] to come under the subject of semiotics.[16] Semiotics, in turn, is divided into three branches or subfields: (i) semantics; (ii) syntactics;[17] and (iii) pragmatics.[18]

Various disciplines are used to model concepts within this field of study. These disciplines include, but are not necessarily limited to, lexicons/synsets, taxonomies, formal logic, symbolic logic, schema related to protocols (ie, syntactics), schema related to diagrams (ie, semiosis), actor-network theory, and metadata [eg, structural (data about data containers), descriptive (data about data content)]. In combination, these various methods form the toolkit for ontology work.

Ontology involves the study of the nature of being, existence, reality, as well as the basic categories of being and their relations. It encompasses answering metaphysical[19] questions relating to quiddity, that is, the quality that makes a thing what it is—the essential nature of a thing.

Admittedly, there are divergent views amongst practitioners as to what constitutes ontology, as well as classification of semiotics and related methodologies. To be sure, keeping all these concepts straight in one’s mind is not without difficulty for those without formal training. Further, “ontology has become a prevalent buzzword in computer science. An unfortunate side-effect is that the term has become less meaningful, being used to describe everything from what used to be identified as taxonomies or semantic networks, all the way to formal theories in logic.”[20]

Figure 2 is a schematic diagram illustrating a hierarchical conceptualization of ontology, its relation to epistemology, metaphysics, and semiotics, as well as its relation to cognitive science. Figure 2 also shows how semiotics encompasses linguistics and semiosis.

 
Figure 2 – Conceptualization of epistemology, metaphysics, ontology, and semiotics


Knowledge representation and first order logic

Knowledge representation (KR) is an area of artificial intelligence research aimed at representing knowledge in symbols to facilitate systematic inferences from knowledge elements, thereby synthesizing new elements of knowledge. KR involves analysis of how to reason accurately and effectively, and how best to use a set of symbols to represent a set of facts within a knowledge domain.

A key parameter in choosing or creating a KR is its expressivity. The more expressive a KR, the easier and more compact it is to express a fact or element of knowledge within the semantics and syntax of that KR. However, more expressive languages are likely to require more complex logic and algorithms to construct equivalent inferences. A highly expressive KR is also less likely to be complete and consistent; whereas less expressive KRs may be both complete and consistent.

Recent developments in KR include the concept of the Semantic Web, and development of XML-based knowledge representation languages and standards, including Resource Description Framework (RDF), RDF Schema, Topic Maps, DARPA Agent Markup Language (DAML), Ontology Inference Layer (OIL), and Ontology Web Language (OWL).[21]

 
Figure 3 – Adapted from Pease (2011) [Figure 15] and Orbst (2012)

SUMO, an open-source declarative programming language based on first order logic,[22] resides on the higher end of the scale in terms of both formality and expressiveness. The upper level ontology of SUMO consists of ~1120 terms, ~4500 axioms and ~795 rules, and has been extended with a mid-level ontology (MILO) as well as domain specific ontologies. Written in the SUO-KIF language, it is the only formal ontology that has been mapped to all of WordNet lexicon.

Formal languages such as DAML, OIL, and OWL are geared towards classification. What makes SUMO unique from other types of modeling approaches (eg, UML or frame-based), is its use of predicate logic. SUMO preserves the ability to structure taxonomic relationships and inheritance, but then extends such techniques with an expressive set of terms, axioms and rules that can more accurately model geo-spatial, sequentially temporal concepts, both physical and abstract.

Nevertheless, KR modeling can suffer from the “garbage in, garbage out” syndrome. Developing domain ontologies with SUMO is no exception. That is why in a large ontology such as SUMO/MILO, validation is very important.[23]


Models of concepts are second derivatives

Returning to Ogden’s and Richards’ (1923) meaning triangle, the relations between term, referent and concept may be phrased more precisely in causal terms:
  • The matter (referent) evokes the writer's thought (concept). 
  • The writer refers the matter (referent) to the symbol (term). 
  • The symbol (term) evokes the reader's thought (concept). 
  • The reader refers the symbol (concept) back to the matter (referent).

When the writer refers the matter to the symbol, the writer is effectively modeling the referent. The method that is used is informal language. However, without a formal semantic system in which to model concepts, the use of natural language as a representation of concepts will suffer from the issue that informal languages have meaning only by virtue of human inter-pretation of words. Likewise, it is important to not confuse the term as a substitute for the referent itself.

In calculus the second derivative of a function ƒ is the derivative of the derivative of ƒ. Likewise, a KR archetype or replica of a referent (ie, a physical or abstract thing) can be considered a second derivative, whereby the concept is the first derivative, and the model of the concept is the second derivative. What can add to the confusion is the term labeling the referent, versus the term labeling the model of the concept of the referent. One's inclination is to substitute the label for the referent.

Figure 4 – Adapted from Sowa (2000), “Ontology, Metadata, and Semiotics” 

Thus, it is important to recognize that a representation of a thing is at its most fundamental level still a surrogate, a substitute for the thing itself. It is a medium of expression. In fact, “the only model that is not wrong is reality and reality is not, by definition, a model.”[24] Still, a pragmatic method for addressing this concern derives from development and use of ontology.


Unstructured data and a way forward…

Over the past two decades much progress has been made on shared conceptualizations and the theory of semantics, as well as the application of these disciplines to advanced computer systems. Such progress has provided the means to solve the problem of deriving meaningful information from the unstructured schema size explosion (ie, “big data”) overwhelming the banking industry, as well as the many other industries which suffer from the same issue.

The missing link underlying “the cause of many a failure to design a proper dialect... [is] the general lack of an upper ontology that could provide the basis for mid-level ontologies and other domain specific metadata dictionaries or lexicons.” The key then is use of an upper-level ontology that “gives those who use data modeling techniques a common footing to stand on before they undertake their tasks.”[25] SUMO, as an open source formal ontology (think Linux), is a promising technology for such purpose with an evolving set of tools and an emerging array of applications/uses solving real world problems.

As Duane Nickull, Senior Technology Evangelist at Adobe Systems explained, “[SUMO] provides a level setting for our existence and sets up the framework on which we can do much more meaningful work.”[26]


About SUMO:
The Suggested Upper Merged Ontology (SUMO) and its domain ontologies form the largest formal public ontology in existence today. They are being used for research and applications in search, linguistics and reasoning. SUMO is the only formal ontology that has been mapped to all of the WordNet lexicon. SUMO is written in the SUO-KIF language. Sigma Knowledge Engineering Environment (Sigma KEE) is an environment for creating, testing, modifying, and performing inference with ontologies developed in SUO-KIF (e.g., SUMO, MILO). SUMO is free and owned by the IEEE. The ontologies that extend SUMO are available under GNU General Public License. Adam Pease is the Technical Editor of SUMO.  

For more information: http://www.ontologyportal.org/index.html. Also see: http://www.ontologyportal.org/Pubs.html for list of research publications citing SUMO.

Users of SUO-KIF and Sigma KEE consent, by use of this code, to credit Articulate Software and Teknowledge in any writings, briefings, publications, presentations, or other representations of any software that incorporates, builds on, or uses this code. Please cite the following article in any publication with references:

Pease, A., (2003). The Sigma Ontology Development Environment. In Working Notes of the IJCAI-2003 Workshop on Ontology and Distributed Systems, August 9, 2003, Acapulco, Mexico.



Footnotes:
[1] Speech by Mr Andrew G Haldane, Executive Director, Financial Stability, Bank of England, at the Securities Industry and Financial Markets Association (SIFMA) “Building a Global Legal Entity Identifier Framework” Symposium, New York, 14 March 2012.

[2] Counterparty Risk Management Policy Group (2008).

[3] CFTC Technology Advisory Subcommittee on Data Standardization Meeting to Publicly Present Interim Findings on: (1) Universal Product and Legal Entity Identifiers; (2) Standardization of Machine-Readable Legal Contracts; (3) Semantics; and (4) Data Storage and Retrieval. Meeting notes by Association of Institutional Investors. Source: http://association.institutionalinvestors.org/ Document: http://bit.ly/QJIP4i

[4] See: http://www.iso20022.org/

[5] SWIFT Standards Team, and Society for Worldwide Interbank Financial Telecommunication (2010). ISO 20022 for Dummies. Chichester, West Sussex, England: Wiley. http://site.ebrary.com/id/10418993.

[6] The term “avocation” has three seemingly conflicting definitions: 1. something a person does in addition to a principal occupation; 2. a person's regular occupation, calling, or vocation; 3. Archaic diversion or distraction. Note: we purposefully selected this terms because it relate to pragmatics; specifically, the “semantic barrier”.

[7] e•pis•te•mol•o•gy, n., 1. a branch of philosophy that investigates the origin, nature, methods, and limits of human knowledge; 2. the theory of knowledge, esp the critical study of its validity, methods, and scope.

[8] Turing, A.M. (1950). “Computing machinery and intelligence” Mind, 59, 433-460.

[9] Harnad, Stevan (1990) “The Symbol Grounding Problem” Physica, D 42:1-3 pp. 335-346.

[10] Ogden, C., and Richards, I. (1923). The meaning of meaning. A study of the influence of language upon thought and of the science of symbolism. Supplementary essays by Malinowski and Crookshank. New York: Harcourt.

[11] se•man•tics, n., 1. linguistics the branch of linguistics that deals with the study of meaning, changes in meaning, and the principles that govern the relationship between sentences or words and their meanings; 2. significs the study of the relationships between signs and symbols and what they represent; 3. logic a. the study of interpretations of a formal theory; b. the study of the relationship between the structure of a theory and its subject matter; c. the principles that determine the truth or falsehood of sentences within the theory. 

[12] lin•guis•tics, n., the science of language, including phonetics, phonology, morphology, syntax, semantics, pragmatics, and historical linguistics.

[13] Use of term “symbolic” refers to semiosis and the term “sign,” which is something that can be interpreted as having a meaning for something other than itself, and therefore able to communicate information to the person or device which is decoding the sign. Signs can work through any of the senses: visual, auditory, tactile, olfactory or taste. Examples include natural language, mathematical symbols, signage that directs traffic, and non-verbal interaction such as sign language. Note: we categorized linguistics to be a subclass of semiotics.

[14] se•mi•ot•ics, n., the study of signs and sign processes (semiosis), indication, designation, likeness, analogy, metaphor, symbolism, signification, and communication. Semiotics is closely related to the field of linguistics, which, for its part, studies the structure and meaning of language more specifically.

[15] The term “semiosis” was coined by Charles Sanders Peirce (1839–1914) in his theory of sign relations to describe a process that interprets signs as referring to their objects. Semiosis is any form of activity, conduct, or process that involves signs, including the production of meaning. [Related concepts: umwelt, semiosphere]

[16] One school of thought argues that language is the semiotic prototype and its study illuminates principles that can be applied to other sign systems. The opposing school argues that there is a meta system, and that language is simply one of many codes (ie, signs) for communicating meaning.

[17] syn•tac•tic, n., 1. the branch of semiotics that deals with the formal properties of symbol systems. 2. logic, linguistics the grammatical structure of an expression or the rules of well-formedness of a formal system.

[18] prag•mat•ics, n. 1. logic, philosophy the branch of semiotics dealing with causalality and other relations between words, expressions, or symbols and their users. 2. linguistics the analysis of language in terms of the situational context within which utterances are made, including the knowledge and beliefs of the speaker and the relation between speaker and listener. [Note: pragmatics is closely related to the study of semiosis.]

[19] met•a•phys•ics, n., 1. the branch of philosophy that treats of first principles, includes ontology and cosmology, and is intimately connected with epistemology; 2. philosophy, especially in its more abstruse branches; 3. the underlying theoretical principles of a subject or field of inquiry.

[20] Pease, Adam. (2011). Ontology: A Practical Guide. Angwin: Articulate Software Press.

[21] A review of the listed KR approaches is outside the scope of this discussion. See Pease (2011), Ontology: A Practical Guide, ‘Chapter 2: Knowledge Representation’ for a more in-depth discussion/comparison.

[22] While OWL is based on description logic, its primary construct is taxonomy (ie, frame language).

[23] See Pease (2011), Ontology: A Practical Guide, pp. 89-91 for further discussion on validation.

[24] Haldane, Andrew G. (2009). “Why banks failed the stress test” Speech, Financial Stability, Bank of England, at the Marcus-Evans Conference on Stress-Testing, London, 9-10 February 2009.

[25] Duane Nickull, Senior Technology Evangelist, Adobe Systems; foreword to “Ontology” by Adam Pease.

[26] Multiple contributors (2009). Introducing Semantic Technologies and the Vision of the Semantic Web Frontier Journal, Volume 6, Number 7 July 2009. See: http://www.hwswworld.com/pdfs/frontier66.pdf 

1 comment:

Andrew Finlay said...

As an aside for those interested in mathematics and Borge (author of the Library of Babel fame), read below.

http://www.amazon.com/Unimaginable-Mathematics-Borges-Library-Babel/dp/0195334574