Olog

The theory of ologs is an attempt to provide a rigorous mathematical framework for knowledge representation, construction of scientific models and data storage using category theory, linguistic and graphical tools. Ologs were introduced in 2010 by David Spivak,^[1] a research scientist in the Department of Mathematics, MIT.

Etymology

The term "olog" is short for "ontology log". "Ontology" derives from onto-, from the Greek ὤν, ὄντος "being; that which is", present participle of the verb εἰμί "be", and -λογία, -logia: science, study, theory.

Mathematical formalism

At the basic level an olog ${\mathcal {C}}$ is a category whose objects are represented as boxes containing sentences and whose morphisms are represented as directed labeled arrows between boxes. The structures of the sentences for both the objects and the morphisms of ${\mathcal {C}}$ need to be compatible with the mathematical definition of ${\mathcal {C}}$ . This compatibility cannot be checked mathematically, because it lies in the correspondence between mathematical ideas and natural language.

Every olog has a target category, which is taken to be ${\textbf {Set}}$ (Category of sets), the category of sets and functions, unless otherwise mentioned. In that case, we are looking at a set of amino acids, a set of amine groups, and a function that assigns to every amino acid its amine group. In this article we usually stick to ${\textbf {Set}}$ , though sometimes using the Kleisli category ${\mathcal {C}}_{\mathbb {P} }$ of the power set monad. Another possibility, though one we do not use here, would be to use the Kleisli category of probability distributions (the Giry monad), e.g., to obtain a generalization of Markov decision processes.

The boxes in the above example refer to objects of ${\textbf {Set}}$ . For example, the box containing the sentence "an amino acid" refers to the set of all amino acids and the box containing the sentence "a side chain" refers to the set of all side chains. The arrow labeled with "has" whose source is "an amino acid" and whose target is "a side chain" refers to a morphism between two objects of ${\textbf {Set}}$ and thus needs to be a function between two sets. Indeed, every amino acid has a unique side chain so the arrow is a valid morphism of ${\textbf {Set}}$ . The functional nature of the morphisms in ${\textbf {Set}}$ is expressed in an olog by labeling arrows with appropriate sentences (e.g. "has").

For another example let $(\mathbb {P} ,\eta ,\mu )$ be the power set monad on ${\textbf {Set}}$ so given $A\in Ob({\textbf {Set}})$ , $\mathbb {P} (A)$ is the power set of A, the natural transformation $\eta$ sends $a\in A$ to the singleton $\{a\}$ and the natural transformation $\mu$ unionizes sets. A morphism $f:A\to B$ in the Kleisli category ${\mathcal {C}}_{\mathbb {P} }$ can be seen as establishing a binary relation R. Given $a\in A$ and $b\in B$ we say that $(a,b)\in R$ if $b\in f(a)$ .

We can use ${\mathcal {C}}_{\mathbb {P} }$ as the target category for an olog. In this case the arrows in the olog need to reflect the relational nature of morphisms in ${\mathcal {C}}_{\mathbb {P} }$ . This can be done by labeling every arrow in the olog with either "is related to", or "is greater than" and so on.

Ologs and databases

An olog ${\mathcal {C}}$ can also be viewed as a database schema. Every box (object of ${\mathcal {C}}$ ) in the olog is a table $T$ and the arrows (morphisms) emanating from the box are columns in ${\mathcal {C}}$ . The assignment of a particular instance to an object of ${\mathcal {C}}$ is done through a functor $I:{\mathcal {C}}\to {\textbf {Set}}$ . In the example above, the box "an amino acid" will be represented as a table whose number of rows is equal to the number of types of amino acids and whose number of columns is three, one column for each arrow emanating from that box.

Relations between ologs

Communication between different ologs which in practice can be communication between different models or world-views is done using functors. Spivak coins the notions of a 'meaningful' and 'strongly meaningful' functors.^[1] Let ${\mathcal {C}}$ and ${\mathcal {D}}$ be two ologs, $I:{\mathcal {C}}\to {\textbf {Set}}$ , $J:{\mathcal {D}}\to {\textbf {Set}}$ functors (see the section on ologs and databases) and $F:{\mathcal {C}}\to {\mathcal {D}}$ a functor. We say that a $F$ is meaningful if there exists a natural transformation $m:I\to F^{*}J$ (the pullback of J by F).

Taking as an example ${\mathcal {C}}$ and ${\mathcal {D}}$ as two different scientific models, the functor $F$ is meaningful if predictions, which are objects in ${\textbf {Set}}$ , made by the first model ${\mathcal {C}}$ can be translated to the second model ${\mathcal {D}}$ .

We say that $F$ is strongly meaningful if given an object $X\in {\mathcal {C}}$ we have $I(X)=J(F(X))$ . This equality is equivalent to requiring $m$ to be a natural isomorphism.

Sometime it will be hard to find a meaningful functor $F$ from ${\mathcal {C}}$ to ${\mathcal {D}}$ . In such a case we may try to define a new olog ${\mathcal {B}}$ which represents the common ground of ${\mathcal {C}}$ and ${\mathcal {D}}$ and find meaningful functors $F_{\mathcal {C}}:{\mathcal {B}}\to {\mathcal {C}}$ and $F_{\mathcal {D}}:{\mathcal {B}}\to {\mathcal {D}}$ .

If communication between ologs is limited to a two-way communication as described above then we may think of a collection of ologs as nodes of a graph and of the edges as functors connecting the ologs. If a simultaneous communication between more than two ologs is allowed then the graph becomes a symmetric simplicial complex.

Rules of good practice

Spivak provides some rules of good practice for writing an olog whose morphisms have a functional nature (see the first example in the section Mathematical formalism).^[1] The text in a box should adhere to the following rules:

begin with the word "a" or "an". (Example: "an amino acid").
refer to a distinction made and recognizable by the olog's author.
refer to a distinction for which there is well defined functor whose range is ${\textbf {Set}}$ , i.e an instance can be documented. (Example: there is a set of all amino acids).
declare all variables in a compound structure. (Example: instead of writing in a box "a man and a woman" write "a man $m$ and a woman $w$ " or "a pair $(m,w)$ where $m$ is a man and $w$ is a woman").

The first three rules ensure that the objects (the boxes) defined by the olog's author are well-defined sets. The fourth rule improves the labeling of arrows in an olog.

Applications

The concept was experimentally documented by David Spivak and coauthors Associate Professor Markus J. Buehler of the Department of Civil and Environmental Engineering (CEE) and CEE graduate student Tristan Giesa in a paper that was published in the December 2011 issue of BioNanoScience[2] in which the researchers establish a scientific analogy between spider silk and musical composition.^[2]

References

1 2 3 Spivak (2011). "Ologs: A categorical framework for knowledge representation". arXiv:1102.1889v1 [cs.LO].
↑ Giesa, Tristan; Spivak, David I.; Buehler, Markus J. (2011). "Reoccurring patterns in hierarchical protein materials and music: The power of analogies". arXiv:1111.5297v1 [q-bio.BM].

External links

"Categorical Informatics". David Spivak.

This article is issued from Wikipedia - version of the 6/18/2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.