Atomspace — Hyper-graph information retrieval system

Overview

8 min readMay 9, 2021

This piece is in continuation to the previous blog on the OpenCog and AGI. In this piece we will understand a core element of the OpenCog system i.e. the Atomspace which is the information storage and retrieval system used in the framework. It is based on the concept of hypergraphs, something that we will look into and develop a understanding of.

Graph Database

Let’s us first lay down a foundation for the graph database concepts, since those will be used to define the high level abstract concepts for the hyper-graphs.

A graph database is a database consisting of set objects, which can be node or an edge.

Nodes : It represents an entity or quantity of a type as a record in the database. In the above diagram, the spherical entotues are nodes.
Edges : connection between two nodes, defines the relationship between the two nodes. As shown in the above diagram, edge connecting the node Alice and the Chess represents member as the edge relationship.
Properties : It is the information associated to the nodes. The Name and the age associated with the id:1 node.

The graph databases are built on top of the graph datastructures, which in itself is a rabbit hole to go down and explore all the concepts within it. There are directed, undirected, weighted, oriented graphs, complete graphs, finite graphs, connected graphs Bipartite graph, planar graph, cycle graph, tree,, polytree which you can refere to for indepth understanding. In this setting we will look into the weighted, directed and undirected graphs as a source of architectural understanding.

Hypergraph

A hyper-graph is a generalisation of a graph in which an edge can join any number of vertices.Let’s us try to understand this with an example

X = {v1, v2, v3, v4, v5, v6, v7} is the set all the vertices in the hyper-graph.

E = {e1, e2, e3, e4} is the set of all the edges in the hyper-graph.

What we observe is e1(yellow), connects v1, v2 and v3 while e2(orange) connects only v2 and v3. Similar connect can be found among v3, v5 and v6.

Hyper-graphs are incidence structures i.e. An incidence structure is a triple (P, L, I) where P is a set whose elements are called points, L is a distinct set whose elements are called lines and I ⊆ P × L is the incidence relation.

Consider the points and the line in the first diagram, the lines are incident of the three points i.e. they are passing through it or you can say the points are on the lines, both relations are acceptable. This is the incidence relation. Now, in the case of the hypergraph(above diagram), the edge e1 is incident on the vertice v2, since it is passing through it or vice-versa that the vertice v3 is on the hyperedge e1.

Every incidence structure can be viewed as a hypergraph by identifying the lines with the sets of points that are incident with them.

Atomspace

Atom

Atoms are the main component of the Atomspace. With values they are called AtomSpace stores. The two most basic types of atoms are nodes and links, they are used to represent and store graphs. The Atoms are typed i.e. follow type theory. In type theory every term has a type which defines its meaning and the operations that maybe performed on it. In the initial version of the OpenCog, it was used to store logical assertions and the probabilistic TruthValues. Now, it has become more general to store the natural language grammar, dictionaries and parsers. Values can be boolean true/false valuations, but they can also be numbers, weights, vectors or strings, or any combinations. This distinction between the “shape of the graph”, and the data that is “hung on it” is central for allowing high-speed graph traversal and generalized graph query.

Atoms can be used for create query language itself i.e. query for query languages. The Atoms themselves are extensible, wrappers can be created with existing libraries to export those functions in the Atomese.

Atoms as symbols, is essentially the same thing as a symbol table. A symbol table commonly has the property that all symbols in it are unique.Once an Atom is placed in the AtomSpace, it gets a single, unique ID. This unique ID is the string name of a Node, and it is the outgoing set of a Link.

A truth value gives each Atom a valuation or an interpretation, thus all Atoms in a given, fixed AtomSpace always carry a default valuation/interpretation along with them. Additional interpretations can be added using the Contextlink. When contexts are used, truth values resemble conditional probabilities; that is, the truth value of an atom A in context C can be roughly understood to behave as a probability P(A|C).

Evaluation link allows a relational model to be specified. The evaluation link is the most central and important atom type in the OpenCog.

The atoms are immutable, can be assigned fleeting, changing values to indicate the truth or likelihood of that atom, or to hold other kinds of transient data.

Atomese

It is the concept of writing programs with atoms. It a programming language for the algorithms to communicate with. It has ideas from prolog, SQL and relational algebra, without tables and being a graph database. Atomese was originally intended to be a language for knowledge representation (KR): that is, a way of encoding facts and hypothesis, in a machine-readable way, such that the knowledge can be manipulated, data-mined, reasoned with. This language subset was vaguely inspired by Prolog and Datalog. More correctly, it was constructed by layering concepts from mathematical logic onto a graph database: representing logical, symbolic statements as graphs.

It is a language designed for easy self-introspection As such, it goes well beyond ordinary object reflection or meta-programming, and provides an implementation of self-modifying code that is actually usable in ordinary machine-learning scenarios.it resembles, in many ways, the intermediate languages found inside of compilers: the intermediate language is where a compiler can best “understand” the program, and analyse it’s structure to perform the needed rewrites and optimisations.

For now, the atomese is very slow compiled language. It needs to be fixed making certain optimisation changes.

Attributes as a database

Uniqueness : An atomspace will contain only one single node of a given name, and one single link with a given outgoing set.
Indexes : The Atomspace contains several indexes to provide fast access to certain classes of the atom. Indexes are used to speed up the search.
Persistence : It is used as an in-RAM database, with all of the performance advantages of having data cached in RAM. For the content which needs to be stored for a longer period of time, thus, the contents of an AtomSpace can be saved-to/restored-from some storage medium.
Decentralised computing vs distributed computing : The design philosophy of encoding all state as graphs, and disallowing any hidden state means that atoms can ber transported from one resource to another resource, or shared among network servers.
Query Language : It used pattern search language for quering atoms in the atomspace. Each entry is a graph and not row in a table like SQL.
Change notification : It provides a signal whrn an atom is added or removed, thus allowing actions to be triggered as the content change.

Atomspace can be considered analogous to the symbol table i.e its a table and all the atoms are symbols in that table. It can be considered as a blackboard or tuple space. It can be considered as a knowledge representation system too.

Programming Aspect — Current Implementation

The current implementaiton of the Atomspace can be accessed using C++, Python, Haskell, a web interface or ZeroMQ, Low level via C++ is about 4x faster and 40x faster with the Python.
AtomSpace is threadsafe via C++. Access via scheme is thread-safe but behaves poorly when more than 3 or 4 concurrent threads are running.The scheme is a dialect of Lisp.
The SQL storage backend provides user-controlled, on-demand saving and loading of hypergraphs. The existing SQL backend provides multiple parallel asynchronous store-back queues (four threads, by default).
Multiple AtomSpaces on multiple network-connected servers can communicate via the Postgres backend. That is, they can share common atoms.
The AtomSpace benchmark tool measures the performance of a dozen basic atom and atomspace operations. A diary in the benchmark directory maintains a historical log of measurements.
Handles : Handle is a smart pointer to an atom. Knowing the Handle is enough to retrieve the corresponding atom. When all the handles corresponding to the atom are gone, then the atom is also deleted. Smart pointer implement memory management using the reference counting.The garbasge collection impmlmentatio was not a success suince GC has trouble chasing long extremely long chains, which is the case with the std::vector, std::map
Different links in the AtomSpace :

Context link, attach values w.r.t the context. E.g. Sky is blue is correct for Earth but not Mars.
Get Link : It returns a search result link, like a memory address of another link.
BindLink : It connects the two search links together in the graph search query path.
Putlink: It add a new link to the graph network and connects it to a another link or keep a new link in isolation.
Filter Link : Similar to the GetLink but it is used as a local Get Link i.e. a search in a smaller local space.

Conclusion

The Atomspace acts as a search space of knowledge in the opencog system. An intelligent information retrieval system built on the concept of Hypergraphs. An atom is the smallest unit to be found in that hypergraph which has multiple links and reference added to it that attach values and properties to it. the current implementation has memory management and performance issues while being thread safe. There have been discussion to replace the atoms with atomlinks in the graph where multiple Atomspaces are needed or stacked layers of Atomspaces are needed.

In the next blog, we will have a explore the Economics of Attention Mechanism in the OpenCog.

References

https://wiki.opencog.org/w/The_Open_Cognition_Project