Atomspace — Hyper-graph information retrieval system

This piece is in continuation to the previous blog on the OpenCog and AGI. In this piece we will understand a core element of the OpenCog system i.e. the Atomspace which is the information storage and retrieval system used in the framework. It is based on the concept of hypergraphs, something that we will look into and develop a understanding of.

Graph Database

Source Wikipedia

A graph database is a database consisting of set objects, which can be node or an edge.

  • Nodes : It represents an entity or quantity of a type as a record in the database. In the above diagram, the spherical entotues are nodes.
  • Edges : connection between two nodes, defines the relationship between the two nodes. As shown in the above diagram, edge connecting the node Alice and the Chess represents member as the edge relationship.
  • Properties : It is the information associated to the nodes. The Name and the age associated with the id:1 node.

The graph databases are built on top of the graph datastructures, which in itself is a rabbit hole to go down and explore all the concepts within it. There are directed, undirected, weighted, oriented graphs, complete graphs, finite graphs, connected graphs Bipartite graph, planar graph, cycle graph, tree,, polytree which you can refere to for indepth understanding. In this setting we will look into the weighted, directed and undirected graphs as a source of architectural understanding.

Hypergraph

Hypergraph structure
Vertices representation

X = {v1, v2, v3, v4, v5, v6, v7} is the set all the vertices in the hyper-graph.

E = {e1, e2, e3, e4} is the set of all the edges in the hyper-graph.

What we observe is e1(yellow), connects v1, v2 and v3 while e2(orange) connects only v2 and v3. Similar connect can be found among v3, v5 and v6.

Hyper-graphs are incidence structures i.e. An incidence structure is a triple (P, L, I) where P is a set whose elements are called points, L is a distinct set whose elements are called lines and IP × L is the incidence relation.

Consider the points and the line in the first diagram, the lines are incident of the three points i.e. they are passing through it or you can say the points are on the lines, both relations are acceptable. This is the incidence relation. Now, in the case of the hypergraph(above diagram), the edge e1 is incident on the vertice v2, since it is passing through it or vice-versa that the vertice v3 is on the hyperedge e1.

Every incidence structure can be viewed as a hypergraph by identifying the lines with the sets of points that are incident with them.

Atomspace

Atom

Atoms can be used for create query language itself i.e. query for query languages. The Atoms themselves are extensible, wrappers can be created with existing libraries to export those functions in the Atomese.

Atoms as symbols, is essentially the same thing as a symbol table. A symbol table commonly has the property that all symbols in it are unique.Once an Atom is placed in the AtomSpace, it gets a single, unique ID. This unique ID is the string name of a Node, and it is the outgoing set of a Link.

A truth value gives each Atom a valuation or an interpretation, thus all Atoms in a given, fixed AtomSpace always carry a default valuation/interpretation along with them. Additional interpretations can be added using the Contextlink. When contexts are used, truth values resemble conditional probabilities; that is, the truth value of an atom A in context C can be roughly understood to behave as a probability P(A|C).

Evaluation link allows a relational model to be specified. The evaluation link is the most central and important atom type in the OpenCog.

The atoms are immutable, can be assigned fleeting, changing values to indicate the truth or likelihood of that atom, or to hold other kinds of transient data.

Atomese

It is a language designed for easy self-introspection As such, it goes well beyond ordinary object reflection or meta-programming, and provides an implementation of self-modifying code that is actually usable in ordinary machine-learning scenarios.it resembles, in many ways, the intermediate languages found inside of compilers: the intermediate language is where a compiler can best “understand” the program, and analyse it’s structure to perform the needed rewrites and optimisations.

For now, the atomese is very slow compiled language. It needs to be fixed making certain optimisation changes.

Attributes as a database

  • Indexes : The Atomspace contains several indexes to provide fast access to certain classes of the atom. Indexes are used to speed up the search.
  • Persistence : It is used as an in-RAM database, with all of the performance advantages of having data cached in RAM. For the content which needs to be stored for a longer period of time, thus, the contents of an AtomSpace can be saved-to/restored-from some storage medium.
  • Decentralised computing vs distributed computing : The design philosophy of encoding all state as graphs, and disallowing any hidden state means that atoms can ber transported from one resource to another resource, or shared among network servers.
  • Query Language : It used pattern search language for quering atoms in the atomspace. Each entry is a graph and not row in a table like SQL.
  • Change notification : It provides a signal whrn an atom is added or removed, thus allowing actions to be triggered as the content change.

Atomspace can be considered analogous to the symbol table i.e its a table and all the atoms are symbols in that table. It can be considered as a blackboard or tuple space. It can be considered as a knowledge representation system too.

Programming Aspect — Current Implementation

  • AtomSpace is threadsafe via C++. Access via scheme is thread-safe but behaves poorly when more than 3 or 4 concurrent threads are running.The scheme is a dialect of Lisp.
  • The SQL storage backend provides user-controlled, on-demand saving and loading of hypergraphs. The existing SQL backend provides multiple parallel asynchronous store-back queues (four threads, by default).
  • Multiple AtomSpaces on multiple network-connected servers can communicate via the Postgres backend. That is, they can share common atoms.
  • The AtomSpace benchmark tool measures the performance of a dozen basic atom and atomspace operations. A diary in the benchmark directory maintains a historical log of measurements.
  • Handles : Handle is a smart pointer to an atom. Knowing the Handle is enough to retrieve the corresponding atom. When all the handles corresponding to the atom are gone, then the atom is also deleted. Smart pointer implement memory management using the reference counting.The garbasge collection impmlmentatio was not a success suince GC has trouble chasing long extremely long chains, which is the case with the std::vector, std::map
  • Different links in the AtomSpace :
  1. Context link, attach values w.r.t the context. E.g. Sky is blue is correct for Earth but not Mars.
  2. Get Link : It returns a search result link, like a memory address of another link.
  3. BindLink : It connects the two search links together in the graph search query path.
  4. Putlink: It add a new link to the graph network and connects it to a another link or keep a new link in isolation.
  5. Filter Link : Similar to the GetLink but it is used as a local Get Link i.e. a search in a smaller local space.

Conclusion

In the next blog, we will have a explore the Economics of Attention Mechanism in the OpenCog.

References

Homo Bayesian