PSOA RuleML Meets Graph Databases

From RuleML Wiki
Jump to: navigation, search

Author: Harold Boley
RuleML Technical Memo


This document focuses on how PSOA RuleML meets Graph Databases and generalizes them. See also the parent document (PSOA RuleML Bridges Graph and Relational Databases) and the sibling document (PSOA RuleML Meets Relational Databases).

The paradigmatic notion of a graph in Graph Databases was initially defined as a Directed Labeled Graph (DLG) as utilized in mathematics, linguistics and AI (semantic nets, knowledge graphs, etc.), economics (e.g., logistics), sociology (social networks), etc. This has been generalized in various ways, e.g. (see Grailog) to the (combinable) definitions of a graph 1) whose nodes (vertexes) need not be atomic but can be graphs recursively (hierarchically), 2) whose directed arcs (edges) need not be binary but can be hyperarcs connecting any number of nodes, and 3) whose (hyper)arc labels become node-reified to labelnodes used as a 'zeroth' (hyper)arc-connected node.

Referring to PSOA RuleML Meets Relational Databases, generalization 2) is of special interest, since its directed hyperarcs permit to represent even the original tuples-as-rows of Relational Databases in (Generalized-)Graph Databases, while the established records-as-rows of Relational Databases are already closer to (DLG-)Graph Databases anyway.

Both the initial DLGs and the three generalized definitions can be captured by the PSOA RuleML metamodel applied to a correspondingly generalized PSOA RuleML. Lisp's property lists, RDF's triples, many other representations as well as F-logic's and PSOA RuleML's frames capture the initial typed-node-with-independent-labeled-outgoing-arc structures for building a DLG. The following discussion proceeds from frames to PSOA's generalized structures.

A DLG can be regarded as a set of triples (nodeI, labelK, nodeJ), representing nodeI's outgoing labelK-labeled arc to nodeJ. These can be grouped by their first element, nodeI, such that the set's m nodeI-repeating triples (nodeI, labelK1, nodeJ1), ..., (nodeI, labelKm, nodeJm) are represented as one untyped frame (nodeI, (labelK1, nodeJ1), ..., (labelKm, nodeJm)) describing the Object IDentifier (OID) nodeI with m (labelK, nodeJ) pairs. Besides its description by pairs, the OID nodeI can also be specified to be a member of a class, leading to typed frames.

The PSOA RuleML metamodel in Ref. 15, Appendix A, defines a systematics of atoms as part of fact bases, which can be brought to bear on untyped and typed frames in Graph Databases.

In PSOA RuleML, a class -- generally, a predicate -- is mandatory, but it can be specified as the universal/root predicate Top. The (labelK, nodeJ) pairs can be refined to predicate-independent slots labelK->nodeJ, leading to the PSOA metamodel's (pn4-corresponding) oidful, slotted, perspeneutral atoms, called "frames" -- untyped if the predicate is Top, and typed if the predicate is anything but Top. Alternatively, the (labelK, nodeJ) pairs can be refined to predicate-dependent slots labelK+>nodeJ, leading to the metamodel's (pv4-realizing) oidful, slotted, perspectival atoms -- untyped if the predicate is Top, and typed if the predicate is anything but Top. Both kinds of slots have a slot name, labelK, and a slot filler, nodeJ, while differing in their use of "-"-marked vs. "+"-marked arrows for indicating a slot being predicate-independent vs. predicate-dependent, respectively.

The PSOA RuleML metamodel accommodates recursive graphs by allowing -- as needed here for frames -- each node, used as an OID or a slot filler, to be a term representing a nested PSOA RuleML knowledge base. The corresponding generalization of PSOA RuleML can make the Rulebase wrapper around the clauses within Assert explicit (cf. the <Rulebase> element of RuleML/XML) and allow terms to be (reified) Rulebases and variables to be bound to them.

Developing earlier work (cf. Ref. 7, Sect. 2), the PSOA RuleML metamodel also accommodates: directed hypergraphs, by allowing directed hyperarcs connecting any number of nodes in sequence; directed labelnode (hyper)graphs; and -- as shown here -- their combination to directed labelnode hypergraphs. The generalization of DLGs to directed hypergraphs permits to expand Graph Databases towards Relational Databases (PSOA RuleML Meets Relational Databases), providing the latter not only with a visualization generalizing DLG visualization, but also permits to generalize graph-theoretical notions and algorithms such as, respectively, paths and path tracing to hyperpaths and hyperpath tracing etc., as explored, e.g., in the Grailog context.

As shown in the box (node) & arrow (hyperarc) diagram of the Rich TA Data Example, a directed hyperarc can: be unlabeled (for a tuple) vs. labeled (for the name of a slot); and not use an initial labelnode (hence not specifying a predicate that the descriptor -- i.e., the tuple or slot -- is dependent on) vs. use an initial labelnode (namely for the predicate, distinguished by the zeroth connected node having an oval-like shape, that the descriptor is dependent on). While the Rich TA Data Example uses slot labels only for hyperarcs connecting two nodes (i.e., for arcs), when introducing tuple-valued slots a directed labeled hyperarc can also connect any number of nodes. Thus, in the Rich TA Data Example, the unlabeled hyperarc for a predicate-independent 1995 date-of-birth could be labeled with the tuple-valued-slot name "dob" while the unlabeled hyperarc using the initial labelnode Teacher could be labeled "dop" for the predicate-dependent days-of-presence (cf. Ref. 15, Sect. 1, Footn. 7). The Rich TA Data Example also illustrates: for six arrows, a labelnode-initiated hyperarc (each for a predicate-dependent descriptor) uses an (additional) arrow head to distinguish its first connected node as the OID node and to indicate the OID's predicate membership; for three arrows, a not labelnode-initiated hyperarc (each for a predicate-independent descriptor) is sufficient to distinguish its first connected node as the OID node. All of these oidful cases can be made oidless by just having the OID node box degenerate to a "branch line" (Ref. 7, Sect. 2) while preserving its graph connectivity.

DLGs can be queried with triple-based languages such as SPARQL. Likewise, typed-frame-based DLGs in PSOA RuleML are slotributed by PSOATransRun to conjunctions of single-slot typed frames -- corresponding to typed triples -- for knowledge bases and queries. On top of frame facts, rules can be defined with PSOA RuleML for execution by PSOATransRun, e.g. deriving frames when queried ('on demand') from frame and/or relational facts and/or built-in calls and/or via other rules. Similarly, directed labelnode hypergraphs in PSOA RuleML can be transformed and queried by PSOATransRun. Developing the Rich TA Data Example, the Rich TA Rule Example in Ref. 15, Sect. 4.1, Fig. 2, demonstrates the derivation of a TA's slot from a slot retrieved in two frame/relationship facts under the Teacher and Student perspectives.