Graph-Relational Data, Ontologies, and Rules

From RuleML Wiki
Jump to: navigation, search

Author: Harold Boley
RuleML Technical Memo


Combinations of the Semantic Web, Graph Databases, and Logic Programming are explored. Data are complemented by Knowledge (Ontologies and Rules). Graphs and Relations are integrated. The ensuing synergies cross-fertilize tools and techniques and contribute to eXplainable Artificial Intelligence and IT at large. As an example, representations of Cypher's Labeled Property Graphs in RDF and PSOA RuleML are discussed.

1 Introduction

In this document, three facets will be considered:

  • Research and Development (R&D) Field: Semantic Web, Graph Databases, and Logic Programming
  • Information Category: Data, Ontologies, and Rules
  • Structuring Paradigm: Graphs and Relations

Here is an initial refinement of the facets:

Field: The R&D fields of the Semantic Web, Graph Databases, and Logic Programming are each progressing on their own and in combinations as depicted by the following visualization:

SemanticGraphLogicSystems.png

Their three pairwise combinations visualized by the two-fold overlap areas are characterized here with synergies as follows:

  • Graph Databases and Logic Programming: Relational atomic formulas (atoms) from Logic Programming can be enriched by atoms corresponding to ground, i.e. variable-free, terms (instances) or non-ground, i.e. variable-containing, terms (patterns) of vertices (nodes) with outgoing labeled edges (arcs) from Graph Databases, as shown, e.g., by LIFE, F-logic, RIF, and PSOA RuleML; conversely, Graph Databases can be enriched by rules for deduction as in Logic Programming; see, e.g., PSOA RuleML Bridges Graph and Relational Databases
  • Graph Databases and Semantic Web: Semantic Web Triple Stores constitute a paradigm for knowledge graphs and for (RDF) Graph Databases queried with SPARQL, possibly built from natural-language texts using extraction methods (as for DBpedia), and Semantic Web tools such as reasoners could be adapted for Graph Databases; conversely, Graph Databases have developed novel modeling methods such as Labeled Property Graphs, and their techniques for time and space efficiency can benefit Semantic Web implementations
  • Logic Programming and Semantic Web: Semantic Web methods such as for text extraction can be transferred to build fact-and-rule bases for Logic Programming, and its Formal Concept Analysis taxonomy generators and Description Logic ontology classifiers, e.g. for OWL TBoxes, could be adapted to preprocessors building type (sort) lattices for sorted and graph-relational Logic Programs from class descriptions; conversely, Ontology languages can be defined via Rules, and Logic Programming methods such as the polyadic relations of Datalog, optionally augmented by the polyadic functions of Horn logic, can be transferred to Semantic Web rules, as shown, e.g., by RIF and RuleML

Category: While much of the current effort centers on Big Data, sometimes extended to Smart Data etc., the information category of Data will be complemented here by the category of Knowledge (Ontologies and Rules). Ontologies among other things support the specification of schema-generalizing semantics for Data. Rules (generalizing database views) support, e.g., the derivation of new Data from given Data.

Paradigm: The structuring paradigms of Graphs and Relations will be bridged (for both Data and Knowledge), resulting in an integrated Graph-Relational paradigm. The industry-leading Graph Database subparadigm of Labeled Property Graphs will be of special interest.

This document proceeds to sections on: the integrated structuring paradigm for the information categories in the combination of all three fields, comparing Labeled Property Graphs with PSOA RuleML and RDF, broadening the scope to Artificial Intelligence (AI) and IT at large, as well as conclusions. Some of these topics are also introduced in accompanying slides.

2 The GraRelDOR Combination

The fields' three-way combination centered on the above visualization's three-fold overlap area highlights:

  1. the Semantic Web's notion of (domain) "semantics" for Data, extending its "Web" aspect to include intranets, the Internet of Things, and other networked systems,
  2. the "graph" structures of Graph Databases (complementary to Relational Databases), expanding their "data(base)" aspect to include Knowledge, and
  3. the computational "logic(s)" of Logic Programming, expanding its Relational "programming" aspect to include -- declarative -- specification as well as -- high-level -- (meta)modeling with Graph-Relational Rules.

Within the three-way combination of these fields, the focus here will be on the integrated Graph-Relational (GraRel) paradigm for joined Data, Ontology, and Rule (DOR) information, i.e. on Graph-Relational Data, Ontologies, and Rules (GraRelDOR).

Expanding on earlier work such as mappings of Relational Databases to RDF (see, e.g., RDB2RDF) and on synergies of the pairwise combinations in the previous section, GraRelDOR is leading to, e.g., the following synergies:

  • Rule languages can be employed for inference over and transformation of graph data, where
    • any rule term (e.g., a graph-node variable) can be constrained by an ontology class, as e.g. done for RDFS taxonomies in N3 (cf. W3C Notation 3 Community Group), which replicates a graph arc as a node-attached attribute/value slot, in languages using Object IDentifiers (OIDs), e.g. in PSOA RuleML, referred to as an OID-describing name/filler slot, and in RDF-like languages, including N3, as a Subject-Predicate-Object triple, and
    • an entire rule atom applying a relation to one or two terms can be replaced by the application of a Description-Logic expression to the terms, as e.g. done for the SWRL (cf. SWRL2SPIN) combination of OWL and Unary/Binary Datalog RuleML, the latter encoding a graph arc as a binary relation (attribute) applied to a node and a value
  • Unified GraRelDOR languages that replicate a graph arc as a node-attached attribute/value slot can be defined from scratch, e.g. one for which Tarski models for graph knowledge (in the tradition of the Semantic Web) are equivalent to Herbrand models for graph knowledge (in the tradition of Logic Programming):
    • Since RuleML assumes no default semantics (e.g., a model theory) for, say, its fact bases, a version of PSOA RuleML could be developed for interchanging, e.g., Graph Databases even if they do not come with a formal semantics, while
    • RuleML also permits the definition of, and reference to, semantics styles, including predefined styles, which could be developed for specifying, e.g., a semantics for a Graph Database standard
  • Language translation and execution methods can be cross-fertilized, e.g. as in the PSOATransRun reference engine for the graph-relational Horn-level PSOA RuleML language with rule-implemented "subClassOf" taxonomies as lightweight domain-semantics ontologies
  • Transitioning from PSOA Hornlog's expressiveness level to First-Order Logic (FOL), a PSOA FOL could be similarly defined and realized by reasoning engines (analogous to the transition from RIF-BLD to RIF-FLD); likewise, GraRelDOR languages on the advanced levels of higher-order and modal (e.g., deontic) logics
  • Ontology-Based Data Access (OBDA) and Rule-Based Data Access (RBDA), founded on semantic ontologies and Datalog rules, through PSOA Datalog can benefit from not only allowing relational data but also graph data
  • Techniques of (heuristic) search over explicit and implicit graphs can be joined with (logical) deduction and (domain) semantics for GraRelDOR retrieval and inference
  • Examples, test cases, and benchmarks of one field can be (partially) transferred to the other two fields, e.g. from the DB-Engines Ranking of Graph DBMS to Semantic Web and Logic Programming engines
  • Use cases, e.g. for Logic Programming-style query answering (e.g. over the "Air Traffic Control KB", built on top of an FAA dataset), can likewise be (partially) transferred across the fields, and new ones be (jointly) built for the three fields in combination

3 From Encoding to Replicating Labeled Property Graphs

Cypher's Neo4j-developed Labeled Property Graphs constitute a central input to the rising industry standard GQL, proposed for specification at ISO (cf. W3C Workshop on Web Standardization for Graph Data).

The representation of Labeled Property Graphs in Semantic Web / Logic Programming languages can be done by (indirect, not round-trippable) encoding or by (direct, round-trippable) replication.

For RDF encodings of Labeled Property Graphs, see Jesús Barrasa's RDF Triple Stores vs. Labeled Property Graphs, which could be transferred to other languages such as PSOA RuleML. Also see Property-Graphs-in-RDF discussion.

For replications in PSOA RuleML and an extended RDF, the Cypher example Jesús Barrasa gives about a flight between New York City and San Francisco, formalizing "Connection NYC-SFO is $300 and 4100Km", can be used (focusing the CREATE statement):

CREATE({name:"NYC"})
      -[:CONNECTION{distanceKm : 4100, costUSD : 300}]->
      ({name:"SFO"})

In PSOA RuleML, this can be replicated as follows:

Assert(_#Top(name->"NYC")(
         :CONNECTION(distanceKm->4100 costUSD->300)->
           _#Top(name->"SFO")))

Inside PSOA's Assert(...) statement, a typed-frame syntax is uniformly used, where PSOA's slot arrows, "->", uniformly

continue the "-[...]->" arrow part after the "[...]" of Cypher's "relationships" (Cypher's arrow part before the "[...]" becoming the PSOA opening parenthesis at the end of the first line), and
pair the Cypher "key" and "expression" of ":"-infixed node (e.g., name:"NYC" becoming name->"NYC") and relationship (e.g., distanceKm : 4100 becoming distanceKm->4100) properties.

The two PSOA "_" occurrences denote unique local constants for the NYC and SFO nodes; they are typed, via a "#" infix, by PSOA's generalized taxonomy root, Top, a syntax that encourages semantic precision via refinement to a specific class such as USairport [which is a subclass of Airport (used in the rule of the next section), which is a subclass of ... Top].

The top-level PSOA frame has a single slot and uses three embedded frames, namely

  1. the OIDful, single-slot frame Atom _#Top(name->"NYC") as its OID and, similarly, the frame Atom _#Top(name->"SFO") as the slot filler, as well as
  2. the OIDless, 2-slot frame Expression :CONNECTION(distanceKm->4100 costUSD->300) as the slot name [like "terms with named arguments" in RIF-BLD, such slotted expressions are allowed in PSOA RuleML and, while they are not yet implemented in PSOATransRun, they could be implemented by two-sided (pattern-pattern) "slotted unification", as done for POSL "structures" in OO jDREW, which for Graph Databases could be specialized to the computationally less complex one-sided (pattern-instance) "slotted matching"]

A common diagram form for both the Cypher and PSOA versions could diagrammatically expand the 'symbolic leftovers' {name:"NYC"}, {name:"SFO"}, and {distanceKm : 4100, costUSD : 300} in the diagram of the slide "LPG in Neo4j: Connection NYC-SFO is $300 and 4100Km" from Jesús Barrasa's above-linked presentation, employing subdiagrams for three complex nodes, having their own internal ("recursive") graph structure, as possible using the Grailog framework.

Since a multiple-slot frame is equivalent to a conjunction of single-slot frames, and a single-slot frame corresponds to a (typed) triple, this PSOA replication of the Cypher example could be transferred to a Higher-level RDF language extended mainly by a construct corresponding to PSOA's (slot-name-)embedded slotted expressions, perhaps complementing N3's "Formulae allowing N3 graphs to be quoted within N3 graphs using { and }" (https://www.w3.org/TeamSubmission/n3/#Quoting).

However, unlike the encodings, such replications are no longer on the syntactic level of Datalog:

  • While the frame atoms of 1. could be unnested, 'leaving behind' their OIDs, the frame expression of 2. could not, thus arriving at the Hornlog level
  • Since the frame expression of 2. is actually used in the slot-name position, it introduces a higher-order aspect

4 GraRelDOR, AI, and IT

The combined endeavor of GraRelDOR reinforces each of the three component fields and can provide a support layer for AI (as well as Cognitive Science), tying into AI subfields such as

as well as into Cognitive Science subfields such as

Because GraRelDOR systems use explicit ontologies and declarative rules as well as producing (internal) proofs for each query they answer, the requirement for AI results being explainable -- i.e., eXplainable AI (XAI) -- can be achieved by appropriate (external) renderings of proofs, e.g. in a Controlled Natural Language.

Hybrid forms of XAI could result from coupling GraRelDOR with, e.g., Event-Condition-Action Rules (cf. Reaction RuleML), Statistical Relational AI (broadened to Statistical Graph-Relational AI), and/or Artificial Neural Networks.

For example, building on the PSOA frame fact from the previous section, a PSOA "frame :- frame" rule (the ":-" infix stands for "is implied by"), dubbed the one-way return heuristic, could hypothesize that the (distance and) cost of a one-way inbound or outbound flight between two arbitrary Airports is the same as the cost of the corresponding one-way outbound or inbound flight, respectively ('self-chaining' loops of such rules, e.g. when swapping symmetric parameters more than once, are prevented by termination-supporting target engines like XSB):

Assert(
  Forall ?a ?b ?m ?n ?d ?c (
    ?b#Airport(name->?n)(
      :CONNECTION(distanceKm->?d costUSD->?c)->
        ?a#Airport(name->?m))
      :-
        ?a#Airport(name->?m)(
          :CONNECTION(distanceKm->?d costUSD->?c)->
            ?b#Airport(name->?n))
  )
)

However, this being only a rule of thumb, since there may well be different inbound and outbound costs, the rule's ?c parameter could be statistically weakened to, say, 70%, by annotating it with 0.7 (cf., e.g., Uncertainty Reasoning TG).

The diagram form discussed for facts in the previous section has been extended to rules, for which Grailog introduces an "implies" arrow between complex nodes (for special cases simplified to an arrow between arrows) and a universal-closure shorthand (to avoid explicit Foralls). Complementing (2D-visualized) Grailog graphs for taxonomies and general ontologies, such Grailog graphs for rules permit declarative knowledge elicitation, explanation and validation, supporting another aspect of XAI.

Like AI itself, GraRelDOR systems are also relevant for IT at large, including for horizontals such as

as well as for verticals such as

To accommodate legacy relational representations in, e.g., AI knowledge bases and IT databases, GraRelDOR strieves toward a wide, systematic bridge for interoperation with recent graph representations, also allowing various intermediate forms.

5 Conclusions

GraRelDOR is emerging as a combined R&D field whose synergies contribute to each of the participating fields, their (partial and full) combination, (X)AI, and IT at large, with use cases and applications combining earlier individual ones as well as arising as new opportunities. An example in one direction of cross-fertilization is replicating Labeled Property Graphs in PSOA RuleML and in an extended RDF and SPARQL. An example in another direction is reusing taxonomy generators and ontology classifiers to build sort lattices for Graph-Database / Logic-Program schemas. Future work includes broadening Linked Data to Linked GraRelDOR and proceeding from declarative GraRelDOR to hybrid XAI systems with additional Rules that are, e.g., state-changing, probabilistic, and/or neural-level.