Rule-Based Data Access

From RuleML Wiki
Jump to: navigation, search

Ontology-Based Data Access (OBDA) -- from Ontology-Based Data Integration (OBDI) to Ontology-Based Data Querying (OBDQ) to Ontology-Based Data Management (OBDM) -- has become an active R&D topic in recent years, and is emerging as a major application area of Semantic Technologies for heterogeneous databases. OBDA ontologies encompass rule knowledge to enrich the factual data mapped -- again via rules -- to a global (homogeneous) schema from the local (heterogeneous) schemas of one or more databases. Given these and other roles of rules, we will focus on Rule-Based Data Access (RBDA) -- with a foundation in the basic Rule-Based Data Integration (RBDI), an emphasis on the central Rule-Based Data Querying (RBDQ), and some examination of the advanced Rule-Based Data Management (RBDM). For a tutorial-style introduction see "The Many Uses of Rules in Ontology-Based Data Access" (original abstract, current slides). A preprint of the paper "A Datalog+ RuleML 1.01 Architecture for Rule-Based Data Access in Ecosystem Research", to be presented at RuleML 2014, is available (GeospatialRBDA). For a discussion on related topics, please contact any of the authors, Harold Boley, Rolf Grütter, Gen Zou, Tara Athan, or Sophia Etzold.

1 Overview

OBDA/RBDA-like architectures can be traced back to Stanford University's Scalable Knowledge Composition and Infomaster projects and then to the European INFOMIX IST project. OBDA, with an emphasis on mapping rules, has now become a major technology of the European Optique FP7 Integrated Project. Further pointers/references can be found in GeospatialRBDA.

  • OBDA usually employs the mediator strategy of dynamic mappings -- corresponding to top-down processing and backward reasoning -- but much of the (translator) technology can be reused from the warehouse strategy of static mappings -- corresponding to bottom-up processing and forward reasoning (cf. Data Integration); so we specified an architecture with unified mediator, warehouse, and bidirectional strategies (cf. GeospatialRBDA)
    • Define and validate -- e.g., w.r.t. soundness and completeness -- mappings independently from their warehouse use for static ('compile-time', 'preprocessing') materialization of the entire sources or for their mediator use for dynamic ('run-time', 'on-the-fly') rewriting of single queries
    • Specialize knowledge (fact, ontology, and rule) transformation methods (as described, e.g., in InteropGraphRel) to data (ground-fact) transformation techniques reusable across materialization (warehouses) and rewriting (mediators)
    • Explore further warehouse/mediator convergence by developing incremental updating for warehouses and caching retrieval for mediators
  • RuleML can support OBDA for all purposes discussed in "The Many Uses of Rules in Ontology-Based Data Access" (original abstract, current slides) and the paper "A Datalog+ RuleML 1.01 Architecture for Rule-Based Data Access in Ecosystem Research", to be presented at RuleML 2014
  • As a Web-logic interchange language, RuleML/XML (and a future RuleML/JSON) enables customized ontological expressivity levels on the global side of OBDA's Global-As-View (GAV) mappings, usable both for warehouses and mediators
    • Recent developments proceed from the level of OWL 2 QL (cf. The DL-Lite Family and Relations) to the more expressive OWL 2 RL (cf. Zhou/Grau/Horrocks/Wu/Banerjee 2013) as well as to various Datalog+/- variants based on n-ary predicates (cf. Gottlob/Pieris 2013 and Extended Version Gottlob/Orsi/Pieris 2011)
    • Deliberation RuleML 1.01 offers MYNG-customized languages for all levels including the anchor sublanguage Datalog RuleML and its Datalog+ variants
    • For evolving expressivity requirements, RuleML/XML sublanguages, OWL 2/XML profiles, CL/XCL 2 dialects, and further sublanguages can be (statically or dynamically) translated into each other (using XSLT, ANTLR, etc.)
      • RuleML/XML can act as a canonical interchange format for data facts and queries (ground atomic formulas and existential conjunctions of atoms) as well as a mapping rule language, e.g. between object-centered facts and relational queries (example: translators for PSOA RuleML)
    • Appropriate engines such as OO jDREW, Prova, DR-Device, Spindle, Flora 2 (based on XSB Prolog), and RDFox can then be associated with these sublanguages for (performance-, interaction-, ...)optimized (query) processing
    • Thematic-ontological knowledge will be exemplified with qualitative geospatial concepts such as nearness, as studied, e.g., at WSL, and betweenness, as studied, e.g., at UofT, UNB, and WSL
      • Considering a simple example, the global Datalog-rule-formulated symmetry knowledge (":-" is read "if" and variables are capitalized)
        betweenRel(Outer2,Inner,Outer1) :- betweenRel(Outer1,Inner,Outer2)
        about a ternary relation betweenRel makes a query like betweenRel(lausanne,bern,zürich) succeed based on the ground fact betweenRel(zürich,bern,lausanne), which itself might be the result of the relational projection-performing GAV mapping
        betweenRel(Outer1,Inner,Outer2) :- geoDB.betwTab(Id,Outer1,Inner,Outer2)
        from the data row [id17,zürich,bern,lausanne] of a relational table betwTab in the local database geoDB.
      • The above example can be extended by further global knowledge, e.g. about centrality, with a Datalog rule
        centralRel(Inner) :- betweenRel(Outer1,Inner,Outer2), betweenRel(Outer3,Inner,Outer4), allDiff(Outer1,Outer2,Outer3,Outer4)
        defining a unary relation centralRel, so that a query like centralRel(bern) succeeds based on the ground facts betweenRel(zürich,bern,lausanne) and betweenRel(basel,bern,zermatt) and a call to the polyadic allDiff predicate ensuring that zürich, lausanne, basel, and zermatt are pairwise different individuals. In a modular fashion, besides the above projection mapping, more GAV mappings such as the relational join-performing
        betweenRel(Outer1,Inner,Outer2) :- townDB.northSouthTab(Id1,Outer1,Inner), townDB.northSouthTab(Id2,Inner,Outer2)
        could be added for betweenRel, along with local data such as the rows [id54,basel,bern] and [id56,bern,zermatt] of a relational table northSouthTab in the local database townDB.
  • RuleML also enables interoperation on the local side of OBDA's GAV mappings, supporting relational databases, (RDF-like) graph databases, and their combination (PSOA RuleML)
  • Grailog can be used for OBDA/RBDA data and knowledge visualization

2 Rationale

The approach to Knowledge-Based Data Access (KBDA) persued here is one of Rule-Based Data Access (RBDA). The rationale for RBDA is the following.

  • Proceeding from warehouses to mediators, the expressiveness of rule systems (e.g. structured in Fig. 1 of Overarching) permits capturing the needs of Data Integration, Access, and Management with OBDI, OBDA, and OBDM conceived as RBDI, RBDA, and RBDM, respectively
    • The language of the global schema can be enriched from unary/binary to n-ary predicates (cf. 3-ary betweenRel above and Datalog+/- below)
    • If decidability of querying is not required, the language can be further extended from Datalog and description logic to Datalog+, Horn logic, FOL, and beyond, as enabled by Deliberation RuleML 1.01
    • Moreover, Reaction RuleML 1.0 can express updates as needed for OBDM (Ontology-based Data Management)
  • The single paradigm of rules can be employed uniformly for all purposes, from queries to knowledge bases to mappings (as indicated in The Many Uses of Rules in Ontology-Based Data Access)
    • The interface between deductive rules and mapping rules can be modified in an agile manner without need for crossing paradigm boundaries
    • Two (extreme) normal forms can be defined (with various other forms anywhere in between)
      • KB-directed normal form: The mapping part is separated from any deduction possible at this stage, performing only atom-level, 'table-to-predicate' renamings, e.g. given the earlier combined betweenRel(Outer1,Inner,Outer2) :- geoDB.betwTab(Id,Outer1,Inner,Outer2), separate the GAV mapping betweenRelRen(Id,Outer1,Inner,Outer2) :- geoDB.betwTab(Id,Outer1,Inner,Outer2) from the projection betweenRel(Outer1,Inner,Outer2) :- betweenRelRen(Id,Outer1,Inner,Outer2)
        • Pro: Can support readability
        • Con: May require several intermediate relations such as betweenRelRen
      • Mapping-directed normal form: The mapping part incorporates all deduction possible at this stage, e.g. merging the earlier deductive rule betweenRel(Outer2,Inner,Outer1) :- betweenRel(Outer1,Inner,Outer2) into the earlier GAV mapping betweenRel(Outer1,Inner,Outer2) :- geoDB.betwTab(Id,Outer1,Inner,Outer2) to obtain betweenRel(Outer2,Inner,Outer1) :- geoDB.betwTab(Id,Outer1,Inner,Outer2)
        • Pro: Can support efficiency
        • Con: May lead to incompleteness such as by making original betweenRel(Outer1,Inner,Outer2) unprovable
    • The (unique-name and closed-world) assumptions of databases are accommodated by the default assumptions of rule systems

3 Sublanguages

Starting the ontology sublanguages for OBDA with RDFS, [1] shows how to add positive and negative inclusion axioms, arriving at OWL 2 QL (which corresponds to DL-LiteR, cf. [2]). Expressivity beyond OWL 2 QL allows enriched OBDA, moving it to RBDA, as described in the following.

3.1 OWL 2 RL

The OWL 2 RL[3] profile (corresponding to Description Logic Programs[4]) is definable in RIF-Core [5] and interchangeable as DLP RuleML/XML. For a syntactic/semantic subset discussion, see Michael Schneider's Answer to OWL2 RL vs. OWL2 RL/RDF Rules[6]. The semantics of OWL 2 RL can be easily customized to various expressivity and efficiency requirements[7] by adding to or deleting from its first-order implication rule definition[8].

3.2 Datalog+/-

Datalog+/- provides a unifying framework for ontology querying and is interchangeable as Datalog+/- RuleML/XML. The following presentation is based on Gottlob/Pieris 2013.

  • Datalog[∃,=,⊥] defined by three syntactic extensions (+): head existentials, equality, falsity
  • But query answering is already undecidable under Datalog[∃]
  • Hence introduce Datalog[∃,=,⊥] restrictions (-)

3.2.1 Decidable Datalog[∃] Restrictions

  • Guarded Datalog[∃]
    • can be implemented with a new guard edge in Implies, that replaces the bound variable list of the universal. May be implemented in Deliberation RuleML 1.x (see the Guarded_Rules issue)
  • Linear Datalog[∃] (cf. Linear Rules)
    • Linear rules are a special case of guarded rules, where the body contains only the guard. May also be implemented in Deliberation RuleML 1.x (see the Linear_Rules issue)
  • Sticky Datalog[∃]
    • Requires Schematron-style context-sensitive schema (see the Sticky_Rules issue)

3.2.2 Decidable Datalog[∃,=,⊥] Restrictions

Every decidable Datalog[∃] can be enriched with Datalog[=], provided equality does not interact with head existentials, and with Datalog[⊥]. The requirement of "does not interact" means either that the quantified predicates in the body of an EGD (rule with equality in head) are either

  1. never existentially quantified in the head of an existential rule OR
  2. if they do appear in an existential head, then there is a complex restriction on which positional arguments are quantified where.

Neither of these restrictions can be implemented in Relax NG or XSD.

3.3 PSOA Datalog RuleML

Restrictions of PSOA RuleML for RBDA: (1) From Hornlog to Datalog. (2) PSOA Datalog RuleML represented as Datalog[∃OID], where only OIDs occur as head existentials, which can be further restricted like Datalog[∃], each restriction being interchangeable like PSOA Datalog RuleML/XML.

4 ΔForest Case Study: Susceptibility of Forests to Climate Change

As part of a WSL project on forest vulnerability classes, we have been conducting the ΔForest (DeltaForest) case study on RBDA. Three tabular data sets, available at WSL, have been used as local sources:

  • Productivity Research Areas: Ertragskundeflächen (EKF)
    • 83 areas
    • Pure stands
    • Approximate 10-year intervals (time series of different lengths)
    • Data: .dbf/.csv files
    • 3 tables
  • Natural Forest Reserves: Naturwaldreservate (NWR)
    • 36 areas
    • Pure and mixed stands
    • 10-year intervals
    • Data: .txt files
    • 1 file per area
  • Long-term Forest Ecosystem Research: Langfristige Waldökosystemforschung (LWF)
    • 18 areas
    • Pure and mixed stands
    • 5-year intervals (since 1995)
    • Data: Oracle DB
    • RDB tables

These sources provide information about disjoint regions of the Swiss forest such as tree stem diameters and forest stand densities. Since the tables of the local EKF, NWR, and LWF data sets have non-identical but overlapping columns, the need arose within WSL to integrate them under a global schema prior to analyzing the information w.r.t. forest (climate-change-)vulnerability classes. The required global table columns were identified by a WSL domain expert and mapping rules from the local tables were defined for these. The mapping rules were implemented in R as well as in Datalog RuleML. For the data integrated under the global schema, deductive rules were implemented in Datalog+ RuleML to derive implicit information. RBDA has allowed to uniformly represent both mapping and deductive knowledge as rules, making it possible for the forestry-domain experts to provide and check the knowledge formalized by the semantic-technology experts. ΔForest is further described in Section 3 of GeospatialRBDA.

5 Resources

5.1 OBDA/RBDA

5.2 OBDA

5.3 RBDA

5.4 OWL 2 QL (DL-Lite)

5.5 OWL 2 RL (DLP)

5.6 Datalog+/-

5.7 PSOA Mapping Rules

5.8 Geospatial KR

5.9 Geospatial Efforts

5.10 APIs

  • Java Database Connectivity (JDBC)
  • Complete OO jDREW API (COjDA)
  • Java Specification Request for a Java Rules Engine API (JSR-94)
  • Rulestore API: Provides RuleML rules as Linked Data objects
  • API for Knowledge Bases (API4KB): Emerging OMG metamodel, not implementation, for KBs with DBs as a special (ground-fact) case

6 References

  1. http://webdam.inria.fr/Jorge/html/wdmch9.html Querying Data through Ontologies
  2. http://www.jair.org/papers/paper2820.html The DL-Lite Family and Relations
  3. OWL 2 RL: http://www.w3.org/TR/owl2-profiles/#OWL_2_RL
  4. Description Logic Programs: http://www2003.org/cdrom/papers/refereed/p117/p117-grosof.html
  5. RIF Core: http://www.w3.org/TR/rif-owl-rl/
  6. OWL2 RL vs. OWL2 RL/RDF Rules http://answers.semanticweb.com/questions/13690/owl2-rl-vs-owl2-rlrdf-rules
  7. Extending OWL RL (customizing OWL 2 RL as a list of rules): http://dallemang.typepad.com/my_weblog/2010/08/extending-owl-rl-.html
  8. http://www.w3.org/TR/owl2-profiles/#Reasoning_in_OWL_2_RL_and_RDF_Graphs_using_Rules