PSOA RuleML Meets Relational Databases

From RuleML Wiki
Jump to: navigation, search

Author: Harold Boley
RuleML Technical Memo

This document focuses on how PSOA RuleML meets Relational Databases and generalizes them. See also the parent document (PSOA RuleML Bridges Graph and Relational Databases) and the sibling document (PSOA RuleML Meets Graph Databases).

The paradigmatic notion of a relation in Relational Databases and the Relational Model was originally defined as a table of rows that are "tuples" as utilized in mathematics, logic, physics, engineering, etc. This has changed to the meanwhile established definition of a table of rows that are "records" as popular in computer science. Both the original and the established definition of row as well as their key-motivated enrichments, wide-spread (through SQL) in modern Relational Databases, can be captured by the PSOA RuleML metamodel. PSOA RuleML's earlier SQL-PSOA-SPARQL interoperation Use Case (Ref. 7, Sect. 5) introduced the original tuple-like rows by employing role-neutral, position-sensitive columns, whose headings (order-encoding attribute names Col1, ..., ColN) are not needed in PSOA's relationships. The following discussion proceeds to the established record-like rows by employing role-significant, position-insensitive columns, whose headings (role-encoding attribute names) become slot names in PSOA's pairships.

It must be noted that the Relational Model uses the term "relationship" for a key-defined connection between tables, whereas in the PSOA RuleML metamodel and in this document "relationship" is used for the application of a predicate to a sequence of arguments a1 ... aN (which in PSOA can be written as a predicate-dependent tuple +[a1 ... aN]). On the other hand, the term "pairship" is used here for the application of a predicate to pairs of attribute names n and attribute values v (which in PSOA are written as predicate-dependent slots n+>v of slot names n and slot fillers v). Such PSOA RuleML atoms with only predicate-dependent (uniformly "+"-marked) descriptors (tuples and slots) are called "dependent" atoms.

The PSOA RuleML metamodel in Ref. 15, Appendix A, defines a systematics of atoms as part of fact bases, which can be brought to bear on rows as part of tables in the Relational Model.

In the Relational Model, rows are also called "tuples" because E. F. Codd originally defined them as the tuples of an N-ary relation, i.e. of a subset of the (ordered) Cartesian product of N domains; we will label these as original rows. Rows are furthermore called "records": "Later, it was one of E. F. Codd's great insights that using attribute names instead of an ordering would be so much more convenient (in general) in a computer language based on relations" (Relational Model); we will label these as established rows.

Both the original tuple-like rows and the established record-like rows can be represented by PSOA RuleML atoms according to the metamodel and its simplified form in PSOA RuleML#Introduction. The PSOA metamodel provides (de1-constraining) oidless, single-tuple, dependent atoms, called "relationships", for the original ordered rows from the Cartesian product: In PSOA RuleML, a relationship has one dependent tuple. The metamodel provides (de3-corresponding) oidless, slotted, dependent atoms, called "pairships", for the established unordered rows with attribute names: In PSOA RuleML, a pairship has a bag of dependent slots, each pairing a name and a filler.

For unique specification of and reference to a row (usually) within a table, keys are fundamental to the Relational Model. We discuss here a key dimension of the Relational Model in connection with the PSOA RuleML metamodel, namely natural-vs.-surrogate keys. In the Relational Model with original rows, a table's natural key may be defined as one or several tuple positions uniquely specifying and referencing each row. Likewise, in PSOA RuleML with original rows, a fixed-predicate fact base's natural key may be defined as one or several tuple positions uniquely specifying and referencing each relationship. In the Relational Model with established rows, a table's natural key may be defined as one or several attribute names uniquely specifying and referencing each row. Likewise, in PSOA RuleML with established rows, a fixed-predicate fact base's natural key may be defined as one or several slot names uniquely specifying and referencing each pairship.

In the Relational Model, instead of a natural key, a surrogate key is often used as one artificial position or attribute of a, respectively, original or established row for unique specification and reference. Likewise, in PSOA RuleML, a surrogate key can be used as one artificial position or slot name of an original or established row.

However, for a surrogate-key and other purposes, the PSOA RuleML metamodel provides a special construct variants of which exist in object-oriented databases, graph databases, SPARQL, F-logic, POSL, RIF, etc.: the Object IDentifier (OID). While ordinary keys are unique only within a table, PSOA OID constants come in two varieties, both unique on a larger scale: Local OID constants (usually indicated by an underscore prefix) are unique not only within a fixed-predicate fact base (corresponding to a single-table database) but also across a multiple-predicate fact base (corresponding to a multiple-table database) or knowledge base (complementing facts by rules). Global OID constants (employing a globally unique identifier such as an IRI, usually abbreviated by a colon prefix) are unique not only within a single knowledge base (KB) but across KBs networked world-wide. OIDs and slots constitute a starting point for extending Relational Databases towards Graph Databases (PSOA RuleML Meets Graph Databases), providing the latter with constructs of the Relational Model on the instance level and a possible schema level.

In PSOA RuleML with original rows, OIDs lead to (de2-constraining) oidful, single-tuple, dependent atoms, called "relationpoints". In PSOA RuleML with established rows, OIDs lead to (de4-corresponding) oidful, slotted, dependent atoms, called "pairpoints".

Introducing a separate OID construct usable as a surrogate key, rather than introducing an artificial position or slot name declared to be a surrogate key, has another consequence: OIDs are syntactically part of instance-level data and knowledge items whereas the artificial position or slot-name declaration is usually done on the schema level. Thus, PSOA's explicit OIDs contribute to making data and knowledge "self-describing". Using rules, PSOA RuleML can also bring further schema-level declarations to the instance level, e.g. default descriptors (Ref. 15, Sect. 3.3). An optional schema level could still be added to PSOA RuleML, e.g. for signature declarations including for types (as provided by the Relational Model) and possibly modes (to support efficient Logic Programming) of tuple elements and slot fillers.

PSOA RuleML's fundamental objectification transformations, realized by PSOATransRun, of oidless to oidful atoms generate OIDs, e.g. fresh local constants from _1, _2, ..., that can be used as surrogate keys. For a "relational predicate", defined in PSOA as occurring only in a KB's oidless atoms that have one dependent tuple, such static objectification can be complemented by dynamic objectification constructing virtual OIDs at query time if/when needed (Ref. 12, Sect. 3).

Another key dimension of the Relational Model, primary-vs.-foreign keys, can be applied (and generalized) to PSOA RuleML's OIDs (and its predicates providing perspectivity): While a primary OID (and one of its predicates) specifies a row, a foreign OID (and one of its predicates) refers to a row.

Since the Relational Model makes the Closed World Assumption, its negation is Negation-as-failure (Naf). Although several RuleML languages provide Naf, PSOA RuleML as of Version 1.03 does not provide any explicit negation operation. But a Naf built-in was already added to the PSOA RuleML reference implementation PSOATransRun in Version 1.4.2 (PSOA RuleML#PSOATransRun).

The Relational Model allows to define so-called "views", e.g. performing a join operation over several tables. Assuming the original rows, in Datalog such views are called "rules". Assuming the established rows, Datalog-like rules have also been defined, e.g. in POSL and in Datalog RuleML. Moreover, multiples rules can operate over original and established rows, and a single rule can involve original and established rows. PSOA RuleML allows to define KBs consisting of all of these rules, both oidless and oidful.

Furthermore, PSOA RuleML generalizes Datalog-like languages to Hornlogic-like languages allowing nested terms (constructor-function applications) corresponding to nested tables (i.e., tables that are not in the Relational Model's first normal form).

Finally, since the PSOA RuleML metamodel bridges between multiple paradigms including (the original and established) Relational Databases, and PSOA RuleML provides these paradigms in a single language, PSOA-based mapping rules can be defined with Relational Databases as sources and targets of interoperation, as studied, e.g., with Rule-Based Data Access (RBDA).