MYNG

From RuleML Wiki
Jump to: navigation, search

Authors: Tara Athan, Harold Boley

In 2011, a re-engineering and re-conceptualization of the non-SWSL portion of the Derivation Rule subfamily of RuleML in the Relax NG Compact (RNC) schema syntax was conducted. Because of the large number of sublanguages available through the re-engineered approach, we also developed the Modular sYNtax confiGurator (MYNG, pronounced "ming" or "my N G"), consisting of a GUI front end to a PHP script, for custom configuration of RuleML sublanguages through on-the-fly building of a schema driver file that includes a selection of modules. A companion paper describing the results of this effort, Design and Implementation of Highly Modular Schemas for XML: Customization of RuleML in Relax NG (citation), was presented in November 2011 at RuleML2011 - America.

Since then, the MYNG system has become increasingly central to Deliberation RuleML (including Derivation Rule) specifications (latest version), myng-code URLs have been made available for all Deliberation RuleML sublanguages, several other MYNG-related papers have followed (@@@), the RNC specification of Reaction RuleML has benefited from it, and MYNG has found its way into teaching Semantic Technologies. The role of MYNG will further increase as it will begin to support managing the expanding set of Anchor Sublanguages, finding the lowest schema for instance input, and more.

The unabridged details of the MYNG project, including a demo of validation using the normative Relax NG schemas, are presented here.


Contents

1 Abstract

RuleML is a family of languages for Web rule interchange that was originally specified in Document Type Definitions (DTDs), then switched to XML Schema Definition Language (XSD) schemas. Here we present a re-engineering of the non-SWSL portion of the Derivation Rules subfamily of RuleML in the Relax NG Compact (RNC) schema syntax. The benefits arising from RNC schemas include decreased positional sensitivity, greater flexibility in modularization (from fine-grained modular to monolithic), and increased capability for orthogonalization, as well as unification of human-readable ("Content Models") and machine-readable (XSD/XML) versions. We introduce a Relax NG schema design pattern, enforced by an RNC parameterized schema, that guarantees monotonicity (grammatical expansion implies syntactic containment) when any of a large number of small expansion modules are mixed-in with a few base modules. The original fifteen RuleML sublanguages are thus embedded in a syntactic lattice with hundreds of thousands of nodes, each corresponding to a viable language. The original RuleML sublanguages are available through redirected URLs, and customized languages are available to advanced users. An invertible mapping from the syntactic lattice into a subset of a grammatical lattice identifies the schema for each language as the merger of a subset of the auxiliary (expansion and base) modules. To manage this large language family, a GUI web-app serves as the front-end to a PHP-driven parameterized schema that takes a selection of customization options and returns the main schema module. These options are encoded to facilitate determination of syntactic containment between any pair of languages. Like earlier RuleML language hierarchies, logical expressivity forms a backbone for the language lattice. The RNC parameterized schema serves as a pivot format from which XSD schemas, statistically-random XML test instances, monolithic simplified RNC content models, and HTML documentation are automatically generated. The RNC-based re-engineering of Derivation RuleML has already led to the discovery and patching of errata in RuleML versions and 1.0, as well as to suggested enhancements of version 1.0 and a newly conceived version 1.1.

2 Goals

  • Maximize Alignment with Semantics: to the extent possible, semantic constraints should be incorporated into the schema. For example, the semantics tells us that an equivalence is syntactic sugar for two implications. Thus the content model for the two sides of an equivalence should be the intersection of the content models for premises and conclusions of implications.
  • Maximize Customizability: A fine-grained, highly cohesive, and loosely-coupled modular schema design will allow a user to custom-build a RuleML sublanguage by assembling a selection of modules (see #GUI). For example, an enhancement is proposed to extend binary (2-argument) languages from Datalog to Horn logic and higher expressivity (see #Binary_Languages_with_Various_Levels_of_Expressivity), and this will be easily implemented if the schema definitions for (the number of) positional arguments are orthogonalized relative to those that determine the content models of the logical connectives.
  • Maximize Automation: The assembly of custom schemas and the production cycle of schema releases should be automated as much as possible (for both reliability -- see below -- and for ease of maintenance -- e.g., XSD co-releases and HTML documentation can be automatically generated from RNC).
  • Maximize Reliability: The new schemas should be exhaustively tested against the existing hand-written XSD schemas and instances, e. g. via automatically-generated testing instances as well as hand-written exemplary instances for `near-miss' (invalid) and `corner' (valid) cases.
  • Maximize Extensibility: The schemas should enable extension by users, as well as RuleML developers, allowing backward and forward compatibility, alternate element names, internationalization, and user-defined formulas.

3 Relax NG

3.1 Relax NG Overview

The Relax NG language was chosen for this re-engineering effort because of its decreased positional sensitivity and its greater flexibility in modularization (from fine-grained modular to monolithic), as well as unification of human-readable (``Content Models") and machine-readable (XSD/XML) versions. These benefits are achieved through unique features of the Relax NG schema language , including the notAllowed reserved word to create abstract patterns, definitions with combine attributes (=, &= in the compact syntax) to merge definitions that are decomposed across modules, and the interleave operator & (a generalization of the xsd:all group) to create order-insensitive content models. Because Relax NG is theoretically grounded in hedge automaton theory, modularization is always possible since regular hedge languages are closed under the operations of intersection, union and complement (see Murata 1998).

3.2 Relax NG Features

  • Relax NG has capabilities for redefinition, specialization and generalization that allow orthogonalization at a fine-grained level of modularity.
    • Modularity in Relax NG is enabled through two different mechanisms; "external" for referencing a single component defined in another schema, and "include" for merging entire grammars. The include feature will be most useful to us for implementing orthogonal modularity.
    • There are two ways that definitions may be combined in Relax NG; through the choice combination (symbolized by |) and the interleave combination (symbolized by &). By placing some simple constraints on the way these combinations are used in modules, we may guarantee that including a module generates a superset of the language (expansion module) or a subset of the language (contraction module). This partial-order relation is the basis of the sublanguage lattice discussed above.
  • Relax NG compact syntax (RNC) is similar to [E]BNF, hence can be used to express an executable, and thus testable and maintainable, content model.
  • Software is freely available (see #Tools) which can, from modular XSD or RNC, automatically generate an equivalent monolithic, simplified schema in RNC , which is quite human-readable compared to XSD.
  • Conversely, software is also freely available which can automatically generate an XSD schema that is equivalent or a close approximation to the RNC schema. When approximations are made, the software always generalizes the schema.
  • On the other hand, Relax NG has the ability to construct certain patterns which cannot be expressed directly in XSD.
    • Non-deterministic, context-dependent constraints can be expressed in Relax NG. Currently Schematron is being used in RuleML 0.91 to handle some of these constraints. Using Relax NG may allow a reduced dependence on Schematron.
    • Relax NG has a powerful capability for introducing positional (element order) independence through the interleave (&) symbol. There is no corresponding construction in XSD although a special case can be expressed with xs:all.
    • Relax NG allows ambiguous patterns, which simplifies pattern statements in some cases.
    • User-defined datatype libraries may be imported into Relax NG schemas (normally, XSD Part 2, Datatypes, but it is not restricted to that).
  • Named patterns, similar to XSD groups, allow pattern re-use, providing greater reliability and ease of maintenance.
  • Annotations may be used in Relax NG to provide pre-processing and conversion instructions. This feature works particular well in generating documentation that is transferred into the automatically generated XSD and can then be automatically converted into HTML documentation.
  • The increasing expressivity from DTD to XML Schema (XSD) to Relax NG has been characterized by, respectively, Local, Single-type, and Regular tree grammars (cf. page 13, Figure 2.2, in the PhD Thesis Logics for XML of Pierre Geneves).

4 Proposed Design Objectives

  • Modular Design: Sublanguages will be assembled from many small, orthogonal components. The assembly takes place in two steps;
    • Assembly of a backbone sublanguage consisting of the smallest set of syntactic components necessary to achieve the necessary classical expressivity, which may range from propositional atoms to full first-order logic (and beyond).
    • Addition of a selection of mix-in modules for syntactic sugar as well as non-classical logical operators, and meta-logic.
  • Alignment with Theory: Alignment with logic and meta-logic theory will be assisted with generous use of named patterns. The named patterns also provide numerous extension points.
  • Two sets of Relax NG Compact Syntax (RNC) RuleML schema will be built:
    • Relaxed-Form RuleML schemas: fully striped (displaying all roles for maximum positional independence), stripe-skipped, or mixed; without canonical element order (e.g., slotted arguments before, between and/or after positional arguments, prefix, infix and/or postfix operators); with attributes explicit, implied, or mixed; and with short or long or internationalized or mixed tag names (tag names should still be 'internationally unique', e.g. avoiding false cognates).
      • The relaxed-form schemas will take advantage of many Relax NG features that cannot be accurately translated into XSD.
      • Context-dependent restrictions will be encoded in the relaxed-form schema to the maximum extent possible.
      • The relaxed-form schema will be normative; that is, a valid RuleML document is defined to be well-formed XML that validates under the relaxed-form schema for a particular sublanguage.
    • Normal-form RuleML schemas: fully striped (all roles are displayed); with canonical element order (e.g., slotted after positional arguments only, prefix operators); with all attributes having default values explicit; and with short tag names only.
      • Normal-form schemas will use the same backbone language, a subset of the expansion modules and a superset of the restriction modules (if any) from the corresponding relaxed-form schema, guaranteeing that the normal-form serialization is contained in the relaxed-form serialization.
      • Normal-form schemas are intended to be converted as accurately as possible into XSD.
  • One set of normal-form RuleML XSD schema will be automatically generated from the corresponding RNC schemas using Trang and minor post-processing.
    • The XSD schemas can be used in the normalidation pipeline.
    • The XSD schemas must be valid according to Saxon-EE, including:
      • the Unique Particle Assumption (UPA). This is more restrictive than a requirement of a deterministic pattern. Note: Saxon-EE is more likely to identify UPA violations than XSV.
      • the Element Declarations Consistent requirement.
      • These constraints are required in XSD, not in Relax NG, which allows non-deterministic and ambiguous patterns.
    • A normal-form RuleML document is defined to be well-formed XML that validates under the XSD normal-form schema for a particular sublanguage.

5 Tools

5.1 Editing Relax NG schemas

  • Relax NG compact syntax can be easily edited in any plain text editor.
  • The oXygen XML Editor is a GUI editor for Relax NG. It is commercial software but provides a low-cost academic license and a 30-day free trial.
  • XMLBlueprint has another GUI editor for Relax NG, and can be purchased for a price similar to the oXygen academic license, and there is a free academic license for teachers and students in an XML course.
  • Stylus Studio] has Relax NG editor support, but not in their Home (Student) edition.

5.2 Validating Relax NG schemas

  • The open-source software Jing, written and maintained by one of the co-authors of Relax NG, James Clark, validates Relax NG schemas. It is distributed as a Java jar, and so can be used from the command line or called from other programs.
  • The oXygen Editor checks validity of the schema periodically as it is edited, as well as on demand, and also supports the configuration of validation scenarios. Scenarios for auxiliary modules may include a reference to the containing schema.
  • Online validation of Relax NG schemas is also available through http://validator.nu/ . See MYNG#Using_Validator.nu for a tutorial.
  • The commercial Relax NG editors listed above provide validation support.

5.3 Testing Relax NG schemas

  • oXygen has an instance generator that works adequately but does not have the capability to generate invalid schema instances as (counter-)examples. It can be run in batch mode using the external tools feature of oXygen provided configuration profiles (one for each schema) have been prepared. For convenience, a set of such configuration files can be generated by a java program or a Unix script.
  • xmlgen is an xml instance generator that is supposed to generate either valid or invalid (near-miss and far-miss) examples from an XSD schema. However it did not work when I tried it.

5.4 Authoring XML Instance Documents using Relax NG schemas

  • oXygen provides a guided XML Editor for authoring instances that can use Relax NG schema alone, or in combination with schemas written in other formats through an NVDL script, described below.

5.5 Validating XML Instances against Relax NG schemas

  • The open-source software Jing provides this validation service from the command line, with oXygen providing a GUI for Jing.
  • JAXP Validation API has Relax NG support that enables a Relax NG validator to be called from a Java program.
  • NVDL (Namespace-based Validation Dispatching Language) is a special-purpose XML scripting language that coordinates the validation of an XML document against multiple schema that may be in multiple formats, including DTD, XSD, Relax NG and Schematron. NVDL is specified in ISO/IEC 19757-4.
    • JNVDL is an open-source Java implementation of the NVDL language that supports the JAXP Validation API.
    • oXygen provides a GUI to JNVDL to perform NVDL validation. A template NVDL document is provided below that implements validation against both Relax NG and XSD schemas simultaneously, and allows optional validation against external schema for XML embedded in Data elements.

<?xml version="1.0" encoding="UTF-8"?>
<rules xmlns="http://purl.oclc.org/dsdl/nvdl/ns/structure/1.0"
   xmlns:a="http://relaxng.org/ns/compatibility/annotations/1.0">
   <namespace ns="http://ruleml.org/spec">
       <validate schema="http://ruleml.org/1.0/relaxng/naffologeq_normal.rnc"/>
       <validate schema="http://ruleml.org/1.0/xsd/naffologeq.xsd"/>
   </namespace>
   <namespace ns="http://www.opengis.net/gml">
       <validate schema="http://schemas.opengis.net/gml/2.1.2.1/geometry.xsd"/>
   </namespace>
   <anyNamespace>
       <allow/>
   </anyNamespace>
</rules></nowiki>

5.6 Conversion to and from Relax NG

  • Relax NG may be converted into XML Schema Definitions (XSDs) using the open-source software Trang. This capability is also available in the commercial software oXygen, which provides a GUI interface to Trang.
  • rngconv converts XSD schemas into Relax NG traditional format.

5.7 Parsing RuleML Instance Documents

  • JAXB has a Relax NG switch: see JAXB RI 2.0. If this works (its labeled experimental, unsupported), the Relax NG schema could be used directly in parsing RuleML to Java objects.
  • RelaxNGCC is a compiler compiler that takes a Relax NG schema as input and produces Java classes that can be used to parse an instance of the schema.

5.8 Batch Processing

  • Jing, Trang, and rngconv are all distributed as Java jar files that can be run from the command line, from batch scripts, as an external tool in oXygen, or from other Java programs.
  • Jing has a "simplify" option (-s) which merges includes and eliminates abstract or unreachable named patterns, creating a monolithic Relax NG schema in compact syntax which forms an easily-readable content model.
  • Jing, trang and rngconv calls can be combined in a script or Java program to perform complex transformations, such as converting a modular XSD to RNG (rngconv), flattening (Jing -s), and converting back to (monolithic) XSD (trang). (Note that direct flattening of XSD may be performed with the XML Parser Toolkit from command line or in oXygen.)
  • Scripts that may be useful for RuleML schema developers are available at http://ruleml.org/1.0/bat

5.9 User-Defined Datatypes

5.10 Tools Link List

5.11 Tips

5.11.1 oXygen

6 Learning Relax NG

7 RuleML Language Lattices

There is a variety of levels at which we may define a partial ordering on a family of XML markup languages and their grammars (schemas). We list here some informal definitions of the containment-based partial orderings that are relevant to the construction of RuleML language lattices. Formal definitions and their mathematical consequences are provided on the RuleML Language Lattice Wiki Page.

  • Semantic Containment: A language L1 is a semantic sublanguage of another language L2 if every valid document in L1 can be mapped to a valid document in L2 with the same "meaning".
  • PSVI Containment: A language L1 is a PSVI sublanguage of another language L2 if every valid document in L1 can be mapped to a valid document in L2 with the same post-schema-validation infoset.
  • Normalized Containment: A language L1 is a normalization sublanguage of another language L2 if every valid document in L1 can be mapped to a valid document in L2 having the same "normalization", where normalization is a mapping defined on every grammatically-valid document of a language into a subset of that language (see Normalizer).
  • Syntactic Containment: A language L1 is a syntactic sublanguage of another language L2 if every grammatically-valid document of L1 is also a grammatically-valid document of L2.
  • Grammar Containment: A language L1 is a grammatical sublanguage of another language L2 if the grammar of L2 is an expansion of the grammar of L1 created by adding new production rules and/or new terminal symbols.

In general, these partial-order relations are not equivalent. In the RuleML Relax NG schemas, we introduce a schema design pattern that guarantees equivalence among all of these containment relations. It is especially significant that we can ensure the equivalence of grammatical and syntactic containment, a property called "monotonicity" (Makoto 2011).

A partially-ordered set (poset) in which every pair of elements has both a greatest lower bound (glb, infimum) and a least upper bound (lub, supremum) in the set is called a lattice. The non-SWSL portion of the RuleML language family satisfies the lattice conditions with respect to the partial-ordering imposed by syntactic containment (see RuleML Language Lattice Wiki Page for proofs). Our modularization approach introduces additional sublanguages, contained within existing RuleML sublanguages, which generates a more comprehensive lattice that includes propositional and frame-like sublanguages.

8 Modularization

There are various alternatives for modularizing the RuleML schemas. The earlier modularization of the XSD schemas uses a "tree"-like model. The new lattice-based modularization of the Relax NG schemas for the RuleML 1.0 family of sublanguages is based on a parameterization of the schemas using a large collection of small auxiliary modules which are mostly orthogonal in that they may be included in a main module mostly independently of each other (see Goals).

Because of the large number of languages that are generated by this parameterization approach, we have implemented a web application, the Modular sYNtax confiGurator, or MYNG, that generates a schema on-the-fly, given a selection of language options. The components of MYNG are:

  • an xHTML GUI form that accepts a user's input of language options through radio buttons and check boxes. The form generates a URL for a Relax NG Compact (RNC) schema. The URL points to a PHP script, described in the next item, and also includes a query string with the language options encoded compactly. The URL may be used as a schema location for validation of instance documents; caching is recommended for best performance. The form also displays the main module.
  • a PHP script which accepts the encoded language options query string, and generates the main schema module for the requested language. This main module consists of a sequence of include statements referencing auxiliary RNC modules.
  • a subdirectory of auxiliary modules, accessible by URL for remote validation.
  • a pair of PHP scripts that create zip archives (normal and relaxed serialization) including the main module and the auxiliary modules it includes, providing a convenient download service.
  • redirects are provided from the directory index at http://www.ruleml.org/1.0/relaxng for the named RuleML sublanguages, such as datalog. These main modules are still generated on-the-fly, but the user does not have to interact with the GUI. These short URLs may also be used for instance validation.

The multi-dimensional language options, the concise encoding of these options and the GUI are described in the following section, entitled #GUI. The redirects for named languages are described in the section entitled #Redirects for Named Sublanguages. The PHP script for generating main modules is described in the section entitled #PHP Parameterized Schema. Details about the hard-coded auxiliary modules are provided in the #Auxiliary Modules section.

8.1 GUI

The MYNG GUI for customization of RuleML Relax NG schemas is accessed at the URL http://ruleml.org/0.91/gui/ for version 0.91 and http://ruleml.org/1.0/myng/ for version 1.0. The GUI is initialized with the language options for the supremum language, naffologeq_relaxed, which is the most inclusive RuleML sublanguage.

The language options are organized in groups of related dimensions. Each dimension is Boolean (i.e. either true or false). There are some options that are vacuous (i.e. not applicable) unless one or more other options are selected. These dependent options are disabled unless the options they depend on are selected. The Boolean values are treated as bits of a hexadecimal value for each group of options, and an associative array of these hexadecimal values is the unique syntactic code for each language. Bit-wise dominance between two codes is equivalent to syntactic containment of the corresponding languages. The option groups and their dimensions are listed below, with the parameter name in parentheses preceding each group. The parameter value appears in parentheses preceding each group. For checkboxes where multiple options may be selected, the parameter value is the sum of the (hexadecimal) values of the options in that group.

  • (backbone) Expressivity "Backbone" (select one): the logical connectives of propositional logic and the variables and quantifiers of predicate logic are implemented in orthogonal modules so that a great variety of expressivities may be constructed by "mixing-in" various schema modules. However, only certain combinations of these modules are accessible from the GUI. These combinations correspond to a hierarchy of expressivity as follows
    • (x0) Atomic Formulas
    • (x1) Ground Facts (Atomic Formulas plus conjunctions and disjunctions, with restrictions on compounding formulas)
    • (x3) Ground Logic (Ground Facts plus implications, with restrictions on compounding formulas)
    • (x7) Datalog (Ground Logic plus quantifications, with restrictions on compounding formulas)
    • (xf) Horn Logic (Datalog plus expressions)
    • (x1f) Disjunctive Horn Logic (Horn Logic plus disjunctions allowed in premises)
    • (x3f) Full First-Order Logic (Disjunctive Horn Logic plus no restrictions on compounding formulas)
  • (default) Treatment of Attributes With Default Values (select one): in the RuleML XSD schemas, certain attributes are defined with default values. In some situations it may be advantageous to eliminate the default values so that the language is more compact; this is the first option, "Required to be Absent". The PSVI of instance documents validated against the XSD schemas will have attributes having default values present, even if these attributes are absent in the instance document itself. In contrast, Relax NG validation does not allow modification of the info-set. Therefore, to construct a language from Relax NG which is PSVI-contained in the language generated by the corresponding XSD schema, it is necessary to make attributes with default values required to be present. The third alternative "Optional" allows such attributes to be absent or present, and thus is the join of the former two languages. This selection is necessary to generate a language which syntactically-contains the language generated by the corresponding manually-prepare XSD schema, and is used for the relaxed-form serialization.
    • (x1) Required to be Absent: all default values are forbidden. Used in constructing CL RuleML, the RuleML sublanguage that is a Common Logic dialect.
    • (x2) Required to be Present: provides PSVI containment
    • (x3) Optional: provides syntactic subsumption
  • (termseq) Term Sequences: Number of Terms (select one): positional arguments may be forbidden, restricted to length zero or two, or be of arbitrary finite length.
    • (x0) None (for propositional and frame-like languages)
    • (x2) Binary (Zero or Two)
    • (x7) Polyadic (Zero or More)
  • (lng) Language (select one): options include the traditional abbreviated names or long English names, but not both. Internationalization of element names is planned, but not yet implemented.
    • (x1) English Abbreviated Names
    • (x2) English Long Names
    • (x4) French Long Names (not implemented yet)
  • (serial) Serialization Options (check zero or more): the normal-form serialization corresponds to all of these options unchecked, so the groups must have the canonical ordering of child elements, and complete striping is required. The relaxed serialization allows unordered groups of child elements, as described in more detail in Section #Proposed_Design_Objectives. Disabling explicit datatyping and the schema location attribute is necessary as a work-around to a bug in the translator from Relax NG to XSD schema (see #xsi_attributes).
    • (x1) Unordered Groups
    • (x2) Stripe-Skipping
    • (x4) Explicit Datatyping
    • (x8) Schema Location Attribute
  • (propo) Propositional Options(check zero or more): these options are appropriate even if the language is propositional.
    • (x1) URIs
    • (x2) Rulebases
    • (x4) Entailments (logical/proof-theoretic, dependent on Rulebases)
    • (x8) Degree of Uncertainty
    • (x10) Strong Negation
    • (x20) Weak Negation (Negation as Failure)
  • (implies) Implication Options (check zero or more)
    • (x1) Equivalences (depends on Ground Logic or higher expressivity)
    • (x2) Inference Direction (in 1.0, dependent on Ground Logic or higher expressivity)
    • (x4) Non-Material (in 1.0, dependent on Ground Logic or higher expressivity)
  • (terms) Term Options (check zero or more)
    • (x1) Object Identifiers
    • (x2) Slots
    • (x4) Slot Cardinality (dependent on slots)
    • (x8) Slot Weight(dependent on slots)
    • (x10) Equations
    • (x20) Oriented Equations (dependent on equations)
    • (x100) Term Typing (this and all following term options depend on terms being used somewhere, either in positional arguments, object identifiers, slots and/or equations)
    • (x200) Data Terms
    • (x400) Skolem Constants
    • (x800) Reified Terms
  • (quant) Quantification Options (check zero or more)
    • (x1) Implicit Closure
    • (x2) Slotted Rest Variables (dependent on slots and quantifications (Datalog) expressivity)
    • (x4) Positional Rest (dependent on polyadic term sequences and quantifications (Datalog) expressivity)
  • (expr) Expression Options (check zero or more)
    • (x1) Set-valued Attribute Absent (automatically set from other options)
    • (x2) Generalized Lists (dependent on Horn Logic or higher expressivity)
    • (x4) Set-valued Expressions (dependent on Horn Logic or higher expressivity and equations)
    • (x8) Interpreted Expressions (dependent on Horn Logic or higher expressivity and equations)

The URLs that are generated by the GUI are similar to the following, which is the URL for propatom_cl with abbreviated English names:

http://ruleml.org/1.0/relaxng/schema_rnc.php?backbone=x0&default=x1&termseq=x0&lng=x0&propo=x0&implies=x0&terms=x0&quant=x0&expr=x0&serial=x0

The supremum language naffologeq_relaxed with long English names (such as argument instead of arg) has the URL:

http://ruleml.org/1.0/relaxng/schema_rnc.php?backbone=x3f&default=x3&termseq=x7&lng=x2&propo=x3f&implies=x7&terms=xf3f&quant=x7&expr=xf&serial=xf

8.2 Redirects for Named Sublanguages

Short URLs for named RuleML languages have been implemented for the convenience of users. The short URLs available for RuleML 1.0 are:

Common-Logic-Form Normal-Form Relaxed-Form Relaxed-Long-Form
propatom_cl.rnc
propfact_cl.rnc
bindatagroundfact_normal.rnc bindatagroundfact_relaxed.rnc
bindatagroundlog_normal.rnc bindatagroundlog_relaxed.rnc
bindatalog_normal.rnc bindatalog_relaxed.rnc
datalog_normal.rnc datalog_relaxed.rnc
...@@@ ...@@@
fologeq_cl.rnc fologeq_normal.rnc fologeq_relaxed.rnc
naffolog_normal.rnc naffolog_relaxed.rnc
naffologeq_normal.rnc naffologeq_relaxed.rnc naffologeq_relaxed_long-en.rnc

8.3 PHP Parameterized Schema

The PHP script schema_rnc.php which constructs the Relax NG main modules is located in the http://www.ruleml.org/1.0/relaxng directory.

Here is an abbreviated version of the main module for the simplest member of this family of languages, propatom-cl, available from

namespace ruleml = "http://www.ruleml.org/1.0/xsd"
start = Node.choice | edge.choice
# ROOT NODE AND PERFORMATIVES INCLUDED
include "modules/performative_expansion_module.rnc" inherit = ruleml {start |= notAllowed}
# ATOMIC FORMULAS INCLUDED
include "modules/atom_expansion_module.rnc" inherit = ruleml {start |= notAllowed}
# INITIALIZATION MODULE INCLUDED
include "modules/init_expansion_module.rnc" inherit = ruleml {start |= notAllowed}
# ATTRIBUTES WITH DEFAULT VALUES MAY BE ABSENT
include "modules/default_absent_expansion_module.rnc" inherit = ruleml {start |= notAllowed}
# ATTRIBUTES WITH DEFAULT VALUES MAY BE PRESENT
include "modules/default_present_expansion_module.rnc" inherit = ruleml {start |= notAllowed}
  1. ORDER MODE - UNORDERED GROUPS DISABLED
include "modules/ordered_groups_expansion_module.rnc" inherit = ruleml {start |= notAllowed}

For every language, the grammar is assembled in a main module by the inclusion of auxiliary modules, which may be base or expansion modules. The constraints on these categories of modules are described in more detail in Section #Schema_Design_Pattern. Main modules contain only namespace declarations, start and include statements, and comments.

The start pattern determines the elements allowed as a document root. The redefinitions of the start pattern with each include allow us to override any start pattern defined within an auxiliary module. The specification of document root in Relax NG does not translate to XSD schema, where any global element may be document root.

All languages that may be generated from the parameterized schema are derived from one of the following infimum sublanguages:

  1. 'propatom_cl.rnc', the infimum sublanguage with abbreviated English-language element names and attributes with default values required to be absent;
  2. 'propatom_normal.rnc', the infimum sublanguage with abbreviated English-language element names and attributes with default values required to be present in a restricted pattern used for all languages with less than full first-order expressivity;
  3. 'folog_normal.rnc', the infimum sublanguage with full-first order expressivity, abbreviated English-language element names and attributes with default values required to be present in an unrestricted pattern;
  4. any one of the above three languages with long English element names instead of the abbreviated element names;

Internationalized implementation in other languages, such as French, is planned for future versions.

8.3.1 Aside: Comments in Relax NG and Auto-generated HTML Documentation

Comments in the Relax NG syntax are lines starting with #, as above, or ##, as in:

 ## attributes for atomic formulas
 Atom.attlist &= empty

Comments starting with a single hash are translated into the usual XML style comment

 <!-- ROOT NODE AND PERFORMATIVES INCLUDED -->

while comments starting with double hash are translated into XSD documentation annotations of the component (element, etc.) that follows them

 <xs:attributeGroup name="Atom.attlist">
   <xs:annotation>
     <xs:documentation>attributes for atomic formulas</xs:documentation>
   </xs:annotation>

The oXygen editor contains a tool for generating HTML documentation from the XSD, which has been used to create the [ http://ruleml.org/1.0/doc Schema Docs] of RuleML 1.0 naffologeq .

Note: there are some issues with this method of generating documentation.

8.4 Auxiliary Modules

Filenames for auxiliary modules are of the form prefix_(base|expansion|contraction)_module.rnc. For brevity, we list only prefixes here.

  • Base Modules (suffix: _base_module.rnc)
    • No base modules are used in the RuleML 1.0 schemas.
  • Expansion Modules (suffix: _expansion_module.rnc)
    • Core: init, performatives
    • Expressivity (backbone): andor, atom, dis, expr, folog_cl, implication, quantification
    • Attributes with Default Values: default_absent, default_absent_folog, default_inf, default_present, default_present_folog
    • Term Sequences: termseq_bin, termseq_poly
    • Order Sensitivity: ordered_groups, unordered_groups
    • Stripe-skipping: stripe_skipping, asynchronous_stripe_skipping_entailment, asynchronous_stripe_skipping_implication
    • XSD Compatibility: xsi_schemalocation, explicit_datatyping
    • Uncertainty: fuzzy
    • Meta-Logic: metalevel, rulebase
    • Negation: neg, naf
    • Terms: individual, type, oid, iri, reify, reify_any, skolem
    • Data: data_any_content, data_simple_content, dataterm_any, data_term_simple
    • Slots: slot, card, weight
    • Equations: equal, oriented_attrib, oriented_default, oriented_non-default
    • Implication Variants: equivalent, direction_attrib, direction_default, direction_non-default, material_attrib, material_default, material_non-default
    • Quantification Variants: closure, variable, repo, resl
    • Expression Variants : plex, per_attrib, per_default, per_non-default, val_absence, val_attrib, val_default, val_non-default
    • Annotation and Identification: meta, node_attribute
    • XML namespace: xml_base, xml_id, xml_lang, xml_space
    • Alternate Element Names: long_name, long_name_repo, long_name_resl
  • Contraction Modules (suffix: _contraction_module.rnc)
    • Alternate Element names: short_name

9 Naming and Design Pattern Conventions

9.1 Directory Structure and Names

9.1.1 Schema Directories, Paths and Target Namespaces

  • The schemas, scripts, and documentation files are housed on the RuleML server, in several directories under the version directory http://ruleml.org/1.0/:
    • designPattern: schemas for validating Relax NG modules according to the schema design pattern;
    • doc: auto-generated HTML documentation;
    • exa: hand-written exemplary instances used for validation and demonstration;
    • glossary: hand-written HTML containing the glossary of element and attribute names
    • myng: URL for the MYNG web-app GUI
    • nvdl: NVDL scripts for validation of RuleML embedded in XHTML
    • relaxng : the PHP parameterized schema and the corresponding modules subdirectory;
    • rsd : monolithic XSD schema (for normal-form naffologeq) automatically generated from the Relax NG pivot schema;
    • simplified : monolithic RNC schemas simplified using Jing
    • xsd : the hand-written XSD schemas and the corresponding modules subdirectory;

9.1.2 Subdirectories of the Main Schema Directory

Let's look more closely at what's inside the subdirectories of http://ruleml.org/1.0

9.1.2.1 doc

This subdirectory contains the HTML documentation auto-generated from the monolithic XSD schema "rsd/naffologeq_normal.xsd".

9.1.2.2 exa

Manually-prepared exemplary instances used for validation and demonstration are published here. The internal structure has been rearranged so that each directory contains instances of a particular sublanguage

9.1.2.3 glossary

A hand-written HTML page containing the glossary of element and attribute names

9.1.2.4 insta

Not Currently Available: This subdirectory contains automatically-generated statistically-random test instances used for cross-validation.

9.1.2.5 myng

Javascript in an html page is used to provide a GUI for customizing a Relax NG schema.

9.1.2.6 relaxng

The PHP parameterized schema and the corresponding modules subdirectory, as well as the zip archives, are published here.

9.1.2.7 rsd

Named as a contraction of rng to xsd , this subdirectory contains monolithic XSD schemas automatically generated by Jing and Trang from the modular Relax NG schemas (see ##Automatically-Generated_Monolithic_XSD_by_Trang for more details). The naming convention for these schemas is to change the extension, replacing .rnc with .xsd, and to include "simplified" in the name, indicating that the Relax NG schema has been processed by Jing with the simplify switch on prior to translation by Trang.

These XSD schemas are made available for users who prefer an XSD-based validation, or need the XSD format as input to other applications. In particular, these are used to generate the schema docs, and as input to oXygen's instance generator.

Such schemas may be created for any customized sublanguage available from MYNG, provided the following features are avoided:

  • explicit datatyping - this is automatically available in XSD schemas, and invalid XSD will be obtained if the patterns required in RNC to allow the xsi:type attribute are included.
  • explicit schema location attribute - similar to the previous case, the xsi:schemaLocation is always allowed by XSD schemas, and should not be included in the RNC that is to be translated.
  • unordered groups, i.e. the infix operator notation, cannot be accurately translated into XSD. Such an RNC schema will be translated by Trang into an overly permissive schema that allows, for example, multiple operators within a single atomic formula.
9.1.2.8 simplified

This subdirectory contains simplified RNC schemas generated by Jing from the modular Relax NG schemas (see ##Automatically-Generated_Monolithic_RNG_by_Jing for more details). The naming convention for simplified (content model) files is to append _simplified.rnc to the whole filename, including extension, of the original hand-written schema filename. These RNC schemas are intended to replace manually-prepared documents such as http://ruleml.org/1.0/xsd/content_models_10.archive.pdf because the compact syntax of Relax NG is just as readable as the EBNF syntax used in the latter, while the automatically-generated results of the former are more reliable.

9.1.2.9 xsd

This subdirectory contains the manually-prepared XSD schemas, originally released in 2006 and recently patched to correct some errors discovered during the re-engineering into Relax NG (see XSD-Errata0.91).

9.2 Schema Design Pattern

9.2.1 Global vs. Local Elements

Like the existing XSD schemas, the Relax NG schemas use the Garden of Eden schema design pattern, which declares all the elements globally using named patterns (XSD types). This allows the schema to be divided among multiple files, and we have taken advantage of this in our modular approach. It also allows reuse of elements, which makes the schema more efficient and easier to maintain. This design pattern maximizes extensibility, especially if additional named patterns are introduced. Most importantly, it allows recursion of elements, which is required in RuleML for the construction of some compound formulas and terms.

According to Kahn and Sum, disadvantages of the Garden of Eden design include:

  • many potential root elements;
  • limited encapsulation;
  • difficult to read and understand.

The first item, many potential root elements, is actually an advantage for testing, as it allows us to create XML test files with a root element that would occur at an intermediate level in a valid RuleML document. Moreover, it allows to selectively use language elements of RuleML corresponding to those root elements embedded in other XML applications.

The second item, limited encapsulation, is not a serious problem for RuleML schemas. Encapsulation would be desirable when certain elements have only one type of parent, or have different content models within different parents. Most RuleML element types can have more than one type of parent, so encapsulation would be an inefficiency. In those cases where the content model of an element is different for different parents (as happens with Plexes in rest variables, for example), we define a named pattern for each content model, with the parent's name incorporated into the pattern's name.

The third item is mitigated to some extent by the Relax NG syntax, which is, we hope, easier to read than XSD. There are also several tools that assist in readability. The simplifying switch in Jing (jing.jar -s) condenses the schema to a monolithic schema and removes unused extension points, generating a fairly human-readable content model. One drawback to the simplification is that it removes unused extension points (named patterns that can be extended), and so does not provide an accurate picture of the extensibility of the original schema. oXygen also provides an automatic documentation tool (although the schema must be converted to XSD first), which provides an HTML page with schema diagrams.

9.2.2 Extensibility: Abstract versus Unreachable Patterns

There are essentially two ways to implement orthogonal optional modules in Relax NG: using abstract patterns and using unreachable patterns. For example, a 'Negation' formula (strong negation) is allowed to occur within a 'Negation As Failure' formula (weak negation), provided both kinds of negation are included in the sublanguage. The corresponding RNC code is

 NafFormula.choice |= Negation-node.choice

We can place this line in either the 'Negation As Failure' module naf_expansion_module.rnc or the 'Negation' module neg_expansion_module.rnc.

  • Abstract Patterns

If the line is placed in the 'Negation As Failure' module, the resulting schema will be invalid unless we add an additional statement to make the 'Negation-node.choice' pattern abstract. In Relax NG Compact (RNC) syntax, an abstract pattern is created with the 'notAllowed' reserved word as follows:

 Negation-node.choice |= notAllowed

The Negation-node.choice pattern is over-ridden in the Negation module, so this abstract pattern is extended only if both negation modules are included in the sublanguage. Having a large number of notAllowed statements causes the code to look cluttered, so these statements may be collected into a single 'initialization' module.

  • Unreachable Patterns

If the line is placed in the Negation As Failure module, the notAllowed line is unnecessary, because the Negation-node.choice pattern is defined in this module. If a sublanguage includes strong but not weak negation the NafFormula.choice pattern is valid but unreachable, and will be removed by the Jing simplification tool. This approach is more efficient in lines of code, but can be confusing to read in modular form, because the definition of a pattern, in this case NafFormula.choice is decomposed into pieces that appear in different modules.

In RuleML RNC, we follow the Abstract Pattern design when possible.

9.2.3 Monotonicity: Segregated Names

In Relax NG schemas, pattern names are the non-terminal symbols used to write production rules. One of the features of our schema design pattern is segregation of pattern names according to the allowed value of the combine attribute of their definitions. To illustrate the constraints on these categories, we draw examples from several RuleML modules. In init_expansion_module.rnc, we have

Equal-node.choice |= notAllowed
Equal-datt.choice |= notAllowed
reEqual.attlist &= commonNode.attlist?
Equal.header &= empty
Equal.main |= notAllowed

In equal_expansion_module.rnc, we have

Equal-node.choice |= Equal.Node.def
Equal.Node.def = element Equal { (Equal-datt.choice & reEqual.attlist), Equal.header, Equal.main }
Equal.main |= leftSide-edge.choice, rightSide-edge.choice

In ordered_groups_expansion_module.rnc, we have

Equal.header &= (Node.header, degree-edge.choice?)?

In unordered_groups_expansion_module.rnc, we have

Equal.header &= (Node.header? & degree-edge.choice?)?

In default_inf_expansion_module.rnc, we have

Equal-datt.choice |= oriented-att-inf.choice

and in @@@default_optional_expansion_module.rnc]

Equal.attlist &= oriented-att.choice?

This is out of date.@@@ In long_name_expansion_module.rnc we have

Equation-node.choice |= Equation.Node.def
Equation.Node.def = element Equation { (Equal-datt.choice & reEqual.attlist), Equal.header, Equal.main }

In short_name_contraction_module.rnc we have

Equal.Node.def &= notAllowed

These schema snippets illustrate the full range of definitions permitted in the RuleML Relax NG schema design pattern.

We utilize three categories of pattern names.

9.2.3.1 Choice Combine

When names from the choice category are defined, the choice combine attribute must be explicitly used. In the example above, Equal.choice and Equal.main are names in the choice category. Choice patterns are initialized as notAllowed, and then over-ridden in base or expansion modules, as shown above.

Neither of the definitions

Equal-node.choice  = Equal.Node.def
Equal-node.choice &= Equal.Node.def

would be permitted in this schema design pattern.

In the XML syntax of Relax NG, we have the following schema design pattern:

 <define name="N" combine="choice">
    ...
 </define>

where N is any name from the choice combine category. This pattern is allowed in both base and expansion modules.

9.2.3.2 No Combine

When names from the no-combine category are defined, in general, no combine attribute is permitted. There is one exception to the no-combine definition constraint: in base modules, it is permitted to have definitions having names from this category with the combine attribute interleave whose pattern is the notAllowed reserved word. There is one exception to the no-combine definition constraint: in base and contraction modules, it is permitted to have definitions having names from this category with the combine attribute interleave whose pattern is the notAllowed reserved word. We use this construction in the alternate names modules, as shown above, to remove abbreviated element names and replace them with long names or internationalized names.

Because neither of the definitions

Equal.Node.def &= empty
Equal.Node.def |= notAllowed

would be permitted, the names in the no-combine category are never initialized. This introduces limitations on how abstract components may be defined. To define abstract elements and attributes, we introduce a more abstract choice pattern, such as Equal-node.choice, as shown above. Such choice patterns are expansion points that hold alternate name elements or alternate constructions that serve the same role in the grammar, and unifies elements that have similar semantics.

In the XML syntax of Relax NG, we have the following schema design pattern: in both base and expansion modules

 <define name="N">
    ...
 </define>

or, in base or contraction modules only,

 <define name="N" combine="interleave>
    <notAllowed/>
 </define>

where N is any name from the no-combine category. In contraction modules, these (no-combine with notAllowed) are the only statements that are allowed.

9.2.3.3 Interleave Combine

When names from this category are defined, the interleave combine attribute must be explicitly used. The interleave combine is used to initialized interleave patterns, such as attribute lists, as empty. Other uses are to add attributes to an attribute list, and, in the relaxed-form, to add children to the interleave header patterns, as shown above. Neither of the following definitions

Equal.header = empty
Equal.header |= notAllowed

are permitted.

An additional constraint is required to attain monotonicity. In an expansion module the right-hand side of a definition with a combine attribute of interleave must be empty, optional:

Equal.header &= empty
Equal.header &= (Node.header? & degree-edge.choice?)?

or zero-or-more:

 Node.header &= metaKnowledge-edge.choice*

In the XML syntax, we have the schema design pattern: in (only) base modules

 <define name="N" combine="interleave">
   ...
 </define>

and in either base or expansion modules

 <define name="N" combine="interleave">
   <empty/>
 </define>

or

 <define name="N" combine="interleave">
   <optional>
   ...
   </optional>
 </define>

or

 <define name="N" combine="interleave">
   <zeroOrMore>
   ...
   </zeroOrMore>
 </define>

9.2.3.4 Justification of Monotonicity Claim

We consider the operation of merging two grammars as auxiliary modules that are both included in a main module. If the Relax NG syntax did not include the interleave combine attribute, this operation would be monotonic; that is, any valid instance of one of the auxiliary modules would also be a valid instance of the merged grammar. (Include over-rides are considered to be a pre-processing step before merger, and thus do not affect the monotonicity of the merge operation.) This kind of monotonicity is very powerful, but at a high price - the fine-grained orthogonal modularization we seek would be impossible without the interleave combine.

Our objective can be met with a compromise - we aim for a weaker kind of monotonicity and allow a restricted usage of the interleave combine. Consider the operation of merging two grammars, one being the base module and the other an expansion module. If any valid instance of the base module is also a valid instance of the merged grammar, then we have a kind of one-sided monotonicity that is sufficient to contruct our lattice of languages with the structure given by syntactic containment. We would also have modular extensibility, for user-extension or our own upgrades, with backward compatibility.

The segregated names schema design pattern described above provides this one-sided monotonicity. The use of an interleave combine with an optional child in an expansion module can be shown to preserve monotonicity by transforming the base and expansion module pair to an equivalent pair of modules without interleave combine.

In general, we start with

start |= x.interleave
x.interleave &= a
a = ...

as our base module, and

x.interleave &= y?
y = ...

as the expansion grammar. In the case that the code stated explicitly has the only occurrence of the interleave combine and the x.interleave pattern, the merged grammar is equivalent to

start |= x.interleave
x.interleave = a & y?
a = ...
y = ...

We may transform the auxiliary modules as

# Base module
start |= x.choice
x.choice |= a
a = ...

and

# 
x.choice |= a & y
y = ...

The merged grammar is equivalent to

start |= x.choice
x.choice = a | (a & y)
a = ...
y = ...

The base modules are equivalent before and after the transformation, as are the merged grammars (the expansion modules are not, but that is irrelevant). After the transformation, all interleave combines have been removed from the base and expansion modules, so their merger is monotonic, implying that any grammatically valid instance of the base module is also a grammatically valid instance of the merged grammar.

Although the formal proof of monotonicity for all cases is more lengthy, it revolves around the transformation illustrated above. Name segregation is key to maintaining monotonicity, as shown in the following counterexample. Consider

start |= Equal.Node.def
Equal.Node.def &= empty

be the base module, and let

Equal.Node.def = element Equal {Equal.alllist, Equal.header, Equal.main}
...

be the expansion grammar. The merged grammar is equivalent to

start |= Equal.Node.def
Equal.Node.def = empty & element Equal {Equal.alllist, Equal.header, Equal.main}
...

The only valid instance of the base module is an empty document, while this would not in general be a valid instance of the merged grammar.

The expansion module satisfies the segregation conditions, but the base module does not, because the design pattern does not allow an interleave combine attribute on a named pattern that belongs to the no-combine category

9.2.3.5 Implementation of Segregated Naming

In the RuleML 1.0 Relax NG Schema Design Pattern, extensions are used on all named patterns (non-terminal symbols) to indicate the category of their combine attributes (* indicates an arbitrary valid name):

  • The no combine category has only the extension *.def and *.notallowed ;
  • The interleave combine category has only the extensions *.attlist, *.header ;.
  • The choice combine category has all the other extensions or no extension, including *.choice, *.main, *.content, *.value, *.datatype, *.sequence, *.defs, and the start symbol start.

9.2.4 Striping: Capitalization Naming Conventions

Additional naming conventions for style include:

  • Camel case will be used for all named patterns, either upper or lower, and may include dashes and underscores:
    • Upper-camel-case, as in Equal.Node.def, indicates a Node element;
    • lower-camel-case, as in formula_AssertRetract.def, indicates an edge element.

9.3 Abstract Element Definitions

9.3.1 Nodes*

Following the spirit of the XSD guidelines for modular schema, we adopt a convention for naming the parts of element definitions. Most Node element definitions follow the design pattern:

Implies-node.choice |= Implies.Node.def
Implies.Node.def =
   ## Annotation goes here
   element Implies { (Implies-datt.choice & reImplies.attlist), Implies.header, Implies.main }
Implies-datt.choice |= notAllowed
reImplies.attlist &= commonNode.attlist?
reImplies.attlist &= closure-att.choice?
reImplies.attlist &= mapClosure-att-fo.choice?
Implies.header &= Node.header?
Implies.main |= notAllowed

The definition is abstract because it can never be satisfied due to the notAllowed pattern of Implies.main, unless the Implies.main pattern is overridden with an allowed pattern. Implies is a particular case where order sensitivity may be imposed or relaxed, depending on the overriding pattern, which is defined in an auxiliary expansion module as either

Implies.main |= body_Implies.name.choice, head_Implies.name.choice

or

Implies.main |= body_Implies.name.choice & head_Implies.name.choice

The latter definition produces order insensitivity, as needed for the relaxed-form serialization.

Similarly prefix, infix and postfix operator notation are allowed in the relaxed-form serializaton by using the interleave symbol in the definition of Atom.main. The only Node element definitions that do not follow the above template are those with simple content; for an example, see the final code block in the next section.

* The word type is overloaded in the discussion of XSD schema in RuleML and can take on the following meanings:

  1. the XSD type of an XML element, which may be complex-type or simple-type;
  2. the semantic type of an individual in the domain of discourse, used in the type attribute as defined in type_expansion_module.rnc;
  3. Types as a category of RuleML elements distinct from roles.

Because of this ambiguity, on this page, "Node" will be used instead of Type for meaning 3, to refer to elements such as Atom. Similarly, the word "role" has a number of different meanings, and "edge" will be used instead, to refer to elements such as op.

9.3.2 Edges

Edge element type definitions have a slightly different template. All but a few edges have neither attributes nor attribute-like children, other than the optional common attributes from the xml namespace, so in general use a simpler pattern for the content of edge elements than that used for Node elements.

The edge type for positional rest variables, <repo>, is defined in repo_expansion_module.rnc following the typical schema design pattern for edge elements:

restOfPositionalArguments-edge.choice |= repo.edge.def
repo.edge.def =
   ## Annotation here
   element repo { repo.attlist? & repo.content }
repo.attlist &= commonInit.attlist?
repo.content |= repoTerm.choice
repoTerm.choice |= SequenceMarker.choice
repoTerm.choice |= Plex_repo-node.choice
SequenceMarker.choice |= Variable-node.choice
Plex_repo-node.choice |= notAllowed
Variable-node.choice |= notAllowed

The pattern VariableTerm.choice is a choice pattern for all elements that represent variables. The abbreviated English element name for this is Var, but we may add an alternate long-name element Variable, or elements named in different languages.

A number of alternate element names are implemented in long_name_expansion_module.rnc. For example, the following code:

Variable-node.choice |= Variable.Node.def
Variable.Node.def = element Variable { Var.attlist, Var.content }
 Var.Node.type &= notAllowed

causes the element Var to be optionally replaced by Variable. This is an example of an Adapter design pattern applied to Relax NG schema (Metsker and Wake, 2006).

9.3.3 Attributes

In module closure_expansion_module.rnc, the closure-att.choice pattern is defined following the schema design pattern for attributes:

closure-att.choice |= closure.attrib.def
closure.attrib.def =   
   ## Annotation here
   attribute closure { closure.value }
closure.value |= "universal" | "existential"

10 Alignment with Logic and Metalogic Theory

In the implementation of the Relax NG schema for RuleML 1.0, we started the process of aligning the syntax with specific semantics. A particular element name, such as "Implies" is not associated with a single semantics, but may be assigned a variety of semantics depending on the value of semantics-modifying attributes on the element or its ancestors [1]. But there are certain patterns that appear in the usage of terms and formulas that suggest a "taxonomy" of the syntactic structures. In the following two sections, we describe this taxonomy and discuss how it might relate to the semantics. A thorough alignment of syntax and semantics will be implemented in the second phase of this project.


10.1 Term and Operator Choice Patterns

The syntactic taxonomy of RuleML 1.0 terms is as follows:

  • Logical Terms
    • Literals: in all sublanguages, Data.
    • Simple Resources: in first-order sublanguages, Ind. In (syntactically) higher-order languages, Con.
    • Simple Constants: Skolem and Reify, as well as Resources and Literals.
    • Variables: Var
    • Compound Terms (may be constant and/or variable): in first-order sublanguages, Plex and Expr. In higher-order languages, Hterm.
    • More restricted versions of Plex are used in the rest variables, repo and resl.
  • Extra-logical Terms
    • In framehohornlog, the extra-logical terms Get and Set are introduced. Get is an extra-logical compound resource, while Set is an extra-logical compound term that is not a Resource.

Terms are used in the content model of a number of edges, including arg, repo, slot, resl, lhs, rhs, declare and oid. Further, these content models may be "context" dependent.

The usage patterns apply to the following language categories. The phrase "and up" refers to the partial order relation of the language "lattice"; "datalog, and up" means "any language that contains datalog". A "subtraction" (e.g. Data ~ Higher) means "any language that contains datalog but not hohornlog".

  • Ground: bindatagroundfact, and up
  • Bin: bindatalog, and up
  • Data: datalog, and up
  • Horn: hornlog, and up
  • Higher: hohornlog, hohornlogeq, and framehohornlogeq
  • Frame: framehohornlogeq

The following table @@@WILL PROBABLY BE MOVED TO BACKUP summarizes the term usage patterns for all RuleML 1.0 languages.@@@

Pattern Name Relation Literal Simple Resource Simple Constant Variable Compound Term Plex-repo Plex-resl Function Get Set
op in Atom Relation Ground ~ Higher Frame Frame Frame Frame Frame
oid AtomTerm Ground Ground Ground Bin Horn ~ Higher
arg in Atom AtomTerm Ground ~ Higher Ground ~ Higher Ground ~ Higher Bin ~ Higher Horn ~ Higher
slot-key slot-keyTerm Ground Ground Horn ~ Higher Frame
slot-filler AnyTerm Ground Ground Ground Bin Horn ~ Higher Frame Frame
declare VariableTerm Bin
rhs, lhs in Equal AnyTerm Horn Horn Horn Horn Horn ~ Higher Frame Frame
arg in Plex, Expr AnyTerm Horn ~ Higher Horn ~ Higher Horn ~ Higher Horn ~ Higher Horn ~ Higher
repo repoTerm Horn ~ Frame Horn ~ Higher
resl reslTerm Horn Horn ~ Higher
op in Expr Function Horn ~ Higher
op in Hterm, Signature Relation Higher Higher Higher Higher Higher
arg in Hterm AnyTerm Higher Higher Higher Higher Higher Frame Frame
SlotProd in Get AnyTerm Frame Frame Frame Frame Frame Frame Frame
Set AnyTerm Frame Frame Frame Frame Frame Frame Frame

It is possible to further restrict content models. For example, nested functions could be disallowed by removing Expr from the content model of arg in Expr. This would be implemented by creating a new term choice pattern, ExprTerm.choice, that did not contain Expr. However, it is also necessary to apply the restriction to Plexes that may occur within Expr. Therefore, a new Plex pattern would need to be created, Plex_Expr, that used the ExprTerm content model. Because this additional flexibility is not needed for the existing RuleML languages, these extension points are not implemented in our schema at this point; users may implement them by redefinition within includes.

10.2 Formula Choice Patterns

In RuleML 1.0, the semantics of formulas is more specific than the semantics of terms, so the taxonomy of formulas, described below, is better developed than the taxonomy of terms described in the previous section. For example, the equivalence element is syntactic sugar for a pair of <Implies> elements with permuted premise and conclusion. Therefore the content model for the equivalence role, torso, is the greatest common pattern of the premise and conclusion patterns. Typically, the premise pattern is taken to be a generalization of the conclusion pattern, so that

 <Implies>
   <body> T </body>
   <head> T </head>
 </Implies>

is syntactically valid whenever T matches the conclusion pattern. Therefore we may define the torso content model as follows:

 torso.content = Conclusion.choice

Simple formulas in RuleML 1.0 Relax NG languages are atomic formulas, with the option of equations.

@@@ In higher-order langauges, Hterm replaces the atomic formula. In the frame languages, Equal, Hterm, InstanceOf, SubclassOf and SignatureOf are all simple formulas, and Atom is reintroduced with a new content model (slotted arguments only).@@@

  • Legend
    • A: atomic and up
    • GF: groundfact and up
    • GL: groundlog and up
    • D: datalog and up
    • H: hornlog and up
    • Dis: dishornlog and up
    • F: folog and up

The following table summarizes the formula usage patterns for the RuleML 1.0 languages in Relax NG. An asterisk (*) indicates that the formula belongs to that choice pattern whenever it is available.

Formula-Pattern Name Backbone Level Simple And And-Query Or Or-Query Implies, Equivalent Forall Exists Neg Naf Rulebase Entails
formula in And AndFormula GF * * * F F F * *
formula in Or OrFormula GF * * * F F F * *
body in Implies Premise GL * * * F F F * *
head in Implies Conclusion GL * F Dis F F F *
torso in Equivalent Conclusion GL * F Dis F F F *
formula in Forall ForallFormula D * F F * * F F
formula in Exists ExistsFormula D * * * F F * F
strong in Neg NegFormula A * F F F F F F
weak in Naf NafFormula A * F F F F F *
formula in Rulebase RulebaseFormula A * F F * * F *
head, body in Entails A *
formula in Asserts, Retracts AssertRetractFormula A * F F * * F * * *
formula in Query QueryFormula A * * * F F * * * * *

11 Transformation

11.1 Automatically-Generated Monolithic RNG by Jing

Using Jing with the simplify (-s) option, all RNG schemas may be transformed into monolithic, simplified Relax NG schemas, in the XML format. The transformations implemented by Jing's simplify option are based on ISO08, in principle. Flattening is performed through the substitution of include statements. This produces a monolithic schema in RNG format for each language, which may then be further transformed. Unreachable patterns are removed, and compound patterns (choice, interleave, sequence and group) are simplified to some extent. Unfortunately, Jing's simplify option also performs some transformations that are not specified in the standard, as discussed in #Simplification.

11.2 Automatically-Generated XSD by Trang

Using Trang, the simplified normal-form RNG schemas may be transformed into XSD schemas. The resulting XSD schemas are equivalent to original RNC schemas, except for the following:

  • The xsi Attributes: XSD has magic attributes in the namespace http://www.w3.org/2001/XMLSchema-instance, conventionally associated with the prefix xsi (W3C04). These attributes are implicitly included in any XSD schema. There is no choice whether to allow them are not, because an XSD validator will accept these attributes on any element in an instance document, according to a built-in schema. In contrast, a Relax NG validator has no magic attributes, and handles the xsi attributes like any other; they are not allowed unless they are explicitly declared. We have explicitly declared some of these attributes in the modules explicit_datatyping_expansion_module.rnc (for xsi:type) and "xsi_schemalocation_expansion_module.rnc" (for xsd:schemaLoction). Unfortunately the Trang application does not account for the possibility of explicit declaration within the http://www.w3.org/2001/XMLSchema-instance namespace, and attempts to translate an xsi attribute declaration into XSD, leading to an error. (This is a known issue in Trang - see #xsi_attributes). Therefore, we do not include the aforementioned two modules in the normal-form schemas that will be translated by Trang into XSD.

11.3 Automatically-Generated Simplified Grammar by Jing

Using Trang, the monolithic RNG schemas may be transformed into monolithic, simplified RNC schemas, which serve as a compact, human-readable, and automatically-generated grammar-based specification of each language.


11.4 Batch Script

The following Windows batch script is used to perform all three of the transformations above for all named languages.

if not exist "..\temp" md "..\temp" 
if not exist "..\rsd" md "..\rsd" 
if not exist "..\simplified" md "..\simplified" 
for %%X in (*.rnc) do (
java -jar "C:\Program Files\RelaxNGTools\jing-20091111\bin\jing.jar" -cs "%%X" > "..\temp\%%~nX.rng" 
java -jar "C:\Program Files\RelaxNGTools\trang-20091111\trang.jar" -o any-process-contents=strict -o any-attribute-process-contents=strict  
           "..\temp\%%~nX.rng""..\xsng\%%~nX.xsd" 
java -jar "C:\Program Files\RelaxNGTools\trang-20091111\trang.jar" "..\temp\%%~nX.rng" "..\simplified\%%X_simplified.rnc")

12 Validation and Verification

12.1 Validation

  • Relax NG schema are validated using Jing - this is the only Relax NG validator available in oXygen.
  • Trang-generated XSD schemas are validated in oXygen, using Xerces, XSV, and Saxon-EE. This is necessary because Trang will sometimes produce non-valid results from ambiguous patterns, although it does provide warning messages.
  • Official RuleML 1.0 XSD schemas are validated with XSV in oXygen. These schemas fail validation in Xerces and Saxon-EE due to circular definitions, indirect redefines, and invalid restrictions, but XSV does not enforce these.

12.2 Verification

  • The realized content models of the official RuleML 1.0 XSD schema were generated for each sublanguage, and compared to the documented content model. In cases where these content models disagree, the intended content model was reconstructed (from wiki pages and email discussions); in some cases the documentation was found to be incorrect, but the realized schema matched the intended content model. In other cases, the realized content model was found to not match the intended content model. Each case has been documented below.
  • The monolithic Relax NG schemas were corrected to meet the intended content model and re-translated to XSD. The resulting XSD schemas were then used (in oXygen) to generate XML example files for a test suite, to validate against the relaxed-form schemas in RNC format.
  • For cross-validation, the hand-written (modular) Relax NG schemas (normal form) were translated to XSD, and used to generate another test suite of XML example files, to validate against the (corrected) official schemas.

13 Examples

There are a few new features available from the relaxed form of the RuleML languages. These include unordered groups of a relation (Rel) and zero or more positional or slotted arguments, and skipping of only one of the two Implies edge elements (head or body).

13.1 SUMO Geography Translation

As a demonstration, I have translated a small portion of SUMO into RuleML that validates with folog-relaxed.rnc. Breaking up the conjunction in the conclusions would allow this example to be expressed with only datalog-relaxed.rnc, albeit more verbosely. New syntactic sugar illustrated here are infix operator notation, and asynchronous unordered stripe-skipping in Implies, neither of which are available in the XSDs. To illustrate the diversity available in the relaxed serialization, the <head/> edge is explicit in the first rule and the <body/> edge is explicit in the second rule.

 <?xml version="1.0" encoding="UTF-8"?>
 <?oxygen RNGSchema="http://www.ruleml.org/1.0/relaxng/folog_relaxed.rnc" type="compact"?>
 <RuleML xmlns="http://www.ruleml.org/1.0/xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    <Assert>
        <!-- SUMO Demonstration -->
        <!-- Reference: http://sigmakee.cvs.sourceforge.net/viewvc/sigmakee/KBs/Geography.kif -->
        <!-- Definition of Superclass of Latitude -->
        <!-- 
        (subclass Latitude Region)
        (names "latitude" Latitude)
        (names "parallel" Latitude)
        (synonymousExternalConcept "latitude" Latitude EnglishLanguage)
        
        (documentation Latitude EnglishLanguage "&%Latitude is the class of &%Regions, 
        associated with areas on the Earth's surface, which are parallels 
        measured in &%PlaneAngleDegrees from the &%Equator.")
        -->
        <Forall>
            <Var>X</Var>
            <Implies>
                <Atom>
                    <Rel>Latitude</Rel>
                    <Var>X</Var>
                </Atom>
                <head>
                    <Atom>
                        <Rel>Region</Rel>
                        <Var>X</Var>
                    </Atom>
                </head>
            </Implies>
        </Forall>
        <!-- Definition of Domain and Range of the Ternary Predicate objectGeographicCoordinates -->
        <!-- 
        (instance objectGeographicCoordinates TernaryPredicate)
        (domain objectGeographicCoordinates 1 Object)
        (domain objectGeographicCoordinates 2 Latitude)
        (domain objectGeographicCoordinates 3 Longitude)
        
        (documentation objectGeographicCoordinates EnglishLanguage 
        "(&%objectGeographicCoordinates ?OBJECT ?LAT ?LONG) means that 
        the &%Object ?OBJECT is found at the geographic coordinates 
        ?LAT and ?LONG.") 
        -->
        <Forall>
            <Var>X</Var>
            <Var>Y</Var>
            <Var>Z</Var>
            <Implies>
                <And>
                    <Atom>
                        <Rel>Object</Rel>
                        <Var>X</Var>
                    </Atom>
                    <Atom>
                        <Rel>Latitude</Rel>
                        <Var>Y</Var>
                    </Atom>
                    <Atom>
                        <Rel>Longitude</Rel>
                        <Var>Z</Var>
                    </Atom>
                </And>
                <body>
                    <Atom>
                        <Var>X</Var>
                        <Rel>objectGeographicCoordinates</Rel>
                        <Var>Y</Var>
                        <Var>Z</Var>
                    </Atom>
                </body>
            </Implies>
        </Forall>
    </Assert>
 </RuleML>
 

13.2 User Extension: Query and KB Languages for OO jDREW

Relax NG schemas require a specification of the patterns that may unify with the start symbol, in contrast to XSDs where any global element may serve as the document root. With a minor change to the main module, we can specify that the children of the RuleML root must be Query elements. We simply replace

start = Node.choice | edge.choice

with

 start = RuleML-Query.def
 RuleML-Query.def = element RuleML { RuleML.attlist, RuleML.header, RuleML-Query.main }
 RuleML-Query.main |= Query-node.choice*

in the main module of any RuleML language. The Assert and Retract elements are still available for use in Reify. Similarly, a Knowledge-Base language can be defined which accepts only Assert or Retract elements as children of RuleML. Such schemas are useful for GUI applications, such as OO jDREW, to provide guided authoring, and may also be distributed separately from an application for use in XML authoring software such as oXygen.

14 Relax NG Drawbacks and Issues

14.1 Relax NG Restrictions

14.1.1 Simple or Complex Content, but Not Both

Relax NG does not allow the use of types such as 'xs:any', because it is considered a poor design practice. The idea is that a pattern should have either simple-content (e.g. xs:integer) or a complex-content, with child elements or mixed elements and text, but not both. In Relax NG, it is possible to create these two patterns separately and then allow a choice between them, as in this example:

 Data = Data.simple | Data.complex
Data_simple-node.choice |= Data_simple.Node.def
Data_simple.Node.def =
 element Data { Data_simple.attlist?,  Data_simple.Node.content }
Data_simple.attlist &= commonNode.attlist?
Data_simple.Node.content |=
 xsd:duration
 | xsd:dateTime
 | xsd:time
 | xsd:date
 | xsd:gYearMonth
 | xsd:gYear
 | xsd:gMonthDay
 | xsd:gDay
 | xsd:gMonth
 | xsd:boolean
 | xsd:base64Binary
 | xsd:hexBinary
 | xsd:float
 | xsd:decimal
 | xsd:double
 | xsd:anyURI
 | xsd:QName
 | xsd:NOTATION
 | xsd:string
 Data.complex = element Data { anyElement* & text }
 anyElement = element * { anyThing }
 anyThing = anyAttribute* & anyElement* & text
 anyAttribute = attribute * { token }

Separately, we can reproduce the XSD content model of <Data> allowing simple or complex content, but without the attribute xsi:type in Relax NG as:

DataTerm.choice |= Data_any-node.choice
Data_any-node.choice |= Data_any.Node.def 
Data_any.Node.def =
 element Data {(Data-datt.choice & reData.attlist) &
 text & anyElement.def* }
Data-datt.choice |= empty 
reData.attlist &= commonNode.attlist?
anyElement.choice |= anyElement.def
anyElement.def =
 element * {
   attribute * { text }*,
   (text & anyElement.def*)
 }

(See http://ruleml.org/1.0/relaxng/modules/data_any_content_expansion_module.rnc .)

In the actual implementation, we also have to list all of the possible XSD datatypes in the declaration of Data.simple. This schema cannot be translated directly to XSD by Trang (because it violates UPA), but a post-processing step to replace the Data element with the xs:any XSD declaration can be carried out manually, or possibly using XSLT.

The we combine the simple and complex content as follows

 Data = Data.simple | Data.complex

Note: there has been in the past a discussion of introducing a <Structure> element that would replace <Data> with complex content. Pros: @@@ Cons: @@@

14.2 Relax NG Features that are Not Allowed in XML Schema Documents (XSD)

14.2.1 Interleave (&)

  • The interleave symbol (&) is used the combine the patterns for sequences while maintaining relative order of the original sequence elements.

For example a & b* matches (a b b), (b a b) and (b b a). The pattern a & (b*), or the multi-line

a & c
c= b*

are equivalent to a & b*. Thus, there is no simple way to protect a sequence in an interleave from having other elements inserted into the middle of it. If the pattern (a,b*) | (b*,a) is desired, then that is how it must be defined.

Atom =
 element Atom {
   (attribute closure { "universal" | "existential" }?
    & attribute node { xsd:anyURI }?),
   (meta* & oid? & degree?),
   ((op | Rel)
    & (Skolem | Reify | Ind | Data | Var | Expr | Plex | arg)*
    & repo?
    & slot*
    & resl?)
 }

This allows any number of slots and one slotted rest variable to appear before, after and in the middle of the positional argument sequence. However, the <meta>, <oid> and <degree> elements, if any, will appear before the (required) <op> or <Rel> element.. There is no corresponding construction in XSD schema, because of the co-occurrence of an unbounded number of one pattern and a bounded number of another pattern within the interleave, as well as the order requirement. This is one reason that the relaxed-form serialization can only be approximately translated into an XSD schema.

  • The RNC definition c = a? & b? could be translated into XSD using xs:all, but Trang does not translate in this manner.

14.3 Jing Issues

14.3.1 Simplification

  • The Jing simplification option (-s) produces a monolithic, flattened RNG schema where abstract and unreachable patterns have been removed. We use this procedure to generate human-readable content models for our hand-written schemas. Unfortunately, the process also modifies the pattern names, making it more difficult to identify the context associated with context-dependent content models. For example, instead of retaining our names, And-query.Node.type and And-inner.Node.type for the different content models of 'And' within 'Query' and 'And' elsewhere, the patterns are named And and And_2.
    • Work-around: (would work for the normal-form only, and depends on commercial software)
      • translate with Trang to modular XSD,
      • flatten with oXygen tools,
      • use rngconv to convert back to RNG,
      • and finally Jing to RNC.
  • Another consequence of the name changes produced by Jing -s is the incorrect transformation of the construct we use to handle the xsi:type attribute. The code for this consists of a number of statements of the following form:

 namespace xsi = "http://www.w3.org/2001/XMLSchema-instance"
 datatypes xsd = "http://www.w3.org/2001/XMLSchema-datatypes"
 namespace xs = "http://www.w3.org/2001/XMLSchema"
 duration.datatype.choice |=
   attribute xsi:type { xsd:QName "xs:duration" },
   xsd:duration

When Jing transforms this code using the (-s) option, it modifies all the namespaces, so that the prefix xs: is no longer defined. Instead, the namespaces are explicitly declared in elements where they are needed. The translated code looks like:

<grammar xmlns="http://relaxng.org/ns/structure/1.0">
 ...
 <define name="Data">
   <element name="Data" ns="http://www.ruleml.org/1.0/xsd">
     <choice>
       <group>
         <attribute name="type" ns="http://www.w3.org/2001/XMLSchema-instance">
           <value type="QName" datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes" ns="http://www.w3.org/2001/XMLSchema">duration</value>
         </attribute>
         <data type="duration" datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes"/>
       </group>
       <group>
           ...
       </group>
     </choice>
   </element>
 </define>
</grammar>

This bug in Jing has been reported.

However, Jing does not recognise the need to handle the xs: prefix because it appears in contents, rather than a component name. Without the declaration of this prefix, the schema is no longer valid. There are several ways this problem could be fixed:

  • Modify the schema after Jing -s application. Either modification could be made automatically with XSLT, although this is not yet implemented.
    • remove the xs: prefix from the attribute value in all declarations. This is something of a hack; it works only because the built-in datatypes have the same namespace as the default namespace of the simplified schema.
    • BETTER: declare the namespace in the value element, or a containing element. This has the advantage of being less intrusive, and would also be applicable to explicit datatyping with respect to user-defined datatypes.
  • Fix the software
    • The best solution from our standpoint has already been proposed (see jing/trang Issue 51) as a Trang option to flatten a schema by resolving include statements, but not transform further.
    • The next-best solution would be a patch to the Jing -s option so that it does not produce invalid code.

Temporary Solution: create a separate script for flattening the relaxed-form schemas that performs this modification after the Jing -s transformation to RNG, and before the Trang translation back to RNC. @@@To Do

14.4 Trang Issues

14.4.1 Documentation

Double-hash comments that do not annotate a component disappear during Trang translation, and there is currently no mechanism to indicate that a comment should annotate the root (<xs:schema>) element of the Trang-generated XSD schema. This is not a known Trang issue @@@but we could raise it on the Jing/Trang list@@@.

14.4.2 xs:any

14.4.3 Modules

Trang attempts to reproduce the module structure of the Relax NG schema when translating into xsd. There are several aspects to this issue:

  • Sometimes it would be better to have a monolithic schema. There is no Trang option to request flattening, although this option is available from the command-line version of Jing. It is possible to flatten the XSD schema that Trang produces, but this only works if we avoid creating "import" statements. This required a hack in the Relax NG modules that only define attributes, as these attributes get assigned to the local namespace and this generates "imports" in the translation process. So I inserted some element declarations into these modules that are not in use now, but might be considered in the future: element versions of the attributes.
  • One Relax NG module can affect how another is translated. To illustrate, let's compare the declaration of the <op> element in bindatalog and dishornlog. In dishornlog, two types are used for <op>, one in atomic formulas (<Atom>) and another in expressions (<Expr>).

In bindatalog, the following code (ignoring annotations) appears in atom_module.xsd:

 <xs:element name="op" type="ruleml:op-Atom.content">
 </xs:element>

In dishornlog, the following code appears in atom_module.xsd:

 <xs:group name="op-Atom.role">
   <xs:sequence>
     <xs:element name="op" type="ruleml:op-Atom.content">
     </xs:element>
   </xs:sequence>
 </xs:group>

In the case of bindatalog, Trang has found a more efficient means to declare the <op> element because it notices that only one xsd:type is used, and the extension point op-Atom.role is also not utlized. So it deletes the op-Atom.role extension point altogether and simply declares <op> as a global element of xsd:type op-Atom.content. Thus the introduction in dishornlog_normal of a different xsd:type for the <op> element in the expressions module expr_module causes atom_module to be translated differently. As a result, different versions of certain modules, such as atom_module must be kept; this is why we have separate directories of xsd modules for each sublanguage. This illustrates another reason for flattening the XSD schema, because there is no efficiency to retaining the modularization.

  • The directory structure of the Relax NG modularization is not preserved under translation. In order to reconstruct a modules directory, the auxiliary module files must be moved and the relative paths modified in the include statements of the main module. Fortunately, any errors in this procedure are quickly found by the validator, but it is still time-consuming, especially during verification where the Relax NG schema must be re-translated whenever any modifications are made. As a workaround, during the verification process Trang output is directed to a particular folder, and the test examples point to that local directory for the schema location. Once a final version has been uploaded, the test examples are rewritten to point to the online version.

14.4.4 xsi attributes

15 How to Validate with the RuleML Parameterized Relax NG Schema

This section demonstrates how to validate RuleML with the MYNG Parameterized Relax NG Schema.

15.1 Using Validator.nu

Validator.nu is a validation webservice that can validate an XML instance against schemas, including Relax NG schemas and Namespace-Based Validation Dispatching Language NVDL.

15.1.1 Instance File

  • You may provide the instance to be validated by URL address, file upload or pasting into a text field.
  • If using the URL address method, you will get best results if the HTTP header received from the address specifies the MIME type (such as application/xml or text/html) and charset parameter (such as utf-8) appropriate to your instance; otherwise, the results will contain some warning messages.
  • The RuleML examples directory contains a number of examples of valid and near-miss RuleML instances which are available for use as test cases.
  • When pasting into a text field, Validator.nu provides by default an html template. To validate directly against the RuleML schemas, delete this template and paste pure RuleML into the text field. An xhtml page with embedded RuleML may be validated with an NVDL schema, as we will show in Example 6.

15.1.2 Encoding

When using an instance specified by URL address or file upload, the option is available to specify an encoding, which may be different than the one specified by the HTTP header. The validator does not honor the encoding specified by the @encoding attribute of the XML prologue.

15.1.3 Schema

The RuleML Relax NG schema may be specified in several ways:

  1. A link specifying the name of a RuleML anchor language, such as http://deliberation.ruleml.org/1.02/relaxng/datalog_relaxed.rnc
  2. A link specifying the myng code of a RuleML language, such as http://deliberation.ruleml.org/1.02/myng-b7-d7-a3-l1-p3df-i5f-tf3f-q3-e0-s4b.rnc
  3. A link that calls the parameterized schema directly, such as http://deliberation.ruleml.org/1.02/relaxng/schema_rnc.php?backbone=x7&default=x7&termseq=x3&lng=x1&propo=x3df&implies=x5f&terms=xf3f&quant=x3&expr=x0&serial=x4b
  4. An NVDL script containing an NVDL rule that specifies validation against a URL, direct or redirected, to the PHP-driven parameterized schema. NVDL rules are available from the RuleML website that validate the following languages:
  5. A static copy of a driver schema created by the PHP-driven parameterized schema. The HTTP header should specify the MIME type as application/relax-ng-compact-syntax and charset parameter as UTF-8.
  6. A driver schema that includes the parameterized schema.

15.1.4 Other Fields

The other fields may be left at their default values, or modified as needed.

  • The Preset Schema field should be left as None.
  • Parser may be left as Automatically from Content-Type.
  • XMLNS filters are not needed to validate pure RuleML, but could be set to screen out other namespaces if the instance document contains some.
  • If you are unable to set up the HTTP header correctly, then the "Be lax about HTTP content-header" option may come in handy.
  • The Image Report is only relevant if you have images in your instance document.
  • We will turn on Show Source for the demonstrations.

15.1.5 Validator.nu Examples

15.1.5.1 Example 1 - Redirected Link to Parameterized Schema

Validation Success: Click Here to see a Validator.nu example attempting to validate a RuleML instance in the bindatalog language with the datalog relaxed schema, accessed via a redirected link to the parameterized schema. As expected, the validator succeeds: bindatalog is a sublanguage of datalog.

The actual URL that is accessed in this example is http://deliberation.ruleml.org/1.02/relaxng/schema_rnc.php?backbone=x7&default=x7&termseq=x7&lng=x1&propo=x3cf&implies=x7&terms=xf0f&quant=x7&expr=x0&serial=x4f. Redirections to the parameterized schema have been implemented for the original fifteen named RuleML sublanguages in the Deliberation family (except for the SWSL languages), from bindatagroundfact (binary Datalog with only ground facts) to naffologeq (First-Order Logic (FOL) with equality and weak negation), in both serializations (normal and relaxed form). A complete listing of these redirects is available at the website http://deliberation.ruleml.org/1.02/relaxng/.

Validation Failure: Click Here to see a Validator.nu example attempting to validate a RuleML instance in the bindatalog language with the bindatagroundfact relaxed schema, accessed via a redirected link to the parameterized schema. As expected, the validator finds errors: the RuleML instance is neither ground nor fact-only.

15.1.5.2 Example 2 - Direct Link to Parameterized Schema

Click Here to see a Validator.nu example validating a RuleML instance with a foreign namespace element. The foreign namespace is filtered out, and so is ignored by the validator.

15.1.5.3 Example 3 - NVDL

Click here to see a Validator.nu example validating a RuleML instance against an NVDL script that refers to the parameterized schema and also allows arbitrary elements from foreign namespaces. The NVDL script is:

 <?xml version="1.0" encoding="UTF-8"?>
 <rules xmlns="http://purl.oclc.org/dsdl/nvdl/ns/structure/1.0"
  xmlns:a="http://relaxng.org/ns/compatibility/annotations/1.0">
  <namespace ns="http://ruleml.org/spec">
    <validate
      schema="http://ruleml.org/1.0/relaxng/schema_rnc.php?backbone=x3f&default=x7&termseq=x7&
 lng=x1&propo=x3ff&implies=x7&terms=xf3f&quant=x7&expr=xf&serial=xf"
      schemaType="application/relax-ng-compact-syntax"/>

  </namespace>
  <anyNamespace>
    <allow/>
  </anyNamespace>
 </rules>
15.1.5.4 Example 4 - Static Schema

There are several reasons why a user may want to validate against a static copy of the RuleML schema, including efficiency and offline validation. Click here to see a Validator.nu example validating a RuleML instance against a static copy of the default output of the parameterized schema.

The schema may be obtained by any of three methods:

  • use the RuleML #GUI to display a direct link to the PHP script, click on the link, and save the output to a file;
  • scrape the schema driver displayed on the GUI;
  • download one of the zip archives, (one each for normal and relaxed form), and extract the schema driver file for any of the named sublanguages;

In all cases, the directory containing the schema driver must also contain the module directory, which is included in both of the zip archives.

15.1.5.5 Example 5 - Parameterized Schema via include

Users may wish to extend the RuleML syntax, and the Relax NG modular schema was designed to make such extensions as convenient as possible. Click Here for a Validator.nu example of an expansion of the RuleML parameterized schema by a Boolean operator for exclusive disjunction.

Four new modules are required:

  • xor_expansion_module.rnc contains the foundational definitions of the <Xor> element and the pattern for its <formula> edge element.
  • xor_stripe_skipping_expansion_module.rnc enables stripe-skipping for exclusive disjunctive formulas
  • xor_dis_expansion_module.rnc contains the expansion of the ConclusionFormula.choice pattern that should be included for languages with at least the expressive power of disjunctive Horn logic.
  • xor_fo_expansion_module.rnc contains the expansion of the patterns for Assert and Retract that should be included for languages with at least the expressive power of first-order logic.

The extended schema driver contains the include statements

 namespace rulemlx = "http://www.ruleml.org/1.0/ext"
 include "http://ruleml.org/1.0/relaxng/schema_rnc.php"
 include "http://ruleml.org/1.0/relaxng/modules-ext/xor_expansion_module.rnc" 
   inherit = rulemlx {start |= notAllowed}
 include "http://ruleml.org/1.0/relaxng/modules-ext/xor_stripe_skipping_expansion_module.rnc" 
   inherit = rulemlx {start |= notAllowed}
 include "http://ruleml.org/1.0/relaxng/modules-ext/xor_dis_expansion_module.rnc" 
   inherit = rulemlx {start |= notAllowed}
 include "http://ruleml.org/1.0/relaxng/modules-ext/xor_fo_expansion_module.rnc" 
   inherit = rulemlx {start |= notAllowed}

The first statement includes the RuleML language naffologeq with the relaxed serialization. The other three statements include the expansion schema modules.

15.1.5.6 Example 6 - xhtml with embedded RuleML

Click here

In keeping with the original purpose of Validator.nu, which is (X)HTML5 validation, we demonstrate the validation of RuleML that is embedded in the header section of an xhtml document in Example 6. NVDL is used to validate against three schemas, the xhtml Relax NG schema, the xhtml Schematron restrictions and a RuleML schema.

15.1.5.7 Example 7 - Validation of Relax NG schema in the XML-based syntax against the schema design pattern

Click here

In #Schema_Design_Pattern, a schema design pattern was introduced that ensures monotonicity of the language when expansion modules are freely mixed. Validator.nu can be used to validate a schema in the XML-based Relax NG syntax (RNG) against the meta-schema, also in the RNG syntax, that defines the schema design pattern. The meta-schema includes and redefines the standard RuleML schema, restricting the vocabulary of named patterns to three categories based on their suffixes:

  • Choice combine elements: suffixes (choice|main|content|value|datatype|sequence|defs)
  • Interleave combine elements:suffixes (attlist|header|notallowed)
  • No combine elements: suffix (def)

16 References

  1. BOLEY, H.; A. PASCHKE and O. SHAFIQ. 2010. RuleML 1.0: The Overarching Specification of Web Rules. In: DEAN, M., J. HALL, A. ROTOLO and S. TABET, eds. Semantic Web Rules. Springer Berlin / Heidelberg, pp.162-178.
  2. GAO, S.; C.M. SPERBERG-MCQUEEN and HENRY S. THOMPSON. 2009. W3C XML Schema Definition Language (XSD) 1.1 Part 1: Structures (W3C Working Draft 3 December 2009). [online]. [Accessed May 11, 2011], Available from World Wide Web: <http://www.w3.org/TR/xmlschema11-1/#cTypeAlternative>
  3. HIRTLE, D.; T. DEMA and H. BOLEY. 2006. The Modularization of RuleML [online]. [Accessed Feb 1, 2011]. Available from World Wide Web: <http://ruleml.org/modularization/>
  4. ISO. 2008. ISO/IEC 19757-2: Ibformation Technology - Document Schema Definition Language (DSDL) Part 2: Regular-grammar-based validation - RELAX NG [online]. [Accessed May 15, 2011]. Available from World Wide Web: <http://standards.iso.org/ittf/PubliclyAvailableStandards/c052348_ISO_IEC_19757-2_2008(E).zip>
  5. MAKOTO, M. 2011. Re: Theory Question: sub-grammars and sub-languages [online]. [Accessed Apr 28, 2011]. Available from: http://tech.groups.yahoo.com/group/rng-users/message/1345
  6. METSKER, S. J. and W. C. WAKE. 2006. Design Patterns in Java [online]. Addison-Wesley.
  7. W3C. 2004. XML Schema Part 1: Structures [online]. [Accessed Dec 8, 2010]. Available from World Wide Web: <http://www.w3.org/TR/2004/REC-xmlschema-1-20041028/>

17 Links

17.1 Version 0.91

17.2 Version 1.0

Retrieved from "http://wiki.ruleml.org/index.php?title=MYNG&oldid=19266"