XML in JSON Revisited

From RuleML Wiki
Jump to: navigation, search

Author: Harold Boley
RuleML Technical Memo


State-of-the-art XML-to-JSON converters usually map XML elements to JSON objects whose key/value pairs represent element name/content pairs. Since such mappings can lose the order of child elements, these converters are not generally round-trippable. To address this issue, (compactifying and normalizing) order-preserving mappings to JSON arrays are proposed, reserving JSON objects for XML attributes. These mappings are exemplified with RuleML/XML instances and explored in a PSOA RuleML/XML use case.

1 Introduction

Since XML and JSON are widely-used data formats, their accurate conversion is a foundation for all higher levels of data/knowledge exchange and integration. As introduced in RuleML in JSON, our goal has been converting XML (specifically, RuleML/XML) instances to JSON (specifically, RuleML/JSON) instances in a round-trippable fashion, i.e. without information loss, so that each source instance can be recovered from its target instance by an inverse converter. The PSOA RuleML use case on Round-Tripping Between RuleML/XML and RuleML/JSON revealed some issues with (online-findable and -executable) state-of-the-art "XML to JSON" converters (and their often-provided "JSON to XML" counterparts). In particular, XML's syntactic order of child elements often carries semantic information, yet is frequently lost in JSON conversions. This motivates us to revisit fundamental data (re-)representation concerns and leads to a revised mapping method.

To exemplify, XML-to-JSON converters usually map an XML instance like

<Playlist>
  <Song>s1</Song>
  <Movie>m1</Movie>
  <Song>s2</Song>
</Playlist>

to a JSON instance like

{
  "Playlist": {
    "Song": [
      "s1",
      "s2"
    ],
    "Movie": "m1"
  }
}

losing the playlist's song-movie-song order. Their corresponding JSON-to-XML converters map this JSON instance to an XML instance like

<Playlist>
  <Song>s1</Song>
  <Song>s2</Song>
  <Movie>m1</Movie>
</Playlist>

with the s1-m1-s2 playing order changed to s1-s2-m1.

The fundamental issue is that XML child elements can be type-like "nodes" or role-like "edges". While the first XML instance above uses node children of types Song and Movie, whose order is implicit in their textual positions, the converters apparently assume edge children, whose order would be explicit in their edge names (so they could be textually rearranged). This assumption would be fulfilled, e.g., with edge names intro, main, and finish in a (Node-edge-Node-)striped XML instance (adopting Java's (Class-method-Class-)capitalization convention) like

<Playlist>
  <intro><Song>s1</Song></intro>
  <main><Movie>m1</Movie></main>
  <finish><Song>s2</Song></finish>
</Playlist>

which XML-to-JSON converters usually map to an information-preserving JSON instance like

{
  "Playlist": {
    "intro": { "Song": "s1" },
    "main": { "Movie": "m1" },
    "finish": { "Song": "s2" }
  }
}

While RuleML/XML uses a fully striped normal form, analogous to this <Playlist> instance, it also uses a stripe-skipped compact form, analogous to the first <Playlist> instance above. Many other XML applications do not (only) use edge children, so an "edge-child assumption" cannot be generally made.

This calls for a different method, which is generally applicable. The current paper defines and exemplifies two order-preserving XML-JSON mappings, one leading to "compact" JSON, the other to "normal" JSON (unrelated to RuleML/XML's compact and normal forms). It then outlines observations about the mappings, also touching on XML and JSON themselves as well as on RuleML.

2 Compactifying Mapping

First, we define the following order-preserving compactifying XML-JSON mapping from XML elements to JSON arrays that {optionally} have a JSON object for {optional} XML attributes:

  1. The first JSON-array entry represents the (double-quoted) XML-element name
  2. The second JSON-array entry represents
    1. the XML element's attributes {in case there are any} if it is a JSON object (which must be "flat", i.e. without nested JSON structures)
    2. the XML element's first child if it is not a JSON object
  3. The remaining JSON-array entries represent the (remaining) ordered XML-element children

According to this mapping, the first (stripe-skipped) <Playlist> instance of the #Introduction becomes

["Playlist",
  ["Song", "s1"],
  ["Movie", "m1"],
  ["Song", "s2"]]

The stripe-skipped XML instance enriched by attributes

<Playlist>
  <Song release="2014">s1</Song>
  <Movie duration="50 min" release="2018">m1</Movie>
  <Song>s2</Song>
</Playlist>

becomes

["Playlist",
  ["Song", {"release":"2014"}, "s1"],
  ["Movie", {"duration":"50 min", "release":"2018"}, "m1"],
  ["Song", "s2"]]

Likewise, the fully striped XML instance enriched by attributes

<Playlist>
  <intro><Song release="2014">s1</Song></intro>
  <main><Movie duration="50 min" release="2018">m1</Movie></main>
  <finish><Song>s2</Song></finish>
</Playlist>

becomes

["Playlist",
  ["intro", ["Song", {"release":"2014"}, "s1"]],
  ["main", ["Movie", {"duration":"50 min", "release":"2018"}, "m1"]],
  ["finish", ["Song", "s2"]]]

According to the compactifying XML-JSON mapping, given the three RuleML/XML source instances in RuleML in JSON#Advancements_and_Complete_Examples, the target instances become:

["Implies",
  ["And",
     ["Atom",
       ["Rel", "buy"],
       ["Var", "person"],
       ["Var", "merchant"],
       ["Var", "object"]],
     ["Atom",
       ["Rel", "keep"],
       ["Var", "person"],
       ["Var", "object"]]],
  ["Atom",
    ["Rel", "own"],
    ["Var", "person"],
    ["Var", "object"]]]
["Implies",
  ["if",
    ["And",
      ["formula",
        ["Atom",
          ["op", ["Rel", "buy"]],
          ["arg", {"index":"1"},
            ["Var", "person"]],			   
          ["arg", {"index":"2"},
            ["Var", "merchant"]],
          ["arg", {"index":"3"},
            ["Var", "object"]]]],
      ["formula",
        ["Atom",
          ["op", ["Rel", "keep"]],
          ["arg", {"index":"1"},
            ["Var", "person"]],			   
          ["arg", {"index":"2"},
            ["Var", "object"]]]]]],
  ["then", 
    ["Atom",
      ["op", ["Rel", "own"]],
      ["arg", {"index":"1"},
        ["Var", "person"]],			   
      ["arg", {"index":"2"},
        ["Var", "object"]]]]]
["Implies",
  ["if",
    ["And",
       ["Atom",
         ["Rel", "buy"],
         ["Var", "person"],
         ["Var", "merchant"],
         ["Var", "object"]],
       ["Atom",
         ["Rel", "keep"],
         ["Var", "person"],
         ["Var", "object"]]]],
  ["then", 
    ["Atom",
      ["op", ["Rel", "own"]],
      ["arg", {"index":"1"},
        ["Var", "person"]],			   
      ["arg", {"index":"2"},
        ["Var", "object"]]]]]

3 Normalizing Mapping

Next, we define the following order-preserving normalizing XML-JSON mapping from XML elements to JSON arrays that {always} have a {possibly empty} JSON object for {optional} XML attributes:

  1. The first JSON-array entry represents the (double-quoted) XML-element name
  2. The second JSON-array entry represents the XML element's attributes as an {empty, in case there are none} JSON object (which must be flat)
  3. The remaining JSON-array entries represent the ordered XML-element children

According to this mapping, the first (stripe-skipped) <Playlist> instance of the #Introduction becomes

["Playlist", {},
  ["Song", {}, "s1"],
  ["Movie", {}, "m1"],
  ["Song", {}, "s2"]]

The stripe-skipped XML instance enriched by attributes

<Playlist>
  <Song release="2014">s1</Song>
  <Movie duration="50 min" release="2018">m1</Movie>
  <Song>s2</Song>
</Playlist>

becomes

["Playlist", {},
  ["Song", {"release":"2014"}, "s1"],
  ["Movie", {"duration":"50 min", "release":"2018"}, "m1"],
  ["Song", {}, "s2"]]

Likewise, the fully striped XML instance enriched by attributes

<Playlist>
  <intro><Song release="2014">s1</Song></intro>
  <main><Movie duration="50 min" release="2018">m1</Movie></main>
  <finish><Song>s2</Song></finish>
</Playlist>

becomes

["Playlist", {},
  ["intro", {}, ["Song", {"release":"2014"}, "s1"]],
  ["main", {}, ["Movie", {"duration":"50 min", "release":"2018"}, "m1"]],
  ["finish", {}, ["Song", {}, "s2"]]]

According to the normalizing XML-JSON mapping, given the three RuleML/XML source instances in RuleML in JSON#Advancements_and_Complete_Examples, the target instances become:

["Implies", {},
  ["And", {},
     ["Atom", {},
       ["Rel", {}, "buy"],
       ["Var", {}, "person"],
       ["Var", {}, "merchant"],
       ["Var", {}, "object"]],
     ["Atom", {},
       ["Rel", {}, "keep"],
       ["Var", {}, "person"],
       ["Var", {}, "object"]]],
  ["Atom", {},
    ["Rel", {}, "own"],
    ["Var", {}, "person"],
    ["Var", {}, "object"]]]
["Implies", {},
  ["if", {},
    ["And", {},
      ["formula", {},
        ["Atom", {},
          ["op", {}, ["Rel", {}, "buy"]],
          ["arg", {"index":"1"},
            ["Var", {}, "person"]],			   
          ["arg", {"index":"2"},
            ["Var", {}, "merchant"]],
          ["arg", {"index":"3"},
            ["Var", {}, "object"]]]],
      ["formula", {},
        ["Atom", {},
          ["op", {}, ["Rel", {}, "keep"]],
          ["arg", {"index":"1"},
            ["Var", {}, "person"]],			   
          ["arg", {"index":"2"},
            ["Var", {}, "object"]]]]]],
  ["then", {}, 
    ["Atom", {},
      ["op", {}, ["Rel", {}, "own"]],
      ["arg", {"index":"1"},
        ["Var", {}, "person"]],			   
      ["arg", {"index":"2"},
        ["Var", {}, "object"]]]]]
["Implies", {},
  ["if", {},
    ["And", {},
       ["Atom", {},
         ["Rel", {}, "buy"],
         ["Var", {}, "person"],
         ["Var", {}, "merchant"],
         ["Var", {}, "object"]],
       ["Atom", {},
         ["Rel", {}, "keep"],
         ["Var", {}, "person"],
         ["Var", {}, "object"]]]],
  ["then", {}, 
    ["Atom", {},
      ["op", {}, ["Rel", {}, "own"]],
      ["arg", {"index":"1"},
        ["Var", {}, "person"]],			   
      ["arg", {"index":"2"},
        ["Var", {}, "object"]]]]]

4 Observations

Compared to object-centric mappings, the array-centric #Compactifying_Mapping and #Normalizing_Mapping have the following advantages:

  1. JSON-array-enabled order preservation for XML child elements within each parent element
  2. Natural alignment of the two main constructs of XML and JSON: Directly reflect the duality of XML elements and attributes by the duality of JSON arrays and objects
    • Rather than encoding both the attributes and children of an element, e.g. of <arg index="1"><Var>person</Var></arg>, as the keys of an object, e.g. in {"arg":{"@index":"1","Var":"person"}}, retain the distinction of attributes, within a nested object, from children, as nested arrays, e.g. in ["arg",{"index":"1"},["Var","person"]], also avoiding the need for an attribute-indicating reserved key prefix such as "@" (used above), "-", or "_"
  3. Instead of representing empty XML elements, e.g. <Playlist/>, as JSON objects with a key/value pair having a special value such as null or "", e.g. in {"Playlist":null}, represent them as singleton JSON arrays, e.g. as ["Playlist"]

The benefit of compact JSON is avoiding empty objects -- in the second JSON-array positions -- for all XML elements without attributes. The benefit of normal JSON is putting XML child elements always -- irrespective of possible attributes -- into the same JSON-array positions. To combine both benefits, compact JSON could be normalized before further processing.

By construction of the XML-JSON mappings, a compact JSON structure can be normalized by inserting an empty JSON object as the second entry of each JSON array if that entry is not already a JSON object. Conversely, a normal JSON structure can be compactified by omitting an empty JSON object as the second entry of each JSON array. Using both of these JSON-JSON transformations on any level of a structure (its top-level or arbitrarily nested), an "intermediate" JSON structure, with at least one compact and at least one normal structure on some level, can be both normalized and compactified. For example, the intermediate JSON structure

["Playlist",
  ["Song", {}, "s1"],
  ["Movie", "m1"],
  ["Song", {}, "s2"]]

can be normalized to

["Playlist", {},
  ["Song", {}, "s1"],
  ["Movie", {}, "m1"],
  ["Song", {}, "s2"]]

and compactified to

["Playlist",
  ["Song", "s1"],
  ["Movie", "m1"],
  ["Song", "s2"]]

The XML-JSON mappings target a subset of JSON where

  1. each JSON array (representing an XML element) must be non-empty and can only use a string (representing the XML-element name) as its first entry
  2. each JSON object (representing attributes of an XML element) must be flat and is allowed only as the second entry of a JSON array

While the mappings are invertible for this subset, a flat JSON object occurring where not allowed according to 2. -- including on the top-level -- can still be non-invertibly encoded as an empty XML element 'wrapping' the attributes that represent the object's key/value pairs. Moreover, nested JSON objects can be represented with nested key/value (slotted) constructs of XML applications such as PSOA RuleML/XML's untyped (Top-predicate) frameships (rather than its corresponding framepoints, since JSON objects do not have object identifiers). Complementarily, nested JSON arrays can be represented with RuleML's nested single-dependent-tuple Plexes.

5 Conclusions

The two order-preserving XML-JSON mappings and their inverses can be implemented directly or by compositions with JSON-JSON normalization/compactification. For a larger XML test instance see the complete RuleML/XML document datalogplus_min.ruleml, whose <Assert>-<Query>-<Retract> order should be preserved in JSON. Our main use case for XML-JSON mappings is PSOA RuleML/XML (with "...PSOA" instances available in exa subdirectories), whose specification includes Round-Tripping Between RuleML/XML and RuleML/JSON. This paper should enable invertible "deployment conversion" of RuleML/XML (instances, RNGs, XSDs, XSLTs) to RuleML/JSON while continuing schema development in RNC.