Transliteration, Interpretation and AtomOWL

Ian Davis asked the question:

When creating RDF schemas based on existing data formats you soon hit the inevitable decision point - should you simply produce an RDF description of the syntactic format or interpret it and produce a semantic model of the format?

This was one of the puzzles with trying to express the Atom Syndication Format ( RFC 4287) in RDF. (Ooh, that felt good - let me do that again: RFC 4287). A lot of time went into trying different versions (work from David Powell, Henry Story, Reto Bachmann-Gmür amongst others). The latest version ( this, I think) could maybe be described as being a hybrid. It has characteristics of a transliteration, so an AtomOWL instance graph could be viewed as describing a specific Atom format document, or alternately as an interpretation describing the entities and relationships that are defined in the Atom spec. It's a compromise which should reasonably fulfil several informal constraints:

  1. must be true to the Atom spec
  2. must be usable in RDF/OWL systems
  3. should be usable in a "degenerate" form

(The first point is really where the transliteration comes from).

Now you'd think being true to the Atom spec would be pretty straightforward. The RFC 4287 had the influence of people who *know* related document specifications and are strong on web architecture. However usefully describing all this in a model like RDF/OWL is far from trivial. Web services, dude. Issues such as the relationship between a resource and its representations when considered in the light of change over time in an multi-lingual environment where multiple encodings are possibles…anyway that spawned a whole fresh bunch of work that Reto was looking at ( schema, probably not the latest).

Being usable in RDF/OWL systems isn't anything like so broad a problem, however as it stands the schema's OWL Full and includes rules (neither of which I'm very comfortable with). But the subset of the schema without these parts probably captures enough for most purposes.

The specific degenerate form part referred to is that implemented in many (most?) aggregator tools. One of the significant aspects of Atom is it introduces a fairly clean form of versioning. A specific entry may be updated over time. Most tools forget previous versions of an entry, but the information can be captured in Atom. The degenerate approach says this entry resource has this single piece of content, but in reality there may be multiple version of a single entry. (This is where a key part of the rules come in, to express a Combined Inverse Functional Property (CIFP) consisting of atom:id, atom:updated and atom:content). The AtomOWL for this will probably benefit from a good few more wash cycles, but seems to work basically ok. [If anyone can show me how to do CIFPs without rules, I'll be most grateful…]

So, what of transliteration and interpretation? The approach taken here seems very close to that described by Harry Chen :

I first map the syntactic format of the data into RDF, and then build external inference rules to produce a more expressive semantic model of the original data.

Luckily the majority of the Atom semantic model can be mapped fairly directly onto simple entities/relationships. The difficult corners being the external models on which the Atom spec depends (WebArch++) and the twisty semantics around entry identity. (Come to think of it, I argued with a few folks over Atom's approach to id way back when. I thought at the time the use of a id URI to point to the representations of different resources (the content, link URI etc) was probably broken. I now reckon I was wrong about broken, but that ids could probably have been done in a little more straightforward fashion…)

One other point that might have relevance to the transliteration/interpretation choice is whether it will be necessary to produce data in the original format from RDF instantiations (maybe round-trip). I would guess a transliterated schema would be considerably easier, but whether that would be worth the cost of a suboptimal representation is another matter. I'm optimistic that roundtripping with AtomOWL will be possible, though Henry and/or Reto had doubts (I forget on which specific point(s)).

[Danny]

Danny Ayers
2005-12-21T13:29:31Z

Related
Comments
Edit