Ian Davis asked the question:
When creating RDF schemas based on existing data formats you soon hit the inevitable decision point - should you simply produce an RDF description of the syntactic format or interpret it and produce a semantic model of the format?
This was one of the puzzles with trying to express the Atom Syndication Format ( RFC 4287) in RDF. (Ooh, that felt good - let me do that again: RFC 4287). A lot of time went into trying different versions (work from David Powell, Henry Story, Reto Bachmann-Gmür amongst others). The latest version ( this, I think) could maybe be described as being a hybrid. It has characteristics of a transliteration, so an AtomOWL instance graph could be viewed as describing a specific Atom format document, or alternately as an interpretation describing the entities and relationships that are defined in the Atom spec. It's a compromise which should reasonably fulfil several informal constraints:
- must be true to the Atom spec
- must be usable in RDF/OWL systems
- should be usable in a "degenerate" form
(The first point is really where the transliteration comes
from).
Now you'd think being true to the Atom spec would be pretty
straightforward. The RFC 4287 had the influence of people who
*know* related document specifications and are strong on web
architecture. However usefully describing all this in a model like
RDF/OWL is far from trivial. Web services, dude. Issues such as the
relationship between a resource and its representations when
considered in the light of change over time in an multi-lingual
environment where multiple encodings are possibles…anyway
that spawned a whole fresh bunch of work that Reto was looking at (
schema,
probably not the latest).
Being usable in RDF/OWL systems isn't anything like so broad a problem, however as it stands the schema's OWL Full and includes rules (neither of which I'm very comfortable with). But the subset of the schema without these parts probably captures enough for most purposes.
The specific
degenerate form part referred to is that implemented in
many (most?) aggregator tools. One of the significant aspects of
Atom is it introduces a fairly clean form of versioning. A specific
entry may be updated over time. Most tools forget previous versions
of an entry, but the information can be captured in Atom. The
degenerate approach says this entry resource has this single piece
of content, but in reality there may be multiple version of a
single entry. (This is where a key part of the rules come in, to
express a
Combined
Inverse Functional Property (CIFP) consisting of
atom:id,
atom:updated and
atom:content). The AtomOWL for this will probably
benefit from a good few more wash cycles, but seems to work
basically ok.
[If anyone can show me how to do CIFPs without rules, I'll be
most grateful…]
So, what of transliteration and interpretation? The approach taken here seems very close to that described by Harry Chen :
I first map the syntactic format of the data into RDF, and then build external inference rules to produce a more expressive semantic model of the original data.
Luckily the majority of the Atom semantic model can be mapped fairly directly onto simple entities/relationships. The difficult corners being the external models on which the Atom spec depends (WebArch++) and the twisty semantics around entry identity. (Come to think of it, I argued with a few folks over Atom's approach to id way back when. I thought at the time the use of a id URI to point to the representations of different resources (the content, link URI etc) was probably broken. I now reckon I was wrong about broken, but that ids could probably have been done in a little more straightforward fashion…)
One other point that might have relevance to the transliteration/interpretation choice is whether it will be necessary to produce data in the original format from RDF instantiations (maybe round-trip). I would guess a transliterated schema would be considerably easier, but whether that would be worth the cost of a suboptimal representation is another matter. I'm optimistic that roundtripping with AtomOWL will be possible, though Henry and/or Reto had doubts (I forget on which specific point(s)).
[Danny]