Ian Davis asked the
question:
When creating RDF schemas based on existing data formats you
soon hit the inevitable decision point - should you simply
produce an RDF description of the syntactic format or interpret
it and produce a semantic model of the format?
This was one of the puzzles with trying to express the Atom
Syndication Format (
RFC 4287) in RDF.
(Ooh, that felt good - let me do that again:
RFC 4287). A lot
of time went into trying different versions (work from
David Powell,
Henry Story,
Reto
Bachmann-Gmür amongst others). The
latest version (
this,
I think) could maybe be described as being a hybrid. It has
characteristics of a transliteration, so an
AtomOWL instance graph could be
viewed as describing a specific Atom format document, or
alternately as an interpretation describing the entities and
relationships that are defined in the Atom spec. It's a compromise
which should reasonably fulfil several informal constraints:
- must be true to the Atom spec
- must be
usable in RDF/OWL systems
- should be usable in a "degenerate" form
(The first point is really where the transliteration comes
from).
Now you'd think being true to the Atom spec would be pretty
straightforward. The RFC 4287 had the influence of people who
*know* related document specifications and are strong on web
architecture. However usefully describing all this in a model like
RDF/OWL is far from trivial. Web services, dude. Issues such as the
relationship between a resource and its representations when
considered in the light of change over time in an multi-lingual
environment where multiple encodings are possibles…anyway
that spawned a whole fresh bunch of work that Reto was looking at (
schema,
probably not the latest).
Being usable in RDF/OWL systems isn't anything like so broad a
problem, however as it stands the schema's OWL Full and includes
rules (neither of which I'm very comfortable with). But the subset
of the schema without these parts probably captures enough for most
purposes.
The specific
degenerate form part referred to is that implemented in
many (most?) aggregator tools. One of the significant aspects of
Atom is it introduces a fairly clean form of versioning. A specific
entry may be updated over time. Most tools forget previous versions
of an entry, but the information can be captured in Atom. The
degenerate approach says this entry resource has this single piece
of content, but in reality there may be multiple version of a
single entry. (This is where a key part of the rules come in, to
express a
Combined
Inverse Functional Property (CIFP) consisting of
atom:id,
atom:updated and
atom:content). The AtomOWL for this will probably
benefit from a good few more wash cycles, but seems to work
basically ok.
[If anyone can show me how to do CIFPs without rules, I'll be
most grateful…]
So, what of transliteration and interpretation? The approach
taken here seems very close to that
described by
Harry Chen :
I first map the syntactic format of the data into RDF, and
then build external inference rules to produce a more expressive
semantic model of the original data.
Luckily the majority of the Atom semantic model can be mapped
fairly directly onto simple entities/relationships. The difficult
corners being the external models on which the Atom spec depends
(WebArch++) and the twisty semantics around entry identity.
(Come to think of it, I argued with a few folks over Atom's
approach to id way back when. I thought at the time the use of a id
URI to point to the representations of different resources (the
content, link URI etc) was probably broken. I now reckon I was
wrong about broken, but that ids could probably have been done in a
little more straightforward fashion…)
One other point that might have relevance to the
transliteration/interpretation choice is whether it will be
necessary to produce data in the original format from RDF
instantiations (maybe round-trip). I would guess a transliterated
schema would be considerably easier, but whether that would be
worth the cost of a suboptimal representation is another matter.
I'm optimistic that roundtripping with AtomOWL will be possible,
though Henry and/or Reto had doubts (I forget on which specific
point(s)).
[Danny]