A first taste of the schema.org carbonated soft drink

I recently realised that in my Seki project it made sense to have any exposed HTML include its own description, amongst other reasons to support IKS-flavoured decoupled content management. I'll use RDFa because the mapping to RDF is more straightforward than HTML5 microdata and there's more comprehensive vocab coverage than microformats. But given that I'm exposing this stuff, it also makes sense to have it understandable by as many consumers as possible. Which pretty much means using schema.org vocabularies (straight RDF representations will also be available via conneg, there I might stick to existing well-known vocabs, see note below).
My initial raft of use cases are around having content that's (loosely) blog post-shaped, but even though schema.org has a section for blogging it isn't immediately obvious how to express this. (Now would probably be a good time to revisit AtomOwl, it got left in a very complicated state, Atom-in-Schema.org would tick quite a lot of boxes).
My typical item looks something like:
<http://hyperdata.org/Hello> a sioc:Post ;
	dc:date "2012-04-02T07:24:53.676Z" ;
	dc:title "Hello World!" ;
	sioc:content "My first post." ;
	foaf:maker [ foaf:nick "danja" ] .
Checking at the excellent schema.rdfs.org I found the following mappings pretty quickly:
schema:articleBody owl:equivalentProperty sioc:content .
schema:author owl:equivalentProperty foaf:maker .
sioc:content isn't quite right in my original as that's meant to be plain text, Dave Beckett's planet:content is probably better - it's like the old RSS 1.0 content:encoded except as a more sensible XMLLiteral. articleBody isn't perfect, for my app or for that matter for a lot of RSS/Atom/blogging-like apps. A more generic content would be better (which might be an articleBody, or it might be a description of the link or whatever, more on description in a mo).
Though I found near-enough mappings, the following suffer similar problems:
schema:name rdfs:subPropertyOf dc:title .
schema:datePublished owl:equivalentProperty dc:issued .
schema:Article rdfs:subClassOf sioc:Item .
name is one of those ultra-generic terms alongside title and label, mixed blessing: very easy to work with but don't offer very much information. For my purposes there isn't much to choose between them. datePublished seemed slightly more suitable than dateCreated or dateModified. Here I would have preferred to be able to use a more generic date, further qualifying only when necessary. Again Article is a bit on the specific side, I want to be able to use this for things like a del.icio.us-style bookmark, for this coverage rss:item, sioc:Item and atom:Entry are all a bit closer. Which leaves:
foaf:nick rdfs:subPropertyOf schema:additionalName .
Near enough.

Top-level terms

I think it would be very helpful if schema.org was a bit clearer about "top-level" terms. Right now Thing has description, name, image, url. Ok, not bad as a first pass against what's needed on the Web. But url is/should be redundant (but that's just my semweb prejudices), there's slight conflict between description and content-oriented terms like articleBody which has the intermediate node of Article. (This isn't a new phenomenon, RSS history is littered with the wreckage of content vs. description, and higher up the architectural tree it's one of the features of httpRange-14). Ok, maybe description is useful enough to leave alone, similarly name is probably reasonable to cover the top level of label, title, name. image I suppose is fair enough, a pragmatic approach to something that could easily get messy if more WebArch was brought into the picture. I guess my recommendations then would be to add a term Item (for a generic Information Resource, superclass of Article etc) and date (for a superproperty of all dates).

Automatic mapping

I haven't yet decided whether or not to use the Web vocab or schema.org versions of the terms in my internal RDF, I suppose I could even use both. But my little experience above demonstrates it's not yet obvious how to map across even with these really common terms. If the starting point was something richer, the amount of work involved could easily explode. Some kind of automation is desirable, for the benefit of someone like me in the current situation, a publisher of semantically marked-up HTML that would like their material to connect with the Linked Data Cloud, or someone writing an app that consumes data across different vocabularies. A service (or two) springs to mind: give it a term and it responds with correspondences from other vocabs, or give it a lump of data and let it offer a translation to the preferred vocab(s)/format. There are at least two approaches to implementation: SPARQL CONSTRUCT and/or RDFS/OWL inference (in both cases the use of generic superclasses/properties could be useful). The front end could offer something like the Rich Snippets Testing Tool for authors together with an open API for translation by app developers, to give a leg-up for integration/mashups. It would be nice if the good folks behind schema.org would consider throwing some resources in this direction.

See also :

Comments to G+ please


danja
2012-04-05T15:13:53+01:00
iks seki rdfa html schema.org semantic semweb rdf
Related
Comments
Edit

Translating between Schema.org and existing RDF

I just heard about a mapping from selected schema.org terms to SIOC (including a little FOAF and DC), it's a handful of statements using RDFS and OWL.

As one of the people responsible for the RDF Review Vocabulary I thought I should take a look what's needed there. It raises a couple of questions. As it happens, schema.org's model for reviews is a little more complex than our RDF vocab (after we went to all that trouble to keep it simple :), and a lot of the terms can't be mapped directly to well-known ones with RDFS/OWL.

e.g. in the RDF vocab:

<#something> :rating "5" .

using schema.org:

<#something> :reviewRating <#theRating> :ratingValue "5" .

So the first question is how best to express this? I think it's straightforward in SPARQL, e.g. for RDF vocab to schema.org for the terms above something like:

CONSTRUCT {

?something s:reviewRating [ :ratingValue ?value ]

} WHERE {

?something r:rating ?value

}

Or would some particular rule language be more appropriate?

The next question is how best to publish the mappings?

Where direct RDFS/OWL translation is possible, I think I'd be inclined towards including them in the RDF vocab, or at least linked via an rdfs:seeAlso.

Where rules are necessary, as above, I really haven't a clue. A simple online reasoning service could be useful (via some pre-cooked SPARQL maybe), but again how would you express that in the vocab?

I doubt I'll have chance to make the translations for Review in the near future (one contract I'm 7 weeks past deadline, another 3 weeks overdue, a couple of days overdue with those paper reviews...). But hopefully the lazyweb will be able to answer these questions in the interim.

That's a point - anyone fancy reimplementing lazyweb.org? It was very sweet, I think Ben only gave up on it due to lack of time to admin.

Oh yeah, and I still haven't implemented comments here yet, so for now please use email, Twitter, Facebook or Google+. (ooh, the Google+ link works properly for comments, but I guess non-G+ user can't comment...let me know if you want an invite)


danja
2011-07-21T19:23:30+01:00
sioc foaf schema reviews schema.org rdf mapping
Related
Comments
Edit