The only distinct thing I had to do today was "This Week's Semantic Web", from then it was the fairly open-ended task of getting tutorial material together for the Talis Platform. I lost most of the morning updating Eclipse on the laptops. This afternoon, while going through my sources I ran across a post from Sean B. Palmer, which led me down a right old rabbit hole.
It turns out that the history of visited pages in Firefox is kept in a text file, using the Mork format. This format is rather painful. Sean pointed to a Perl parser (which contains amusing comments, btw), Phil Wilson had done a bookmarklet too, which doesn't work with recent FF, so to me both of these paths sounded hard work.
But then I found a Python script, demork.py, that could convert the stuff into XML ( dunno how sbp missed that). So I tweaked that to produce RDF/XML: mork2rdf.py, here's a sample of the result: history.rdf. [PS. now I realise there should be some separation between the item and the person & their browsing history - next time...] You may notice I cleared my history earlier - it was a bit too big for experiments, and probably too embarrasing for publication. (Note that the script is now only for the history.dat, not for any other Mork files - apparently Thunderbird uses the format for contacts, but I couldn't find an example locally, save that for another day).
Yesterday, when I intended doing a bit more Erlang, I wound up instead playing with Python on the Platform, got as far as posting data up to a store. So I couldn't resist trying the history RDF with that (the posting Python is just a first pass - after sleeping on it I decided the structural approach wasn't very good, but still the relevant method was easy enough to get at). I'd already done:
python mork2rdf.py history.dat > history.rdf
So after a bit of fiddling in Python I then set the auth credentials for my store (the Platform uses HTTP Digest) and did:
from talis_platform import SimpleTalisGraph
file = open("history.rdf", "rt")
data = file.read()
file.close()
graph = SimpleTalisGraph()
graph.set_credentials(TALIS_STORE_URI, TALIS_USERNAME, TALIS_PASSWORD, TALIS_REALM)
print graph.postWithDigestAuth(data)
which responded:
URI : http://api.talis.com/stores/danja-dev1/meta
HTTP Error 204: No Content
Which was what I was after (urllib2 thinks anything other than a
200 is an error - go figure). Now I can run SPARQL queries over the
stuff online from its
endpoint.
The data is now automagically merged with the beginnings of my
Personal Graph, though I haven't got enough in there yet
to make it particularly interesting/useful. But there's another
handy RESTful
interface
which gives you RSS 1.0 vocab data, and as it happens I'd modelled
the history resources as using rss:item. So I went to the search
interface, entered "Python" and got the related material as
feed
data. Voila:
ADD
by subscription.
Although it means "This Week's SemWeb" will be late again, I reckon I'll file all this as structured procrastination.
@en