Small Data

I'd just like to plant a little flag in the sand. Big Data seems to be the flavour of the month (and is undeniably extremely useful and interesting), but I've a gut feeling that might be symptomatic of not seeing the wood for the trees (or maybe vice versa).

I've not thought this through much, but surely any trends/correlations/relationships that are important enough to be of interest should be detectable without having to build a terabyte+ store? Rather that trying to capture as much raw data as possible up front, I suspect a more productive approach long-term will be to work with (maybe federated) crawler farms, with lots and lots of algorithms running in parallel over what they see. If there are appropriate training feedback loops in place, the shape of algorithms themselves could be treated as the results of the analysis.

It could be argued that once you have accumulated a corpus of raw data you can subsequently throw whatever you like at it without having to get the raw data again. But that corpus will never be complete or truly fresh - as new data appears on the Web all the time. More critically, under normal circustances you can never be sure you've got a dataset that contains a good sample representation covering whatever unknowns you're exploring. But crawlers can be directed to favour slices of the Web that contain information relevant to your hypotheses.

So, in the context of the Web, the Web itself should be the only big data needed. Which gives a neat parallel in the other sciences: reality itself is the only database you'll ever need :)

Ok, in the same way that Big Sites (like Wikipedia/dbPedia) adds big value to the Web alongside lots of small pieces, loosely joined, the same no doubt goes for Big Data. But let's not forget the vice versa, a complementary Small Data approach.

Somewhat orthogonal to this, one way in which the Web is a game changer for data is that here the relationship between pieces of data (/documents) is at least as significant as those pieces of data stacked on top of each other. Link Rank is a special case, an aggregated, flattened view of link value. If topics and entities (i.e. thing in general, people, places, concepts etc) and their interrelationships are inferred and/or explicitly named, it should expose some interesting facets of how human knowledge works.

Comment to G+ please.


danja
2012-01-30T10:04:06+01:00
algorithms federated ai science rdf data
Related
Comments
Edit

Search plus Your World - fool's gold

For quite a while I've held the view that most current approaches to Web search are fundamentally flawed, because the best way to find something is not to lose it in the first place. But as the companies invested in search gradually get smarter in their use of person- and (to a lesser extent) thing-oriented data, rather than just word association (football) search results seem increasingly more focused. Google's approach in particular has grown increasingly like the model put forward in the Semantic Web initiative. Recently with G+ we see a big push to capture and exploit data associated with personal profiles (the FOAF domain) and brands (the GoodRelations domain, although maybe there's a role for an additional brand- rather than product-oriented vocab). With Rich Snippets and Schema.org there's a direct use of semweb technology (in a slightly mangled form - One True Ontology is a well-known antipattern to anyone that bothers to look at the literature).

In fact the "Your World" part of Search plus Your World (SPYW) can be seen as a reinvention of the most important part of Semantic Web technology, that of giving everything of significance a URL: people, places, things, concepts. Given that, you can start describing and leveraging relationships between those resources. To use a phrase I think originated around microformats, it's lower-case semantic web. Ok, behind the quality glitz of G+ profiles and pages this seems to have been done in a rather sloppy, ad hoc fashion, but that in itself is fine - whatever it takes. But where Google get it very wrong is by putting themselves at the heart of their system. Not only is semantic in lower-case, so is web. If you do a search with SPYW enabled, you're pointed straight back into the Google Empire. They are making themselves gatekeepers of the Web. Although there aren't any concrete entry barriers to this walled garden, by only signposting Google's footpaths in search results it's creating a system with the same characteristics as say AOL around 2000. From Google search being a vital accessory on the open Web, it's increasingly becoming a portal.

There is already a visible cost in practice to Google's echo chamber - if you want to re-find something one of your colleagues said the other day, sure SPYW is helpful. But if you're trying to do some original research, you don't want to be searching with Your World blinkers on - an engine without those preconceptions such as DuckDuckGo will be more useful

This strategy I'd assert is doomed to failure for the same reason AOL's walled garden collapsed, to use another phrase I like to repeat, because no matter how big any single entity becomes, the rest of the Web will always be bigger. The focus on the user/Don't Be Evil thing is absolutely right to highlight the value of non-Google resources, although it does fall short by suggesting that the rest of the Web is just a handful of other companies [G+ link] i.e. Twitter, Facebook etc. Google's own long-term survival as a market leader is absolutely dependent on their respect of the Web at large.

So what should Google do? Re-read Steve Yegge's awesome rant [G+ link] for starters. Especially the bits about Platforms. G+ and Your World should be considered in this context - as a semantic (any case) Web (upper case) Platform. For example, while Google's pages appear to be aimed at providing the canonical URLs for concepts (...lower-case). But there's already an excellent source of such URLs : Wikipedia. In itself Wikipedia only provides URLs of documents who's primary topic is the thing in question, but dbPedia is a well-established mapping based on best practices from thing identifiers to Wikipedia pages (e.g. <http://dbpedia.org/resource/Berlin> foaf:isPrimaryTopicOf <http://en.wikipedia.org/wiki/Berlin> . ). If a handful of students from obscure north-European universities (heh, sorry, just for the sake of contrast), with a little community support can create and maintain - give the world - a service supporting all the concepts/things covered by Wikipedia, imagine what the mighty Google could achieve...

To give a little example in the context of Personal Profiles, if I publish my definitive personal profile on my own domain (note Google already understands all the elements of this) then for queries for which "me" is the appropriate response, that page should be the first hit, not my G+ profile.

Another factor in the walled nature of G+ is the limited API. I'm sure features will be added to this in the near future, but I hope (probably unrealistically) they will use proper standards and follow known best practices. Going further into over-optimistic territory, I'll quote Tom Gruber (in an interview talking about how Siri works) :

A site that exposes RDF usually has an API that is easy to deal with, which makes our life easier. For instance, we use geonames.org as one of our geospatial information sources. It is a full-on Semantic Web endpoint, and that makes it easy to deal with. The more the API declares its data model, the more automated we can make our coupling to it.

What should we (as users and components of the Web) do? Well, basically what we're already doing...but trying not to be distracted by shiny things and keeping an eye on the long term - standards are good. When we publish data on the Web we need to consider the quality of the data first (i.e. make it 5 Star), seeing it as purely Google-fodder is missing the point.

Comments please [Google+ link, the irony is not lost on me :)]


danja
2012-01-28T12:59:52+01:00
google semweb rdf spyw
Related
Comments
Edit

Establishing Logical Truth on the Web

Here's true and false

[http://purl.org/stuff/true and http://purl.org/stuff/false]

I'm sure they already had URIs somewhere before (http://dbpedia.org/resource/True is nearly there...) but it seemed a nice idea to give them some solid (?) semantics too, fortunately there's at least one media type available - so it's "true as in Javascript". Took a few minutes to set up to give that media type. Tried PHP first but it doesn't seem properly configured on this server (which is weird, I'm sure I've got PHP stuff live). Anyhow, Apache2 config for hyperdata included mod_python from who-knows-when, so I used that.

true.py is:

from mod_python import apache

def index(req):

req.content_type = "application/javascript"

return "true"

with .htaccess (same dir) as:

RewriteRule ^true$ true.py

- plus corresponding stuff for false.py.

I don't like the way the PURL redirects, can that be done transparently I wonder, keeping the same URI in the address bar?


danja
2012-01-20T19:27:25+01:00
truth false logic uris true rdf
Related
Comments
Edit

Different Modes of Browsing

Browsers have certainly evolved since the WorldWideWeb browser in 1990 into pretty sophisticated pieces of kit, supporting rich views of HTML and many other media types, along with a powerful version of code-on-demand through Javascript. But in certain respects they're still very primitive. It was probably unavoidable, but there's a significant conceptual gap between what the browser can do as a general-purpose tool and what it can do as a container for site-localized Web Applications. Take Gmail as an example of the latter - very much the same ballpark as desktop mail applications. But move away from that domain and all gmail's functionality becomes inaccessible. We're still a way off a genuine Web of Applications.

One obstacle to maximizing the Webiness of Web Applications is found around the way buttons are used, directly mimicking the behaviour of desktop applications. But on the Web, the best affordances are associated with links (i.e. URIs + HTTP). In this context we should expect more of Web Applications - that the application should be built primarily as a Web API, i.e. a regular Web Site, so that the affordances are available to other applications. It should be trivial for me to check the contents of my gmail inbox from the comfort of my own Home Page. However I'd hazard that the business models of the big Web brands are likely to hinder development in these directions - Google, Facebook, Amazon etc. run somewhat counter to the open Web, in that they are motivated in keeping you in their domain (or in extreme cases like Apple and MS, in their own devices). Web Intents seem to me to be a good start towards enabling more flexible yet uniformly accessible interactions.

Over the years the read/write nature, or rather lack of it, has been discussed an awful lot. Even though the first browser included an in-place page editor, the current model still doesn't really support this. One big reason for this is that HTML - even HTML5 - falls short of supporting the full range of HTTP methods. The predominant approach to writing to the Web is through major intermediation by Content Management Systems. While CMSs are generally a very good idea, the fact that they're built on an effectively hobbled client means they aren't as Webby as they should be. There are genuine technical obstacles to generic writeability, notably those related to authentication and authorization, though hopefully WebID will help there.

The metaphor of the browser is itself quite limiting. Generally we only have one Web document open - visible on the screen - at a time because we can only read one thing at a time. Even with the development of tabs, the browser still essentially reflects this modal model of Web resources. I think I read about work on accessing data across tabs, but as far as I can see it doesn't exist yet. Ok, desktop applications are also under the restriction that we can only look at one thing at a time. But it's a lot more common to interact with multiple independent data sources/sinks and processing components there.

A browser can pretty much support a general-purpose HTTP client (through script), but because we're so used to thinking in terms of the Web of Documents and requirements there, the one page at a time modality is deep in the mindset. Service mashups, be they client or server-side, all really aim towards focusing down on a (primarily read-only) single-document view. A critical aspect is that traditionally the link has basically just one meaning - navigate to another page (do a GET and display the results). But while a link on a Web of Data could correspond to the same thing, it could also mean 'GET the data and merge it into the local store' or 'use this URI to filter the current view' or any number of pivot-like operations.

Ok, this is in danger of turning into another rant...let me sidestep by highlighting one specific browser affordance.

The Turn-Around Button?

One link-oriented metaphor for the browser Back Button could be walking down a footpath, junctions in the footpath corresponding to the available links presented to us on a Web page. Clicking the back button has the effect of walking backwards to the previous junction. We are still facing in the same direction. But what if we metaphorically turn around? Ok, the outlinks on the page will look just the same, but other data is available, our whole history. Why not present the current page alongside a recent history page (like chrome://history/) so we can hop back further - a turn around button. Yes, the Back Button may drop down a list showing the history, but richer information could be provided in the main window such as the links followed as a tree.

From a data perspective, variations on a Back Button might mean 'remove this data from the local store' or simply 'Undo'.

Dunno. Work needed on RDFAffordances.

See also: Identifying Applications State


danja
2012-01-20T12:29:32+01:00
intents applications metaphors actions web browsers rdf
Related
Comments
Edit

Introducing dork

That's Descriptions of Runtime Klasses. Some simple Java for getting RDF out of code trees.

The RDF can be used to generate class diagrams, like this:

class tree

An interesting aspect of the Web Beep project is processor pipelines. To optimize things I needed to play with parameters easily so wound up building a system interface covering the processors and pipelines. As it stand in the source now, the configuration is set up from Java structures. But to see what the configuration is, a recursive toString() on the Java structures yields a fairly structured text description of the configuration (there's an example on the How It Works page).

This led me to think that if such descriptions could be used to describe existing configurations, they could also be used to set up those configurations. The format's ad hoc, so first it made sense to look at using something standard. The processor pipelines are essentially graphs (with annotations) so RDF was naturally the hammer I chose. The general processors/pipelines model is encoded (better word?) in the Java class structure, so if I could get that in RDF it'd be a good start. It's general-purpose stuff so I've split it off as a separate project at github and given it a silly name.

This kind of thing's been done before, in fact I'm hoping to incorporate David Huynh's doclet (for use with Javadoc to generate RDF) as well in the near future. But that approach gets its data 'statically' from the source, whereas the parameters at runtime are important for Web Beep's processors etc. I've made a start on the write-up with the code (ermm, Javadoc's todo :), but one key thing is just using a describe() method in the kind of places you might use a toString(). It should return a snippet of Turtle-syntax RDF describing the object in which it appears. I've also made a start on some easy-to-use utility methods that use reflection to extract a description of objects which doesn't rely on them having a describe() method, bit of a lighter touch.

As a sanity check on the generated RDF I made a (pretty trivial) SPARQL query with XSLT transform to GraphViz dot format, the result of which can be used (with straightforward command-line tools) to generate images like the one above. [I remembered half way through that Redland's rapper utility can output dot format, but that's RDFy (see screenshot) and I'm after something much more app-specific.] There's a little script which shows how the image was arrived at.


danja
2012-01-19T22:12:46+01:00
java dork dot diagrams class sparql rdf
Related
Comments
Edit

Introducing JEdwards

JEdwards is a little sub-project I've just been putting together in Java. Screenshot.

It's so named for two reasons:

  1. it's roughly a contraction of "towards a Javascript editor"
  2. it's something you probably want to ignore (like twincest :)

Having said that, it does have a couple of features that may be of interest to sane developers:

  1. a Java terminal emulator (bash shell)
  2. syntax highlighting for SPARQL/Turtle

Neither are entirely finished, but both are useable/reusable (Apache 2 license, or somesuch).

evil jedward

I've been using Eclipse for most of my dev stuff for years now. When I was doing things in Node.js I wound up configuring it to have a file explorer pane, a text editor pane (for Javascript, HTML, Turtle or SPARQL) and three terminal panes all connected to the local shell. Eclipse was basically a (slow) sledgehammer to crack a nut. I did spend a while looking for a way of setting these things up using separate apps, but was beaten by the problem of pinning the windows to the workspace. I believe it should be possible using Devil's Pie or similar, but I had no joy. But as it happened I wanted a terminal emulator in Java anyhow and had played with syntax highlighting before.

In Scute I'd put together some basic highlighting for Turtle, except when I came to look at it again it was a bit too hardcoded to reuse, and Javascript is quite complicated... Looking around I came across jsyntaxpane, which is a pluggable highlighter which takes its config from a JFlex lexer. It'd got the necessary for Javascript, so I decided to use that instead of my hacky code. I found a SPARQL/Flex file on the Web that someone had prepared for IntelliJ IDEA which although was geared to do other things saved me a bit of time writing out the SPARQL patterns. Here's sparql.flex.

For the terminal emulator I started with the JConsole UI from BeanShell, to which I've adding the bits which talk to the bash shell. It works ok on this Ubuntu machine, I've no idea what would be needed to set it up for a different OS. The source for that is here.

I started Scute, a desktop RDF toolkit, just over a year ago. I did get some bits working fairly well - I was using the SPARQL bits for real - but then I got distracted and left it largely unusable... This JEdwards bit of coding has got me back into it, and tightened up how I was thinking about the dev process. I must write this up properly. The main idea is, while it should be built from reusable components, the way it's setup as a whole will be optimized for how I want to work. Somewhat inspired by woodcarving, where a lot of the time what's best isn't a general purpose tool (wood router or software IDE) but a highly focused tool (1/4" No.4 fishtail gouge or JEdwards). If the resulting code is useful for other people, great, but the motivation isn't to create a product, just to help my own personal workflow. Horse before cart dogfood.

The reusable components part comes from testing. I'm lazy about tests at the best of times, and Scute is all about GUI so is a bit tricky to test. But I reckon component-level functional tests make a fair a substitute for unit tests. Anyhow, more about this another day.


danja
2012-01-18T19:12:14+01:00
scute terminal emulator jedwards sparql turtle syntax highlighter rdf
Related
Comments
Edit

Listy Thing - aspirations

Speaking on the phone to my brother, I told him about the Listy Thing I've been working on, he pointed me to workflowy. It's an outline/list todo thing that already does a big chunk of what I had in mind for Listy Thing (quite funny they've also got a 'y' on the end). The UI is awesome, which on the one hand is inspiring in demonstrating feasibility, on the other scary, showing how far I have to go.

It is basically what I'm after, only I want something backed by RDF so that more data can be associated with nodes (especially nodes which correspond to Web resources), the data can be reused, and many alternate views are possible.

I'm still a little stuck on the fundamental question of how best to represent lists, I guess I just have to try things out. Had some good suggestions on the G+ page - there's even an Ordered List Ontology.

The issue's a bit conflicted, because on the one hand useful ordering is generally tied to some particular property (e.g. dc:date) so the list structure can be generated on demand (via SPARQL or whatever), no additional ordering is needed in the data. But then as far as user experience is concerned, as a list is being put together the order can be totally arbitrary - i.e. there is an order, only we're not quite sure what it is yet. This might suggest using rdf:List as a general purpose mechanism.

I think I'll try some kind of low-cost property (with a numeric value). So a property, which after all is just another kind of resource, gets minted when the list is created in the UI. Ideally I suppose it'd be a bnode but a quasi-disposable URI will do. Dunno, give it an rdfs:label on the fly and associate it with user/date of creation?

I use the namespace http://purl.org/stuff# for "disposable" classes and properties (feel free to follow suit). They're Cool URIs in the sense that they'll always resolve (although I must add RDF docs to that URI), disposable in the sense that they appear in instance data but won't have any more definition.


danja
2012-01-16T09:25:24+01:00
lists rdf
Related
Comments
Edit

1. HTML 2. RDF

I posted the other day a note-to-self re. Listy Thing, though I haven't done any coding on that since I have been puzzling over one particular bit, which boils down to : how best to represent lists in RDF. Ok, there's rdfs:List and Wikipedia has quite a good definition of list, but that's not really the end of the story, there are a few practical considerations.

The conceptual model I want is just the usual thing of a finite ordered sequence of things, for sorting out my own lists: shopping, project resources and todo, bookmarks, list of people (friends & circles) etc. etc. The items in these lists will generally be either text or links (which fits nicely with RDF literals and resources)...or other lists. Ok, that just jumped into graph-land. Also the lists/items may also be lots of other things as well - like a todo list might contain tasks (yeah, got an unfinished vocab for that as well). An item may be multiple lists and so on. But the practical model I'm after is pretty much a direct reflection of HTML lists:

    ,
      and
    • to allow easy rendering/manipulation in the browser.

      One of my favourite typed-item lists of all time is (appropriately enough) Enrico Franconi's Description Logics Course. I want to be able to put things like that together really easily and - the why RDF? part - reuse the data easily.

      I've played with very similar stuff before with something I called XOW, XHTML Outlines for W6, where W6 was a simple vocab for adding just a bit of semantics to resources (addressing the questions who, why, what, when, where, how - I think it was Libby set me going down this path). Beh, loads of link rot around there, must fix - basically you could make lists of typed items in the browser, the result could be sent through XSLT to produce RDF.

      I've come at it afresh from the motivation of wanting to sort out my own lists, sod any wider problems. The tools are much better this time around, but the modelling thing is stil a quandary.

      With a bit of googling I've found some good scripts to get started and have been noodling on the HTML side - drag and drop reordering of lists, with in-place editing. Current home: live here, in github here (note - dev branch of seki). It's not far off what I reckon I need.

      Now there's the fun bit of expressing this material as pure data to stick it in a store and access via SPARQL (1.1). First pass at least I'll probably use XSLT in lots of places, the transformations should be pretty straightforward, but the SPARQL side is a bit tricky. Andy Seaborne has done a great post about lists with SPARQL Update, and as I'm using his Fuseki for storage it stands a good chance of working (heh).

      But I want to be able to muck around with these things a lot, so I'm wondering whether it might be advantageous to also 1. overlay some old-fashioned RDF container stuff on the lists as well (i.e. rdf:Seq) and even 2. simple ordinal property values, something like :

      :contains [ a :listitem; :value ; :inlist :position "43" ] .

      Dunno, may this might help with matching quirks like those Andy mentions, "the empty list isn't any RDF triples, so looking for lists isn't just looking for rdf:rest properties", (rdf:nil keeps running away!) and with SPARQL 1.1 property paths, list elements do not necessarily come out in order.

      Funnily enough, lists in RDF seem to attract a lot of caveats - there's the old stuff about how 'weak' containers are, the lovely line in the RDFS spec: "Just as a hen house may have the property that it is made of wood, that does not mean that all the hens it contains are made of wood, a property of a container is not necessarily a property of all of its members.". In the same spec, the delightful: "RDFS does not require that there be only one first element of a list-like structure, or even that a list-like structure have a first element.". But if it walks like a duck and quacks like a duck then it's made of wood, hence a witch.

      Suggestions very welcome, here's a G+ Page , see if that works for comments.

      PS. also related is the Linked Data API stuff re. lists, see e.g. listvalued_props


danja
2012-01-13T20:10:36+01:00
lists html rdf
Related
Comments
Edit

Hixie's Furniture

Too long; read later - here's a demo : SPARQL Sliders Test

+Ian Hickson posted a lovely semweb use case:

"I'd like a search tool for furniture that works like Google's Flight Search does for flights. That is, with sliders so I can say what type of furniture (table), what range of widths (1-2m), lengths (2-5m), and heights (1-2m), what material (wood), what thickness, what price range, etc, I'd like, with the list of available products updating in real time."

As it happens I wanted a slider thingy ages ago, so this was a good prompt to make a demo of the front end part which takes the values from slider components and uses them in a SPARQL query.

For convenience/lack of available data the demo runs against dpPedia via the SNORQL SPARQL Explorer. As furniture and it's dimensions wasn't available it uses cities and their populations and elevations.

So how would you get real data?

First of all, furniture vendors could either provide dumps of their data or, more Webby, mark up their sites with RDFa and/or HTML5 microdata using e.g. the GoodRelations e-commerce vocabulary.

Ultimately, for a front end like these sliders to work, the data would need to go in a store with a SPARQL endpoint. But, triplestores shouldn't be thought of as just a wacky alternative to a SQL database. A triplestore is just a cache of a little chunk of the Linked Data Web. The question of where the store resides and how the data is collected is entirely open. Following the more traditional DB model, a service might aggregate the data published by known furniture suppliers and provide the endpoint online.

But alternately, a local user agent (I think Chris Bizer had a little Java example, can't find the link...there are others) could crawl the Web to answer the query just-in-time. The advantage of this approach is that it's more thorough and the only real option for totally arbitrary queries, the downside being that it's answer will probably take longer than milliseconds. But remember triplestores are caches, not every little bit of information would have to be discovered and read from every page. There are vocabs for dataset and vocab discovery (remind me of the acronyms please :) Note too that you're not limiting your client agent to a single datastore. traditional backends (SQL or NoSQL) are effectively isolated silos, triplestores are integrated with the links of the Web.

Incidentally, this is something that might be nice to express as a Web Intent, along the lines of "make me a query from this template with these parameters and apply it to this endpoint, putting the results into this widget" (that's a bit verbose for a general-purpose intent, but you get the gist). c.f. RDFAffordances.




danja
2012-01-11T15:01:56+01:00
sparql demo goodrelations rdf hixie furniture
Related
Comments
Edit

The Emperor's New Client

A wee rant.

Ok, I'm totally with the consensus that the future is Cloud-based, and to be a little more specific Platform-based and to be even more specific primarily HTTP-based. To back that up, cf.

But to expand something I mentioned in passing here recently :

in one respect the emperor is stark-bollock naked. Browsers are currently a really sucky environment for client development. Sure, the HTML/CSS-based (standard!) rendering is wonderful. As shown with Node.js (and despite what Google are saying around Dart), Javascript is a reasonably pleasant, perfectly capable programming language. The growth of Ajax and JSON have shown inter-system comms is workable. There are some good dev tools and libraries. So why does working with this stuff feel like pulling your own teeth?

Here I could point to the traditional DOM API, blame the W3C for all the world's ills and an awful lot of people would nod and smile knowingly. But although that's arguably valid (heh), I reckon the problem is more systematic and can mostly be blamed on browser developers.

Ok, blame is too strong. The decisions made over the years and the directions taken have generally been perfectly rational in the context of the prevailing conditions. But there have been feedback loops at work. The flashy [sic] chrome [sic] surrounding HTML dev, from the img tag onwards, has pulled Web developers in like moths around a flame. So the browser developers act to improve that experience. Meanwhile server-side tech has developed out of the corporate legacy of silo-based systems. Let me quote Steve Yegge there: "It's a big stretch even to get most teams to offer a stubby service to get programmatic access to their data and computations.". The way services are offered over the Web, even Web 2.0 services still have a big hangover from this mentality. I'd argue that most Web APIs are only marginally better than SOAPy stubs. Largely because XML and JSON aren't particularly Web-friendly. Ok, don't bite my head off, let me qualify that.

First XML. There have been plenty of arguments over the years around XHML, and back in the day (I wonder how old that phrase is) there were arguments about the XML nature of RSS. Postel's Law, the "Robustness Principle" got cited a lot. Let me give you some deja vu:

Be liberal in what you accept, and conservative in what you send.

What a lot of people misinterpreted was the keyword robust. A robust system is one designed to be able to fail gracefully or continue working acceptably with noisy data. That's exactly what we want for the Web, right? Well not necessarily, if I was ordering a book from Amazon, and there was a partial failure, I'd rather they didn't make a best-guess when it came to taking money of my credit card (I think paraphrasing Tim Bray there). Anyhow, XML is not robust, by design. XML is designed to bail out completely at the first sniff of anything dodgy. As it happens, the way XML is often served on the Web is without proper regard for the media type, i.e. dodgy and hence broken.

Sorry, that was gratuitous deviation, the real reason I'd say XML isn't Web friendly, like JSON, is in the way people use it. Whether data is conveyed as name-value pairs or through more complex structures, the key parts are generally just simple strings. But by itself, a string on the Web is next to useless. You or I can (maybe) read it, or even paste it into Google and get a definition. But what is a poor machine client to do? What makes the Web are links. It's 101 but somehow still manages to be overlooked: the link has two facets: a universally unambiguous name (URI/IRI) and a protocol for following it (HTTP). If a client on the Web encounters a link, it can follow its nose to find out more information about it. That's what we as humans do in browsers all the time, yet when it comes to Web services for some reason a simple string is seen as adequate to identify something.

Ok, with XML, the HTML DOM and to some extent JSON there's been some justifiable resistance to the use of URIs for names, because namespaces have traditionally been uninuitive at best and agony at worst. Using URIs instead of simple strings certainly adds a burden (it doesn't have to be that great, check Turtle syntax), but its benefits far outweigh the costs.

The thing is, you'll hear talk of snowflake APIs - only one implementation of each exists - but what gets overlooked is that by their very nature, most APIs just aren't Webby. The client must have prior knowledge that the service at endpoint X uses API Y. What you end up with is effectively a series of 1:1 client-server connections. That, while the uniform interface REST may mean it's less brittle than an RPC connection, still means tight coupling.

Ok, you might argue, that for any communication to take place, some prior knowledge is required. Sure, but that can be minimised - just like the way we follow links for more information in a browser, a service client can follow links to get more information. This is only a small conceptual step, but what it enables is hugely powerful. Above everything else, it's what Linked Data and the Semantic Web gets right.

I reckon that browser developers, with their emphasis on doc-oriented HTML have a natural tendency to carry their experience in that domain across and apply it to data. Naturally namespace-less XML and JSON will seem preferable through that lens. But in practice, documents and data are apples and oranges. Browsers have been optimized over the years for the former, incidentally making the latter harder than necessary.

It's funny how you don't hear so much about service mashups these days, despite their undeniable coolness. I'll assert that it's because developing for Web data in the browser is bloody hard work, especially when there are NxN arbitrary API mappings to know.

Overall it's actually something of a miracle that the notion of cloud-based platforms has emerged.

I had planned to say more about Cloud Computing Outside of the Browser - or to put it another way, evolving old-fashioned non-browser Rich Internet Clients (as well as server-server and every other non-browser configuration). But ranting's worn me out. Anyhow, in short, I reckon that for the forseeable future, non-browser clients in many circumstance are probably preferable to browser-based equivalents, primarily because they're easier to develop (as I keep saying, I reckon the agent model of combined client/server units is a good way to go). While I personally welcome HTML5 and the APIs as a clean-up of document markup and processing, when it comes to data it isn't even a Band-Aid.


danja
2012-01-09T20:00:25+01:00
apis cloud browser services rdf
Related
Comments
Edit

Dart H. Vader

I just heard about Dart (via Seth Ladd and Edd), a new Web programming language from Google. It aims to fulfil the role Javascript currently has, only doing it better. On the pro side, new languages are inherently cool, and Javascript can be a real pain. On the con side it seems unlikely that any browsers other than Chrome will support it in the foreseeable future, except potentially via translation to Javascript, i.e. This Page Best Viewed with Chrome

It's hard not to see echoes of the old Microsoft arrogantly pushing it's own product here (remember VBScript?), although Google have in recent years made NIH an artform. But who cares about politics, how's this going to affect the Web?

Well, Code-on-Demand does appear in Fielding's thesis (slightly bizarrely as an 'optional constraint') and has been around since the early days. Pluggable clients are certainly a good idea, and Google have been leaders in moving Rich Internet Applications as opaque desktop apps into the browser using Javascript. The apps are still pretty opaque (View Source on gmail if you doubt that) but they do at least more-or-less run cross-browser.

I've not read much of the Dart docs yet, not tried it at all, but first impressions are that it's a nice clean syntax not unlike JS (or for that matter Java, C# or Python...) and they've already got a good bunch of libs together (even if they do include RPC, yuck!).

As an aside, it should be noted that there's a cost to the standardization of today's browser as Web client (in the process of being defined via HTML5 and associated APIs). It does mean an effective monoculture of HTTP clients. Arguably you can write whatever kind of client you like (probably in Javascript) and host it inside a browser, but they have been optimized for a fairly specific app scope. If you stray from the general model of a Web of HTML Documents you're in for an uphill journey. The arbitrary desktop client has more freedom to use HTTP more creatively, but then there won't be one on everyone's desktop. (Personally I like the notion of Web agents (where an agent = client + server + persistence + code) as an abstraction for Web components, as in "Two Webs!" [pdf - heh]. I wonder, is there a HTTP server in Dart yet?)

Looking at the "Leaked internal dart email" (as with UK politics, it's probably sensible to take the "Leaked" aspect with a pinch of salt), there does seem to be some motivation for Dart coming in response to the success of iOS. I'm pretty sure a new language isn't the best response to this, but it certainly makes a change to the usual big proprietary Flash/Silverlight kind of issues. Google are still talking of evolving Javascript, but it does raise the question of what Dart will offer that couldn't be achieved using JS. Optional typing is the feature they seem to be plugging most. So I wondered if anyone had worked on adding static types to JS. Funnily enough, the first few hits refer to iOS. Oh dear, we're really not talking iOS envy, are we?

It's a little surprising that Google haven't thrown their expertise at the JS-is-a-mess issue previously, I don't see a groundbreaking dev tool and pattern library out there (funnily enough the Dart Editor is based on Eclipse, which does seem a bit un-groundbreaking (although I'm not criticising the choice, Eclipse is my main IDE)).

Whatever, it should be interesting to watch how this pans out. Dart will almost certainly be a very cool language, albeit engendering ambivalence everywhere outside Google. Give me a shout when it includes libs for non-HTML Web languages (i.e. gimmee RDF :)

Comments (G+)


danja
2012-01-06T20:48:18+01:00
google language programming dart rdf
Related
Comments
Edit

Listy Thing - note to self

Spent this morning having another go at sorting out my lists and links. The aim is to keep them in a triplestore (probably Seki/Fuseki/TDB/Jena) and to be able to add, organise & edit them in a browser. I'd better leave this, have a nap then get on with something else now. So to help me remember where I'm at:

  • Rearrange & in-place edit (with jQuery) worked on test page, doesn't yet work on real data (crashes browser!)
  • Editory thing - four-pane CSS seems ok, CKEditor looks good for rich content, need to play (not tried with above yet, not sure about cross-list D&D)
  • Did a dump of links from Chrome, ran through Tidy, XSLT (xsltproc) to ul/li and split into separate lists - basically working ok

tidy-default bookmarks_1_6_12.html > bookmarks.xml

xsltproc lib/bookmarks-split2lists.xsl bookmarks.xml

for file in *; do mv "$file" "${file}.html"; done

  • Vocab - no idea for textual list items, Annotea has http://www.w3.org/2002/01/bookmark#Bookmark, also Tag Ontology
Everything in github under hyperdata/lists

PS. got some dump-from-del.icio.us code tagliatelle. Need to try AndyS's rdf:List with SPARQL 1.1 Update stuff.


danja
2012-01-06T15:15:08+01:00
lists bookmarks links rdf
Related
Comments
Edit

Scutter's Mate

As I was admiring the Linked Open Vocabularies Endpoint (LOV-E) it occurred to me that the vocabs I maintain (well, create and forget...) aren't particularly discoverable. Even before saying they're vocabs, there's not necessarily anything linking in to them (yes, really forget). Ideally I suppose I should put together a proper Semantic Sitemap, but for now I've thrown together a quick and dirty directory walking script in Python: scutters-mate.py. It produces a Turtle listing of the RDF files it finds (by filename extension) containing entries like this:

<http://hyperdata.org/xmlns/meta.ttl>  rdfs:seeAlso <dogmood/index.ttl> .
<dogmood/index.ttl> rdfs:seeAlso <http://hyperdata.org/xmlns/meta.ttl> .
<dogmood/index.ttl> format:format <http://purl.org/stuff/formats/text/turtle> ;
rdfs:label "text/turtle" .

Here I ran it in the /xmlns directory and saved the output to xmlns/meta.ttl.

I'm thinking I'll also run it from the root of all the domains I use, then try and remember to link to /meta.ttl wherever appropriate to give the scutters a helping hand.

Comments (G+)


danja
2012-01-04T20:16:46+01:00
sitemap scutter vocabs rdf data linked
Related
Comments
Edit

Web Beep - where next...

Minor tweaks aside I've got Web Beep to a good milestone, basically proof-of-concept.

Boxes ticked:

A good point at which to put it on one side and get on with some rather more pressing bill-paying stuff for a while.

But it'd nice to have a clue on next steps. There are a few potential directions:

Ports

The obvious one is in-browser Javascript. While the HTML5 APIs look the best route long-term, it's not so obvious right now. There are things already around like making .wav data: URIs, and also dynamicaudio.js - which looks very promising, it supplies a Flash player for browsers that don't support the API. Until very recently I expected there to be a need for DSP libraries (there is a dsp.js) but as it happens it only requires trivial stuff and there's the Java to refer to, all easily hacked. (The only "serious" DSP bit is the Goertzel algorithm, but that itself is easy-peasy, already done: goertzel.js, literally only took a couple of minutes).

There might be uses for desktop UI-based codecs, but I don't know what...I might well hook something up to the current implemetation, see if it inspires.

Some kind of mobile device app should have potential.

But this all is all very tied to another dev direction -

Applications

What to do with the darn thing? danbri's put some good ideas down with ChirpChirp (that I've still not fully digested).

Nicholas J Humphrey had a brilliant suggestion, use them on radio - nearly every programme these days (BBC R4 at least) seems to read out one or more URIs.

I've not got a smartphone so am pretty clueless about that kind of Apps, but presumably there are a few around there.

Doing stuff with DSP and/or GA and/or RDF

Building the thing led to a couple of collateral proto-products: a little genetic algorithm-based optimizer and the makings of a DSP vocab/ontology.

There has been work done already around DSP and semweb tech by the dbtune and omras folks. The Henry service is a sweet example of the kind of thing that's possible, it's "...able to perform audio processing tasks to answer a particular query". The shape/scope of their ont does seem a bit different to what I've been finding, though obviously there's overlap. My inclination is to derive what's needed from the running code then later align it with their material.

With a reusable system-description mechanism in place (i.e. a DSP vocab) it should be straightforward to apply the genetic algorithm optimization setup to any system which depends on a bunch of parameters and has a notion of fitness.

I've also got a few other personal tie-ins with this - the opportunity to tie the DSP (and analog SP) bits to the SPICE in RDF stuff I was playing around with last year, and going back somewhat further, updating the RPP vocab from over a decade ago (I'll get these things finished eventually...). From a suitable level of abstraction there looks to be interesting potential overlap with data processing too - check David Booth's RDF Data Pipelines for Semantic Data Federation.


danja
2012-01-03T14:21:37+01:00
ga pipelines genetic webbeep algorithm dsp web rdf beep
Related
Comments
Edit

Web Beep

I've just gone live with a little fun service : Web Beep - enjoy!

Comments to G+ please


danja
2011-12-31T19:22:33+01:00
audio dsp web rdf beep
Related
Comments
Edit

A very compact database query language based on binary relations

Kragen thought it through a bit: http://canonical.org/~kragen/binary-relations.html

(God I love people that are cleverer than me)


danja
2011-10-19T17:08:50+01:00
rdf
Related
Comments
Edit

Queries

Words from Kragen, and I really can't answer this, I hope he doesn't mind me sharing:

[[

I've been thinking about how most queries don't really need to have free variables in the predicate position of a triple — that is, in most queries, you know what all the predicates are going to be, and your variables are only in the subject and object positions. Is this true in general, or is it just me?

I've been thinking about a different query language syntax that takes advantage of this, but it has to fall back on reification when it actually does need a variable predicate.

]]


danja
2011-10-19T16:20:38+01:00
rdf
Related
Comments
Edit

A Role Model of Consciousness

Past few weeks I've been on pause, my head not working properly. Finally got around to seeing doctor yesterday, now waiting for antidepressants to take effect. I haven't totally wasted my disconnected time, watched a lot of stuff. Including a Midsomer, a couple of Bargain Hunts and a geeky-great vid on poker bots (have I said I really like Berlin? This is a Chaos Communication Camp production, wonderful material). Simulating an actual poker player is really hard, but it got me thinking about the similarly hard problem of what consciousness is, appropriately mental for my state of mind.

Caveat, I'm not up to date on theories in psychology or even AI. Last big thing I read anywhere near this was a lay-reader book I think with "Intelligence" in the title, about what humans are really good at is predicting the future - pretty good hypothesis IMHO. Maybe someone can enlighten me about current thought (I'll cc Planet RDF). But the thing that has been on my mind is more old-school, the internal model bit I think was popular around the 17th century, gone downhill since. Although it may well be rubbish as human stuff, something makes me imagine it might be worth thinking about for machine stuff. I really like the agent metaphor.

Ok, generation 0, we have an agent (A) in a universe (U), and it just sits there. It's a rock. It's surrounded by other agents (which might also be rocks).

a blob in a universe

Generation 1, we have an agent capable of interacting with the environment, but its interactions are pretty minimal, starting somewhere around a pebble on a beach that has a wander with each tide up to a living creature that has built-in stimulus-response maps along with learnt ones. Kinda Behaviourist. I'm starting with the pebble because interaction with the environment can take a lot of forms, and there's quite a history from at least the Neolithic of generally anthropomorphic agency views of facets of the environment (weather etc) through the Bronze Age deities up to the modern-day religious mythologies.

a blob interacting with environment

Generation 2 we approach the Enlightenment and/or Smalltalk. The agent in question has an internal model of the universe containing the agents outside.

a blob with an internal model

On generation 3 we come to the bit that I'll call novel until someone points to an 18th century philosopher who already suggested this. The agent in question has had all its sensors and actuators geared up to the outside world for a while, as well as sensors (and actuators) connected internally. By the mechanisms of Intelligent Design, Natural Selection and copy, paste and tweak a bit, it notices parallels between interactions with the external agents and interactions with itself. It develops a sense of self as another model very similar to the models it has for external agents. Here's the novelty - first the agent becomes aware of external agencies, only then by analogy it becomes aware of itself.

a blob including a model of itself

Like all the great (as in most entertaining) theories this is of course unverifiable. But I like the notion that the local stuff only appears after some level of comprehension of the remote stuff, feels like it might be useful somehow.

Comments to the big G+


danja
2011-10-15T20:59:10+01:00
mind intelligence psychology federated ai mad model rdf
Related
Comments
Edit

Sell Out

A couple of days ago I got another mail from someone wanting to put links here to their client's. Unusually this seemed written by a human, so I didn't immediately bin it. Insert our links in your old posts and we'll give you some dollars (the figure I think was $50 a link), and the targets will be either relevant and/or to educational resources. Given that I'm in the red right now, and given my recent amount of enthusiasm for paid work, I said ok, bring it on.

It would have been better if I'd been able to do Sebastian Trüg's approach, having a real project to which to donate, but bugger it, I've added a donate button to this blog. Now go visit my sponsor.


danja
2011-10-06T20:48:17+01:00
federated money rdf
Related
Comments
Edit

A couple of days ago I got another mail from someone wanting to put links here to their client's. Unusually this seemed written by a human, so I didn't immediately bin it. Insert our links in your old posts and we'll give you some dollars (the figure I think was $50 a link), and the targets will be either relevant and/or to educational resources. Given that I'm in the red right now, and given my recent amount of enthusiasm for paid work, I said ok, bring it on.

It would have been better if I'd been able to do Sebastian Trüg's approach, having a real project to which to donate, but bugger it, I've added a donate button to this blog. Now go visit my sponsor.


danja
2011-10-06T20:47:55+01:00
federated money rdf
Related
Comments
Edit

Check Sums

I was offline last week when the news broke that CERN folks announced they'd found a discrepancy between the assumed speed limit of the universe and the way their neutrinos appeared to behave, 20 parts per million. That's a pretty big anomaly when you consider dogs can detect salami in 9 parts per billion of the kitchen (that paper will be published once I've got 99 other co-signaturies who don't mind their crotches being sniffed). I was offline because I was feeling pretty crap after a boozy weekend, lightweight compared to previous exploits but after the hangover had passed I was left in an ultra-violet funk.

Incidentally, for a few days, going to sleep I wound up picking a random, unloaded word that flashed by on my Cartesian plasma screen, "mink", repeating it as a voiceover in said theatre as a mantra to keep demons at bay. I have since rationalised the word - it's a potential HTML5 rel value to correspond to URIQA's MGET. But that's by-the-by.

The too-fast neutrinos went from CERN to Gran Sasso. After dopplering my funk, I was curious about the constant thing. I knew where CERN was (because I watched The Champions as a child) but though I'd heard of Gran Sasso, couldn't place it. As any good mental illnessity goes, my funk featured a good proportion of guilt (getting sweary on social networks leaves you a bit shamefaced).

Now looking on the map my funk shifted back up the spectrum, if you draw a line on the globe from CERN to Gran Sasso it goes straight through this house. Those faster-than-light neutrons came through here (ok, a little underground, but I do leave my empties in the cantina). So how's that for something to feel guilty about - screwing up the model of the universe..?

Which is why I can be sure they got their sums wrong. My empties would have slowed them down. You're probably 40 parts per million out guys.


danja
2011-10-04T21:50:35+01:00
federated cern physics rdf c neutrinos light
Related
Comments
Edit

stupid computers

Train of thought. The world imagined by machines won't ever be a direct reflection of the world experienced by animals. But that's not a bad starting point, go all Plato and have the computers as the shadowplaw. Maybe the current generation of computers aren't capable of doing the 3D of a child's first discovery of a 4-leaf clover. They will though, probably in my lifetime. But there's the map/shadow, and there's stuff we can do well in this world, stuff that the machines are good at. A virtual reality with rules that are consistent with this side, but take advantage of that side.

Perhaps I'm getting a little too excited about being back online again.


danja
2011-09-21T23:38:47+01:00
federated rdf
Related
Comments
Edit

Speed

A passing observation. It's bloody slow this Web thing. I have terrible wire bandwidth here, but it isn't that that is the bottleneck. Me, ask anyone, might be smart but he's slow witted. Not as slow as this thing.

Picture a couple of people that know each other fairly well, have a spoken language in common. Say they've been out and are trying to figure out the best way of getting home. Bang bang bang bang, the ideas will flow. The Interwebs know the best way to get a taxi, walk or bus. Augmented by the smartphone. But it don't quite work. Computers + data = knowledge. Not.

Even if this machine in front had better than human standard AI, it would still be slow and useless right now compared to a (stupid) talking human. We are missing bits we need to take advantage of the technology. The back end seems to function well, the front end seems like it's as good as it gets. So why do these things behave as if they are slow and stupid?

Passing observation, I honestly don't know. But I feel we should be able to find out. How? Dunno.


danja
2011-09-21T23:24:15+01:00
federated rdf
Related
Comments
Edit

RDF, where art though

In comments on a post on G+ I said something I might regret:

"There are plenty of RDF-based applications around, but none really have much broad public appeal."

Ade Oshineye responded with "why do you think that is?"

Ok, overnight I remembered there's at least one app (or set of apps if you prefer) that uses RDF and has a lot of adoption: Drupal. According to Wikipedia it's used on at least 1.5% of Web sites worldwide, and has RDF in its core. Then there's data.gov.uk, a public-facing national government site that's RDF through-and-through. I'm a little out of touch, there are no doubt quite a few other good examples of where I'm wrong.

But given that RDF has been around for 5 years*, it's the way of doing data on the Web and virtually every Web-oriented app uses data somewhere, why isn't it ubiquitous?

(* solid specs came out in 2004 although SPARQL wasn't until 2008 so I'm splitting the difference for a rough date for when it became usable)

RDF isn't something that's going to be in your face anyway, so "broad public appeal" is slightly off-target. Developer adoption may be a better key. Whadever.

In terms of it as a database tech, compared to relational DBs (MySQL etc), custom data handling (Twitter uses Ruby message queues), novel DBs (Facebook uses a key-value store Cassandra apparently) RDF stores don't get much of a look-in. Ok, arguably the big scale things need to be custom to hone performance, but why, alongside the Big Data handling, don't we see RDF augmentation?

For consuming apps and desktop apps, I can't actually think of any well-known ones off the top of my head (I think quite a few of the music apps on Linux use librdf under the covers). I don't have a mobile device - any iPhone apps?

What I find a little bizarre (and please give me counter-examples), is that in the areas where RDF really shines - Web-oriented data integration and reuse - there are hardly any well-known apps out there at all, using any technology. There are a handful of feed aggregators and things like techmeme, but the level of integration there is pretty trivial. (Before Kingsley jumps down my throat - OpenLink Virtuoso is seriously good at this kind of stuff out of the box - but what I'm after is where these things are being used by twitter-sized demographics).

There's certainly something to what Lee Feigenbaum said the other day, the wrong question is usually asked, it should be: What can I do with Semantic Web technologies that I wouldn't do otherwise?

In terms of app-building, right now most parts of most things can be built relatively easily using other technologies, so unless the RDF stack is part of the developer's on-hand toolkit (like e.g. LAMP) it won't be first choice. I do suspect that while the false perception that RDF is complex per se isn't so prevalent these days, there's still a notion around that RDF is complex for the benefits it offers. i.e. linked data isn't perceived as a significant value-add, so why bother? The primary objectives can be acheived by pushing around little JSON objects ("jobbies"?) in a fairly arbitrary fashion, so why look further? But data on the Web surely isn't a niche thing...

Feel free to shoot me down in flames from all angles over this one (I'm not interested in advocacy here so don't care if I expose the wrong message) - I also suspect there's still something in the idea that people simply don't get it. While developers seem to have no problem representing pretty much anything in local databases, the idea that anything can be represented on the Web in a similar way hasn't been grasped. I reckon there's good evidence in virtually every high-profile project. Things tends to be focused on HTML (with a little Javascript) and the browser experience. For service-oriented systems the unwritten assumption is that the services will tie into the same view. I'm certainly not saying that this focus is wrong (those user-facing components are vital), just that it can lead to a blinkered view of what is possible. Only relatively recently have developers at large started looking at things like the identity of people on the Web. You still don't see the same attention given to everything else in the world - products, ideas, activities. Ok, you might point to activity streams and the like, but the subject of those activities still largely tends to be doc-oriented: messages or posts. You might point to schema.org and microdata as ways in which people in the Web development community can put data on the Web. But scratch the surface and the main goals underneath are things like SEO, most of the data being expressed is document metadata, not data about the real world. (Next time you go shopping, notice your interactions with the world from finding your car keys onwards, compare and contrast with the Amazon experience.)

The other day I posted a question on G+ that probably should have gone here: All the necessary components were in place for online social networks, in a distributed form, before Facebook & co. came along: blogs, aggregators, the various protocols. So why were Facebook & co. so successful? (got some good comments there, and was very pleased to find out Andreas Kuckartz is researching the question)

The question of data on the Web seems to lie in a similar socio-politico-technical morass. On federation, I'm afraid I'm inclined to agree with Eric Siegel : "I predict decentralization is inevitable, but its very very far away." I feel pretty much the same about the Web of data, though perhaps not so far away (unless I'm confusing small and far away :)

[ooh - a good point on that from Seb Paquet I'd missed before: The folks who grokked decentralization didn't master social experience design and UI design as well as Zuck, and decentralized infrastructure is harder to monetize so getting funding was difficult.]

One final question dedicated to folks on Planet RDF, from danbri in response to (the Facebook re-presentation of) my post yesterday:

If RDF is so great, we should all be rich by now? :)

Another quote, it must have some relevance - via the BBC, from Sir William Preece chief engineer of the British Post Office in 1876: "The Americans have need of the telephone, but we do not. We have plenty of messenger boys."

Still no system here yet, comments to G+ again.


danja
2011-09-17T13:52:14+01:00
federated semweb rdf
Related
Comments
Edit

Plan B - RDF for fun and profit

Last night, after finding out that part of the G+ API had gone public I skimmed their docs and the docs of some of the specs they draw on: Portable Contacts, Activity Streams and OAuth 2.0. Of course it's great that G+ is exposing an API, and great that they're drawing on existing standards. But after looking at those standards I came away shaking my head, feeling rather discouraged. Again and again they contain data expressed use JSON mappings like "kind": "plus#person" (G+ API) and "objectType" : "person" (Activity Streams) and "" (Portable Contacts assumes that if you've got data you're looking at contacts). Aside from the variation in the naming across these, there's a common theme, the assumption that a simple token (like "person") is adequate for definition of something on the Web. How do you know that their definition of "person" is compatible with your system's definition of "person"? Sure, there are the spec docs to back them up, but how do you get from the data to the spec docs? Ok, there's openness in the publication and dev of these specs and standardization to the extent that they're high-profile enough that vendors like Google will see them and adopt them. But in their technical detail they have more in common with pre-Web, offline proprietary formats - "person" means person because we say so, and everybody knows what we mean.

Digging a bit deeper there's reference to the Discovery Protocol Stack which draws on XRD (the OASIS spec for describing resources) and Web Linking (RFC 5988 for defining typed links). Here there's more of an attempt to make the stuff Web-friendly, entities (resources) and relations (links) are identified with URLs so Web-based discovery of further information is in principle possible. But the "One True Ontology" registry-based approach of Web Linking is questionable in a distributed environment (and comparable to schema.org).

The description of things using schema like "kind": "plus#person" looks like what RDF does, except rather than using a Web-based approach to naming (so you could derive a URL from "plus#person", look it up and find out what it means) instead we see ad hoc token-based naming schemes. With Web Linking we have something that corresponds exactly with RDF properties (they are typed links), and if you can look things up in a registry then that's a step in the right direction. We already use registries to decode the meaning of terms in other major vocabularies - e.g. the HTTP media types through which HTML is delivered lead you to the definitions of terms like "strong" in the relevant specs. But is a registry appropriate for every term we're ever going to use? Does a word like "strong" only have one meaning?

Ok, so far there's a phrase which sums up all this: Cargo Cult RDF

But the theory is that grassroots, use case-driven development will tend to create cowpaths in the environmnent, and all standards orgs have to do is pave these. Except it doesn't seem to quite work that way. On the one hand we have the XKCD Standards effect (check the first paragraph on the Portable Contacts page), on the other hand the simple fact that, even with the best will in the world and with good information, people often get things wrong. Take for example:

OAuth [1.0] aims to unify the experience and implementation of delegated web service authentication into a single, community-driven protocol.

[time passes]

OAuth 2.0 is a completely new protocol and is not backwards compatible with previous versions....As more sites started using OAuth, especially Twitter, developers realized that the single flow offered by OAuth was very limited and often produced poor user experiences...OAuth 1.0 was largely based on two existing proprietary protocols: Flickr’s API Auth and Google’s AuthSub. The result represented the best solution based on actual implementation experience. (Introducing OAuth 2.0)

So...even when good, informed standardization is aimed for, flawed technologies built with flawed processes are unavoidable.

But these things are so popular! Vendors and developers can't get enough of this kind of stuff. It's a continuous stream: XML APIs become JSON APIs, microformats become microdata, but the same patterns are repeated again and again.

Years of these developments passing RDF by. Plan A : The Semantic Web still seems as far in the future as it did 5, 10 years ago. The RDF technologies demonstrably work, and adoption is growing, but it's hardly viral. However you look at it, the world of trendy new specs repeatedly steers around that fact. What's a jaded RDF enthusiast to do? Here's what I recommend:

Exploit the situation!

With a continuous flow of different specs that each covers some little part of data on the Web, focusing on any specific development can only work in the short term. A strategy based on technologies that support flexibility and agility, using known best practices of the truly distributed Web is the best option in the long term, so that systems can be rapidly adapted to meet any new requirements. It doesn't matter that e.g. schema.org misses the point, the data is still useful. "Think globally, act locally" is a great expression - in this context it could mean accept whatever the world of Web 2.0+ has to offer, but handle it on your own terms.

In practice, let's say you're developing a system for a particular vertical market: dog leads (I'm getting serious hints as I type). Don't build the system from scratch based on what people in the dog lead market are doing, don't tie yourself to domain-specific schema or protocols. Wherever possible use commodity, off-the-shelf tools. Then if dog leads take a nose dive on the international market you can regroup with a different target - cowbells for cats - using the same tools, and same skill set. The only parts that need change are at the edges. Basically RDF technologies offer a long-term commercial advantage.

Comments to G+ please.


danja
2011-09-16T14:31:52+01:00
google streams contacts rant federated web semantic semweb activity rdf portable
Related
Comments
Edit

RESTful Turing Machines

I went to bed a couple of hours ago but every time I started to drift off a mosquito buzzed by. Led to this train of thought - how would you build a Universal Turing Machine with hypertext as the engine of state?

Seemed natural to use the Web in the role of the tape in Turing's setup with URLs corresponding to the position on the tape. The path part can be tape-like. Imagine an infinite path: http://example.org/location/location/location ... To move left it's href="../location", to the right href="location/location" (I think...bit tired here :). Whatever, problem is the train crashes to the left. I'm pretty sure a single-ended tape would still be universal because it'd be like folding the normal tape and interleaving the cells. But I reckon it'd be better to stick with the standard tape config but with little sub-rules for the mapping, something like:

start at http://example.org/H

to move right:

if the final char of the URL is a L, remove it

else append a "R"

- and vice versa.

The content of the page http://example.org/HLL might then look like:

<html>

...

Symbol = 0

<a href="http://example.org/HLLL">Left</a>, <a href="http://example.org/HL">Right</a>

...

</html>

or it might 404, corresponding to Turing's blank symbol.

Reading from tape is a GET, writing is a PUT.

I think this is in keeping with the spirit of REST, there is no context kept on the server, the messages are self-contained. The client would have to know the rules for generating the new URLs, plus the instructions. Maybe a neat way of doing the instructions might be to have a series of linked scripts each corresponding to an instruction, effectively a second agent stepping through them.

Now I've had a glass of milk and a chunk of chocolate I'm off back to bed, hope that mosquito's gone. I'll leave criticism, improvement and implementation to other insomniacs.

Comments to G+


danja
2011-09-14T03:04:13+01:00
rest turing machine rdf insomnia
Related
Comments
Edit

node.js early impressions

It is possible to learn enough Javascript and node.js to do useful stuff in a week.

I've just done it. I'm not exactly familiar with the idioms and I'm sure there are constructs I've not yet encountered, but it's to be expected that broad knowledge will take time.

Of course I had encountered JS before, around HTML/browser, but had never tried doing any proper coding with it. It certainly doesn't lack power, but one drawback I'd say is that its flexibility means that it isn't always obvious what's going on. That goes double for node.js, where having callbacks everywhere can make things confusing (though I'm beginning to get used to that).

The little app I've put together is much more concise than it would be in the languages with which I'm familiar (mostly Java and Python), but then needing a lot of comments to explain what's going on isn't a good smell.

However, if my vague understanding of how node.js works is remotely correct, I get performance/scalability for free (something that'd need a lot of thought in Java/Python). node.js really does lend itself to Web wiring.


danja
2011-09-07T12:58:29+01:00
seki node.js rdf javascript node
Related
Comments
Edit

Affordances, described with less clutter

Posts on this blog get picked up by Facebook. Alison who's an experienced Web developer spotted my last post over there and couldn't make much sense of it. Hardly surprising, I referred to rather a lot of obscure stuff and used a lot of jargon without much explanation. But given that this affordances thing relates directly to the way everyone uses the Web, a developer should be able to make sense of it. So here I go again, this time trying to stick to the main points, glossing over the detail. [Blimey, but I've ended up rambling on a long while]

So on the Web you've got lots of documents in HTML on servers and lots of people with clients (browsers) that understand HTML. Those documents and various other messages are passed between server and client using the HTTP protocol. Most of HTML is about document structure, which with the aid of CSS can make text look good on the screen. But it has several things built in that allow a client to communicate over HTTP and hence allow the end user to interact with the Web. Most used is almost certainly the <a href="http:/example.org/here">something</a> link. When interpreted by a browser, that bit of markup highlights the word something and enables the link http://example.org/here to be followed by clicking on the something.

One fairly archaic definition of the word afford is to provide or supply (an opportunity or facility). Presumably this is where a 1970's psychologist got the word affordance (Wikipedia) which he defined as an "action possibility" (and some other stuff). This got picked up by human-computer interaction folks and mutated a bit, but "action possibility" is good enough here. So what the browser does with the bit of markup above - enables the link http://example.org/here to be followed by clicking on the something - can be described as an affordance.

The Web can be looked at as an information store with which we interact, and borrowing from database speak we have four basic operations: Create, Read, Update and Delete (CRUD). Through the highlighted, clickable link the browser provides the Read operation. When we want to Create e.g. a new blog entry, Update or Delete it we typically interact through a HTML <form>. So the kind of things a form enables can also be described as affordances. It's not unreasonable to expand the definition to include certain things the browser does that go beyond displaying a document with structure, things like displaying an image file that's linked to by an <img> element. Nowadays we're surrounded by loads of other different potential interactions thanks to Javascript and Ajax, these are also affordances. With the rise of blogging, online photo/video sharing and social platforms like Facebook, Twitter and now Google Plus, there's a new emergent breed of affordances that's been identified that include things like share, like, +1 etc. These are typically powered by Ajax and very often operate across sites and involve some data transfer, e.g. if you post a link on Facebook to a photo on Flickr it'll add it to your wall display a thumbnail of the image and the title. This new breed of affordances has been called Web Intents or Web Actions depending on where you look. (The Web Intents thread is I believe partly derived from a similar thing called Intents on Android phones, but having never used one I can't comment).

Ok, now there's an increasing amount of data on the Web expressed as Linked Data. This is published using the Resource Description Framework, RDF (depending on who you ask, linky non-RDF formats can also be considered linked data, but that's not really relevant here). The question is, how best to interact with this material, in other words what affordances do we need? There's a natural expression of documents on the Web - just show them as documents - but even for a passive display it's not altogether clear how to represent data. Ok, with traditional databases we usually have a table of some kind. But in that context we have a good idea in advance what can go in the rows and columns. On the Web, where the data can potentially be any shape it's a much trickier creature to pin down. With documents there is the familiar constraint of the individual document or page, whereas data doesn't chunk so neatly - the data we're interested in might be spread wide across the Web, between files containing only a handful of statements and stores containing millions. Links are part of the expression of the data, and links are the fabric of the Web, twisty eh? And this is just considering the Read aspect, there's also (at bare minimum) Create, Update and Delete to throw into the mix. We also need to not only interface with simple file-like linked data representations, there are also triplestores with SPARQL interfaces to consider (although the linked data API should help there, it can make a triplestore+SPARQL setup look more like normal Web representations).

However, to put these kind of problems into context - we don't need every possible operation for all data in all environments, far from it. One thing the work around Web Intents shows is that a handful of little facilities (share, like etc) are making a big difference in the benefit people get out of the Web. One thing that should really be avoided is making things as special cases - if you can share from A to B then you should be able to use the same mechanism to share from C to D and so on (this isn't that different from the centralized system setup, things on the Web should be distributed and ideally federated).

Ok, seems that affordances are going to be pretty important for working with the Web of Data. Some fairly good analysis has been done of HTML-in-browser affordances, and taking a leaf from the HTML book the simple hypermedia click-following of links seems a reasonable place to start in assembling suitable tools (in fact there are quite a few tools out there that support this in one form or another). It's fairly certain that some of the affordances will be a vastly different than those we're familiar with - data supports things like merging (trivial in RDF), query and inference, completely different kinds of transformation and analysis than text and so on. At the moment it's not even really clear that a general-purpose tool like the HTML browser is for documents makes sense for Web data (my guess is most likely a variety of different tools will be built inside the Javascript-capable browser, with different tasks being spread between clients and services).

But again to put these problems into context, there's no reason why any individual applications should be much different than they are today. Passing an image and its title between Flickr and Facebook requires the same basic machinery whatever kind of markup is used to describe the material. One of the aims for the Web as a whole, augmented by the Web of Data, has to be a reduction in complexity for common tasks. The fact that a whole new world of potential applications becomes feasible is just, well, interesting.


danja
2011-08-29T02:56:10+01:00
federated actions intent affordances rdf
Related
Comments
Edit

RDF Affordances

Short version : An RDF Affordance is a resource description which gives a client all the information it needs to perform an action.

see RdfAffordances and AffordanceVocabulary.

My last post about what a Data Web Browser might look like led to some fertile discussion on G+. Essentially Mike Amundsen neatly reframed the question to being one about affordances, pointing to a bit of related prior work by him on Hypermedia Types.

We hold this truth to be self-evident, that presented with a simple application scenario a Web Architect will abstract it into a form that will take decades to implement.

Only joking...

Web Intents and Actions

I was initially thinking only in terms of an RDF-oriented browser (plugin/service) but it does make sense to stand back and look at the bigger picture. For starters, while RDF is ideal for describing stuff like service characteristics, there's no compelling reason to limit the data that's being manipulated to RDF. With that door open, there's an immediate tie-in with Web Intents, a JSON/Javascript way of describing/implementing generic interactions like share, edit, view, pick etc. (As it happens I added a Web Intents repository to my todo list a few weeks ago, the idea being to store the descriptions as RDF, providing a minimal API for using them in browsers as others have described - nice bit of serendipitous tie-in).

Tantek has spotted the potential around intents and in Web Actions: Identifying A New Building Block For The Web looks at common features across existing systems like Blog this, Digg, Read later, Follow, Like, Share, Tweet, +1 (he uses "Actions" instead of "Intents" for essentially the same idea).

We hold this truth to be self-evident, that presented with the potential for open-ended innovation a Microformats Geek will start paving cowpaths.

Again, joking...

On the Wiki - RdfAffordances - Mike has brought the abstraction back down to ground with some more detail of RDF-oriented actions, and with a view to hacking an implementation (on my virgin node.js installation) I've started a vocabulary - AffordanceVocabulary - this may change fairly soon, apparently Michael Hausenblas has done a vocab in this area, that'll get precedence if there's overlap/conflict.

We hold this truth to be self-evident, that offered a simple application scenario a Semantic Web Geek will always create a vocabulary that obscures the purpose of the application and that no-one will ever use.

Not entirely joking...

There is one high-level abstraction I've noted on that vocab page that is probably useful. There's a natural boundary between affordances that are essentially just HTTP (e.g. click through link, replace a page) and those which require more complex interations. For now at least I'm calling the former Actions (let me know if there's a better word that doesn't clash with Tantek's usage) - they are around the scope of Mike's Hypermedia Types and the latter Intents - around the scope of Web Intents.

Comments on G+


danja
2011-08-28T13:52:53+01:00
intents json browser web affordances semweb rdf data
Related
Comments
Edit

Data-Oriented Web Browser

Not a new idea, but I thought I'd try and find out how far we've got and braindump a little. I'm making the fairly big assumption that a general-purpose data browser would feasibly useful/usefully feasible in addition to application- or task-specific tools (i.e. use X for your contact/social data, Y for your project management data, Z for your shopping list).

Historically Web browsers provide simple display of (linked) HTML documents obtained via a subset of HTTP, and that's still their primary use. Not very promising for use on the Web of Data without a lot of server-side magic.

But, as well as supporting increasingly sophisted UI elements, they have built-in support for a Turing-complete language, Javascript. The HTTP limitations can be worked around. So while there may still be potential for a totally new breed of data-oriented Web browsers built from scratch as Rich Internet Applications, current browsers have the potential do do whatever's needed. Although they're pretty much limited to playing a client role, in effect they can be whatever kind of Intelligent Agent you like. The bonus is that everyone's already got a browser on their desktop/tablet/mobile - it's an easy path to deployment either for a plugin or better style as code-on-demand.

What's needed for a Data-Oriented Web Browser?

I'm not sure if the Tabulator is still actively maintained (if not, why not!?), but that gave a good indication of the kind of thing that is possible. Taking a step back, the Web of Data is really the same thing as the Semantic Web, and what's new about the Semantic Web isn't the "Semantic" but the "Web" (once again I've lost the source of that quote). How did/do people work with data without the Web? Typically SQL databases and spreadsheets. From those we can lift SQL queries and command-line tools, stored procedures and database forms (this is rather a confession, but back in the day when I first encountered MS Access it blew me away). Then of course there's the spreadsheet UI paradigm, a grid of cells which can be filled with pretty much anything, including most significantly on-the-fly calculated values.

So here's an initial shopping list:

  • an in-memory* graph data structure support (rdfstore-js looks the most advanced right now)
  • a spreadsheet-like view (I bet David Huynh has got stuff like this, if not, how hard could it be with a and jQuery? :)
  • a little language for concisely expressing Web operations, e.g. running SPARQL queries, that could be used inside the spreadsheet (the RDF path-following DSL in Apache Clerezza could be useful here too - link please Henry)
  • tools for building app-specific forms (quite a few tools support custom views of particular classes, e.g. foaf:Person, Fresnel might help here)
  • the ability to write as well as read data (this shouldn't need saying)
  • * persistence would be provided by the Web

    I doubt it's possible to say up front what would be a good user-friendly way of setting this stuff up. But given a bunch of scripts that supported these elements, I reckon with a bit of trial and error dogfood use, within a few iterations something really useful could be possible.

    Thoughts? Volunteers? Startups? :)

    I've still not got commenting set up here so please post any feedback to this Google Plus entry.


    danja
    2011-08-26T10:49:28+01:00
    gui browser ui spreadsheet semweb rdf data linked
    Related
    Comments
    Edit

    Magnificent Seven for APIs

    Some interesting survey results have just been published about APIs: the good, the bad and the pains. I commented about this on G+ and the discussion there got on to Atom. Some interesting points made, including the likelihood that we're stuck with snowflake APIs (every one is different) for the foreseeable future. I think it was Bill de hÓra who had a post years ago (can't find it now) about the N x N problem of diverse APIs (/models/formats). Essentially if you've got N different APIs then to connect them all you need N x N different translators. But it's also worth noting that this can be reduced to 2 x N if you have mappings to a common format/model. I reckon recent history has shown that formats are secondary, assuming certain boxes are ticked (see below). Regarding the model - there is a well-known, Web-friendly one. So here I'll simply point to ConverterToRDF and ConverterFromRDF.

    In the G+ discussion Bill referred to an old blog post of his, Magnificent Seven - the value of Atom. In it he highlights the 7 'primitives' that Atom (format and protocol) uses and that he suggests should be used in any carrier format. I'm inclined to agree, if you are creating an API, tick these boxes, repeated here without Atom-specificity:

    1. ID - a globally unique identifier for the chunk of data, ideally this should be a HTTP URL
    2. Link - as above, it's rare that a separate ID and URL are needed
    3. Updated - the most recent change, invaluable for keeping things in sync
    4. Extension rules (mustIgnore, foreign markup) - anything the parser doesn't understand, it simply ignores. This allows other people to reuse and extend the format in a compatible fashion.
    5. Date construct rules - using a standard date format is basic politeness
    6. Content encoding rules - generally follow the rules for the media type you're using, and if there's textual content use an existing standard format (XHTML is good). Rule of thumb: UTF-8.
    7. Unordered elements - insisting on order in the structure is (or at least should be) unnecessary, accessing things by name is more reliable
    The most significant bit is the ID/Link, this is essential for any API on the Web. It allows the use of the "follow your nose" protocol: if you want any more information about a thing, follow the link. It works for regular Web documents and increasingly for linked data.
    Incidentally (1), if you are an API developer/user you may like to have a look at the Linked Data API, looking at what's needed to make access to data in a SPARQL-capable store more developer-friendly. Comments welcome there.
    Incidentally (2), Google Plus is emerging as a pretty good discussion space, if you're in need of an invite mail me.


    danja
    2011-08-13T09:42:21+01:00
    apis atom federated json rdf
    Related
    Comments
    Edit

    GTD meets the Dice Man

    Not for the first time, recently I've been having trouble Getting Stuff Done. In the past I've tried various strategies along with lots of little bits of software that are meant to help. It's never been particularly successful. But pretty much all those techniques have involved starting with the strategy and applying it to your own needs. This time around I thought I'd try going the other way. Start with what I want to do, try and identify the problems I have, develop a strategy from there.

    The stuff I want to do generally falls into three categories: what I call work-work, i.e. the stuff that pays the bills; personal projects, which includes things like coding, woodcarving and doing music stuff; chores - a wide range of things from washing up to gardening. I am still in the process of renovating a house, and jobs that need doing there are mostly good fun once I get into them. But I've put them in the chores category because they are things I feel I must do (unlike personal projects), but without any great urgency (unlike work-work).

    I can sum up the problem I have with each category pretty easily:

    • work-work : procrastination
    • chores : laziness
    • personal projects : distraction

    47 years of assorted neuroses mean whatever psychology is behind these is anyone's guess, but I'm pretty sure each of them them can be in either a vicious or virtuous cycle. If I can bump a little from the former to the latter then winning (as Charlie Sheen would say). So how can I deal with this strategically? Here's what I came up with.

    work-work : the one thing I can't let slip (although still manage to), so that has to be my default activity before I think about anything else. But if I've got other stuff to look forward to (and off my mind) and am making reasonable progress with things, it should be easier to get down to it. Just have to make sure I get started in the morning...

    chores : I reckon a lot of the problem I have with these is that there's always so much to do, so I feel swamped and stressed about them, only finding relief by putting my feet up in front of the tv, ideally with a bottle of wine. So for this category I've decided to take a leaf out the GTD book - write stuff down and forget it until scheduled. I've got two whiteboards, the one on the left with a list of things I need to get done in the next week or so, the one on the right things to do today. I would guess the tasks probably average to a couple of hours each, some much shorter (e.g. bins, ~10mins) some much longer (e.g. making windows, ~20hrs). With these kinds of things it tends to be the case that once started, there's a tendency to keep going until next meal time. Whatever, if I can do at least half an hour done of each, I'll call that a success for the day. To give me a sense of progress I'll just cross things off as I get them done, only wiping them from the board when space is needed.

    whiteboards

    personal projects : this is a tricky category, for an unlikely reason. I reckon I probably spend about the right amount of time on these things. Problem is that it's very unfocused and I'm always ready to go off on a tangent on a whim. As well as hopping from project to project I'm always coming up with new ones, long before existing ones get finished. Big trail behind me. With things like woodcarving I don't think I'm too far from where I want to be (I may try the following strategy there as well). But with programming and the like, it's pathological. Fortunately there is usually a lot of common ground to software projects I play with, typically the Web and RDF-related stuff. Often work on one thing will help with another. So I've picked the 6 main projects I've got on the go and numbered them on a sheet of paper. Actually 'project areas' would be more accurate, with most of them there's loads of wiggle room within the same umbrella (not sure if that's mixing metaphors, certainly sounds kinky :)

    diceman

    Now when I feel it's time for a session on a coding project I'll roll the dice, consult the list and concentrate on that project until the next big natural break. Chances are there'll be a change at the next roll of the dice, I'm hoping that'll be enough to stop me project-hopping.

    Anyhow, I'll give this a few weeks, see how it goes.

    Incidentally I've had a Semantic Web-oriented project management tool on my todo list for years now. I've pushed that back on the big hidden stack for now, with a Personal Knowledgebase being the first goal. Given a decent one of those, personal project management stuff should be a smooth extension. If by then I still need it...

    This post was brought you by the number 2.


    danja
    2011-08-10T23:49:08+01:00
    gtd projects dice rdf
    Related
    Comments
    Edit

    Sitemap notes

    Today I added a sitemap to this blog. Some notes-to-self.

    Not sure what inspired me to do this, but I have been wanting a complete list of blog post URIs for a while to play around with augmenting the data (e.g. pulling out the links contained within posts and grabbing more info about them).

    Blog engine general setup

    The HTTP request routing first goes through Apache, if there's a file on the filesystem that matches the request, that is returned. If not, the request gets transparently forwarded to an instance of Gradino running on port 8080. The request gets dispatched through jax-rs to the appropriate handler in the code (most of the code is in Scala but using various Java libs). All the blog data is stored in a Jena TDB triplestore. When a request is made a SPARQL query is run programmatically against the store. The results are formatted as appropriate using a little crude templating (example). Results for the front page and feed are both cached as in-memory strings.

    Adding sitemap generator

    So for the sitemap, first pass I set things up in the same fashion as the front page and feed are generated, just without a LIMIT on the SPARQL query. This wound up making Apache give a proxy error, not sure exactly why (for some reason error messages didn't show) but it seemed reasonable to assume that it was somehow related to the quantity of results, maybe a silent timeout. I've got archives in the store going back years, my current query (excluding everything with "comment" in the URI) produces just over 5,500 results.

    So then I decided to modify things to generate a static file when a POST was received at a particular URL. I should have seen this coming, but my initial attempt at this also gave a proxy error. D'oh! Performance-wise it was effectively the same routine running in the same thread.

    But I was able to get it working by making the sitemap generator class a Scala Actor. When the appropriate POST is received, the handler creates a new instance of the Actor and sends it a message, but then continues along the original thread, returning an "ok" message to the browser.

    Along the way I evolved what I reckoned was most suitable to put in the sitemap file. The blog front page just uses the core sitemap terms, and this is hard-coded:

    <url>

    <loc>http://dannyayers.com/</loc>

    <changefreq>daily</changefreq>

    <priority>0.9</priority>

    </url>

    Initially I had individual posts using the News sitemap terms, until just now I noticed that they are only for things that change a lot... So instead they just look like this:

    <url>

    <loc>http://dannyayers.com/2003/05/12/bufo-bufo/</loc>

    <lastmod>1970-01-01T01:00:00Z</lastmod>

    <changefreq>monthly</changefreq>

    </url>

    I've left it as monthly in case I want to change any of the template of the individually rendered pages, but reindexing isn't really a priority once the content text has been looked at.

    Next I guess I should look at Semantic Sitemaps.

    I'm typing this as yet another version of the generator code is running, I've kept making little errors that only show up when I point Google at the sitemap file... But if you're reading this then Gogle is happy with the current version :)


    danja
    2011-07-24T20:59:40+01:00
    seo sitemap catalog gradino rdf
    Related
    Comments
    Edit

    Schema/Vocab Mapping toolkit

    Olaf Hartig has pointed me to the R2R Framework :

    [[

    The R2R Framework enables Linked Data applications which discover data on the Web, that is represented using unknown terms, to search the Web for mappings and apply the discovered mappings to translate Web data to the application's target vocabulary. The R2R Framework is aimed to be used by Linked Data publishers, vocabulary maintainers and Linked Data application developers. It support them by:

    1. providing the R2R Mapping Language for publishing fine-grained term mappings on the Web

    2. defining best-practices on how mappings can be discovered by Linked Data applications

    3. providing an open-source implementation of the R2R Mapping Engine.

    ]]


    danja
    2011-07-22T09:47:48+01:00
    schema linkeddata r2r lod rdf vocab mapping
    Related
    Comments
    Edit

    Protocol

    Sasha and Primo demonstrate a combination of "follow-your-nose" and authentication:

    Protocol


    danja
    2011-07-21T19:52:53+01:00
    primo federated dog sasha rdf protocol cat
    Related
    Comments
    Edit

    Translating between Schema.org and existing RDF

    I just heard about a mapping from selected schema.org terms to SIOC (including a little FOAF and DC), it's a handful of statements using RDFS and OWL.

    As one of the people responsible for the RDF Review Vocabulary I thought I should take a look what's needed there. It raises a couple of questions. As it happens, schema.org's model for reviews is a little more complex than our RDF vocab (after we went to all that trouble to keep it simple :), and a lot of the terms can't be mapped directly to well-known ones with RDFS/OWL.

    e.g. in the RDF vocab:

    <#something> :rating "5" .

    using schema.org:

    <#something> :reviewRating <#theRating> :ratingValue "5" .

    So the first question is how best to express this? I think it's straightforward in SPARQL, e.g. for RDF vocab to schema.org for the terms above something like:

    CONSTRUCT {

    ?something s:reviewRating [ :ratingValue ?value ]

    } WHERE {

    ?something r:rating ?value

    }

    Or would some particular rule language be more appropriate?

    The next question is how best to publish the mappings?

    Where direct RDFS/OWL translation is possible, I think I'd be inclined towards including them in the RDF vocab, or at least linked via an rdfs:seeAlso.

    Where rules are necessary, as above, I really haven't a clue. A simple online reasoning service could be useful (via some pre-cooked SPARQL maybe), but again how would you express that in the vocab?

    I doubt I'll have chance to make the translations for Review in the near future (one contract I'm 7 weeks past deadline, another 3 weeks overdue, a couple of days overdue with those paper reviews...). But hopefully the lazyweb will be able to answer these questions in the interim.

    That's a point - anyone fancy reimplementing lazyweb.org? It was very sweet, I think Ben only gave up on it due to lack of time to admin.

    Oh yeah, and I still haven't implemented comments here yet, so for now please use email, Twitter, Facebook or Google+. (ooh, the Google+ link works properly for comments, but I guess non-G+ user can't comment...let me know if you want an invite)


    danja
    2011-07-21T19:23:30+01:00
    sioc foaf schema reviews schema.org rdf mapping
    Related
    Comments
    Edit

    FSW SFW?

    See http://dannyayers.com/2011/07/20/FSW-SFW

    [I've not got any handling in for ? on the end of titles in my blog engine...sorry if this post appears twice]


    danja
    2011-07-20T11:57:21+01:00
    federated social web rdf fsw2011
    Related
    Comments
    Edit

    FSW SFW

    [oops, I've not got any handling in for ? on the end of titles in my blog engine...sorry if this post appears twice]

    As usual, after the Federated Social Web meet in Berlin I'd planned to write comprehensive blog posts about it. As usual I didn't get far before getting distracted. So far I've done a bit of overview of the conf, a brief note on privacy issues and a fairly random think-piece on decentralized vs. distributed networks. But I haven't actually covered what were probably the two main take-aways from the conf - Federated Social Web stuff itself and the role of WebID. In lieu of something better I'll drop a few key links in now. In both cases things have moved along very quickly in the past few weeks with Google+ and BrowserID, more on those in a mo.

    FSW

    One big meme was that of the Facebook-killer - basically we need something that has all the user-friendliness of Facebook but not as a walled garden (and with a better story on privacy etc). Step forward Diaspora - you can use it as a service a la Facebook (with which it shares many features), but also set up your own install. There were also a handful of other apps with a similar style. It took me about a 1/2 hour to set up my own install of Status.Net, essentially an open version of Twitter. though I have yet to start using it and probably more significantly yet to connect it up to the other services I use.

    Another pointer I must include is to the W3C Federated Social Web Incubator Group. As the charter describes, its scope is pretty wide, including the various emergent protocols and technologies in this space. One of the initial targets is to move forward the Social Web Acid Test - Level 0 (SWAT0) - an integration use case for the federated social web. On the Wiki there are potential use-cases or user-stories that could become part of SWAT1. They're both fairly short so I'll paste SWAT0 and the list of non-W3C technologies from the charter below. The incubator group is encouraging people to join, so if you're interested in this material please sign up.

    WebID

    To quote from the WebID site, "With WebID, logging into a website is as simple as selecting a WebID and clicking 'log in'". It's a very nifty bit of tech, secure, relatively straightforward to implement, much simpler than most of the alternatives. In essence it's about passing a URI in with a PKI certificate. When Henry presented this at the conf, the audience response was interesting. Although it isn't rocket science, the certificate stuff used isn't very intuitive (personally I have a blind spot on all things auth), so not everybody got it. Of those that did get it, very few could believe what it provided. A question from the audience was telling : "What can be easier than using username + password to log in?". Henry : "One click.".

    Although not critical to the functioning of WebID, one of the coolest aspects is that it cleanly supports FOAF (and other) profile discovery, the service can learn more about the user to improve their experience. In other words it's entirely compatible with the Semantic/Linked/Data Web.

    WebID was initially known as FOAF+SSL, on the Wiki oh, also here, there are lists of implementations etc. Watch the video and read the notes from Berlin for more.

    There's also a W3C WebID Incubator Group.

    ...

    Videos of presentations of the FSW meet in Berlin are online, along with most of the papers.

    Google+

    Before going any further, I should remind you that we already have a Federated Social Web, the blogosphere. However this is weak on many aspects - the social graph is fairly inaccessible, often poor UIs - in particular feed aggregators are clunky things, immediacy is seriously lacking, identity management and the personal profiles that there are messy, privacy, auth and access control systems are virtually non-existent. Of course all that has left a convenient niche for Twitter, Facebook, and now Google+.

    I largely agree with Edd in his (must-read blog post) Google+ is the social backbone. As a competitor to Facebook it does open up the social aspects as a commodity, and it's considerably more open and linkable, i.e. Webby (here's my stuff). I do worry about Google becoming all-powerful in this space, but as they say this too shall pass. I personally believe the nature of the Web is such that any attempts to monopolise or centralize systems will inevitably fail - because decentralized/distributed systems have inherent evolutionary advantages, though they may take time to take effect. So I reckon Google+ should be viewed by Web technologists not as an end in itself, rather as a bootstrap to a more social Web.

    Although Google+ doesn't have any Semantic Web features per se, it does a reasonable job of giving people URIs and linking them together. But rather than a niche, there's a gaping void for describing things in general in a machine-friendly form. Whether RDF-oriented linked data activity will expand to fill this void or some Googlesque reinvention (cf. microdata overlords) of RDF remains to be seen, but either way this also seems inevitable (see also Smarter (Hash)Tags and Google+). I'm not sure we're seeing it yet, but with a bit of luck, once the commercial world sees the SEO etc advantages, GoodRelations should cause a large expansion of semwebbiness.

    BrowserID

    BrowserID is a recent development from Mozilla. It's close to WebID in that it's in the identity space and about secure signing in, but arguably the primary goal is somewhat different. Broadly speaking, it boils down to the payload of WebID being a URL and the payload of BrowserID being an email address. Discussion is ongoing about the (/any) relationship between the two protocols. All other considerations aside, I'd suggest that WebID is more versatile in that there's a lot more you can do with a URL than an email address and because BrowserID is easier to integrate with existing email-based auth, there's better impedance matching with existing systems. I've tried to argue that BrowserID should allow the user to associate a (non-secret) URL with their email address to allow profile discovery etc. But consensus seems to be that keep-it-simple now trumps easier stuff later (WebFinger has been suggested as the route to discovery, I'm not altogether convinced as it's quasi-centralized, requiring a service to assert the email/URL mapping). Whatever happens on this particular point, BrowserID is certainly an interesting and useful development.

    - - - -

    SWAT0 Use Case

    1. With his phone, Dave takes a photo of Tantek and uploads it using a service
    2. Dave tags the photo with Tantek
    3. Tantek gets a notification on another service that he's been tagged in a photo
    4. Evan, who is subscribed to Dave, sees the photo on yet another service
    5. Evan comments on the photo
    6. David and Tantek receive notifications that Evan has commented on the photo

    FSW-related Technologies

    ActivityStreams
    ActivityStreams is an evolving format for syndicating social activities around the web.
    OpenID Foundation
    The OpenID Foundation is the group responsible for OpenID-related standardization. Although work like OpenID Connect is a moving target, the test-cases and specification should be compatible with OpenID.
    OStatus
    OStatus is an architecture combining Pubsubhubbub, WebFinger, ActivityStreams, and PortableContacts.
    Portable Contacts
    The goal of Portable Contacts is to make it easier for developers to give their users a secure way to access the address books and friends lists they have built up all over the web.
    Pubsubhubbub
    Pubsubhubbub (PUSH) is a server-to-server publish/subscribe protocol as an extension to Atom and RSS. Servers compliant with PubSubHubbub can get near-instant notifications when a feed they're interested in is updated.
    Salmon Protocol
    As updates and content flow in real time around the Web, conversations around the content are becoming increasingly fragmented into individual silos. Salmon aims to define a standard protocol for comments and annotations to swim upstream to original update sources -- and spawn more commentary in a virtuous cycle.
    SMOB
    SMOB (Semantic MicroBlogging) is a framework that enables an open, distributed and semantic microblogging experience based on Semantic Web and Linked Data technologies.
    Webfinger
    WebFinger is about making email addresses more valuable, by letting people attach public metadata to them.

    danja
    2011-07-20T11:55:44+01:00
    federated social web rdf fsw2011
    Related
    Comments
    Edit

    The Symbiotic Web

    During the Federated Social Web meetup in Berlin a few weeks ago, most folks used the phrases "distributed network" and "decentralized network" interchangeably, which doesn't seem unreasonable at this point in time when both appear in major contrast to the prevailing "centralized network" architecture of Web sites. On my last night in Berlin, on the steps of a crashed space station at around 4am (early flights to catch) I was chatting with Harry Halpin and he had the following diagram on his netbook:

    networks

    It's from Paul Baran's landmark memo from 1964, "On Distributed Communications: 1. Introduction to Distributed Communications Networks" (see also some related network diagrams), some of the work which eventually led to the development of the Internet.

    Harry was quite insistent on the significance of the "decentralized" net, saying that it was the one you found in nature (e.g. plant structure). I suggested that "distributed" looked at lot like (a 2D representation of) biological cell structure. That wasn't a very satisfactory analog, and since I've had my eyes open for a good natural world example of "distributed". Now I think I have one, and while it's in a different dimension than e.g. plant structure I reckon it maps quite nicely onto Web systems.

    Lichen!

    lichen

    (in the woods up the hill)

    To quote Wikipedia:

    Lichens are composite organisms consisting of a symbiotic association of a fungus (the mycobiont) with a photosynthetic partner (the photobiont or phycobiont), usually either a green alga (commonly Trebouxia) or cyanobacterium (commonly Nostoc). The morphology, physiology and biochemistry of lichens are very different from those of the isolated fungus and alga in culture.

    Now imagine how these things might have evolved. Initially there must have been an inheritance tree for the fungi and an independent tree for the algae (following the "decentralized" form), but then at some point the organisms started to get benefit from each other (I am not a microbiologist, but I'd guess that it probably started as a parasitic relationship, then the host side evolved some advantage). So there's a structure something like this:

    lichen net

    The tree has become a graph. [PS. ok, strictly speaking a tree is already a graph, but you know what I mean]

    Analogies get useful when you can use known aspects of one perspective to predict unknown aspects of the other (like the weird old alchemists' "As Above, So Below"). I don't know, vague hand-waving, maybe the nutrient molecules the fungi handle in lichen could be said to correspond to data, the photosynthesis of the algae corresponding to processing.

    While clear-cut symbiosis like this isn't exactly the most common relationship in nature, there's obvious interdependence between every kind of organism on this planet. I don't think it's much of a stretch to suggest there are good parallels with Web systems, especially if you view the interfaces between organisms and their environment as corresponding to APIs between online systems. Certainly client tools and services (agents, in other words) correspond nicely to organisms.

    The Web of Data (alongside the Web of documents) is already pretty distributed, the Linked Open Data cloud diagram being a nifty representation. This aspect of the Web isn't in itself particularly dynamic in its operation (data usually just sits there, periodically updating). But given the number of processors connected to the Web as servers and clients, the digital environment certainly has the potential for extremely interesting interactions.


    danja
    2011-07-15T15:06:41+01:00
    federated networks decentralized lichen rdf distributed
    Related
    Comments
    Edit

    HTML5 Kitten-Herding

    Mailing lists again, Facebook doesn't have a monopoly on the notion of "poke", you provide a tiny bit of impetus and get valuable results. This time Sam Ruby, one of the smartest guys I've encountered who isn't a rabid RDF fan (heh). How HTML5 process works:

    [[

    *) If you have a problem with the process, participate in bug reports on the process.

    *) If you have a technical problem with a given change, start by identifying the technical problem.

    *) If you want to be impolite on twitter or IRC, I won't be responsible for policing such.

    *) Outbursts on public-html will first be dealt with privately when possible, and will be dealt with publicly when necessary.

    ]]


    danja
    2011-06-20T14:33:58+01:00
    html html5 rdf
    Related
    Comments
    Edit

    WebID for Pets

    I challenged Henry to explain WebID in a form my dogs would understand, I think his response is worth sharing:

    [[

    Ok. So you need to give each of your dogs and cats a webid enabled RDFID chip that can publish webids to other animals with similarly equipped chips when they sniff them. From the frequence and length of sniffs you can work out the quality of the relationships. On coming home for food, this data could be uploaded automatically to your web server to their foaf file. These relationships could then be used to allow their pals access to parts of your house. For example good friends of your dog, could get a free meal once a week. You could also use that to tie up friendship with their owners, by the master-of-pet relationships, and give them special ability to tag their pet photos. Masters of my dogs friends could be potential friends. If you get these pieces working right you could set up a business with a strong viral potential, perhaps the strongest on the net.

    ]]

    - and bonus points for attaching a photo of a cute kitteh.


    danja
    2011-06-20T13:44:34+01:00
    rdf
    Related
    Comments
    Edit

    Personal Geo

    It's early this this year, but it's Wakes Week in Tideswell. This is when we make kings into fools, fools into kings. Always many fools, no kings. The local church (Cathedral of the Peak) is of John the Baptist, who's day funnily enough coincides with the summer solstice. We dress wells (flower pictures in clay) because water has been a bit important for people, sussed in early neolithic. A gathering of clans with lots of beer. We have our own tune and we have our own dance. We bless the ground and we bless the sun, we bless the moon and we bless the sheep shit. We take to the streets to reclaim the world we're from. There's a brass band, in the brass band there's a big bass drum. At the appointed time he goes dum-dum. Then the ritual dance begins, torchlight through the streets. Ending at the fair, now in the recreational ground, reminding some of us that it is still the 1970s. Last few pints at the Club, back to bed knowing the sun will keep going for another year.

    Can't find an accurate youtube of Tidza Band, this might even be Cressbrook band:

    http://www.youtube.com/watch?v=iMIAffb6SCY


    danja
    2011-06-19T05:01:43+01:00
    tidza rdf culture
    Related
    Comments
    Edit

    Blank (node) Verse

    Catching up on the HTTP-range-14 thread on the lod mailing list, in a robust response to cygri's robust brand of pragmatism, I couldn't help but notice timbl inadvertently (?) slipped into poetry here:

    Formalisms aren't smart.

    Sure, I can make a program to make sense of that.

    But I'm not going to just to save you the effort of getting it right.


    Disappointed by the intensity of your posting.

    Systems have managed for a long time to distinguish between library car and book,

    between message header and message,

    between a book and its subject.


    Now we have masses of information about many books

    and about many other things we have great value in it

    Let's not mess it up.


    If you want an ambiguous source of information, use natural language.

    The power of data is that is a whole lot less ambiguous.

    ----

    As for the discussion I was having with Pat Hayes, I'm more than happy to call it a day now Pat's acknowledged (despite my communication failures) :

    The document is a valid *representation* of the car, yes of course.


    danja
    2011-06-18T21:21:24+01:00
    poetry rdf
    Related
    Comments
    Edit

    httpRange-14 Reflux

    Back in 2002, the following issue was put before the TAG:

    httpRange-14: What is the range of the HTTP dereference function?

    TBL's argument the HTTP URIs (without "#") should be understood as referring to documents, not cars.

    By 2005 a resolution was accepted. If a GET is done and the thing being referred to isn't a document, then a 303 redirect should be used to provide something which is a document. As these things go, this is quite an elegant solution. Additionally it's accepted practice to use #-URIs for things which aren't documents. However, both approaches have their problems, many of which are listed in Providing and discovering definitions of URIs.

    But I liked the 303 approach, and did my share of grumbling when things like Microformats and OpenID appeared to conflate the notion of a person with their home page, using the same URI for both. Now schema.org conflates lots of different kinds of things with documents. So while I still believe the TAG resolution works well I personally, finally feel we have to take into account that people won't make this kind of distinction. Others have already argued for pragmatism around the issue - e.g. see linking things and common sense and Back to Basics with Linked Data and HTTP. However it's hard not to see a conflict when (e.g.) RDF says "http://example.org/fred is a person" and in effect HTTP says "http://example.org/fred is a HTML page". Not pretty. On the lod list I just had a go at a conceptual model that avoided such a conflict, but I suspect so far I've only managed to persuade Pat Hayes that I'm barmy. So I'll have another go here with these arguments:

    1. (solid) : HTTP doesn't have a notion of a "complete" representation of a resource. A photo of a car could reasonably be served as a GIF or lossy-compressed JPG image. The difference here as far as HTTP is concerned is just in the media type expressed by the Content-Type header.

    2. (adequate) : a resource may have representations reasonably served with very different media types. Here I'm thinking of, say, a text version of a photo of a car. It may sound clunky, but there are good precedents: an RDF version of My Home Page can vary widely in its information content from a HTML version of My Home Page. In the HTML format, we have img alt="text". In both cases the assertion is made by the publisher that in some sense the different version can act as a useful alternative.

    3. (strong enough IMHO) : a description of a resource can be considered a representation of that resource. On list I suggested there was some isomorphism between a description and a thing. Pat didn't accept that at all, but did say "there can indeed be correspondences between the syntactic structure of a description and the aspects of reality it describes". I'd suggest that's near enough. Those correspondences could be said to make the description a representation in the same way a lossy-compressed version of an photo can still share the same URI as an uncompressed version. As long as an appropriate Content-Type is used.

    Given these three, a HTTP URI can simultaneously be understood as referring to a document and a car.

    I'd better mention the barmy part. If HTTP did support transfer of matter, then as far as the URI referencing is concerned, all you'd be looking at is another media type. The example I used on list was of my dog Sasha, and given the above I'd suggest you could have various different representations of diminishing fidolity: Sasha herself; Sasha's description in DNA; a photographic description of Sasha; an RDF description of Sasha; a HTML description of Sasha... As I put it on list: you can't squeeze a dog over the wire with HTTP, but that's just a limitation of the protocol.


    danja
    2011-06-15T16:18:22+01:00
    uri tag http httpRange-14 rdf
    Related
    Comments
    Edit

    On Constraints

    Yesterday, re. Microdata, I loosely quoted Dan Connolly - here's the original:

    Are there parts of traditional logic and databases that, if we set them aside, will result in viral growth of the Semantic Web?

    From a slide, Logic, Databases and Scale dated 2006.

    (thanks Dan)


    danja
    2011-06-10T22:17:55+01:00
    rdf
    Related
    Comments
    Edit

    Privacy bullet points

    Federated Social Web stuff.

    It seems privacy can't really be pinned down, the definition is evolving. But you can effectively use a working definition (pick one).

    Things are different depending where in the world you live.

    Your average internet user hasn't a clue.

    What's being leeched from your online activity - virtually nobody takes on the implications. But people are learning, they're as far as 1999.

    Even when the browser vendors get together and make a button to limit things - still no-one gets the implications (see Aleecia at the link above).

    Ok, so far is mostly "duh!".

    But there was a lovely little revelation (from Soren I believe) that statistically the people more aware of privacy tend to be those with more disposable income [bum, that's twitterable]. If you want this demographic's dollars in your consumer base, you better get your privacy sussed.


    danja
    2011-06-09T22:31:14+01:00
    federated social web rdf fsw2011
    Related
    Comments
    Edit

    Federated Social Web

    I've been out of the tech loop somewhat the past couple of years, and had decided not to go to conferences for a while. Ennui mostly. But when a Federated Social Web meet in Berlin showed up on the radar, it struck me I might get the shot in the arm I needed. Wasn't far off the mark. Berlin itself I found awesome, but right now I want to get down some notes on the conf. Falk (my new pen-pal) has a couple of overview posts. Good start Thursday night meeting up with Henry and a good crew. Friday morning I was tempted to sit in on the WebID WG but decided to leave them to it, relax in the hostel instead. That was until I got a ping from danbri, flying visit, unexpected f2f. Then the conf. proper started.

    It opened with a pep talk from timbl via video link (captured by Dan Romescu, who has also written up the event). Nothing remarkable (aside from how hyper the man can be at 5am local or whatever :), just reinforcement that the notion of "Federated Social Web" is pretty much the same as Tim's notion of how the Web should be.

    After that, all the stage stuff was captured on video by the organisers (bravo!).

    For most of the presentations and discussion, Facebook was the mammoth in the room. All the stones they've turned over regarding identity, privacy, Web-wiring is astonishing. But there are people generally very well aware of these issues, which was nice.

    beh, I'm really struggling writing this up, I get to 135 chars and start counting. Have to do it PowerPoint. The first bullet:

    Lessons learned from Social Networking in Egypt (Amr Gharbeia) is really a must-see. A lot of the media bollocks about Facebook and Twitter playing a role in recent Middle Eastern events was true.

    A related must-see presentation happened after the FSW event, over at starship c-base. How some European hackers were able to get communications going again after a govt. had pulled the plug - go to about 1700 on the vid here at telecomix (so I'm told, not got bandwidth here to check :)


    danja
    2011-06-09T21:00:19+01:00
    federated social web rdf fsw2011
    Related
    Comments
    Edit

    I, for one, welcome our new Microdata overlords

    I don't think the approach taken by schema.org is the best one, far from it. But the Semantic Web has got quagmired so many times, while the rest of the world gets on and does cool stuff. I recently got irritated enough by the goings-on in the HTML5 WG to join [c.f. extensibility] but then realised I couldn't make any difference, certain patterns/arguments are set in stone. Facebook has taken the world by storm with ideas that were in FOAF years ago. Ball dropped.

    I used to hold the opinion that the shortest path from A to B was usually the best one. But any path is better than spending more time wading the river.

    Also Dan Connolly came out with a shrewd phrase like "so which constraints do we need to relax for this to go viral?"

    And Michael's already sorted http://schema.rdfs.org, mappings to RDF schema :)


    danja
    2011-06-09T17:35:17+01:00
    rdf microdata
    Related
    Comments
    Edit

    Dark Secret

    These Web people are remarkably cool. The Semantic Web people especially so, all with dark pasts arournd AI or worse still, data. But humungously human. This is significant. Even Hixie (who I basically want to take into a dark room and stick needles in) is a good guy. Emergent community, something that only has happened in the past around popes. Tagging this post with RDF so that northen irish guy sees is [yeah you dajobe]. People being nice to people, trying to discover new things, there's nothing wrong here.


    danja
    2011-05-28T00:41:42+01:00
    rdf
    Related
    Comments
    Edit

    Smell the coffee

    I have a problem with the Semantic Web right now. We have plenty of solid specs (and SPARQL 1.1 as it's evolving). We have a lot of data online. The browser is getting smarter. Maybe it's because I don't trust myself with mobile devices, but I'm somehow missing the benefits all this should provide. The social side, which did seem to be doing very well with blogs and people setting up their own sites seems to have disappeared up a cul-de-sac with Twitter and Facebook. This morning I'd have liked to have found a hay fever remedy online, and been able to contact some local drug suppliers to stop me sneezing. It seems like we are all playing with toy projects and can't grow up (yes, I'm still working on my Turtle editor). I'm aware there's a huge amount of pharmaceutical data online, but to stop my sneezing the best bet still seems to be Wikipedia followed up with personal visits to medics. I expect, no *demand* more from the technology, but right now it all seems a bit Dark Ages. We have moved closer to the Bazaar model over Cathedrals, but that just seems to mean people have an excuse to be sloppy (it's open source so it's not my problem any more). About say 5 years ago there was inspirational stuff coming from academia, now it seems to have dried up into poor mimic of the commercial world. I've no real specific idea what is needed right now, but I suspect it looks a lot like a big kick up the backside.


    danja
    2011-05-27T10:22:55+01:00
    rant rdf
    Related
    Comments
    Edit

    Another SPARQL solution

    Bravo! A solution to the latest SPARQL puzzle.

    @glenn_mcdonald found a way of getting the non-Roman-god solar system bodies:

    PREFIX rdfs:  <http://www.w3.org/2000/01/rdf-schema#>
    PREFIX wn:    <http://www.w3.org/2006/03/wn/wn20/schema/>
    PREFIX id:    <http://wordnet.rkbexplorer.com/id/>
    
    SELECT DISTINCT ?planet WHERE {
      ?s1 wn:memberMeronymOf id:synset-solar_system-noun-1 .
      ?s1 rdfs:label ?planet .
      OPTIONAL {
        ?s1 wn:containsWordSense ?ws1 .
        ?ws1 wn:word ?w .
        ?ws2 wn:word ?w .
        ?s2 wn:containsWordSense ?ws2 .
        ?s2 wn:hyponymOf id:synset-Roman_deity-noun-1 .
      }
      FILTER (!bound(?s2))
    }
    

    Isolating just the planets looks to be out of reach using the WordNet endpoint alone, but I guess that can be left as a challenge for federated query e.g. CONSTRUCTs from different datasets into a local store before SELECTing.

    Update

    From RobVesse -

    Here's an even simpler query for yesterdays puzzle - still doesn't isolate real planets though

    PREFIX rdfs:    <http://www.w3.org/2000/01/rdf-schema#>
    PREFIX wn:  <http://www.w3.org/2006/03/wn/wn20/schema/>
    
    SELECT DISTINCT ?label WHERE 
    {
     ?s1 wn:memberMeronymOf <http://wordnet.rkbexplorer.com/id/synset-solar_system-noun-1> .
     ?s1 rdfs:label ?label.
     OPTIONAL
     {
      ?s2 wn:hyponymOf <http://wordnet.rkbexplorer.com/id/synset-Roman_deity-noun-1> .
      ?s2 rdfs:label ?label.
     }
     FILTER (!BOUND(?s2))
    }
    

    ...plus...

    Here's a soln using wordnet and dbpedia to show only planets not named after roman gods, requires a SPARQL 1.1 engine to run

    PREFIX rdfs:    <http://www.w3.org/2000/01/rdf-schema#>
    PREFIX wn:  <http://www.w3.org/2006/03/wn/wn20/schema/>
    
    SELECT DISTINCT ?label WHERE 
    {
     SERVICE <http://wordnet.rkbexplorer.com/sparql/>
     {
       ?s1 wn:memberMeronymOf <http://wordnet.rkbexplorer.com/id/synset-solar_system-noun-1> .
       ?s1 rdfs:label ?label.
     }
     MINUS
     {
      SERVICE <http://wordnet.rkbexplorer.com/sparql/>
      {
        ?s2 wn:hyponymOf <http://wordnet.rkbexplorer.com/id/synset-Roman_deity-noun-1> .
        ?s2 rdfs:label ?label.
      }
     }
     BIND(URI(CONCAT("http://dbpedia.org/resource/", ?label)) AS ?dbpResource)
    

    Here's a suitable engine: Leviathan (a demo of the SPARQL Engine used in dotNetRDF).


    danja
    2011-04-12T20:19:30+01:00
    sparql puzzle rdf
    Related
    Comments
    Edit

    Another SPARQL puzzle

    Using the WordNet endpoint at http://wordnet.rkbexplorer.com/sparql/ I can get the names of the solar system bodies that are named after Roman gods with :

    PREFIX rdfs:		<http://www.w3.org/2000/01/rdf-schema#>
    PREFIX wn:	<http://www.w3.org/2006/03/wn/wn20/schema/>
    
    SELECT DISTINCT ?label WHERE {
    ?s1 wn:memberMeronymOf <http://wordnet.rkbexplorer.com/id/synset-solar_system-noun-1> .
    ?s1 rdfs:label ?label.
    ?s2 wn:hyponymOf <http://wordnet.rkbexplorer.com/id/synset-Roman_deity-noun-1> .
    ?s2 rdfs:label ?label.
    }
    

    The challenge is to get the names of the solar system bodies that aren't named after Roman gods. (Ideally I'd like planets in the solar system... rather than ...bodies, but I can't see a suitable class).


    danja
    2011-04-11T20:42:00+01:00
    sparql puzzle rdf
    Related
    Comments
    Edit

    Linked Data One-Liner

    A lot of information is merely On the Web when it would be more useful In the Web...


    danja
    2011-04-04T03:44:39+01:00
    linkeddata rdf
    Related
    Comments
    Edit

    Puppies on the Web of Data?

    Received via email:

    Several years ago I bought a cocker spaniel puppy in Pleasant view Colorado. Are you the Ayers that sell Cocker Spaniel puppies? If so could you contact me anytime you have a litter with a tri-colored male in it both my son and myself are interested. If you are not the correct party I apologize for bothering you.

    Nothing to suggest this isn't a legit enquiry. The immediate solution is "no", but I wonder how the machines might help solve it otherwise...


    danja
    2011-03-30T19:09:28+01:00
    puppies rdf
    Related
    Comments
    Edit

    Pattern exclusion in SPARQL

    Seconds after I twittered the last post, @LeeFeigenbaum responded.

    Ok, so I have two patterns, and I want to find the statements that match either pattern but don't match both. The solution is rather a flexible little idiom for this kind of negation. The specific patterns are:

    ?set dbpp:wikiPageUsesTemplate  <http://dbpedia.org/resource/Template:Infobox_programming_language> .

    and

    ?set a yagoc:ProgrammingLanguage106898352 .

    (I'm running this agains dbPedia)

    Lee's solution is:

    PREFIX dbpp:		<http://dbpedia.org/property/>
    PREFIX yagoc: <http://dbpedia.org/class/yago/>

    SELECT count(?set) where {
    {
    ?set dbpp:wikiPageUsesTemplate <http://dbpedia.org/resource/Template:Infobox_programming_language> .
    OPTIONAL {
    ?set a ?marker .
    FILTER(?marker = yagoc:ProgrammingLanguage106898352)
    }
    FILTER(!bound(?marker))
       } UNION {

    ?set a yagoc:ProgrammingLanguage106898352 .
    OPTIONAL {
    ?set dbpp:wikiPageUsesTemplate ?marker .
    FILTER(?marker = <http://dbpedia.org/resource/Template:Infobox_programming_language>)
    }
    FILTER(!bound(?marker))
    }
    }

    (Note that COUNT isn't (yet) standard SPARQL, but seeing the size of the result sets was handy here).

    It's looks convoluted, but each half of the UNION is kind-of the converse of the other (and will give interesting results independently). I was a little surprised it did work as variables are scoped to the whole query and ?marker looked troublesome. But FILTERs are scoped to the local group, and that's where it matters here (it will produce the same results if you had a different variable for each half of the UNION).

    There is something slightly odd happening in this particular case (or I'm missing something obvious). The figures I got before were 762 matches for the UNION of the two patterns, 178 for the intersection, so I'd have expected 762 - 178 = 584 results, but this gives 406. So there's a bit of sloppy QED around here. I was missing something obvious.

    Lee again via twitter: the numbers look perfect to me - the 762 double-counts the 178 in the intersection. 406+178=584

    As @glenn_mcdonald and Lee have pointed out, a DISTINCT would fix my original UNION query to exclude the dupes. Glenn also offers a more concise version taking advantage of a Virtuoso feature:

    PREFIX dbpp:		<http://dbpedia.org/property/>
    PREFIX yagoc: <http://dbpedia.org/class/yago/>

    SELECT count(?set1) where {
    {
    ?set1 dbpp:wikiPageUsesTemplate <http://dbpedia.org/resource/Template:Infobox_programming_language> .
    FILTER NOT EXISTS {?set1 a yagoc:ProgrammingLanguage106898352}
    } UNION {
    ?set1 a yagoc:ProgrammingLanguage106898352 .
    FILTER NOT EXISTS {?set1 dbpp:wikiPageUsesTemplate <http://dbpedia.org/resource/Template:Infobox_programming_language>}
    }
    }


    danja
    2011-03-28T19:35:15+01:00
    negation sparql rdf
    Related
    Comments
    Edit

    Long multiplication

    Querying http://dbpedia.org/sparql


    PREFIX yagoc:		<http://dbpedia.org/class/yago/>
    SELECT COUNT(?set1) where {
    ?set1 a yagoc:ProgrammingLanguage106898352 .
    }

    result = 336

    PREFIX dbpp:		<http://dbpedia.org/property/>

    SELECT COUNT(?set2) where {
    ?set2 dbpp:wikiPageUsesTemplate <http://dbpedia.org/resource/Template:Infobox_programming_language>
    }

    result = 426

    disjunction, language is in set1 OR set2

    PREFIX dbpp:		<http://dbpedia.org/property/>
    PREFIX yagoc: <http://dbpedia.org/class/yago/>
    SELECT count(?s) where {
    {
    ?s dbpp:wikiPageUsesTemplate <http://dbpedia.org/resource/Template:Infobox_programming_language> .
    } UNION {
    ?s a yagoc:ProgrammingLanguage106898352 .
    }
    }

    result = 762

    conjunction, language is in set1 AND set2

    PREFIX dbpp:		<http://dbpedia.org/property/>
    PREFIX yagoc: <http://dbpedia.org/class/yago/>
    SELECT COUNT(?set) where {
    ?set a yagoc:ProgrammingLanguage106898352 .
    ?set dbpp:wikiPageUsesTemplate <http://dbpedia.org/resource/Template:Infobox_programming_language> .
    }

    result = 178

    PREFIX dbpp:		<http://dbpedia.org/property/>
    PREFIX yagoc: <http://dbpedia.org/class/yago/>

    SELECT count(?and) where {
    {
    ?or dbpp:wikiPageUsesTemplate <http://dbpedia.org/resource/Template:Infobox_programming_language> .
    } UNION {
    ?or a yagoc:ProgrammingLanguage106898352 .
    }
    ?and dbpp:wikiPageUsesTemplate <http://dbpedia.org/resource/Template:Infobox_programming_language> .
    ?and a yagoc:ProgrammingLanguage106898352 .

    FILTER(?and = ?or)
    }

    result = 356!?

    Took me a long while to realise what that number represents, definitely time for a break...

    What I'm trying to find (if it's possible) are queries to look at the difference between the sets above, the 762 - 178 = 584 part. I'm hoping something along the lines of Finding Resources that don't have a certain property might work. If anyone knows an idiom that'll work (or knows that it isn't possible) please ping me.


    danja
    2011-03-28T13:59:45+01:00
    sparql puzzle rdf
    Related
    Comments
    Edit

    4store on Ubuntu

    Bad notes for future ref. It works! Last night I finally got around to sticking 4store (a properly scalable free RDF store) on my slicehost server, which is running Jaunty x84 64bit (uname -a, I always have to look that up), and today I got it on my laptop x86 32bit Maverick. I did things a little out of sequence from the 4Store instructions - I hadn't seen the pointer to ready-wrapped Debian/Ubuntu packages, and didn't keep notes. But in each case I did install the latest Raptor and Rasqal from source, and a stack of common dependencies with apt-get/synaptics, as listed on the 4store site. (I think) on the server the deb package worked right away, locally I had a little snag but got there finally with the 4store source. That snag was a error when trying to load some data, in amongst gobbledegook was "getaddrinfo failed". It seems ipv6 stuff can be one cause for this, googling I found this way of disabling it:

    added to /etc/sysctl.conf:

    #disable ipv6

    net.ipv6.conf.all.disable_ipv6 = 1

    net.ipv6.conf.default.disable_ipv6 = 1

    net.ipv6.conf.lo.disable_ipv6 = 1

    (reboot)

    Seems ok now. Server install has currently just got this blog's data in it, exposed through SPARQL here. Local copy I think I'll just dump any bits of RDF I come across into, kind of a random DB. Filter it at read time...


    danja
    2011-03-26T21:46:55+01:00
    4store ubuntu triplestore rdf
    Related
    Comments
    Edit

    Scripty Agents

    A bit of background for a question I've put to SemanticOverflow (soon to be moving to semanticweb.com) -

    Has anyone got a triplestore to interop with node.js?

    (It's evented I/O for V8 JavaScript, looks nifty kit for HTTP stuff).

    Given that V8 runs native, I'm guessing it should be possible to hook into Redland somehow (but I wouldn't know where to start). Alternately, I suppose one of the Javascript RDF engines could work, but I'm out of touch with what's available.

    SPARQL would be nice to have...

    node.js floated into my consciousness again after seeing mention of @sh1mmer's new video on the subject. This time around I got around to installing it - it was pretty easy to get the example server script running (error first time, but rather than disabling SSL as suggested in the docs I just apt-get installed openssl-dev etc. and that did the trick).

    On and off I've been playing with the idea of using an agent metaphor for (Semantic Web) services. I did a bunch of slides for the Scripting for the Semantic Web 2007 meetup, which I've just managed to dig out again: Two Webs!. Here's what a generalised agent looks like (slide 47):

    scripty agent block diagram

    Pretty much any semweb service can be viewed this way, and if you degenerate it a bit by dropping the RDF store and HTTP client it covers pretty much any Web app/service/site. Drop the RDF and server (and add a view) and you've got a browser. But I reckon the fun should start when you go in the other direction, starting with this as a general architecture for the app, plugging in whatever bit of functionality you like. I like the idea of such things being quite small, and to add to the agility, use a scripting language for the behaviour, so it looks something like this (slide 48):

    scripty agent with some pseudocode

    The pseudocode here is for a simple doc server (e.g. if the query was done with SPARQL, format done with XSLT), but a key piece to note is the call to another datasource if the query can't be fulfilled locally. It occurred to me that if you had a little framework for agents like these, they could communicate with each other over HTTP (as they would in the wild) although if they knew they were in the same VM then more direct programmatic comms could take place. In between direct and HTTP calls other protocols could also be available, e.g. XMPP, with negotiation used to choose the most suitable wire. But I think it would be important to always (MUST) have HTTP support.(I've been playing with this stuff a bit using Scala Actors, no results to show yet).

    So...given how nifty node.js is for doing the HTTP client/server bits, it would be nice to get the RDF bits in there too. Given that node.js/V8 runs natively, one option would be a native RDF store like Redland. I'm not sure, but I think you'd have to wrap the Redland functions up quite a bit in V8's C++ to make them accessible through Javascript in V8 (and I'm not volunteering - it's years since I did any C, and I never got the hang of C++).

    If there was a decent Javascript RDF engine, that would also be a good alternative - V8 compiles Javascript, so the result should be pretty performant. (Hmm, whatever happened to the RDF store in Tabulator?).

    In the first instance I'm thinking here of running such agents server-side (slide 30 is "Tyranny of the Browser" in Gothic Black Letter - we've got used to such a narrow view of what the Web can do), though as a browser can supply the VM for V8, similar stuff could run there too.

    Anyhow, there are already a couple of interesting answers to my question, comments over there please, or alternatelyI installed phpBB the other day to use in the near future for some support stuff - feel free to use that for comments or whatever.


    danja
    2011-03-25T22:34:57+01:00
    agents node.js rdf scripting
    Related
    Comments
    Edit

    Adding SPICE to the Semantic Web

    Main Course

    Here's a circuit:

    distortion circuit
    - and here's its SPICE model:

    ***
    .INCLUDE la-components.mod

    Rsrc 1 0 100E3
    Rin 1 2 1E3
    Rfeed1 2 3 10E3
    Q1 3 0 4 BC109
    Q2 3 0 4 BC179
    Rfeed2 3 4 10E3
    Xopamp 0 2 5 6 4 TL071
    Rload 0 4 10E3
    Vcc 5 0 15
    Vee 6 0 -15

    Vsrc 1 0 SIN(0V .1VPEAK 1KHZ)

    .TRAN 10US 1000US
    ***

    The .INCLUDE is as it sounds, the contents of that file are included in this model. After that it's describing a graph with two kinds of nodes: those associated with a component and connection nodes (i.e. common terminals/points/buses/PCB tracks...). Although the components kind-of contain arcs, they're hidden behind the component's connectors. The component's connectors are identified by their position in the space-separated data. On the schematic the nodes are marked in red.

    Taking the first line:

    Rsrc 1 0 100E3

    This is interpreted via :

    (a Resistor) <name> <node1connection> <node2connection> <value>

    Rsrc is a 100k resistor connected between nodes 1 and 0

    (Node/bus 0 is always ground)

    Taking the first of the transistors:

    Q1 3 0 4 BC109

    a transistor of type BC109 called Q1 has its collector connected to node 3, base to node 0, emitter to node 4

    The .TRAN line is used to run a simulation (a transient analysis), sampling every 10uS for 1000uS. I've not really figured out this side of things properly, couldn't get a straight .DC based transfer chart. But the sine wave will do for now.

    Anyhow I can't go looking at a graph model for long without wondering how it could go on the Web. While there are no doubt loads of ways of doing it, the circuit definition can be transcribed into Turtle fairly directly. Bnodes could be used for the connection buses, but it's just as easy to name them. So making things up as I go along -

    @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
    @prefix dc: <http://purl.org/dc/elements/1.1/> .
    @prefix spice: <http://purl.org/stuff/spice/> .
    @prefix u: <http://purl.org/stuff/units/> .
    @prefix d: <http://purl.org/stuff/devices/> .
    @base <http://hyperdata.org/circuits/logamp/> .

    <http://hyperdata.org/circuits/logamp> a spice:Circuit ;
    dc:title "Log Amp" ;
    dc:description "a modified log function amplifier" ;
    spice:components ( <Rsrc> <Rin> <Rfeed1> ... <N0> <N1> ...) .

    # Rsrc 1 0 100E3
    <Rsrc> a spice:Resistor ;
    rdfs:label "Rsrc" ;
    spice:terminal1 <N1> ;
    spice:terminal2 <N0> ;
    u:ohms "100000" .

    ...

    that seems ok, now for a transistor:

    # Q1 3 0 4 BC109
    <Q1> a spice:BJT ;
    rdfs:label "Q1" ;
    spice:terminal1 <N3> ;
    spice:terminal2 <N0> ;
    spice:terminal2 <N4> ;
    spice:device d:BC109 .

    that'll do.

    Doing a .INCLUDE in general could really do with something from RDF core (ping RDF WG), but here it's providing other SPICE definitions of the components so it seems reasonable to be more explicit:

    d:BC109 rdfs:isDefinedBy <http://hyperdata.org/circuits/logamp/components#BC109> .

    which given that SPICE supports subcircuits (which is how TL071 is defined) provides a nice composition mechanism.

    I reckon it should be straightforward to write a transformer from SPICE syntax to Turtle. Going the other way, the usual SPARQLing shouldn't be rocket science.

    All seems doable. Homework. Rainy day.

    Starter

    I want to play with analog electronics again, stuff I used to do before the Web came along and ate up my cycles. My motivation now is mostly driven by the price of recording studio equipment. If, for example, I just want to invert the phase of a signal, I'd need to pay say $50+ for a passive DI box or $100+ for a pre-amp. This is a bit demoralising when the components are available for pennies (though hardware like connectors and cases can cost a lot more). Then of course there's the circuit hacking angle, it really is good fun. A project that's a permanent fixture in this space is the distortion pedal (like ghard's Big Muff) - the circuits aren't complicated, but getting a good sound is the Holy Grail, so this is what I'm going to play with first.

    I did buy a bunch of components a while back, but haven't got much in the way of prototyping/test gear. A cheapo USB ADC will hopefully do for a makeshift oscilloscope for now, and I've just ordered the parts to put together a simple PSU (along with *lots* of oddments). But feeling a bit impatient, I thought I'd have a quick look what software was available these days for circuit simulation.

    I don't know if I'm missing something, but things hardly seem to have progressed at all in the last couple of decades (but then again analog electronics hasn't really changed). The de facto standard is SPICE, and there are quite a few tools open source available for using it (ah, things weren't open source back in the day, that's progress). I won't bother linking to the individual bits, if you look for 'spice' in Synaptics a bunch show up, and they all seem to come under the umbrella of gEDA. Anyhow, after an hour or so's fiddling I was able to draw a little circuit using gschem, but I haven't yet managed to get it to generate a working netlist file (which specifies the inter-component connections for SPICE). I think I just need to sit down and check/add all the component attributes. But that's a bit tedious so I've just been playing with a SPICE file manually. Praise be to text formats.

    The first problem here was finding simulation definitions of the components I want to use. The little circuit I want to test includes a common op-amp (TL071) and a pair of transistors, one NPN (BC 109), one PNP (BC 179). Took a lot of searching, and although (allegedly) many of the manufacturers do provide SPICE modules for their components, I eventually found what I needed on hobbiest sites. (Making the component module files doesn't look too difficult, it'd mostly mean copying values from a spec sheet into a SPICE definition - again, sounds tedious).

    There is GUI adapter for running simulations, gspiceui (which I must have another look at now I've got a working model), but with the amount of trial and error I was having to do I settled back into the command-line tools. For future ref. it goes like this:

    ngspice <filename>

    This loads in the file and starts up an interactive shell. Took me a long time to figure out what to do next, but here are a couple of bits that worked for me. Once in the shell:

    ngspice 1 -> run

    Runs the simulation (the .TRANS bit). Then:

    ngspice 2 -> plot V(1) V(4)

    Produces a plot like this:

    distorion plot

    Red is the input (voltage on node 1), blue the output (voltage on node 4).

    Certainly looks like distortion...wonder what it sounds like...

    Pudding

    Finding stuff and looking up references in this space is still fairly Paleolithic, so there's one application of exposing this kind of material as linked (wired!) data. But there are probably stacks of other more inspiring apps. Going totally blue sky, a globally distributed circuit could be rather cool. In the digital realm you for example could have a global computer that's built from just a few simulated gates on each of a million interconnected PCs. Bit like an extremely dumbed-down Web service/agent kind of thing.

    In the analog realm it could get very wacky. Host your own local circuit subsystem, connect it to anyone else's. I guess you'd want to connect your inputs to other folks' outputs and offer outputs of your own. As long as you are limited to connecting your inputs to the rest of the world (or more versatile, I can only connect my output to your input if I have the appropriate rights) then subsystems should play nicely with each other. I see no reason why for control and audio signals you couldn't do this in real-time using existing streaming audio protocols (codec'ing locally to PCM for the instantaneous values).

    This is pretty much what I assume some of the net-based recording systems that are around are doing. I must confess I've never looked into these, trying to mimic traditional recording/mixing stuff that way seems a bit of a non-starter because of the latency issues. But flipping it to a messier, slightly bonkers [insert pun about bipolar transistors] global analog synth kind of idea, then it starts to sound more fun.


    danja
    2011-02-23T22:00:24+01:00
    spice turtle electronics semweb rdf
    Related
    Comments
    Edit

    Really Simple Reading Lists

    I occasionally visit Dave Winer's blog, as he has been known to have good ideas. One of these from a few years ago, that he's talking about again, was 'Reading Lists', whereby (in principle) you can subscribe to a list of feeds. When the list changes, your aggregator (in principle) can subscribe/unsubscribe to the individual feeds in the list, showing the contents of the listed feeds probably grouped together in some fashion. Neat idea, but it doesn't really seem to have caught on.

    There are two de facto standards for expressing lists of feeds: OPML and RDF ("foafrolls"). The former is probably better supported in desktop aggregators, the latter maybe more visible in the online Planet aggregators (including Planet RDF, though that uses chumpologica/Redland rather than PlanetPlanet). OPML is Dave Winer's 'outline processor' markup language, for lists of feeds it has typed links. The RDF version uses the FOAF, DC and RSS 1.0 vocabs (very typed links). Away from the feed list application, the OPML format is usable in Dave Winer's outliner, and any RDF tool can make sense of the RDF (naturally :) but I reckon it does rather lend itself to FOAFishness - feeds are associated with a foaf:Person (and/or foaf:Agent) with a foaf:weblog etc. (I dunno, the domain is right on top of SIOC too, maybe some info using those terms could be added to the feedlists..?).

    For any kind of Information/Knowledge Manager kind of tool (Personal or otherwise) built with RDF, it seems quite natural to periodically refresh data (not only feeds but pretty much anything in the domain of interest - FOAF Profiles probably being ubiquitous), so Reading Lists would sit comfortably alongside other features.

    But in the 'simple' world of RSS, subscribing to feedlists is something of a complication. For instance, in the good Mr. Winer's latest incarnation, he's got aggregated pages (e.g. daveriver.scripting.com) not unlike those of the Planets, with an autodiscovery link in the HTML pointing to the feedlist:

    <link rel="alternate" type="application/rss+xml" title="OPML" href="index.opml" />

    OPML is RSS? I don't think even the Universal Feed Parser is that liberal. The kludge does get Firefox to show the target as a subscribable link, but then that's still not much good if the tools don't know what to do with it. But it seems to me there's a much simpler approach - use RSS. To get myself some markup to show I just bookmarked this blog's feed with del.icio.us and had a look at the feed that produced, and it contains (trimmed) this:

    <item>
    <title>Danny Ayers : Raw Blog (feed)</title>
    <link>http://dannyayers.com/index.rdf</link>
    </item>

    Now a current aggregator would see that and probably just display it as a HTML-style titled link. But if the aggregator bothered to do a HTTP HEAD, it'd see:

    Content-Type: application/rdf+xml

    To a (non-RDF savvy) aggregator that means an RSS 1.0 feed. So, aggregator dude, subscribe to it. Atom <link> elements have a (mime) type attribute, so there the HEAD wouldn't even be necessary.

    While most feeds are a changing, fixed-length FILO queue of entries, there's nothing to stop them being a variable length list.

    In other words, the simplest RSS feed list is an RSS feed. Even if the aggregator needs a little help in recognising a feed list, it's got to be easier than understanding an entirely different format (published with an inappropriate media type).

    Ok, so personally I'd go straight down the RDF route, it's a heck of a lot more flexible. But an RSS-format Reading List does seem like low-hanging fruit for non-RDF tools.

    Anyhow, if anyone's building an aggregator (they're a great little starter app when learning a new language), consider Reading Lists as a feature.


    danja
    2011-02-17T17:32:19+01:00
    aggregators lists reading rss rdf opml
    Related
    Comments
    Edit

    Why OWL ain't bad

    John Sowa just posted some criticism of OWL as a KR language to the Conceptual Graphs list. I responded but the list's playing up, I ended up with an over quota message. So I'll post the text here and send John the link...

    On 14 February 2011 22:59, John F. Sowa <sowa@bestweb.net> wrote:
    > I have often commented on the limitations of OWL as a knowledge
    > representation language.

    ...and I believe I have leapt to its defence more than once :)

    I don't believe I've done so since OWL 2 [1] came out, so it behoves
    me to add my few cents once again.

    OWL is limited as a knowledge representation language, by design. As
    with most other modelling languages it's a trade off between
    expressivity and computational demands. But it has certain features
    that sets it apart from most other such languages, the key ones being
    related to the fact that it's a Web language. Three such features
    spring to mind:

    * Most of the language's constructs (individuals, classes, relations
    etc) are identified using URIs, so there's Web-compatibility built in
    at a low level
    * While the binary relation that's at the heart of OWL can seem like a
    handicap (especially coming from a DB perspective), when the
    information is seen as a graph structure, echoing the Web, its utility
    is hard to dismiss
    * By making the open world assumption (statements are either true or
    unknown) the language reflects the real world as expressed on the Web
    - in a global environment, we can't know everything

    Given these features as a starting point, OWL does a good job of
    providing an ontology language that ticks many of the logician's
    boxes.

    As it happens, as development of OWL 2 was being proposed, I'd argued
    with some of its advocates that there were more useful things the time
    could be spent on than the logic side of the language. Turns out it
    didn't matter anyway - the enthusiasts there produced what they wanted
    (and what apparently their customers were demanding) and there's been
    no discernible impact on other development tracks.

    The thing is RDF (plus RDFS, perhaps with a tiny bit of OWL) is enough
    to cover the vast majority of descriptions (the statements of
    interest) to a useful degree. Most of the time you don't actually need
    expressive constructs to get useful data on the Web, a very simple
    statement of relationship between resources is enough. Logic-wise, the
    SPARQL query language (syntactically like SQL, but operating over
    graphs) covers the requirements of the vast majority of applications
    (IMHO), its simple pattern-matching being substantially more useful
    than most other inferences.

    The aspect of RDF/RDFS/OWL that really seems to work well is what's
    been called the 'follow your nose' protocol. As when browsing the Web,
    if you want to find out more about something (and it has a link), you
    click the link to get more information. With RDF & OWL entities and
    relations being identified by URIs, typically HTTP URIs, your machine
    can do the same.

    If it encounters a statement, say something like:

    Fornitura(John Sowa, Something)

    - it (and you) may have now idea of what's being stated. However all
    three parts of the statement are Web resources, the statement could be
    written longhand as (e.g.) :

    <http://www.jfsowa.com/people#me> <http://some-vocab.org/fornitura>
    <http://example.org/something> .

    To find out more information, you can do a HTTP GET on the unknowns.
    Because of it's position in the statement you know :fornitura is a
    predicate (a rdf:Property) and by following the link you can get more
    information in a machine readable form. In this case, by asking for an
    RDF mime type, you will typically get back the ontology defining the
    predicate.

    Where 'better' knowledge representation (and reasoning) is required, a
    lot of the time that can be carried out locally. For example, you may
    have a traditional SQL database covering your specific domain.
    Internally there will a closed-world assumption, native n-ary
    relations and so on. On your own data you can use whatever languages
    you like. But that data may be exposed to the Web through RDF (etc),
    making it reusable elsewhere.

    Ok, there's the argument that to be really useful, you need powerful
    knowledge representation globally. But there is a major hurdle - to be
    really useful you need a lot of people using *and publishing*
    information in that form. Unfortunately along with the
    representational power comes complexity, and the extra work required
    has to be justifiable - in economic terms at least.

    I forget the source, but there's a nice line: "what's new about the
    Semantic Web isn't the semantics, it's the Web". When it comes to
    global information sharing, the Web part is really where the
    difficulties lie. Any logic/data has to actually be widely adopted.

    Though the development of the Semantic Web is happening slower than
    most folks hoped, it is happening. Take the recent statistics from
    Yahoo! :
    [[
    The data shows that the usage of RDFa has increased 510% between
    March, 2009 and October, 2010, from 0.6% of webpages to 3.6% of
    webpages (or 430 million webpages in our sample of 12 billion).
    ]]
    https://tripletalk.wordpress.com/2011/01/25/rdfa-deployment-across-the-web/

    (RDFa is an RDF format allowing it to be embedded in HTML)

    Also the Linked Data Cloud is a nice visualisation of some of the big
    datasets out on the Web:
    http://linkeddata.org/

    So while RDF (alone) is a much more limited knowledge representation
    language than OWL (essentially simple binary relations), at least the
    data's getting out.

    [snip]
    > the OMG group is proposing as a way of representing type hierarchies
    > in a simpler and more readable form than OWL.

    I can't personally see how it could be much simpler than in OWL:

    SubClassOf( :Woman :Person )

    (in functional syntax)

    Note also that structures other than tree can easily be represented,
    an artificial example:

    #node1 :connectsTo #node2 .
    #node2 :connectsTo #node3 .
    #node3 :connectsT0 #node1 .

    (Turtle syntax)


    danja
    2011-02-15T10:47:40+01:00
    sowa owl kr rdf
    Related
    Comments
    Edit

    Some Problems

    Georgi Kobilarov has a refreshing post, suggesting Making Linked Data work isnt the problem. I'm inclined to agree with most of what he says. The technology in itself isn't a solution to any problem, rather an enabler to solve problems. While the idea of serendipity is appealing, it isn't very good justification for a huge global commitment of resources. So what kind of problems do we, as living, social and technological organisms wish to solve?

    To start exploring this space I reckon there are (at least) two general modes of knowledge use. The first is relatively domain-specific, directed by a set of requirements associated with a corresponding set of real-world tasks and operations. These I'd put under the umbrella of Applications, akin to the computer applications we already use but augmented with knowledge engineering facilities and access to the Web of Data. As a shortcut the starting point here is Connolly's Bane: "The bane of my existence is doing things I know the computer could do for me.". But in general it goes far further, in that there are plenty of beneficial things we don't already do. A second mode would be ad hoc, fairly immediate, unplanned, call it Just-in-time problem solving, the kind of thing that we currently turn to search engines for.

    As an example of the Applications mode, one of the early drivers for the Web was e-commerce. I think I'm fairly safe in saying that only the surface of the potential there has been scratched. There's a hint of what can be possible with things like the individual-targetting of Google Ads and Amazon recommendations. In this space the GoodRelations ontology is a marvellous baseline. But what we're not really seeing yet is the whole supply chain from the manufacturer to consumer being integrated. Fairly loosely-coupled (as it is today) In one direction there are the financial aspects ("follow the money"), in the other direction is all the transport, manufacturing and processing that go from raw materials to delivered finished product. Within those different parts of the pipeline there are a whole host of problems relating to technology tied together by human and natural resources.

    Alongside this commercial world there are macroeconomic and macrosocial systems, those areas traditionally covered by government. We're already seeing some movement around transparency with the various government data projects, but I think we're still a very long way from seeing genuinely informed policy and decision making. Reflecting the darker side of advertising right down to commercial spam and taking advantage of general ignorance, good governance is seriously compromised by self-interest (of individuals and corporations) and misinformation. I recently heard a radio programme talking about the UK Conservative Party's successful "Broken Britain" election campaign. An aspect of this was that violent crime was perceived as being on the increase. However the actual statistics suggest that in reality this malaise had actually been declining (see Murder rate lowest for 12 years "Home Office figures show overall crime fell by 5% in England and Wales"). Politicians will always lie, but damage is only done when they get away with it and aren't held to account with the facts. But I don't really want to suggest that prevention of political badness is the goal here, rather the encouragement and facilitation of goodness (man...).

    Another huge area where there are countless problems to solve is science. While the Web has vastly improved information sharing and been a boon to research, I'm not sure the underlying methodologies have changed that much. I'm convinced the open sharing of knowledge at the data level can offer A New Kind of Science (no hyperbole there!).

    There are plenty of other application domains that could benefit from a bit of Web-scale knowledge engineering. Ok, I'll name one more bundle: the Arts.

    Ok, moving on to the Just-in-time mode of problem solving, take a look at the following list (random stuff that came off the top of my head when I woke up this morning). Imagine how you would solve these problems now, and then think how you might solve them with a thousand programmers at your beck and call. Most of them need something considerably deeper than a keyword/linkrank document search. I've dumped this list over on the ESW Wiki, additions and discussion welcome over there (I still haven't implemented comments on this blog, so if you have a comment for anything either mail me or blog it (and mail me) or tweet or use Facebook...).

    • I'd like to upgrade the computer I use for video editing. My budget is about 300 euro. What should I buy?
    • Who should I get to make the soundtrack to my new film?
    • I've bought an Ubuntu laptop to replace my old Apple, I'd like it to run applications that fulfil all the tasks I have on the old machine. What do I need?
    • Should HTML use namespace prefixes?
    • Is there a political motivation behind Royal Weddings?
    • Who should I vote for?
    • Who might make a good (romantic) partner?
    • I wish to sell my double glazing products in sub-Saharan Africa, who should I contact?
    • Who might make a good (business) partner there?
    • I got a mail from someone claiming to be my cousin, asking for a loan. Should I give them the loan?
    • I've got an interesting rash. Should I see a doctor?
    • I wish to enlarge my penis. What method is safe and reliable?

    (Sorry, couldn't resist the last one - but it's a valid example of where you'd need good healthcare data alongside reputation and provenance information)

    PS. danbri points me to a short 1989/90 document which contains a fairly similar list (minus references to genitalia) : Information Management: A Proposal, by a certain Tim Berners-Lee. Go read it. Now!


    danja
    2011-01-22T17:46:57+01:00
    semweb problems rdf
    Related
    Comments
    Edit

    Drupal 7

    I'm back home after a spell away so am having a lazy week, or rather picking odds and ends that have been on the to-do list seemingly forever. One thing was sorting out my sites, and coincidently the following notice appeared on the SWIG list last week:

    After over 3 years of development by almost 1,000 contributors, Drupal 7 has finally been released today! Drupal is an open source content management platform powering hundreds of thousands of websites and applications. Notable websites are WhiteHouse.gov and the many top music artist's sites of Warner Media Group. Drupal 7 features the latest web technologies and remarkable improvements to user experience (UX). Drupal is the first major CMS to include RDF as part of its DNA and embed RDFa markup out of the box: all Drupal 7 sites annotate by default their pages, comments, images, tags, authors, posted date with the popular SIOC, FOAF, Dublin Core, and SKOS vocabularies. We hope that with Drupal adopting RDFa, we can pave the way for a greater adoption of the Semantic Web technologies. Drupal is estimated to power 1% of the Web, and even though Drupal 7 was just released, more than 30,000 websites are already powered by Drupal 7. With today's announcement, this number is likely to sky rocket in the coming months.
    ...

    This blog is still a little (Scala) homework project, but not long ago I registered a domain as a place to put my music noodling: spikeandwave.com. It was just a couple of handwritten HTML pages until yesterday, when I slapped Drupal there. Ok, so first my MySQL install seemed to be broken, so I took the opportunity to upgrade from Ubuntu Jaunty to Karmic. Turned out not only had I forgotten the admin password but the instructions I'd found online for resetting it didn't work. But these instructions did work. The install would have been a breeze, had I known this, in php.ini :

    memory_limit = 32M ; Maximum amount of memory a script may consume

    Default is 16MB and Drupal 7 core requires at least 32MB (I've set mine at 64MB to be on the safe side). You also need to restart Apache2 after changing this. If you make the mistake I did, then DROPing the DB and replacing the settings.php gets you back to square one.

    So yesterday I wound up spending a couple of hours or so getting the thing installed. Today it took me maybe 3 hours to get a handle on how to use it enough to get the site more or less how I want it to look. While Drupal core seems to work fine out of the box, I did hit a few bugs with plugin modules, e.g. no joy with XML Sitemap. Must admit I didn't spend long on trying to get such things working, opting for the disable and leave for later workaround. Most of the time was spent getting to know the navigation and where to find things (e.g. took me ages to discover that links are under menus, d'oh!). The only bit of handcoding I did was to tweak the CSS so the centre column was wide enough for a big image.

    So basically I reckon it's pretty much comparable to WordPress in terms of ease of setup/use. One nice feature which WordPress didn't have last time I looked is in-place updating of code, something that will hopefully help avoid the usual mess.

    So finally to check those semweb credentials. There are hints of typed nodes here and there, but then what's is it's RDFa publishing like? The front page contains these 6 triples: drupal.txt (extracted with the RDFa distiller). content:encoded seems to have grown up!


    danja
    2011-01-13T00:17:44+01:00
    sites drupal rdf
    Related
    Comments
    Edit

    del.icio.us bookmarks to RDF

    The blogosphere seems to think Yahoo! is going to axe del.icio.us so I've knocked together a quick Python script to get my data out - 2317 occasionally annotated bookmarks. To use: make sure you've got Python first (!), download and install BeatifulSoup (navigate to the dir with setup.py, run python setup.py install), download the script and rename it to souper.py, get your del.icio.us bookmarks and rename to delicious.html. Then run python souper.py delicious.html > delicious.ttl and there you have the Turtle.

    I've not checked the output particularly thoroughly, but I think it's ok (one shortcut I made was that any bookmarks that couldn't be converted to ASCII would get ignored). Here's my original bookmarks file and the same data in Turtle (7599 triples).

    The Twitterati seem to be moving en masse to Pinboard, which has a sign-up fee of $7.42 but seems to have got good reviews.


    danja
    2010-12-17T00:24:50+01:00
    script python turtle semweb rdf delicious tags
    Related
    Comments
    Edit

    Slow Data, Decentralization and Semantic Web Architecture

    [I've still got a bug in my blog software which mangles links, so apologies for the ironically unlinky URIs]

    Slow Food (http://en.wikipedia.org/wiki/Slow_Food) is an international movement founded to offer an alternative to fast food, "it strives to preserve traditional and regional cuisine and encourages farming of plants, seeds and livestock characteristic of the local ecosystem". By a little analogical legerdemain, fast data is the kind of stuff you get from regular search engines - quick but not very nutritious, probably bad for you. Slow Data on the other hand has been harvested with care and with attention paid to its preparation. It's far more satisfying in the long run. While complex Semantic Web systems are currently at a slight disadvantage performance-wise (largely due to their youth), there's no reason that high quality data can't be readibly accesible at high speed using existing, well-documented Web techniques. But I'll call it Slow Data anyhow.

    So...I recently got a letter (!) which included a description of a proposed social net application based around RDF data. The author knew what they were talking about and the system sounded good, but they were really struggling with one aspect, how to avoid making a centralised system.

    One of the great rallying cries of the Linked Data movement has been to open data out to the Web. I doubt very much that I've seen a presentation on the subject that hasn't referred to data silos, usually with a predictable image. This antipattern reaches its zenith in applications where the only interface to the data is a dedicated 'snowflake' API (so named because every one is unique), severely limiting the potential for Web-style interconnection (links). Behind the scenes the application implementation may be highly distributed, but all the user or developer can see is a walled garden with a gatekeeper. That's a lot of buzzwords in one paragraph, so I'd better move towards the point.

    How is an RDF triplestore any more open than a SQL-style database hooked up to the Web?
    It might sound heretical, but it isn't, or at least isn't necessarily. The only advantage it has is that by default it uses URIs as identifiers for things (corresponding to the keys in a SQL store) which if designed properly will be dereferenceable over HTTP, i.e. they will be links which can be followed to find out more about the named resources. But SQL-backed Web applications can expose links that can be followed, and many do. (The same goes for NoSQL stores). SPARQL is a query language that can be applied to a particular variety of graphs, but again in itself it isn't really any more webby than the triplestores it addresses. However there is the SPARQL Protocol for RDF (SPROT) http://www.w3.org/TR/rdf-sparql-protocol/ which allow things like a HTTP GET /sparql/?query=EncodedQuery and changes the whole ball game (you don't hear much mention of SPROT, I suppose because of the ugly name and a spec that's mostly WSDL stuff that everyone ignores).

    Hopefully everyone's familiar with Chapter 5 of Fielding's dissertation - http://www.ics.uci.edu/~fielding/pubs/dissertation/rest_arch_style.htm - so to cut the waffle I'll cherry-pick one heading: 5.1.4 Cache. If we imagine the Web (of Data) as one huge interlinked information space, then individual stores such as those associated with specific applications can be considered as caches of small chunks of the Web of data. This is probably easiest to conceive by contrasting two different pieces of software. For one let's have a social net app that lets people discover other people with similar interests. It will store data around resources of the type foaf:Person with properties such as foaf:interest and to leverage the social angle foaf:knows. A traditional app for this kind of thing would involve people signing up and entering information about themselves. But quite justifiably a person might say "I don't want to enter loads of stuff into a form in application Y when I already entered it in application X yesterday" (yes, this is the old Data Portability thing). But pause there and for a second piece of software let's have a generic link-follower and data aggregator, i.e. a crawler or bot, or as they're known in FOAF circles, a scutter. It's not difficult to make such things directed, so they only following specific link types of interest (check Slug http://ldodds.com/projects/slug/ - see also https://github.com/ldodds/slug). Let's make the storage system for this scutter a triplestore. Ok, set the scutter going on the Web at large with a plan to follow foaf:Person related links and slurp the data. Come back a few hours later, and you have an already-populated store to which you can plug in the social app, no need for people to sign up (in an ideal world, and ignoring privacy matters).

    Now the scutter plan for this (i.e. get people data) is pretty much isomorphic to a SPARQL query along the lines of:

    PREFIX foaf: <http://xmlns.com/foaf/0.1/>
    PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

    CONSTRUCT { ?s ?p ?o } WHERE {
    ?s rdf:type foaf:Person .
    ?s foaf:interest ?o .
    ?s foaf:knows ?o .
    ?s ?p ?o .
    }

    This is exactly the kind of query you'd also want to be asking in the social net app. Going through the scutter, you're asking the Web at large, but because the data has already been aggregated in your store, it doesn't take a thousand GET requests to find relevant statements. But the statements are exactly the same. In other words, an RDF store is just a cache of a small chunk of the Web of Data.

    For performance reasons this kind of cache would be selective in the data collected, so maybe strictly speaking the architecture is more like Uniform Pipe and Filter http://www.ics.uci.edu/~fielding/pubs/dissertation/net_arch_styles.htm#sec_3_2_2 with the uniformity essentially maintained by following the SPARQL and SPROT specs (and 5.1.6 Layered System is probably relevant too).

    This kind of thing is entirely implementable today, in fact the Semantic Web Client Library http://www4.wiwiss.fu-berlin.de/bizer/ng4j/semwebclient/ can do SPARQL queries on the Web at large (SELECT at least, not sure if it supports CONSTRUCT).

    There are other pieces of the Semantic Web toolkit that can be cleanly inserted into Web architecture (as one would hope, given that the Semantic Web is meant to be an extension of the existing Web). For example, a general-purpose WebID setup (FOAF+SSL http://esw.w3.org/WebID) could be inserted between client and server to handle authentication, acting as a proxy and/or gateway.

    Somewhere recently (I think in a paper by danbri and others) I saw discussion about what was needed to get from a Web of Linked Data to a more fully Semantic Web. In other words, even if you score 5 stars at http://lab.linkeddata.deri.ie/2010/star-scheme-by-example/ there might be more you can offer. I might have dreamt it, but I believe the discussion mentioned inference and reasoners. The thing is on the one hand we have lots of linked data already out there, and we already have pretty performant reasoners (e.g. http://clarkparsia.com/pellet/ ) but reasoning over Web-scale data is likely to remain a fantasy. That is, unless you imagine multiple reasoners acting as dedicated, fairly task-specific agents/services over their own manageable little batch of data. These again could be deployed as proxies. For example, another bit of FOAF jargon is smushing, which originally (when people were bnodes) meant the unification of data about a person based on the assumption that the person could be identified by means of their email address or homepage. Since it's more common now to use URIs to identify people (see http://dig.csail.mit.edu/breadcrumbs/node/71) I don't think it's unreasonable to extend the term to cover unification of multiple URIs for a person (typically with owl:sameAs links somewhere). Now going back to the triplestore of the app described above, that's only really interested in statements including the identified foaf:Person, foaf:interest and foaf:knows. There's nothing to stop this treating a person as two individuals if data has been pulled from sites which use their own person ID schemes. But if somewhere else on the Web at large there was a triplestore with reasoning capability that could eat person IDs, foaf:mbox and foaf:homepage data and spit out owl:sameAs statements, this could be used to unify the descriptions for the application. This triplestore could have a scutter as its input and a SPARQL endpoint to provide output, in other words being a uniform pipe kind of proxy.

    Ok, so effectively I'm arguing here that we already have all the bits from which we can glue together a Semantic Web that sits nicely with the Architecture of the World Wide Web http://www.w3.org/TR/webarch/ . But I do think there are at least two specific areas that need attention in the near future. One is in the increased use and optimization for named graphs, especially those of the order of only tens or hundreds of statements. I thought I had a good justification for this, but now my minds gone blank, so just call it a gut feeling. The other thing is in description of datasets - there's already some stuff around annotations and provenance etc, but I'm thinking more in terms of discovery and agents/services being able to advertise themselves to allow a client that's looking for some particular kind of data. the Vocabulary of Interlinked Datasets (voiD, http://vocab.deri.ie/void) is pretty good in this space, but I reckon we need to go a lot further, and have been mulling over a little quasi-protocol for matchmaking between datasets and agents. I'll post more on that once I've got something to talk about...

    There is a teeny bit of low-cost, potentially invaluable data that it'd be nice to see more of. Let's say a directed scutter has crawled the Web and has aggregated all statements of the form <http://example.org/fred> foaf:interest ?x. While ideally it will be placing the triples it's found into named graphs corresponding to the provenance, a more likely coding scenario (because the queries will get silly with thousands of FROMs - hmm, does SPARQL NG do anything about that?) would be to dump everything into a default graph. But, while the full provenance may not be retained is this setup, it can still be made available to consumers of the data if statements of the form <http://example.org/fred> rdfs:seeAlso <http://wherever.com/source/somedataaboutfred> are added to the store. Call it future-proofing.


    danja
    2010-12-14T11:08:17+01:00
    architecture arch semweb rdf
    Related
    Comments
    Edit

    SPARQL Results and HTML

    A thought in passing. When I've need to display SPARQL results in a browser I've generally either used some kind of programmatic templating (as in this blog) or XSLT on XML results - which can get clunky, but when the transformation is done, it's done. Results XML is straightforward (and I'm still rather fond of XML) but the choice of syntax is pretty arbitrary. The RDF that comes back from a CONSTRUCT is grand, that's a really nice kind of query, the data is immediately ready for reuse (it might an obsessive-compulsive thing, but DESCRIBE still feels a bit messy). I've not got around to playing with JSON results, presumably that lends itself to speedy application in most languages.

    But I can't help thinking it'd be neat if SPARQL results came back directly as RDFa so by default you had something that made sense both in a browser and to an RDF agent. Is there anything you can do with a SELECT that a CONSTRUCT-to-HTML couldn't do? Is there any way the stuff could be structured to simplify templating? There's at least one results XML to HTML XSLT around somewhere, I guess that could be tweaked for experimantation.


    danja
    2010-12-14T07:54:28+01:00
    rdfa html sparql rdf
    Related
    Comments
    Edit

    path slice grr

    the little javascript UI I'm using for typing blog posts seems to break URIs in its attempts to make them relative

    but they do work on the homepage, http://dannyayers.com, and I'm too tired to fix them now...


    danja
    2010-11-21T19:01:25+01:00
    blog rdf
    Related
    Comments
    Edit

    Once more unto the breach (again)

    For the first time in ages I've had a couple of days to sit down and look at code. A lot of it was stuff I hadn't finished, dating back a few years. The typical pattern was either getting distracted from the original aims and playing with the fun stuff or aiming to do so much that I never really got past square one. So this time around I've changed my mind, decided to keep the fun stuff (playing with Agents in Scala) separate from the main app work.

    The main app in mind here is the Semantic Web in a Box idea which I'm back to thinking about in a more minimal form, informed a lot by what Rob wrote on his blog - What people find hard about Linked Data - and the stuff in the Talis tutorial. Basically what I'm after is a very easy-to-use Linked Data editor/visualization tool, with support for some kind of pluggability (TBD). There are existing tools which can do this sort of stuff, but the key here is to keep things as simple as possible (and free and open source). Target users are total beginners and experienced folks that want to be able to knock simple stuff together quickly. There's really not a lot to this, and 'wait long by the river and implementation of your plans will float by' usually works, but no-one really seems to have got around to this thing.

    It'll be a Java/Swing desktop app with the following features:

    • Internal triplestore(s)
    • RDF editor with various views and syntax validation
    • SPARQL editor and results viewer
    • HTTP client (for examining remote resources, crawling and publishing to remote stores/services)
    • HTTP server (for simulating live data)
    • HTTP proxy (for examining headers etc)
    • Basic HTML editor/viewer


    What should also be possible is to run it headless, as a live service.

    Probably more than half the people that read this are likely to have such parts living in their codebases - Java Swing components, Jena, ARQ, and Apache HTTP libs cover an awful lot, the tricky part is wiring them all up in a useful way, with a UI that doesn't confuse.

    I've made a start on gathering together the bits, but I'm unlikely to get down to a good coding session for a while again, so what follows is really notes to self so I don't forget...

    So, RDF editor.

    Currently the main class is org.hyperdata.swing.rdftree.editor.RdfEditor

    One view is a resource-centered thing, based on a JTree backed by a Jena Model. Like everything else here, it's unfinished and very buggy (notably there's something like an out-by-one error on which row expands). But this should give the general idea, the paths should expand indefinitely :

    rdf tree table

    Right now it's only addressing the local model, but it should be reasonably straightforward to hook the HTTP client up to terminal node URIs to go and GET remote data (must check how Tabulator goes about that) and extending the drop-down paths.

    Text views for Turtle and RDF/XML (with crude highlighting from JEditorPanes):

    turtle editor

    xml editor

    I've only just started looking at a graph view (again!), separate from the stuff above - I just hacked at one of the JGraph demos, long way to go:

    The launcher for that is org.hyperdata.swing.graph.danja.GraphEditor


    graph view

    I've stuck the code over here:

    source, wiki etc.


    danja
    2010-11-21T18:47:57+01:00
    swib linkeddata semweb rdf
    Related
    Comments
    Edit

    Piano Piano

    Where I'm staying at the moment I don't have much time to get on the computer, and net access is really lousy. But I've had a lot of chance to think about stuff that I want to do, and have realised that I can feed a few birds with one bean. The blog engine (this) I've been writing in Scala is approaching the basic level of functionality I wanted, so I'm looking again at a couple of old ideas.

    The first is Semantic Web in a Box (new name needed!), the second an agent-based engine that will support scripting (I did a lightning talk about that at one of the SFSW meetups, must see if I can find the slides). Given that Scala actors are perfect for constructing the kind of agents I have in mind, as well as offering a nice way of doing the SemWeb in a Box stuff, I reckon I'll wrap it all together into one project. And the first application built with this setup can be a refactoring of my blog engine...

    Many of the agents probably won't have all these features, but the stereotypical agent I want, a SemWebAgent, will have the following traits:

    • named with a URI
    • access from a HTTP server
    • access to a HTTP client
    • triplestore


    + some code that'll actually do something useful

    Looking from outside, the things will look like regular Web-accessible resources, and can call/be called by external (RESTful) clients/services etc. Internally, if a particular named resource lies within the same VM then more direct messaging is possible. For scripting (when I get around to it), I've got Jython and Rhino (or equivalents) in mind. To support the pluggability of SemWeb in a Box, I'll go for OSGI, probably using Felix as the container.

    I've started coding up the core actor stuff, which I will fill with unit tests as well - being new to Scala I'll no doubt make a lot of mistakes. I'm also putting together some functional tests for the blog engine, which I'll refactor to use this system. I'm already using a tiny bit of Apache Clerezza (for jax-rs handling handling of HTTP calls), I believe there'll be quite a lot more I can cherry-pick.


    danja
    2010-10-10T10:46:05+01:00
    box clerezza gradino semweb rdf
    Related
    Comments
    Edit

    Per-Tag Feeds

    I've just added a quick feature here so that if you go to a URI of the form /feed/tag/{TAG} it will produce an RSS 1.0 (RDF) feed for that tag. So hopefully /feed/tag/rdf will now be everything tagged "rdf".

    PS. Silly me mistyped the above, so went back and started coding up item editing (as yet unimplemented)...then realised I'd already set things up so that if I post something with the same title on the same day it will already overwrite the previous entry (all the triples hanging off that URI). Heh.


    danja
    2010-09-27T15:23:45+01:00
    code gradino rdf tags
    Related
    Comments
    Edit

    Slides from KRDB 2010

    A week or so ago I was up north in Brixen-Bressanone (definitely "a charming town") at the 3rd KRDB school on Trends in the Web of Data. The programme was exceptionally well contrived, IMHO, seriously apposite for what's going on in the Web of Data. In between beers (don't worry, I am sorting that one out) I did the opening session. My initial brief was (I think) "Semantic Web Platforms". Now I could happily have done the obligatory semweb intro and led into material about the Talis Platform (which is still as far as I know the only one I'd consider a true semweb platform, being provided in a Software as a Service manner via HTTP). But Tom was down to talk about Linked Data (slides) and Martin about the GoodRelations ontology (slides), so I assumed that between them most of those bases would be covered.

    In many real senses the Semantic Web is already a done deal, so all this conspired to give me chance to look at the notion of a platform in general. Naturally I consider the Web of Data to be the key enabler right now, but when it comes to choices on how to use it and application strategies, there I reckon it's worth looking at analogeous systems. So I refactored my title to "Platforms and the Semantic Web" and basically spent 2 hours rambling about my hobbies...

    Slides on slideshare and pdf.

    Many thanks to Enrico, Anja et al for the opportunity. I did stay to poke my nose into the SWAP 2010 goings-on, so caught up with quite a few old faces and met a bunch more new ones. Even made it home in one piece.


    danja
    2010-09-27T08:44:16+01:00
    bressanone krdb semweb rdf slides
    Related
    Comments
    Edit

    HTML in Turtle

    Because of the graph structure behind the scenes, pretty much any data can be expressed in the RDF model and hence in an RDF syntax, although it might get a bit nonsensical when it comes to interpreting the triples. Here is a case in point. There was definitely some sensible discussion of Atom syntax being RDF/XML (but the handful of extra attributes needed were considered too much overhead). But also I vaguely remember (or maybe imagine) HTML/RDF mappings being done pre-Turtle. It just crossed my mind, couldn't resist having a go.

    So here's an example:

    @prefix : <http://example.org/html9/> .

    <http://example.org/hello> a :html ;
    :head [ :title "A Page" ] ;
    :body [
    :h1 "My Page" ;
    :p "Hello World!"
    ] .

    The placing of the bnodes is a bit arbitrary, but I rather like the idea of a resource being a HTML. I believe this corresponds to the RDF/XML:

    <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns="http://example.org/html9/">
    <html rdf:about="http://example.org/hello">
    <head rdf:parseType="Resource">
    <title>A Page</title>
    </head>
    <body rdf:parseType="Resource">
    <h1>My Page</h1>
    <p>Hello World!</p>
    </body>
    </html>
    </rdf:RDF>

    Hmm, actually it seems quite sensible at this level of nesting, not all that far from Reto's DiscoBits idea. In fact those bnodes could usefully be swapped for # URIs. But I'd prefer not to think how it gets with e.g. a load of nested <div>s.

    Dunno, I could imagine an advanced (RDF-friendly!) Wiki syntax looking something like that Turtle.


    danja
    2010-08-14T16:59:04+01:00
    ideas wikis rdf
    Related
    Comments
    Edit