Search plus Your World - fool's gold

For quite a while I've held the view that most current approaches to Web search are fundamentally flawed, because the best way to find something is not to lose it in the first place. But as the companies invested in search gradually get smarter in their use of person- and (to a lesser extent) thing-oriented data, rather than just word association (football) search results seem increasingly more focused. Google's approach in particular has grown increasingly like the model put forward in the Semantic Web initiative. Recently with G+ we see a big push to capture and exploit data associated with personal profiles (the FOAF domain) and brands (the GoodRelations domain, although maybe there's a role for an additional brand- rather than product-oriented vocab). With Rich Snippets and Schema.org there's a direct use of semweb technology (in a slightly mangled form - One True Ontology is a well-known antipattern to anyone that bothers to look at the literature).

In fact the "Your World" part of Search plus Your World (SPYW) can be seen as a reinvention of the most important part of Semantic Web technology, that of giving everything of significance a URL: people, places, things, concepts. Given that, you can start describing and leveraging relationships between those resources. To use a phrase I think originated around microformats, it's lower-case semantic web. Ok, behind the quality glitz of G+ profiles and pages this seems to have been done in a rather sloppy, ad hoc fashion, but that in itself is fine - whatever it takes. But where Google get it very wrong is by putting themselves at the heart of their system. Not only is semantic in lower-case, so is web. If you do a search with SPYW enabled, you're pointed straight back into the Google Empire. They are making themselves gatekeepers of the Web. Although there aren't any concrete entry barriers to this walled garden, by only signposting Google's footpaths in search results it's creating a system with the same characteristics as say AOL around 2000. From Google search being a vital accessory on the open Web, it's increasingly becoming a portal.

There is already a visible cost in practice to Google's echo chamber - if you want to re-find something one of your colleagues said the other day, sure SPYW is helpful. But if you're trying to do some original research, you don't want to be searching with Your World blinkers on - an engine without those preconceptions such as DuckDuckGo will be more useful

This strategy I'd assert is doomed to failure for the same reason AOL's walled garden collapsed, to use another phrase I like to repeat, because no matter how big any single entity becomes, the rest of the Web will always be bigger. The focus on the user/Don't Be Evil thing is absolutely right to highlight the value of non-Google resources, although it does fall short by suggesting that the rest of the Web is just a handful of other companies [G+ link] i.e. Twitter, Facebook etc. Google's own long-term survival as a market leader is absolutely dependent on their respect of the Web at large.

So what should Google do? Re-read Steve Yegge's awesome rant [G+ link] for starters. Especially the bits about Platforms. G+ and Your World should be considered in this context - as a semantic (any case) Web (upper case) Platform. For example, while Google's pages appear to be aimed at providing the canonical URLs for concepts (...lower-case). But there's already an excellent source of such URLs : Wikipedia. In itself Wikipedia only provides URLs of documents who's primary topic is the thing in question, but dbPedia is a well-established mapping based on best practices from thing identifiers to Wikipedia pages (e.g. <http://dbpedia.org/resource/Berlin> foaf:isPrimaryTopicOf <http://en.wikipedia.org/wiki/Berlin> . ). If a handful of students from obscure north-European universities (heh, sorry, just for the sake of contrast), with a little community support can create and maintain - give the world - a service supporting all the concepts/things covered by Wikipedia, imagine what the mighty Google could achieve...

To give a little example in the context of Personal Profiles, if I publish my definitive personal profile on my own domain (note Google already understands all the elements of this) then for queries for which "me" is the appropriate response, that page should be the first hit, not my G+ profile.

Another factor in the walled nature of G+ is the limited API. I'm sure features will be added to this in the near future, but I hope (probably unrealistically) they will use proper standards and follow known best practices. Going further into over-optimistic territory, I'll quote Tom Gruber (in an interview talking about how Siri works) :

A site that exposes RDF usually has an API that is easy to deal with, which makes our life easier. For instance, we use geonames.org as one of our geospatial information sources. It is a full-on Semantic Web endpoint, and that makes it easy to deal with. The more the API declares its data model, the more automated we can make our coupling to it.

What should we (as users and components of the Web) do? Well, basically what we're already doing...but trying not to be distracted by shiny things and keeping an eye on the long term - standards are good. When we publish data on the Web we need to consider the quality of the data first (i.e. make it 5 Star), seeing it as purely Google-fodder is missing the point.

Comments please [Google+ link, the irony is not lost on me :)]


danja
2012-01-28T12:59:52+01:00
google semweb rdf spyw
Related
Comments
Edit

Dart H. Vader

I just heard about Dart (via Seth Ladd and Edd), a new Web programming language from Google. It aims to fulfil the role Javascript currently has, only doing it better. On the pro side, new languages are inherently cool, and Javascript can be a real pain. On the con side it seems unlikely that any browsers other than Chrome will support it in the foreseeable future, except potentially via translation to Javascript, i.e. This Page Best Viewed with Chrome

It's hard not to see echoes of the old Microsoft arrogantly pushing it's own product here (remember VBScript?), although Google have in recent years made NIH an artform. But who cares about politics, how's this going to affect the Web?

Well, Code-on-Demand does appear in Fielding's thesis (slightly bizarrely as an 'optional constraint') and has been around since the early days. Pluggable clients are certainly a good idea, and Google have been leaders in moving Rich Internet Applications as opaque desktop apps into the browser using Javascript. The apps are still pretty opaque (View Source on gmail if you doubt that) but they do at least more-or-less run cross-browser.

I've not read much of the Dart docs yet, not tried it at all, but first impressions are that it's a nice clean syntax not unlike JS (or for that matter Java, C# or Python...) and they've already got a good bunch of libs together (even if they do include RPC, yuck!).

As an aside, it should be noted that there's a cost to the standardization of today's browser as Web client (in the process of being defined via HTML5 and associated APIs). It does mean an effective monoculture of HTTP clients. Arguably you can write whatever kind of client you like (probably in Javascript) and host it inside a browser, but they have been optimized for a fairly specific app scope. If you stray from the general model of a Web of HTML Documents you're in for an uphill journey. The arbitrary desktop client has more freedom to use HTTP more creatively, but then there won't be one on everyone's desktop. (Personally I like the notion of Web agents (where an agent = client + server + persistence + code) as an abstraction for Web components, as in "Two Webs!" [pdf - heh]. I wonder, is there a HTTP server in Dart yet?)

Looking at the "Leaked internal dart email" (as with UK politics, it's probably sensible to take the "Leaked" aspect with a pinch of salt), there does seem to be some motivation for Dart coming in response to the success of iOS. I'm pretty sure a new language isn't the best response to this, but it certainly makes a change to the usual big proprietary Flash/Silverlight kind of issues. Google are still talking of evolving Javascript, but it does raise the question of what Dart will offer that couldn't be achieved using JS. Optional typing is the feature they seem to be plugging most. So I wondered if anyone had worked on adding static types to JS. Funnily enough, the first few hits refer to iOS. Oh dear, we're really not talking iOS envy, are we?

It's a little surprising that Google haven't thrown their expertise at the JS-is-a-mess issue previously, I don't see a groundbreaking dev tool and pattern library out there (funnily enough the Dart Editor is based on Eclipse, which does seem a bit un-groundbreaking (although I'm not criticising the choice, Eclipse is my main IDE)).

Whatever, it should be interesting to watch how this pans out. Dart will almost certainly be a very cool language, albeit engendering ambivalence everywhere outside Google. Give me a shout when it includes libs for non-HTML Web languages (i.e. gimmee RDF :)

Comments (G+)


danja
2012-01-06T20:48:18+01:00
google language programming dart rdf
Related
Comments
Edit

Plan B - RDF for fun and profit

Last night, after finding out that part of the G+ API had gone public I skimmed their docs and the docs of some of the specs they draw on: Portable Contacts, Activity Streams and OAuth 2.0. Of course it's great that G+ is exposing an API, and great that they're drawing on existing standards. But after looking at those standards I came away shaking my head, feeling rather discouraged. Again and again they contain data expressed use JSON mappings like "kind": "plus#person" (G+ API) and "objectType" : "person" (Activity Streams) and "" (Portable Contacts assumes that if you've got data you're looking at contacts). Aside from the variation in the naming across these, there's a common theme, the assumption that a simple token (like "person") is adequate for definition of something on the Web. How do you know that their definition of "person" is compatible with your system's definition of "person"? Sure, there are the spec docs to back them up, but how do you get from the data to the spec docs? Ok, there's openness in the publication and dev of these specs and standardization to the extent that they're high-profile enough that vendors like Google will see them and adopt them. But in their technical detail they have more in common with pre-Web, offline proprietary formats - "person" means person because we say so, and everybody knows what we mean.

Digging a bit deeper there's reference to the Discovery Protocol Stack which draws on XRD (the OASIS spec for describing resources) and Web Linking (RFC 5988 for defining typed links). Here there's more of an attempt to make the stuff Web-friendly, entities (resources) and relations (links) are identified with URLs so Web-based discovery of further information is in principle possible. But the "One True Ontology" registry-based approach of Web Linking is questionable in a distributed environment (and comparable to schema.org).

The description of things using schema like "kind": "plus#person" looks like what RDF does, except rather than using a Web-based approach to naming (so you could derive a URL from "plus#person", look it up and find out what it means) instead we see ad hoc token-based naming schemes. With Web Linking we have something that corresponds exactly with RDF properties (they are typed links), and if you can look things up in a registry then that's a step in the right direction. We already use registries to decode the meaning of terms in other major vocabularies - e.g. the HTTP media types through which HTML is delivered lead you to the definitions of terms like "strong" in the relevant specs. But is a registry appropriate for every term we're ever going to use? Does a word like "strong" only have one meaning?

Ok, so far there's a phrase which sums up all this: Cargo Cult RDF

But the theory is that grassroots, use case-driven development will tend to create cowpaths in the environmnent, and all standards orgs have to do is pave these. Except it doesn't seem to quite work that way. On the one hand we have the XKCD Standards effect (check the first paragraph on the Portable Contacts page), on the other hand the simple fact that, even with the best will in the world and with good information, people often get things wrong. Take for example:

OAuth [1.0] aims to unify the experience and implementation of delegated web service authentication into a single, community-driven protocol.

[time passes]

OAuth 2.0 is a completely new protocol and is not backwards compatible with previous versions....As more sites started using OAuth, especially Twitter, developers realized that the single flow offered by OAuth was very limited and often produced poor user experiences...OAuth 1.0 was largely based on two existing proprietary protocols: Flickr’s API Auth and Google’s AuthSub. The result represented the best solution based on actual implementation experience. (Introducing OAuth 2.0)

So...even when good, informed standardization is aimed for, flawed technologies built with flawed processes are unavoidable.

But these things are so popular! Vendors and developers can't get enough of this kind of stuff. It's a continuous stream: XML APIs become JSON APIs, microformats become microdata, but the same patterns are repeated again and again.

Years of these developments passing RDF by. Plan A : The Semantic Web still seems as far in the future as it did 5, 10 years ago. The RDF technologies demonstrably work, and adoption is growing, but it's hardly viral. However you look at it, the world of trendy new specs repeatedly steers around that fact. What's a jaded RDF enthusiast to do? Here's what I recommend:

Exploit the situation!

With a continuous flow of different specs that each covers some little part of data on the Web, focusing on any specific development can only work in the short term. A strategy based on technologies that support flexibility and agility, using known best practices of the truly distributed Web is the best option in the long term, so that systems can be rapidly adapted to meet any new requirements. It doesn't matter that e.g. schema.org misses the point, the data is still useful. "Think globally, act locally" is a great expression - in this context it could mean accept whatever the world of Web 2.0+ has to offer, but handle it on your own terms.

In practice, let's say you're developing a system for a particular vertical market: dog leads (I'm getting serious hints as I type). Don't build the system from scratch based on what people in the dog lead market are doing, don't tie yourself to domain-specific schema or protocols. Wherever possible use commodity, off-the-shelf tools. Then if dog leads take a nose dive on the international market you can regroup with a different target - cowbells for cats - using the same tools, and same skill set. The only parts that need change are at the edges. Basically RDF technologies offer a long-term commercial advantage.

Comments to G+ please.


danja
2011-09-16T14:31:52+01:00
google streams contacts rant federated web semantic semweb activity rdf portable
Related
Comments
Edit