Small Data

I'd just like to plant a little flag in the sand. Big Data seems to be the flavour of the month (and is undeniably extremely useful and interesting), but I've a gut feeling that might be symptomatic of not seeing the wood for the trees (or maybe vice versa).

I've not thought this through much, but surely any trends/correlations/relationships that are important enough to be of interest should be detectable without having to build a terabyte+ store? Rather that trying to capture as much raw data as possible up front, I suspect a more productive approach long-term will be to work with (maybe federated) crawler farms, with lots and lots of algorithms running in parallel over what they see. If there are appropriate training feedback loops in place, the shape of algorithms themselves could be treated as the results of the analysis.

It could be argued that once you have accumulated a corpus of raw data you can subsequently throw whatever you like at it without having to get the raw data again. But that corpus will never be complete or truly fresh - as new data appears on the Web all the time. More critically, under normal circustances you can never be sure you've got a dataset that contains a good sample representation covering whatever unknowns you're exploring. But crawlers can be directed to favour slices of the Web that contain information relevant to your hypotheses.

So, in the context of the Web, the Web itself should be the only big data needed. Which gives a neat parallel in the other sciences: reality itself is the only database you'll ever need :)

Ok, in the same way that Big Sites (like Wikipedia/dbPedia) adds big value to the Web alongside lots of small pieces, loosely joined, the same no doubt goes for Big Data. But let's not forget the vice versa, a complementary Small Data approach.

Somewhat orthogonal to this, one way in which the Web is a game changer for data is that here the relationship between pieces of data (/documents) is at least as significant as those pieces of data stacked on top of each other. Link Rank is a special case, an aggregated, flattened view of link value. If topics and entities (i.e. thing in general, people, places, concepts etc) and their interrelationships are inferred and/or explicitly named, it should expose some interesting facets of how human knowledge works.

Comment to G+ please.


danja
2012-01-30T10:04:06+01:00
algorithms federated ai science rdf data
Related
Comments
Edit

A Role Model of Consciousness

Past few weeks I've been on pause, my head not working properly. Finally got around to seeing doctor yesterday, now waiting for antidepressants to take effect. I haven't totally wasted my disconnected time, watched a lot of stuff. Including a Midsomer, a couple of Bargain Hunts and a geeky-great vid on poker bots (have I said I really like Berlin? This is a Chaos Communication Camp production, wonderful material). Simulating an actual poker player is really hard, but it got me thinking about the similarly hard problem of what consciousness is, appropriately mental for my state of mind.

Caveat, I'm not up to date on theories in psychology or even AI. Last big thing I read anywhere near this was a lay-reader book I think with "Intelligence" in the title, about what humans are really good at is predicting the future - pretty good hypothesis IMHO. Maybe someone can enlighten me about current thought (I'll cc Planet RDF). But the thing that has been on my mind is more old-school, the internal model bit I think was popular around the 17th century, gone downhill since. Although it may well be rubbish as human stuff, something makes me imagine it might be worth thinking about for machine stuff. I really like the agent metaphor.

Ok, generation 0, we have an agent (A) in a universe (U), and it just sits there. It's a rock. It's surrounded by other agents (which might also be rocks).

a blob in a universe

Generation 1, we have an agent capable of interacting with the environment, but its interactions are pretty minimal, starting somewhere around a pebble on a beach that has a wander with each tide up to a living creature that has built-in stimulus-response maps along with learnt ones. Kinda Behaviourist. I'm starting with the pebble because interaction with the environment can take a lot of forms, and there's quite a history from at least the Neolithic of generally anthropomorphic agency views of facets of the environment (weather etc) through the Bronze Age deities up to the modern-day religious mythologies.

a blob interacting with environment

Generation 2 we approach the Enlightenment and/or Smalltalk. The agent in question has an internal model of the universe containing the agents outside.

a blob with an internal model

On generation 3 we come to the bit that I'll call novel until someone points to an 18th century philosopher who already suggested this. The agent in question has had all its sensors and actuators geared up to the outside world for a while, as well as sensors (and actuators) connected internally. By the mechanisms of Intelligent Design, Natural Selection and copy, paste and tweak a bit, it notices parallels between interactions with the external agents and interactions with itself. It develops a sense of self as another model very similar to the models it has for external agents. Here's the novelty - first the agent becomes aware of external agencies, only then by analogy it becomes aware of itself.

a blob including a model of itself

Like all the great (as in most entertaining) theories this is of course unverifiable. But I like the notion that the local stuff only appears after some level of comprehension of the remote stuff, feels like it might be useful somehow.

Comments to the big G+


danja
2011-10-15T20:59:10+01:00
mind intelligence psychology federated ai mad model rdf
Related
Comments
Edit