DISQUS

Mathew's comments: Google and the end of everything

  • Simon Cast · 1 year ago
    Correlation means very little on very large data sets causation or otherwise. On really big data sets the probability that data will correlate while having no link whatsoever increases. With a large enough data set the probability of random bits of data correlating is almost a certainty.
  • ianbetteridge · 1 year ago
    Exactly. And that raises the question "How do you decide which algorithm to use?" Anderson seems to have no answer to that.
  • Simon Cast · 1 year ago
    Flippant answer is "know what you are doing" :) I've not looked into it for a while but there are a lot of techniques in more esoteric math that will probably find usage here. I think branches such as topography and set theory could yield some interesting results.

    I expect we'll see lots of bloggers trotting out their high school math without really understanding the limitations of the techniques.
  • mathewi · 1 year ago
    Excellent point, Simon. A nice example of why huge amounts of data might actually make it *harder* to apply Anderson's theory in some ways, rather than easier.
  • ianbetteridge · 1 year ago
    Part of the problem with Anderson's theory is that there are many, many different algorithms which work for any given set of data - and without causality, you end up with no reasonable method of determining which of the many algorithms which compete is the best-possible (and thus most-likely) answer.
  • David · 1 year ago
    Haha, wow, I had not seen that Chris Anderson piece. Totally agree with your take on it. It's like those scientists from the late 1800s who were pretty sure that basically all scientific knowledge had been discovered, and everything left was just cleanup, filling in a few blanks.

    These "the end of X" proclamations are wrong so much of the time it's basically useless to make them. Does Chris Anderson really want to make the same argument that so many Doomsday cults for the last 2000 years have been making? It's so over the top that I suspect he sensationalized it on purpose to garner attention.
  • ianbetteridge · 1 year ago
    Well, they sell books, don't they? :)
  • Alistair Croll · 1 year ago
    Matthew -- I think you have a point. Even if the technology finds every correlation, we still need science to prove causality.

    I started to write an increasingly long comment here after reading this, then went and stuck it at http://www.bitcurrent.com/does-big-search-chang... instead.

    Besides, this way I got to post a Google LOLcat ad.

    Thanks for getting me thinking!
  • mathewi · 1 year ago
    Thanks, Alistair -- good post.
  • JoeDuck · 1 year ago
    A great big think item Matt, and unless he qualifies his idea more I'm with you on this one.

    However it seemed to me he's making a more reasonable and subtle point than a wrong suggestion that correlation=causation.

    Generally science bases descriptions of behavior or biology or other phenomena on data *samples*. As the sample size approaches 100% our models become closer to the full reality rather than just a model of that reality. I don't agree that we are anywhere near the point of having enough data to do much more than target ads a little better, but in the areas where we have huge data sets I think we will start to find that Google analysis may be able to predict and describe things better than any previous models.

    Far more significant will be conscious computing, which is likely to change the game for everything and everybody almost as soon as that Genie's out of the bottle - probably in about 15 years.