Raw Data Now

First, a video:

I watch just about every TED video that’s released, but this one really struck me. One of the most important things about the web is the way it promotes the increase of wealth by sharing information. The typical lack of fees for most content also means that the wealth generated (as the information spreads) effects everyone, rich and poor, alike. But the web is made to be human-readable. Each page is tailored by human hands for human eyes, and so while a good amount of information is being shared to the people around the world, teaching a machine to learn from the web is quite difficult. Indeed, breakthroughs in this sector have the potential to revolutionize the way we interact with our computers.

On the human side the result means that I can get meaningful results when I search for factual questions and I can clarify who or what I’m searching about without adding additional query items. On the machine side, this means that more projects like gap-minder can aggregate meaningful data. In collecting all of human knowledge into a machine-readable format we also empower proto-artificial intelligence to perceive the world by translating it into easily digestible information.

When I get to thinking about the possibilities I am overwhelmed by the wealth of knowledge that, as Mr. Berners-Lee puts it, is “unlocked” by this course of action. In doing this, we will make ourselves even richer, and by “we” I mean the entire species.

This entry was posted in Thoughts and tagged , , . Bookmark the permalink. Post a comment or leave a trackback: Trackback URL.

6 Comments

  1. Mike MacLeod
    Posted March 22, 2009 at 1:15 am | Permalink

    You might like this approach:

    http://artificial-intuition.com/index.html

  2. Rose of Montague
    Posted April 6, 2009 at 3:22 pm | Permalink

    Dude. You’re teaching the machines to read. They’ll destroy us all!

  3. Mike MacLeod
    Posted August 2, 2009 at 1:54 am | Permalink

    I would like to think that open source and copyright can coexist, but maybe the former will drive out the latter. I like Clay Shirky’s “ring” model where proprietary apps grow up around a doughnut hole of open source standards, like WWW IP protocols. This is all in its infancy, so I guess we can expect accelerating change into the foreseeable future.

    I draw a distinction between data and information, and I would propose that the net is full of data that does not become information until I discover a purpose for it, like collapsing the wave/particle state vector by observation. Human interaction with data adds value to it, admittedly a Lockean POV. But that opens a huge box of worms. Is the number 7 data or information? Yes. How about Pi? Yes. Planck time? Yes.

    About Syntience: the AN model (as currently constituted) does not build a generalized intelligence, only canned expertise. Generalization is at least an order of magnitude harder, and consciousness much more difficult again – though I have some ideas about how to do it.

  4. Posted August 2, 2009 at 9:49 am | Permalink

    I don’t doubt that copyright will thrive in the near future. After all, cloud computing (which I expect to grow) is typically built in a closed-source way.

    Forgive me, but I don’t see your point regarding the distinction between data and information. If something is both data and information, why distinguish them?

    You’ve also lost me on Syntience. What is AN? I don’t think we’ve been talking about Artificial General Intelligence (AGI) at all at this point. My post on the intelligence explosion is about that, but here I think we’ve been talking about the value of raw data for the purposes of harvesting by weak AI, like Googlebots.

  5. Mike MacLeod
    Posted August 2, 2009 at 4:02 pm | Permalink

    Sorry, I was rambling. I’m not a methodical thinker, I just “think things”, like the Joker in Batman Begins just “does things”. Take what you can use and leave the rest.

    Syntience is a company using a new (very proprietary) approach to AI they call “artificial intuition” (AN). Monica Anderson, its founder, says:

    “Most humans have not been taught logical thinking, but most humans are still intelligent. Most of our daily actions such as walking, talking, and understanding the world are based on Intuition, not Logic.

    I will attempt to show that it is implausible that the brain should be based on Logic. I believe Intelligence emerges from millions of nested micro-intuitions, and that true Artificial Intelligence requires Artificial Intuition.”

    The connection to your comment above about machine analysis of human-readable data, is that Syntience is focusing on natural-language processing because its peculiarities lend itself to solution through the AN paradigm. And because a solution is worth hundreds of billions of dollars to search engine companies alone.

  6. Posted August 2, 2009 at 7:05 pm | Permalink

    Oh I see! I didn’t connect that post with your first one.

    Natural language processing and the intuiting of information from raw webpages is all well and good, but it’s still a ways off and even a perfect AI (with intuition, and logic), or a human, for that matter, would likely have an easier time if the data was presented in a raw state.

    The freeing of proprietary data can happen in parallel with natural language parsing, and I think it would be good if it did. (Wolfram alpha, for instance, could easily benefit from both.)

    Interestingly, government data is one of the best examples of where this concept can be applied, and two months after I posted this, Data.gov launched. Yay!

    Thanks for your comments, Mike.

Post a Comment

Your email is never published nor shared. Required fields are marked *

*
*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>