Sunday, March 25, 2007

Artificial Artificial Intelligence: supplementing machine intelligence with human intelligence

There is an article in The New York Times by Jason Pontin entitled "Artificial Intelligence, With Help From the Humans" about efforts to use web-based human workers to supplement artificial intelligence (AI) efforts. Noting that "Things that humans do with little conscious thought, such as recognizing patterns or meanings in images, language or concepts, only baffle the machines", the article goes on to tell us that:

The problem has prompted a spooky, but elegant, business idea: why not use the Web to create marketplaces of willing human beings who will perform the tasks that computers cannot? Jeff Bezos, the chief executive of Amazon.com, has created Amazon Mechanical Turk, an online service involving human workers, and he has also personally invested in a human-assisted search company called ChaCha. Mr. Bezos describes the phenomenon very prettily, calling it "artificial artificial intelligence."

The articles examines some of the practical and human issues with such "work", but this is still a wide open area with at least some potential. The technique is not without its problems and limitations:

THERE have been two common objections to artificial artificial intelligence. The first, confirmed by my own experiences searching on ChaCha, is that the networks are no more intelligent than their smartest members. Katharine Mieszkowski, writing last year on Salon.com, raised the second, more serious criticism. She saw Mechanical Turk as a kind of virtual sweatshop. "There is something a little disturbing about a billionaire like Bezos dreaming up new ways to get ordinary folk to do work for him for pennies," she wrote.

My personal view is that I am not a supporter of the "pennies" economic model. At a minimum, the prevailing federal minimum wage should be used. Encouraging servitude should not be considered socially or economically acceptable. AAI should not be built upon a financial foundation that relegates real people to being simply "slaves to the machines."

-- Jack Krupansky

Wednesday, March 14, 2007

Danny Hillis has an addendum to "Arisotle: The Knowledge Web"

Danny Hillis has published an addendum to his May 2004 essay on Edge that was entitled "Arisotle: The Knowledge Web" which summarizes his efforts at his new company Metaweb to produce a freely accessible universal knowledge database:

In retrospect the key idea in the "Aristotle" essay was this: if humans could contribute their knowledge to a database that could be read by computers, then the computers could present that knowledge to humans in the time, place and format that would be most useful to them.  The missing link to make the idea work was a universal database containing all human knowledge, represented in a form that could be accessed, filtered and interpreted by computers.

One might reasonably ask: Why isn't that database the Wikipedia or even the World Wide Web? The answer is that these depositories of knowledge are designed to be read directly by humans, not interpreted by computers. They confound the presentation of information with the information itself. The crucial difference of the knowledge web is that the information is represented in the database, while the presentation is generated dynamically. Like Neal Stephenson's storybook, the information is filtered, selected and presented according to the specific needs of the viewer.

John, Robert and I started a project,  then a company, to build that computer-readable database. How successful we will be is yet to be determined, but we are really trying to build it:  a universal database for representing any knowledge that anyone is willing to share. We call the company Metaweb, and the free database, Freebase.com. Of course it has none of the artificial intelligence described in the essay, but it is a database in which each topic is connected to other topics by links that describe their relationship. It is built so that computers can navigate and present it to humans. Still very primitive, a far cry from Neal Stephenson's magical storybook, it is a step, I hope, in the right direction.

The original Aristotle essay is well worth reading.

There is a related article in The New York Times by John Markoff entitled "Start-Up Aims for Database to Automate Web Searching" about Hillis, Metaweb, and Freebase.

My own ruminations on the concept of a Knowledge Web can be found in my white paper entitled The Consumer-Centric Knowledge Web - A Vision of Consumer Applications of Software Agent Technology - Enabling Consumer-Centric Knowledge-Based Computing.

-- Jack Krupansky

Monday, March 05, 2007

Spatial agents

I'm not sure precisely what the term spatial agent refers to, but I ran across it and it sounds worth investigating.

I suspect it relates to cellular automata, but it may have broader applications and analytic uses.

-- Jack Krupansky

Sunday, March 04, 2007

Renaming DVPC to DVPDS - proposal for Distributed Virtual Personal Data Storage

A couple of years ago I came up with a proposal for something I called a Distributed Virtual Personal Computer (DVPC), which was an attempt to abstract the user's data from their personal computer and have that virtual data live on the Internet, with the local storage of the PC simply being a cache of the distributed, virtual data. I have decided to rename the concept to Distributed Virtual Personal Data Storage (DVPDS) to focus the emphasis on the user's data as distinct from the computing or processing capabilities of their PC.

I don't intend to pursue implementation of the DVPDS concept at this time, but I do want this proposal to be available so that others may contemplate the incorporation of its features into computing infrastructures that they may implement in the coming years.

Here is the preamble for the new DVPDS proposal:

This proposal for Distributed Virtual Personal Data Storage (DVPDS) supersedes my previous proposal for a Distributed Virtual Personal Computer (DVPC). DVPDS includes all of the concepts of my previous DVPC proposal, but simply changes the name to emphasize the focus on the data storage aspects of a personal computer (PC) as distinct from the computing or processing capabilities of a PC. In particular, it abstracts the user's personal data to give it a virtual form distinct from the actual storage used to store that virtual data.

The intention remains that all of a user's data would live in a distributed, virtual form on the Internet, and that the user's device (PC or phone or other computing device) merely caches the distributed, virtual data. The intention is that the user gets all of the performance and other benefits of local mass storage, with none of the downside, such as need for backup, anxiety caused by lost or mangled data, inconvenience of access from other machines, difficulty of managing archives, etc.

The intention is not that the user would "work on the Web", but to continue to emphasize higher productivity through rich client devices with instantaneous data access and full control of that data. In practice, users will frequently or usually work directly on the Web, but occasionally or sometimes frequently or for extended stretches of time they may work disconnected from the Internet, all seamlessly and with no loss of the positive aspects of the user experience.

With regard to the requirements for being distributed, the emphasis is on maximum diversity so that users can be guaranteed that their data will be both readily accessible and protected from loss due to even the most extreme of contingencies. Degrees of diversity include vendor, geography, communications backbone, and offline, so that neither human error, fire, flood, earthquake, explosion, vendor financial difficulties, sabotage, theft, or legal disagreements, can cause any of a user's data to become inaccessible for more than a shortest period of time. A particular emphasis is placed on avoiding vendor-specific solutions. Vendor "lock-in" is unacceptable.

One area that needs attention since my original proposal is the more-demanding storage requirements for media such as music, video, podcasts, and movies, as well as intellectual property issues such as DRM.

This proposal is in the public domain. It may be copied and modified -- provided that Jack Krupansky and Base Technology are credited and a link back to this original proposal is provided AND these same use and distribution terms are carried along.

Please note that DVPDS is only a concept right now, with no implementation or business plan to turn the concept into a product and service.

The rest of the document is unchanged since its creation to describe the DVPC concept, but should be read as referring to the DVPDS concept.

-- Jack Krupansky

Dumb PC/smart Web versus smart PC/dumb Web

One of the issues that we need to confront as we design the computing architectures of the future is the question of which will be "smarter", the user's "device" (PC or "phone" or other object) or the applications on the Web.

One route is that we continue to put a fair amount of intelligence on the user device and that the Web remain primarily relatively static data and "services". For example, the browser and browser "add-ons" would continue to get smarter, and Web Services would be primarily "utilities" to be used by browser-based applications.

The complementary route is that the user device be relegated to being a relatively "dumb", "thin" client, strictly focused on UI implementation and that the real "smarts" of applications would live on Web servers. For example, the browser would support sophisticated AJAX-like UI support and a 3-D graphical environment, but little in the way of support for "intelligent" operations on the device itself.

Obviously you can have a full spectrum of hybrids of dumb/smart, but then we will have to constantly be making tradeoffs about dumb/smart as we design each application. That might be optimal for specific applications, but raises the cost of designing, implementing, and supporting applications and it may be pure hell for poor "dumb" users who simply want some consistency between applications so that they don't need to figure out every new application and try to keep them all straight.

This raises the next question, which is what criteria to use to decide where along the spectrum smartness and dumbness lie.

It also raises the question of where software agents will live and operate. Do we want agents to live strictly on servers with only UI elements on the user device? Or do we want software agents to live and work on the user device as well or even primarily on the user device?

More food for thought.

-- Jack Krupansky

Saturday, March 03, 2007

The fractal nature of the Web - updated

TBL has updated his commentary entitled "The Fractal nature of the Web" with some notes on how the Semantic Web can and must work with a combination of overlapping global and local ontologies. He discusses the importance of thinking about ontologies and domains from the perspective of agents that communicate with messages relating to ontologies and domains that they share. He concludes:

So the idea is that in any one message, some of the terms will be from a global ontology, some from subdomains. The amount of data which can be reused by another agent will depend on how many communities they have in common, how many ontologies they share.

In other words, one global ontology is not a solution to the problem, and a local subdomain is not a solution either. But if each agent has uses a mix of a few ontologies of different scale, that is forms a global solution to the problem.

His overall "web fractal" commentary starts from this thought:

I have discussed elsewhere how we must avoid the two opposite social deaths of a global monoculture and a set of isolated cults, and how the fractal patterns found in nature seem to present themselves as a good compromise. It seems that the compromise between stability and diversity is served by there the same amount of structure at all scales. I have no mathematical theory to demonstrate that this is an optimization of some metric for the resilience of society and its effectiveness as an organism, nor have I even that metric. (Mail me if you do!)

However, it seems from experience that groups are stable when they have a set of peers, when they have a substructure. Neither the set of peers nor the substructure must involve huge numbers, as groups cannot "scale", that is, work effectively with a very large number of liaisons with peers, or when composed as a set of a very large number of parts. If this is the case then by induction there must be a continuum of group sizes from the vary largest to the very smallest.

File this under "food for thought." This issue of how domains can interoperate and the respective role of global domains is key to developing a global knowledge web. How to achieve stability in the presence of diversity is a quite difficult problem. This will need some original thinking on the nature of equilibrium.

-- Jack Krupansky