The search for search's next generation
| Email weblog link | ||
| Discuss | ||
| Blog this |
Andy Oram
Dec. 17, 2003 11:29 AM
Permalink
![]()
URL: http://www.businessweek.com/technology/content/dec2003/tc20031216_9018_tc04...
Current search engines--even the constantly surprising Google--seem unable to leap the next big barrier in search: the trillions of bytes of dynamically generated data created by individual Web sites around the world, or what some researchers call the "deep web." You can't look up the status of a Federal Express package without going to the Federal Express site, or the details on an eBay item without checking the eBay site. Dynamically generated data can't be spidered.But the article cited above shows how this barrier is slowly cracking. Now I can enter "fedex 791725670102" into Google (not Federal Express) and discover that the jigsaw puzzle I mailed to an author in Australia was signed for by him.
Of course, Google has to send me to the Federal Express site (which takes an extra click) to complete the search, but the principle is established: a search at Google can kick off a deep search on another site.
The burn-out of the dot-com era left a smoldering envy of those few dot-commers that managed to stay alive. Google is foremost among these. If they can continue pulling in dynamic data from more and more sites, their dominance may well continue--for access to dynamic data is indeed the key to the next big improvement in search.
A generalization of the Google/FedEx collaboration would lead to what is commonly called metasearch engines, a peer-to-peer solution to the search problem that involves a radically different architecture from any of the current popular engines. I said different, not new. The idea of peer-to-peer search was aired at least as far back as early 2000. I described it in my first article on peer-to-peer systems in May of that year:
Gnutella is a fairly simple protocol. It defines only how a string is passed from one site to another, not how each site interprets the string. One site might handle the string by simply running fgrep on a bunch of files, while another might insert it into an SQL query, and yet another might assume that it's a set of Japanese words and return rough English equivalents, which the original requester may then use for further searching. This flexibility allows each site to contribute to a distributed search in the most sophisticated way it can. Would it be pompous to suggest that Gnutella could become the medium through which search engines operate in the 21st century?
What's holding back metasearch is the lack of standards for categorizing data and knowing what to search for. It's easy to guess that "fedex 791725670102" should be interpreted as a search for a Federal Express package, but anything less strictly defined is a big metadata problem.
A lot of people have dumped on the ideal of metadata, notably Cory Doctorow in the article Metacrap. So the waters of the deep web will be slow to stir, but as the benefits become clear, more and more sites may emerge.
What business model would drive metasearch? That question is classic in peer-to-peer systems, because distributed systems typically have problems generating and distributing income. Sites could be motivated to solve the metadata problem because they'd draw more traffic by joining the system, and expose more of their data to people's searches.
As for the aggregating site--Google or a competitor--it would potentially have an easier road to profitability than Google has now. The aggregating site could continue to derive revenue from ads and from the sale of search software. Since the computing resources it needed would be vastly less than the current Google, it would need less revenue from ads and sales. And since the use of its software would be a prerequisite to joining (although one hopes it would tolerate the use of compatible, competing software) it should be able to land more sales.
Andy Oram is an editor for O'Reilly Media, specializing in Linux and free software books, and a member of Computer Professionals for Social Responsibility. His web site is www.praxagora.com/andyo.
Showing messages 1 through 2 of 2.
-
the role of metadata in searching
2003-12-18 06:40:46 Bob DuCharme | [Reply | View]
-
The business of search
2003-12-17 13:18:59 anonymous2 [Reply | View]
If the recent woes of the music industry have taught us anything, it's that the easier it becomes to perform a service, the less people are willing to pay for it. If the computational resources for "deep" searching are drastically reduced, more and more people will jump in the game. Not saying it's a bad thing, the users will benefit, but such deep search would be an extremely disruptive technology. If Google is looking to go that route they're going to have to leverage it to make something even better if they want to make money off of it.
| Showing messages 1 through 2 of 2. |
Return to weblogs.oreilly.com.
Weblog authors are solely responsible for the content and accuracy of their weblogs, including opinions they express, and O'Reilly Media, Inc., disclaims any and all liabililty for that content, its accuracy, and opinions it may contain.
This work is licensed under a
Creative Commons License.




Bob