oreilly.comSafari Books Online.Conferences.
Articles Radar Books  

Distributed Search Engines

Category View | Alphabetical Listing | Detail Listing
Clip2 Clip2 Distributed Search Solutions (Clip2 DSS) develops and provides technical data and research for the Gnutella developer and end user communities. The company also recently released the Clip2 Reflector, a proxy server with indexing capabilities designed to operate as a "super peer" that works in conjunction with one or more Gnutella servents to enable a "brokered peer-to-peer" networking model. Although the actual file transfer is still handled directly between peers, the rationale is that using a network management application to broker the search requests improves network performance overall. The Java-based application requires a Java 2 Runtime Environment (JRE), Standard Edition, Version 1.3.0 (or equivalent) and is compatible with all software implementing version 0.4 of the Gnutella protocol.
Eikon Eikon is a Java -powered prototype for distributed image search engine software. The engine receives the URL of an image, then locates a user-defined number of similar images on the network. The query image may be in a wide range of forms, for example a thumbnail, scan, video capture or user-generated drawing. Image metadata can be retrieved via Extensible Markup Language Remote Procedure Calls (XML-RPC).

The Eikon project employs the Fast Multiresolution Image Querying method, which is described in a white paper (mrquery.pdf, 446k) by Charles E. Jacobs, Adam Finkelstein and David H. Salesin of the Department of Computer Science and Engineering, University of Washington. This page provides links to documentation for the Eikon Application Programming Interface (API), additional research references and a FAQ.
ExactOne, Inc. ExactOne, Inc. enables real time searches of dynamic data residing throughout the Internet via its ExactOne Query Engine. The specialized Java application consists of a front-end Hypertext Transfer Protocol (HTTP) request server that deals directly with end users via browsers or online applications, and a back end query and parsing engine. Results of searches can be returned as a formatted HTML page or an XML data stream. Dynamic data, or information in Web pages and databases that changes frequently, has traditionally been difficult to mine. A typical use of ExactOne's technology would be a shopping bot that accepts the type of item and price range desired from a shopper's Web browser or cell phone, and returns a formatted page of links to matching items available in real time.
Filetopia Bitmap Multimedia is the developer of Filetopia, a free communications software that includes instant messaging, chat, e-mail, a powerful distributed file sharing system with a search engine, an online friends list and message boards. Filetopia's file sharing tool includes public key encryption and a choice of strong ciphers to protect the IP addresses of its users.
FirstPeer FirstPeer provides a development framework for facilitating "Dynamic Distributed Marketplaces" that have more functionality, increased scalability and lower costs than most server-based centralized marketplaces. Features include: enhanced real-time trading, visibility of entire inventory, no increased overhead, cooperative interactions, analytic tools and tracking and the ability to receive monitored reports for the entire marketplace. FirstPeer's platform is based upon the Domain Naming System (DNS), Hypertext Transfer Protocol (HTTP), XML, Extensible Markup Language Remote Procedure Calls (XML-RPC) and Jabber protocols.

FirstPeer's Java-based Professional Servant is a thin client file sharing application that is able integrate directly with existing data sources (JDBS, XML, or CSV). Professional Servant supports Windows, Macintosh and Unix. The company has posted a form for persons interested in obtaining its Personal Servant application or the plug-in required to by its GnuMarkets active marketplace.
Gnotella Gnotella is a distributed real time search and file-sharing program run from a user's desktop as a client and a server (a "servent") on the Gnutella peer-to-peer network. Gnotella allows users to interface directly with each other with no intermediate, central authority, and search for and share any type of digital file (audio, video, word processing documents, recipes, games, and text files). The Gnotella client is designed for the Windows environment and offers features such as multiple simultaneous searches, resumption of partial or failed downloads, improved filtering/spam protection, bandwidth monitoring, enhanced statistics, upload throttling, skinning and other features.

Gnotella's open architecture allows for the program to be customized to target consumer and business communities. Gnotella's parent company, Petapeer Holdings, Inc., will continue to distribute the program freely while it develops a committed user base. Once that user base is in place, the company plans to begin charging for Gnotella upgrades and deluxe features. Petapeer Holdings Inc. does not monitor the activity of, collect or sell information about the users on its network.
grub.org Grub, Inc. authors a distributed computing client utilizing P2P technology. The initial client applicationis a distributed web crawler. The distributed crawler network will have the capacity to index, on a daily basis, every web page residing on the Internet. The index will be kept in a centralized database, on Grub's servers.
Hotline Connect Hotline is a P2P-Client/Server hybrid system for community building and file sharing that has been in operation since 1997. Hotline Connect is a suite which includes both a Hotline Client application and a Hotline Server application (available via separate downloads). Hotline works over any Internet or Intranet network and has capabilities similar to File Transfer Protocol (FTP) and Internet Relay Chat (IRC), but using its own "Hotline Protocol". Features include resumable uploads and downloads, real time chat, newsgroup style forums, and streaming capabilities in numerous media and file formats. The program's search function is used to locate servers, rather than the files of individual users. The Hotline Client, which has approximately a million active users, provides an easy-to-use interface and small memory requirements (2 MB).
Human-Links Human-Links is a distributed searching environment that categorizes public information in a personalized and relevant manner in order to generate accurate responses tailored to each user's specific needs. The program functions as a shared neural network, based on a group of technologies that fall roughly within the disciplines of cognitive science and artificial intelligence, and derived from algorithms that stem from research in data compression and natural language translation. The "brain" of Human-Links' complex classification system identifies conceptual patterns in media documents to determine their actual meaning, so that it can be situated in relation to other works of greater or lesser relevance. Using this method, the system is able to develop a sense of how each user organizes the world as opposed to imposing a standardized cultural or linguistic lexicon. The system is designed on an open-source platform with public sockets targeted at third-party development communities.

Human-Links is owned by Amoweba, a cognitive science research and development company specializing in enterprise applications of neural networks, artificial intelligence and human-centered ergonomics. The company is actively seeking individuals interested in beta-testing and is even offering incentives (a vacation package to anywhere, Palm PDAs, and DVD players and discs) to entice test participants to turn in comprehensive results and refer friends to the program.
InfraSearch InfraSearch (a.k.a. gonesilent.com) was acquired by Sun Microsystems in February to become part of Sun's JXTA (Juxtapose) project. InfraSearch, a technology prototype of a fully-distributed search engine, was orginally based on the Gnutella technology but has since been moved to a proprietary framework. Infrasearch was built by Gene Kan and other Gnutella developers, and the founding engineering team has roots in UC Berkeley's Experimental Computing Facility.
Jibe Jibe has recently released its Enterprise File Sharing v 1.0 beta. Jibe customers can participate over a public network or set up a private secure P2P hub between suppliers or partners. Any storage system that can be accessed through JDBC or ODBC can participate, including Microsoft Access and Excel spreadsheets.

Jibe provides a Java servlet that can interpret the XML for each taxonomy and present a Web form to an end user. A Jibe application can run standalone, or a company licensing Jibe can store a single servlet on an internal Web server and let its employees do searches through their browsers. Users can simply type a string into the application's "GO FIND" box, and let Jibe do the retrieval, ranking, sorting and display. Jibe's storage format and taxonomies are both defined in XML. Support for JXTA is planned in the future.
JXTA Juxtapose (JXTA) was originally one of Bill Joy's research projects aimed at developing a network programming and computing platform able to solve a number of the problems in modern distributed computing. JXTA has since become a community-based open source development platform. The four main concepts of the JXTA project are: the ability to "pipe" from one peer to another, a grouping notion, the ability to monitor and meter, and a security layer. The JXTA Shell is a prototype application that illustrates the use of JXTA Technology. The JXTA Shell permits interactive access to the JXTA platform's building blocks through a simple, text based interface (available on Solaris Operating Environment, Linux, or Microsoft Windows). A Technical Specification provides a description of the architecture and key elements of the Project JXTA technology including: peers, advertisements, messages, pipes and protocols. Demonstrations are available for download. See the FAQ or the documentation page for more details.

Check out OpenP2P.com's Project JXTA Developer Contest.
KaZaA KaZaA's P2P file sharing network allows users to search for and download audio, video, image and text files using one of three interfaces: the KaZaA Media Desktop Peer-To-Peer (P2P) client, KaZaA's Winamp Plug-in or the KaZaA.com web site. KaZaA's distributed network is "self-organizing" and features the ability to automatically transform more powerful clients into "SuperNodes" able to broker the search query requests of the weaker nodes on-demand. (See the FAQ for more technical details). The service is currently free, but KaZaA anticipates a small fee in the future in light of the lawsuits against Napster and Scour. KaZaA's core technology is based on KaZaAlib, a platform independent C++ library comprised of 59 functions. (The company has not yet formalized an open source development program.)
Morpheus Morpheus, available from MusicCity Networks, Inc. is a distributed file-sharing network based on the KaZaA code base. (Morpheus and the KaZaA Media Desktop client provide uniquely-branded interfaces to different breeds of what amounts to the same underlying network.) Like Napster, Morpheus is a closed system that requires the use of a centralized user registration and logon system. However, Morpheus differs from Napster in that it does not maintain a central content index or subject its network to content filtering. The Windows-based Morpheus client is available for download and contains an embedded version of Microsoft's Windows Media Player.

See the OpenP2P.com article: "Morpheus Out of the Underworld" by Kelly Truelove and Andrew Chasin.
MusicBrainz.org MusicBrainz Metadata Initiative was formed to organize a means of exchanging metadata describing media files on the Internet. The vocabulary was employed in the development of MusicBrainz Metadata (formerly CD Index Project), a Dublin Core metadata application and free web service that uses XML and Resource Description Framework (RDF) to make artists names, track numbers, song titles and other metadata describing musical content downloadable to the user's CD player, MP3 player, Vorbis player, or other client. The first player to fully support MusicBrainz is FreeAmp.
Myster Myster attempts to duplicate Napster in every way except for the fact that it uses a completely distributed system. Although very similar to Gnutella, Myster has many optimizations that make it easier to scale than Gnutella, including a self organizing network of peers that is able to channel to only query the nodes that are most likely to contain the desired files. For instance, if one server only has a great MP3 collection and no Movie files it will only receive requests for those files. The system is also expandable so that as the network aquires more nodes, more sub networks can be created to keep searches efficient, enabling networks to eventually be sub-divided into more well-defined categories as the size of the network increases.

Myster is written in Java and available via the GNU General Public License (GPL). A full working version in either Macintosh or Windows format is available along with documention and source code.
NeuroGrid The NeuroGrid system currently under development consists of two main elements: a learning engine that observes user activity and updates documents meta-data accordingly, and an approach to routing search messages in distributed, decentralized networks. The NeuroGrid approach stems from two basic desires. One is to have the ability to organise data in an associative, web-like fashion (as opposed to the hierarchical structure we see in file-systems). The second is to be able to extract that data from a large distributed network environment with the minimum of effort. NeuroGrid tries to provide a more general semantic framework than that of FreeNet by creating and maintaining lists of which queries other nodes have been good at answering in the past. A Position Paper, White Paper and online demonstration are available for review.
NextPage, Inc. NextPage's NXT 3 e-Content Platform provides some of the world's largest information-intensive corporations with their own customized peer-to-peer file-sharing network. The company's Tools and Applications provide enterprise-sized businesses with the ability to manage and share resources securely over a distributed network in real time, using a robust application architecture that was specifically-designed to be able to scale.
PLATFORMedia LLC PLATFORMedia has developed a peer-to-peer distributed computation environment and search application that uses a corporate LAN's idle computing cycles to automatically-index document content, making it more easily accessible by users across the network. The company's Search by Concept application uses statistical inferencing and optimization techniques to systematically build 'mental maps' of each user's concepts of interest. Its scalable application architecture enables the members of a network to not only exchange files, but also manage the sharing of knowledge between nodes. The PLATFORMedia environment is tightly integrated with Microsoft applications such as Word and Excel and requires Windows 95/98/NT and the Java2 Runtime Environment 1.3 (JRE 1.3) or later. A free trial of the software is available for download.
Plebio Plebio is a search engine that searches all the online file databases on computers running plebio software. Plebio attempts to create a peer to peer search network that will allow anyone to share any piece of information easily and quickly. Plebio has been created by Ashhar Farhan - a software writer from Hyderabad, India.
Project Pandango Inc. Project Pandango Inc. is a company formed by the i5 Digital LLC intellectual property development company to launch a search engine based upon their Pandango (taken from the Latin word "Pando," meaning to extend) Peer-To-Peer Web search technology. In addition, third party companies will be able to license a private label version of the search engine that employs personalization features to target a specific area, such as sports or finance. The XML-based Pandango mixes Peer-To-Peer technology with a distributed version of collaborative filtering (a technology used by the Google engine). Patterns of searches by other like-minded searchers are leveraged to arrive at what the company claims are newer and more relevant results. (Search companies using traditional methods disagree, and predict a downside of slower searches.)
Rapigator Rapigator, a Napster work-a-like designed for the OpenNap server to give users the ability to search on the Napster network for any file type. Rapigator also features support for resuming failed downloads, chat, bandwidth throttling and control, the ability to search multiple servers and server listing auto updating. The very small (483 KB) Rapigator client is available as a free download. A FAQ is also available for review.
Redfoot Redfoot is a Python-based framework for distributed Resource Description Framework (RDF) applications written by James Tauber and Daniel Krech. Redfoot offers an RDF parser and serializer, a Hypertext Transfer Protocol (HTTP) server that provides a Web interface for editing/viewing/importing RDF, a query Application Programming Interface (API) for RDF with many high level query functions, a customizable User Interface (UI) and the ability to perform Peer-to-Peer (P2P) RDF data exchanges. Future development plans include major expansion of the P2P architecture so Redfoot applications have a robust environment for discovery of RDF statements on peers, an inference engine, example applications built upon Redfoot, and connectors that map non-RDF data to RDF triples. Redfoot is distributed under a Berkeley Software/Standard Distribution (BSD) license.
Songbird Songbird is a client-side application specifically designed to work within the existing Napster network. Songbird is the latest offering of Media Enforcer LLC. Songbird is freely-available for download and also offers several artist-related features, such as enabling an artist to receive a "snapshot" of a song's popularity on the network. A recent Wired News article felt that the most useful of the program's features was its ability to create an Excel file of user names and files (although IP addresses are not provided). The Songbird FAQ claims the program is able to work effectively with many common song title variations (such as Aimster's Pig Encoder, etc.).
Thinkstream Thinkstream's distributed information and e-commerce platform and Tadaaa! client software provide small to medium-sized merchants with the tools to configure and organize a "public information network" (online marketplace) quickly, easily and without requiring any substantial modifications to their existing network infrastructure. Thinkstream's own online marketplace provides direct, real-time access to product information and pricing, as well as detailed rating and comparison information about its participating vendors. The Tadaaa! file-sharing application allows users to exchange just about any file type (document, video, music, database, images or even spreadsheets) within a secure environment. The program's search technology is capable of searching within file content for metadata or to satisfy a variety of other, more complex kinds of queries. A white paper describing Thinkstream's Distributed Internet Architecture and the theoretical justifaction behind it (A Technology Review of the Next-Generation Internet Architecture: Thinkstream’s Distributed Internet Architecture -- 35 pages, Requires Adobe Acrobat) is also available for download.
UDDI (Universal Description, Discovery and Integration) The Universal Description, Discovery and Integration (UDDI) specification is an industry initiative lead by Ariba, IBM, and Microsoft that defines a platform-independent, open framework for describing services, discovering businesses and integrating business services over the Internet.

UDDI was designed to provide existing directories and search engines with a centralized source for programmatic descriptions of business Web services.

The UDDI Business Registry will allow Businesses to publish their preferred terms of conducting e-commerce or other transactions for other UDDI-enabled agents to "discover".
Veriscape Veriscape's IntelleCat (Intelligent eCatalog Procurement Assistant) is a knowledge-based dynamic catalog management system that enables purchasing professionals to search, find, retain and implement their own tailor-made best-practices on a global scale. IntelleCat's adaptive searching techniques save the state of "successful" search queries as tree-based hierarchical data structures. These stored query trees collectively provide the semantic framework used to regulate the program's inventory tracking and purchasing management features.

IntelleMatch is the subcomponent that enables IntelleCat's adaptive matching engine. Nomenclature issues are resolved via a dynamically-constructed reference database of eCatalog product descriptions structured in a thesaurus-like fashion using a customized semantic infrastructure generated by "matching" to the cognitive makeup of an individual user. Veriscape's products utilize its patent-pending Netcentric Virtual Supercomputing Infrastructure (NVSI) architecture. NVSI is designed to "plug and play" within existing networks and runs on top of standard operating systems, such as Unix, Linux, and NT.
WebV2 WebV2 provides an application platform and network infrastructure to enable commercial peer-to-peer applications. Its approach goes beyond simple file searching and sharing and extends to B2B collaboration between peers in the supply chain, or to enable direct knowledge exchange within enterprises. WebV2 architecture is based on networked intelligent agents, capable of scaling up for commercial use.

The company was founded in early 2000, and has been incubated and financed by Siemens Technology-To-Business Center in Berkeley, California.
xS xS is an open source Java-based personal digital asset management system that enables people to organize anything they have that is digital--audio files, pictures, videos, etc., and share them with others. It also empowers users to control to define how they share different assets with others by storing complex meta-data about users' digital assets. xS comes out of the box with support for 3 different network protocols: XML over HTTP, RMI, DXP, xS' own custom protocol. The network protocols are written in an XML dialect I have created called Dax, so alternate clients can be written in other languages as long as they talk Dax. It has a pluggable network protocol architecture and a pluggable "ingest" architecture that provides a simple interface that can be used to build new ingest engines. (An "ingest engine" is code that reads a digital asset and determines initial meta-data from that stream, for instance, to analyze video for scenes and content.) xS comes with an MP3 ingest engine that reads MP3 ID3 tag info. xS stores its meta data in a pure-Java relational database, InstantDB. Other features include: multi-lingual capability (English, French, Dutch, and Spanish), multi-byte character set support, and the ability to perform complex queries, such as a search for "all songs with the word rain in them by the cure and the cult".

P2P Weblogs

Richard Koman Richard Koman's Weblog
Supreme Court Decides Unanimously Against Grokster
Updating as we go. Supremes have ruled 9-0 in favor of the studios in MGM v Grokster. But does the decision have wider import? Is it a death knell for tech? It's starting to look like the answer is no. (Jun 27, 2005)

> More from O'Reilly Developer Weblogs

More Weblogs
FolderShare remote computer search: better privacy than Google Desktop? [Sid Steward]

Data Condoms: Solutions for Private, Remote Search Indexes [Sid Steward]

Behold! Google the darknet/p2p search engine! [Sid Steward]

Open Source & The Fallacy Of Composition [Spencer Critchley]