Gnutella: Alive, Well, and Changing Fast01/25/2001
Gnutella, an open peer-to-peer search system primarily used for file sharing, was released in March. Within four months, developer activity had substantially diminished, although usage continued to surge due to Napster-driven media attention on peer-to-peer file-sharing systems. After five months, the strain of an increasing number of users on a weak technical infrastructure resulted in a quasi-collapse of the Gnutella network. Late in the year, however, a second wave of more sophisticated development began to emerge, informed by experience. Defying reports of its demise, Gnutella is evolving and usage is growing in response, although significant technical challenges remain.
What problems have been overcome and how? What problems remain to be solved, and how can they be addressed? Clip2's Distributed Search Solutions initiative has continuously gathered data on the Gnutella network and closely followed related application development. Here, we cover some representative issues to provide insight into Gnutella's evolution.
The origins and technical significance of Gnutella have been described elsewhere. Some notable points:
- Gnutella's creators released an executable application and published neither its source code nor the communications protocol. Extant protocol publications made by third parties trace their primary sources to reverse-engineerings of the original application.
- It is generally acknowledged that Gnutella was not designed to support an unlimited user population, but instead a few hundred to perhaps a few thousand users.
- The Gnutella protocol defines five message types, the data carried by each type, the transmission rules for each type and the mechanics of connection between hosts.
- Pings and queries used to discover hosts and files, respectively, are broadcast; other message types, including responses, are routed. Messages are supposed to be dropped after a predefined number of relays.
- Gnutella is not a file-transfer protocol. The protocol is designed for finding hosts and their files. File transfer is handled directly between serving and requesting hosts via HTTP. Gnutella applications that serve files contain mini Web servers.
- The protocol does not specify how many connections a given host may initiate, accept or simultaneously maintain. It does not dictate conditions under which a host should maintain or drop a given connection.
- Many independent developers have produced a number of Gnutella-speaking applications.
It is not hard to imagine from the foregoing that Gnutella is susceptible to a number of problems.
Non-compliant implementations are problematic not just for their users, who may not be able to effectively communicate with others, but they are also trouble for the network at large. Because Gnutella messages are relayed from host to host, the impact of a non-compliant application can easily extend beyond its installed base and be magnified out of proportion.
But, what does "non-compliant" mean for a protocol without a blessed standard? In the open world of Gnutella, free from central authority, compliance means being able to effectively communicate with the bulk of the installed base. It is not unlike the situation with languages such as English that have no formal codification. Protocol specification documents in this environment then become analogous to dictionaries that reflect popular usage instead of dictating usage.
Of course, non-compliance can arise out of the purposeful invention of new words or simply out of poor grammar and pronunciation. Among the many ways an application can go wrong on the latter front: It can malform messages it originates, it can corrupt messages it forwards and it can improperly route messages. Proper handling of the routed message types by creating and maintaining a routing table is a feature that, when short-shrifted by a developer, results in substantial costs to users, including increased traffic and lost responses. The low barriers to entry to Gnutella programming have encouraged less experienced developers to try their hands, often exacerbating matters.
Non-compliant implementations have been kept in check by, among other things, the availability of quality protocol specification documents, and the strict filtering implemented in popular applications in order to not propagate deviant messages. They represent a continued problem.
Connectivity was a big headache for users. Just as a Web browser needs a start page, a Gnutella application needs a start host. Unfortunately, early programs did not come preset with one because host addresses generally have short shelf lives. This sent users searching across Web sites, message boards and chat rooms for active host addresses. Developer Bob Schmidt came to the rescue with gnuCache, an open-source application that automatically began doling out addresses from several enthusiast-run servers. Not long after, Clip2 began reliably serving lists of well connected, verified active hosts through a service that could be accessed at Gnutellahosts.com via Gnutella and the Web. By fall, developers had begun providing an "auto-connect" feature in Gnutella applications that relied upon host list services for start hosts, relieving users of the need to bother with this matter. Technically, these services are sufficiently uniform in the way they operate so as to be interchangeable to the developer, although the quality of addresses returned varies. The connectivity problem has thus been addressed in a manner that is not susceptible to a single point of failure.
Lack of Search Results
A lack of search results was a substantial issue following the quasi-collapse of the network in August. As the traffic carried by an average host grew, it eventually exceeded the capacity of hosts on the slowest physical links -- dial-up modems. These hosts became bottlenecks in the network, effectively severing communication lines running through them. Fragmentation into smaller sub-networks effectively resulted, with the upshot that users saw fewer search results.
Responses to the issue followed a common theme: move users on slower connections to the edge of the network.
In October, Clip2 introduced the Reflector, a special Gnutella server designed to run on a high-speed connection and act as a proxy for users on slower links. In so doing it conserves the user's bandwidth and situates slower hosts at the edge of the network. Via a Reflector, a network of users can use Gnutella with far less aggregate bandwidth than would otherwise be required. Most Reflectors are run on behalf of a particular user population and not publicly advertised, although a handful of public-access ones are available at any given time.
November and December saw the introduction of two significant new Gnutella applications. First, Lime Wire LLC introduced LimeWire, then Free Peers, Inc. released BearShare. Both programs apply connection-preferencing rules that decide whether a given connection will be maintained. One common example: connections to unresponsive hosts are dropped. The consistent repeated application of this simple rule to a series of connections will tend to drive slower hosts to have fewer connections and sit at the edge of the network, a bit like a poor conversationalist might find himself marginalized at a party.
Coincident with these developments and the uptake in adoption of these applications, Clip2 has seen a steady increase in the number of responsive hosts active at any given time on the network, rising from a typical figure of 500 in October to more than 1500 in early January 2001. The quantity of search results has increased as well. According to Clip2 estimates, the number of Gnutella users per day has risen from 10,000 to 30,000 in November to between 20,000 and 50,000 in January.
Pages: 1, 2