Let's return to the happier subject of research topics: what peer-to-peer can do for you, and what you can do for peer-to-peer.
Peer-to-Peer: Harnessing the Power of Disruptive Technologies
Research into distributed applications and infrastructure has a very wide application. Centralized systems are evolving toward decentralization as they grow larger and scale upward. A well-known example is how the hosts file on the Internet became the Domain Name System. A more recent example concerns Web caching and the use of Akamai by large sites with high bandwidth demands. You might have heard that Akamai's founder and CTO, Daniel C. Lewin, was tragically lost on one of the hijacked planes last month. One observer pointed out that the rush to news sites after the tragedies proved the importance of his company's technology.
So centralized systems evolve toward decentralization. In an intriguing, complementary operation, decentralized or peer-to-peer sites are evolving toward centralization, also in a response to growth and the need to scale upward. Gnutella now has superpeers. Freenet provides gateways, JXTA Search creates a hierarchy of servers, and so on.
Some of the activities I've seen on the Internet2 Web site under the Middleware directory touch on the problems that centralized as well as peer-to-peer projects face. It would be great for Internet2 developers to remember the peer-to-peer aspects of whatever they are researching and its potential applications to peer-to-peer. For a start, consider the possibility of symmetric exchanges over all your protocols and infrastructure. Here are some more specific topics.
First, I'll talk about naming and resource discovery. The only systems where you don't care about names are systems where you want to be anonymous. Gnutella and Freenet are famous for this characteristic, of course, and they have achieved something incredibly ground-breaking and mind-expanding: They provide content independent of its location. Later peer-to-peer systems have built on this innovation, which allows lots of good things. But most systems still want to find particular individuals or repositories for information--they want identification and resource discovery.
|What role should academia play in the development of P2P technology?|
They achieve this through a shameless lapse away from decentralization. Identities are stored in a strictly centralized repository, as in instant messaging services. Some products, like XDegrees, Jibe, and Redmind, do some fancy distribution and breaking up of the namespace. The good old Domain Name System does this, in fact.
The Gartner Group speaks of a virtual namespace for peer-to-peer. I don't know what makes these names less real and more virtual than any other names. I think what the Gartner Group means is that these namespaces--instant messaging, Napster, and so on--tend to spring up ad hoc and opportunistically. This seems to me a weakness of current peer-to-peer systems, not a strength.
IPv6 will definitely help. It will, we hope, bring users' systems out into the open, eliminating the current Network Address Translation system that hides the users. But IPv6 is not enough to solve peer-to-peer's addressing problem. First, we can't wait until IPv6 is deployed in the larger world. Second, it is naive to think that every device will have a fixed, permanent address when IPv6 is deployed. To do so would overwhelm the world's routers; one of the major benefits advertised for IPv6, in fact, is that it makes renumbering easier. Finally, what we really want is names rather than numbers anyway. When I ask you to visit my Web site, I don't ask you to type 220.127.116.11 into your browser. Furthermore, I may log in from many places--work, home, a mobile phone, a train station--and I'm still me even though my address is different.
Identification and resource discovery is therefore one of the great problems you can work on in Internet2. I would like answers to the question: "What combination of centralization and decentralization works best for a particular application and information architecture?"
Partly because so many services are already offered through Web HTML forms and CGI, and partly because firewalls block any data not sent through port 80, the chief method of service delivery for the next couple years will be Web services using HTTP and probably either XML-RPC or SOAP. These protocols and the programs that handle them are probably not the most efficient nor flexible way to handle peer-to-peer communications. Some other protocols you can explore include JXTA, of course, the SCTP transport-level protocol, and the BEEP application-level protocol.
Security is the bogey man invoked by many people who want to debunk peer-to-peer. I'm not sure why there's so much hysteria around the supposed security problems of peer-to-peer. Most systems, and certainly commercial systems, are perfectly up to date on encryption, digital signatures, digests, and other standard elements of network security. I suppose the confusion sets in because the most famous peer-to-peer systems, like Napster and Freenet, are marvelously open and uncontrolled. To people who are unused to disruptive technologies, open and uncontrolled must mean insecure.
What combination of centralization and decentralization works best for a particular application and information architecture?
If peer-to-peer were inherently insecure, it would not be used by the McAfee company to distribute updates to its virus-detection software. McAfee ASaP is a service provided to large companies to let them distribute updates quickly throughout their organizations. Instead of making 10,000 individuals contact the McAfee Web site (a sure recipe for network overloads), a few initial systems contact the McAfee site, and they pass on the software to other systems in a chain. This is called rumor technology and is a form of peer-to-peer, the same architecture used by such content-delivery networks and streaming-media distributors as AllCast.
When you're fighting viruses, you're clearly concerned with security, and McAfee's use of a partially peer-to-peer system is a stunning endorsement of peer-to-peer's security. McAfee's rumor technology is not only more efficient than routine Web downloads, but more secure. Employees of each company have to go outside their corporate network only a few times to get the software. Most of the networking takes place inside the corporate network, presumably protected by a firewall and the general LAN architecture.
But peer-to-peer systems have to deal with the same security problems as traditional systems. There's denial of service, where computers can become overloaded with requests or with data. There's authentication, so you know who's sending you data. And there are larger trust issues. A centralized public-key infrastructure (PKI) is not necessarily any more robust than the peer-to-peer solution known as a "web of trust." I would not be surprised if authentication and trust become the greatest success of peer-to-peer. Eventually we may all move to adopt the web of trust as our preferred form of PKI.
There's lots and lots of room for research projects in architecture. What's the best structure to impose on the mass of internetworked computers for each combination of application and environment?
I have already mentioned metadata as an area for research. Metadata includes the kinds of categories people search for, what scales they use to measure one resource against another, or in a social sense, what brings people together.
Jabber and RDF are particularly promising ways to deploy metadata, but communities must somehow agree on tags. Then applications that exploit their potential need to be developed.