Editor's Note: The following speech was given at the February 2002 Free and Open Source Software Developer's Meeting in Brussels, Belgium.
Although this was billed as a speech about peer-to-peer, I'm actually going to talk about how to make Internet-connected computers function better. The suggestions I offer today would make the Internet a more effective place for many types of applications, peer-to-peer as well as others.
I've never been that happy with the term "peer-to-peer," which has the sound of a marketing term and has never been tied to a clear specification of technical criteria. But I think the concept drives developers to build new systems that stretch our current use of the Internet in valuable ways. Peer-to-peer systems definitely have certain vulnerabilities, so they and their infrastructure are forced to be better than traditional applications. I have repeatedly said that the problems in peer-to-peer systems are neither new nor unique; they make us look for solutions to old problems that we all worked around or tried to ignore before.
Furthermore, the challenges in this area are just right for free software. First, peer-to-peer makes life especially hard for a proprietary software company. Few companies can survive even in the current market for conventional products, but peer-to-peer makes the challenge even worse. Most of the activity in peer-to-peer systems, by definition, goes on at the end points. The situation is like all the users bringing parts for a model airplane, and the proprietary company providing the glue. The companies want you to sniff the glue and come back for more, but it's a very thin basis on which to charge money.
I'm not denigrating what proprietary companies do; I'm just suggesting that they have chosen a steep uphill path. There are many people with bold thinking and good products in the proprietary companies I talked to. Researchers trying to create mega-projects with ambitious goals should examine the commercial products to see what can be realistically expected from the next generation of software.
The 2002 O'Reilly Emerging Technologies Conference explored how P2P and Web services are coming together in a new Internet operating system. |
In contrast, free software fits the spirit of peer-to-peer beautifully. Peer-to-peer is about empowering end users. Customization is a common feature; extending the system with new capabilities is to be expected. Because central administrators have no control over what people do on their systems (at least in wide area networks), peer-to-peer systems can't tolerate the proprietary hack of security through obscurity. Security has to be built into the protocol. For these reasons, great advances in peer-to-peer have already emerged from academia and the free software movement, and I expect more to come from these sources.
Some researchers, in Scientific American and elsewhere, call for mega-projects that solve all resource, routing, and indexing problems at once. While I am in awe of some of the research, I suspect that a more quick-and-dirty approach, based on an easier set of assumptions, may be more fruitful. Ted Nelson said dismissively of the World Wide Web, "Berners-Lee picked off the easy stuff." Yes, he did, and that's why the World Wide Web was successful enough to change how all of us work. If you want to develop something really significant, make it easy for yourselves.
Is peer-to-peer worth the attention of leaders in the free software movement? I believe it offers a critical opportunity: think strategically.
It would be tempting, but not productive, to take this opportunity to bash Microsoft. I believe that Microsoft got to its pre-eminent position because it manages to keep up with trends and meet people's needs in some basic ways. It has the most effective programmers in the industry, the most effective marketing people in the industry, and the most effective lawyers in what's left of the industry.
Microsoft, in its position of control over the resources of the individual computer, can take over any initiative centered on that computer, to the extent it's allowed by law. After seeing what happened to disk compression, office software, and (if recent trends continue) audio playback technology, new software companies could easily come to believe that they exist simply to provide ideas for new Microsoft Windows features. Even the TCP/IP stack and the Web browser have not been exempt; everything that wants to insert itself onto the desktop has to pass through the Mines of Moria. (After the second volume of Lord of the Rings has been released in movie form, I will be able to speak of Shebol's lair, an even better metaphor.)
The only hope of breaking the monopoly--and not a sure thing, either--is to develop a radically new form of software that is not easy to capture, that distributes control among many different sites, and that offers so many advantages that people rush out to embrace it rather than stay safely in the sheepfold.
Think also of Microsoft's strategy. Free software is the most popular server software: for instance, BIND, Apache, and sendmail are superior to Microsoft's products by almost any measure. Microsoft hopes to spread upward from the desktop to the server, as a disruptive technology (described in Clayton M. Christensen's book, The Innovator's Dilemma) traditionally does--from the cheaper and simpler system to the more expensive and sophisticated system. The idea behind this strategy is that system administrators will want the management of a corporate server to be as easy as running office applications.
I don't want to let this form of Gresham's law triumph. Why don't we reverse it? Let's provide servers that are so powerful, so easy to customize, and so robust in the face of failure or attack, that every user wants to run a server. This movement will provide more competition, not only to Microsoft, but to other institutions that want to control users' options, such as cable TV companies.
|
Enough introduction. The meat of my talk is a list of issues that I'd like free software developers to work on.
|
Related Reading
Peer-to-Peer: Harnessing the Power of Disruptive T |
Let's start at the first stage of a typical peer-to-peer application. The various actors have to find each other, which requires identification, routing, and resource discovery. We know that instant messaging systems and Napster had weak solutions to this problem, because they depended on one mega-database and a single or replicated server. Better solutions can be seen in standard email and Jabber, because they rely on servers that are normally close to the user. Each user could even run his or her personal server, theoretically. This is not a full solution, though, because each user still depends on a single server.
When I ask researchers about providing distributed identification, they all say, "Chord!", a research project at M.I.T. It does look very elegant. The research papers stress the speed of a look-up (it takes place in O(log2 n) time) and how robustly the system responds to host failures, but I haven't seen anything about its memory usage or the overhead of maintaining the system as peers enter and leave.
The IETF (Internet Engineering Task Force) is also working on mobile computing, but its solution is oriented toward mobile phones--a very particular solution designed for a particular industry. It embodies all kinds of assumptions tied to cellular phone systems: assumptions about the distribution of servers, the authentication between server and client, and so on.
What I'd like to see is a flexible identification system for mobile and intermittently connected users that doesn't tie the user to one server, but allows a user to specify multiple alternative base systems. That is, you can register with several systems that have fixed IP addresses or DNS names, and then tell people to check with those systems to find you. If one system is offline or out of date, one of the other systems should know.
This system is layered on top of DNS. DNS is great at what it does, and my suggestion would add an extra hop, or several hops, to support the idea of presence for the mobile user. Perhaps a system involving lots of hops would help to hide the communication from surveillance, as onion routing does. But that's not its main goal.
Routing is another area stretched by a lot of peer-to-peer systems. The designers have found that routing at Layer 3 is not sophisticated enough for their kinds of applications, even once they've found someone's IP address.
For instance, take the common application of file sharing, and assume that one person who has the file you want is connected to the same router through the same ISP, while another person is on a different continent. Which would provide the file more efficiently?
Peer-to-peer applications demand more network-aware routing. It would be nice to know what the response time and throughput on various routes were before choosing how to send a large chunk of data. Large businesses and ISPs do this now through Route Optimization, but decisions are made based on the granularity of the interface. If the business wants different Quality-of-Service for different traffic, it has to route traffic through different interfaces. Peer-to-peer routing should take place at a very fine granularity.
This project could benefit from standards, and even a new networking layer right below the application layer. Some intelligence may be specific to each application, so there may be a limit to how much we can standardize application routing. It certainly deserves research.
Routing requires an identification system, so you can specify where you want traffic to go. Correspondingly, an identification system isn't any good unless you have a way to route traffic from one identified party to another. So identification and routing are actually the same problem.
In the next step of peer-to-peer computing, the peers need to exchange data, and that involves protocols on many levels. The main protocol trying to establish itself as a standard right now is JXTA. I don't know whether it will succeed. What Sun did is release a reference platform, and you know what those are like. JXTA seems to be even cruder, less efficient, and buggier than most reference platforms because Sun rushed it out. So whenever I talk to a company working on a peer-to-peer product, I ask them about JXTA, and they always say, "We're looking at it."
A few days before coming here, I saw an announcement for "the first comprehensive framework based on Project JXTA technology" (VistaRepository from VistaPortal Software). That's a good sign.
Web services are hot right now, but their infrastructure has turned out big and scary, which is not what the Web was meant to be. XML-RPC was elegant, but SOAP tries to cover every eventuality. You hardly get a chance to say what you were going to say by the time you've said everything you have to say about what you're going to say. And in trying to solve the presence and discovery problems, Microsoft and IBM and others have created an enormous superstructure resembling CORBA or COM, all implemented between angle brackets. Well, it may serve a need right now; it's nice to be able to add some automation to your Web site. But I can't believe that Web services are the path peer-to-peer will take in general. Besides, I don't believe in port 80 pollution.
Some of the necessary protocols are meant for structuring content. Because XML provides a universal way to structure content, it's about the most pervasive technology in computers right now. Structure lets computers be intelligent about content: they can sort it by various criteria, extract what a person cares about while ignoring crap they don't care about, find commonalities among people by checking public profiles, and do a million other things.
What's special about metadata in peer-to-peer is that users should be able to invent their own tags; these tags should emerge through a grassroots, bottom-up process. Let's say you have a political discussion and people would like to rate politicians. They may come up with tags to express the politician's commitment to the environment, his or her attitude toward immigration, and so on. I would love to see systems that help people negotiate what tags they should use for any occasion.
When we think about supporting peer-to-peer applications, we can't stop with standard computer systems; this is going to become a world of small devices. I find the idea of having an Internet-connected sensor attached to my furnace a little frightening. If I screw up on my computer, I can reload my back-up disk, but if I screw up on my furnace, my neighbors may not live to forgive me. As the American humorist Dave Barry wrote, "I don't want my appliances to be smarter than me. They should be stupider than me, like my politicians and my children."
But people will insist on using devices and sensors, so we should be there to support them. There are some fairly well known issues you have to address to write applications appropriate for devices, such as adopting a Model-View-Controller design pattern so elements can be reused in very different visual or non-visual environments.
What I think will make or break the device market is ease of customization. People should be able to program their devices; they should be able to write a script and download it to the device. So I think devices provide an exciting new platform for scripting languages. And the best scripting languages are free software; somebody should port them to new devices as they come along. Or develop new languages.
The reason I talk about new languages is that applications for devices have different needs from standard computers. On the furnace device, for instance, I imagine that date and time are critical concepts and should be easy to manage. I use the Date/Time module in Perl all the time for my own applications, but it's rather bloated for a device. Maybe it's time for new languages or at least new libraries for devices.
|
Hanging over everything is the problem of security. Two interesting issues are highlighted by peer-to-peer.
The first is that a peer-to-peer application is both a client and a server. So now everybody is running a server; that's what empowerment means. And now every person has to be paranoid, like a system administrator on a network. Welcome to empowerment.
Let's not exaggerate the risks of peer-to-peer. If somebody exploits a buffer overflow on your Web site or database server, you really have to worry. If somebody exploits a buffer overflow on a random PC in your organization, you probably have less to lose (unless the PC belongs to a vice president and the vice president has left the corporate plans for the upcoming year on the PC). So the first thing companies should do with peer-to-peer is teach their vice presidents to encrypt their files. (Even Windows now has an Encrypted File System.) Apart from that, developers have some responsibilities toward peer-to-peer applications.
It's time to get security right, as much as we can. The long list of recent vulnerabilities in free software that Linux Weekly News published just last week is cautionary. Perhaps it's time to move away from the C language as the default application platform. Can free software developers pledge to use C only when it's called for--that is, when speed is absolutely critical or the software needs access to low-level hardware registers? Otherwise, you should make it a habit to code in something more secure like Java, and accept the performance cost. I'm not religious about Java or any other platform; if you want, you could code in Scheme, probably at a higher performance cost. But let's eliminate the most common classes of security flaws.
The second interesting security issue in peer-to-peer is that it tends to break out from the local area network; the old, rigid idea of the "intranet" is cast aside. You are sending data back and forth with people across the Internet, and you have to protect that data. So we'll finally see end-to-end security everywhere. Firewalls will be less important, although they're still good for preventing common spoofing and denial-of-service attacks.
Data integrity and privacy are known technologies; they may not be simple, but the computer field has plenty of experience implementing them. We have to teach people how to use encryption, how to do integrity checks on files, how to distribute signatures, and how to form Webs of Trust. A Web of Trust is a classic peer-to-peer architecture, and it may be the best type of architecture for certificates. Big, bureaucratic certificate authorities embody risks; that came home to everybody last year when somebody managed to wrangle a false certificate from Verisign certifying that he was Microsoft. Let's take the wonderful PGP (Pretty Good Privacy) infrastructure we've got in the free software movement, and build reputation and trust systems on it. Advogato is a classic free-software example. Some say that the Web of Trust is not scalable, but to a certain extent our relationships with other people are not scalable either. That criterion may not be so important.
Beyond those basics come authentication, and reputation, and non-repudiation, and update protection, and all sorts of other trust issues. Resource allocation also gets mixed in. Someone may put a camera on his bike and believe that the sixty hours of footage he took on his cross-European ride is the greatest thing to share on a peer-to-peer system, but we have to restrain him from uploading it unless he's willing to donate some other resource we can use. These are the sorts of issues that are built on top of the standard security infrastructure of encryption, digests, and signatures.
I find it interesting that we've come so far with a system of privileges that basically comes down to assigning a user to every major function of the system (like mail, DNS, databases, and so on) and allowing that user sole access to the files used by that function. That's a very simple privilege system! The traditional Unix permissions don't go much further. I have encountered hardly any projects that even use the Unix group permission, although the group bits of the inode have moonlighted as a useful place to store hidden information about the file.
Access control lists were supposed to be the big advance in computer security. But I never heard anybody say, "Boy, access control lists really saved this project! If I couldn't assign that user GRANT READ privileges. . . ."
Still, we're going to need a new security superstructure for peer-to-peer. I don't know whether this means we should all be running a system that contains mandatory access control, like SELinux or TrustedBSD, or--the following strikes me as more likely--whether free software developers should work on systems for checking and sharing reputation, such as the one used in Free Haven.
I want to digress for a minute and talk about a subject that's very important for some members of the peer-to-peer community: anonymity. There are problems combining anonymity or pseudonymity with other goals in peer-to-peer systems. Resource allocation, for instance: If you can't exclude someone, you can't control resources.
Peer-to-peer systems like Freenet allow anonymous users, but I think they're incompatible with resource management for the same reason. Pseudonymity is no better, so long as a person can always come back bearing a new pseudonym. Maybe there are clever ways to make people donate resources and gradually raise their participation over time, but those would raise barriers to new users. So I'm not sure anonymity is conducive to robust peer-to-peer systems, at least systems that value some kind of persistence or ongoing collaboration among users. However, anonymity is a very important goal, and there should be systems in place to protect it where appropriate.
I mentioned earlier that proprietary software companies have trouble finding their niches in peer-to-peer. But I think there's a great business opportunity in reputation and resource management systems. Since they are services, they could be developed as free software and become commercially viable. Such services do not have to become massive and try to encompass all kinds of users; they could be limited to serving one population or one application. I think the potential for doing business in the area of reputation and resource management is enormous.
The big problem such a business has to solve, as with any system involving online identities, is using the proper out-of-band procedures to verify and collect information on participants. Perhaps a legally notarized document would be enough.
|
I feel I should say something about desktops, because there's a stunning collection of desktop developers at this conference. I believe that developers in both the KDE and GNOME projects would agree with me that desktops are more than pretty pictures. The component architectures reveal that philosophy. I'm particularly interested in using CORBA to tie together components, which I know was a subject of big debate in the KDE and GNOME communities. Without trying to guess whether Bonobo is the right approach, I'll just say that I'm intrigued by the idea of building CORBA's distributed model right into the desktop and making it automatically available to each application. Microsoft's hope, when it built DCOM, was to allow applications to pull in services from other computers, but that never happened. Now, of course, it's trying again. I don't think we should give up on the ideal; it's a good subject for people interested in peer-to-peer because it opens up all the same concerns peer-to-peer does about resource utilization, security, the reliability of servers, response time, and so forth.
But most of the topics I've mentioned in my talk should be carried out below the desktop level. The reason is simple: Users should be able to interact directly with programs, and programs to interact with other programs. Where is the sweet spot, where desktop developers can support the ground-breaking innovations going on in peer-to-peer while providing something relevant to the user interface?
I have a couple ideas for general functions you can offer users. One is presence, which is a well known concept from instant messaging services: It indicates whether you want to be interrupted and who can interrupt you. Jabber may provide the solution. Desktop developers have a lot to offer here because there's no reason the notion of presence should be internal to a single application. It could apply to everything from email to telephone calls to the coworker who walks in to complain that someone in the quality assurance department has asked her to review the code of the lazy bum who just went out of town because the work he'd done on another project blew up at the site of a major customer and her boss says she should put aside what she's doing right now and play along.
Along with presence, a desktop can help users manage the other side of the connection: who's allowed to be part of a shared P2P space and what their privileges are.
Another piece of infrastructure that might be nice on the desktop is management for a user profile. A lot of interesting peer-to-peer applications are based on collaborative filtering. Do you like to collect white papers about bandwidth? Have you answered a lot of people's questions about Perl? Information like this, stored in a user profile, helps a peer-to-peer system know whether to send a question to you, offer a document you might be interested in, or use your input in rating something. These kinds of peer-to-peer collaborative applications are also intriguing because they gently explore the possibilities of adaptive interfaces, systems that keep track of your preferences and alter the system's behavior as you go along, without presenting you with an annoying dialog box you have to click whenever the systems notice something.
The information you store in your profile is quite private, of course, so unlike proprietary systems, an open peer-to-peer system stores it on the individual's computer. That doesn't completely prevent information leaks through security breaches, but it tries to establish the principle that you own your data. A desktop could help somebody keep this information up to date and indicate when it's OK to share it.
I've gone over the general work that needs to be done for peer-to-peer applications. Now I'll just highlight a couple types of applications that I find exciting: distributed file systems and collaboration tools.
Some would complain that I left out the important application known as distributed computation or grid computing. But I feel I have already covered it adequately, because all P2P involves distributed computing. When you access data, or do something on an end user's site, you're using his or her computing power, aren't you? Distributed computing is just a particularly focused example of peer-to-peer, and the general infrastructure issues I've already listed apply.
A lot of people learned about Napster, Freenet, Gnutella, KaZaA, etc., and said, "Oh, they're just ways of transferring illicit files." Not at all. These systems were laying the basis for a new type of distributed file system, a new generation that goes way beyond NFS or AFS. The new file systems exploit existing wasted space on end user systems instead of using central storage on the servers. They break files up into pieces so you can retrieve the pieces quickly from multiple systems, encrypt the files for security, and provide an indexing service.
You've probably heard of OceanStore, which is an impressive example of the new file systems. But OceanStore is one of those heavyweight systems I mentioned near the beginning of this speech. It's a comprehensive solution to a very ambitious set of requirements, and I don't really think most people would run it. There are commercial systems that use these same principles but have much lower ambitions and are consequently more mature and widespread. I'd like free software systems to have access to this new generation of file systems.
The other interesting application is collaboration. Jabber looks very promising. It's so well thought out, and offers so much as a platform, and is so easy to program, that I'm convinced it will fly. The problem is the dearth of applications, so far, that run on top of Jabber. Jabber is like a new computer for which no programs have yet been written; few people can see its potential. Now we need applications that use Jabber to do all the great things you can do with structured data, such as in the example I gave earlier of political discussions.
When it gets really fun is when Jabber is used for program-to-program communication. With XML, you can structure communications so much you don't even have to be there. And this is a valuable goal, because none of us really want to spend all our time on computers. We'd rather be strolling through the forest or playing the violin or doing something else interesting. And so now I'll stop talking about software.
Don't miss the O'Reilly Emerging Technology Conference, May 13-16 in Santa Clara, California. This year, we'll explore how P2P and Web services are coming together in a new Internet operating system. Register by March 22 and save up to $695.
Andy Oram is an editor for O'Reilly Media, specializing in Linux and free software books, and a member of Computer Professionals for Social Responsibility. His web site is www.praxagora.com/andyo.
Return to the Web Services DevCenter.
Copyright © 2007 O'Reilly Media, Inc.