Articles Weblogs Books School Short Cuts Podcasts  

Tapping the Matrix, Part 2
Pages: 1, 2

Bandwidth and Hosting

An important consideration for any DC project involves determining network bandwidth utilization. It is important to consider bandwidth from both the SuperNode server's end as well as from a PeerNode's perspective.

Many popular DC projects package data into chunks that take PeerNodes days or sometimes weeks to process. In these cases, the actual network bandwidth utilization on the client side is negligible by average end-user network usage patterns. For example, accessing a single high-bandwidth web site with lots of graphics can use more bandwidth than most DC clients use in weeks.

The situation on the server side offers a very different perspective, where a single SuperNode server might service thousands (or, if truly popular, millions) of requests per day.

It is important to examine the frequency and size of data transmissions and to adequately plan for bandwidth requirements. One of the best ways to study the behavior of a network application is to use a network analyzer. Ethereal is a good tool for examining what is actually going through the wire; however, network-bandwidth measuring tools and load-test simulations should also be used to gain better insights into the application's true requirements.

Understanding the project's network requirements is necessary in order to choose the right server-hosting plan. Proper planning is essential, because if your project becomes successful, you may discover you can not afford to pay for the bandwidth costs. Even if you do not pay for bandwidth, the bottom line is that someone, somewhere, does pay, and without proper analysis, your project may be terminated prematurely.

The best course of action is to plan for bandwidth, carefully consider your data protocol, and potentially, use data-compression techniques.


Your favorite computer's days are numbered: it's just a matter of time before a key component, such as a hard drive or power supply, breaks down. Because distributed computation projects typically deal with vast amounts of data, it is absolutely vital that you develop a backup strategy.

It is virtually impossible (or exceedingly expensive) to guard against data loss. The key is to minimize your risk exposure. For instance, if you perform a backup once per day, it's possible that you may lose nearly a day's worth of data in the event of a hardware failure. Thus, backing up data once per hour minimizes the risk of losing an entire day's work. Data mirroring using redundant hardware is certainly the way to go if you can afford it, however you'll still need a backup policy.

An overwhelming majority of individuals don't perform regular data backups. The reason is actually quite simple -- it's a chore to do so. The best way to ensure that data is backed up regularly is to automate the process. On UNIX-type systems, the process is greatly simplified using the crontab scheduling system and archive scripts. On other systems, you may need to explore backup software solutions.

Another key backup strategy is the concept of offsite storage. In addition to redundant storage (storage on multiple machines) and CD archives, I use a service called Xdrive as an offsite storage facility.

Protect your data. Hardware may be expendable, but loss of data may cripple your project. Yes, this is potentially one place where paranoia may really pay off.

A Value Proposition

We've skimmed the surface of many, but by no means all, of the technical considerations you might encounter. However, not all of the issues you'll encounter will be technical in nature. There is a very human aspect to distributed computing, and failure to understand the human elements will seriously jeopardize your project's longterm viability.

In the past, the notion that individuals would pay for, and allow their computers to participate in, research projects was foolhardy, at best. Times have changed. Today, millions of people participate in distributed computation projects. As a result, we are now able to tap a wealth of computing resources. However, there is one small catch: we must convince people to join our projects.

If you are interested in getting people to join your project, you need to create a value proposition offering an enjoyable and rewarding experience in exchange for participation. In addition, you need to consider how to retain members once they have joined. The best way to begin to address these issues is by understanding the underlying motivators that attract people to distributed computation projects.

Why People Join DC Projects

You may be wondering what drives a person to contribute their time, energy, and the use of their computer to a distributed computation project. Although specific reasons vary, there are a few common themes that consistently appear in DC projects.

A Sense of Purpose
Some members are motivated by a deep sense of purpose. Projects such as FightAids@home and the University of Oxford's cancer research project offer individuals the opportunity to support noble research that might ultimately benefit millions of people.

A Sense of Community
Many active members enjoy being part of a community and collaborating with other people. Generally, people like to be involved in things that transcend them as individuals.

Competitive Opportunities and Peer Recognition
Members want to know that their contributions matter. All major distributed computing projects track member contributions and post the results on the project's web site. Members gain the respect of their peers and obtain subculture ranking within communities.

Participating in a distributed computing project can be entertaining in a number of ways. Meeting people and competing against them can be entertaining.

Successful projects understand the needs we've just examined and seek ways of promoting them within the context of the project.

The Distributed Computing Scene

Distributed computing projects have given birth to communities of enthusiasts who closely support projects. In turn, project web sites publicly display project statistics and member ranking (sometimes referred to as leaderboards) offering individuals a convenient way to compare their ranking against those of their peers. This has led community members to form teams, which compete against one another to see which group can make the most significant contributions to a project. Project organizers are eager to support competition because the results typically lead to teams recruiting more members and subsequently, more computers.

Distributed computing team members have adopted the moniker "DC Team," and members refer to themselves as "DC'ers." Many DC'ers take their hobby seriously, and many run two or more machines, with some running as many as 40 or more while participating in various projects.

When asked why he contributes to projects, DC'er Chris Harrell replied, "I like to think I solely pursue DC projects for the common good of mankind, but I cannot deny the fact that the project statistics are the main attraction for 99% of DC contributors." Chris is far from alone; for many DC'ers, interest in a project comes second to competing for public ranking.

When I started ChessBrain, a global project to build the world's largest distributed chess computer, I was surprised to discover contributors who had very little interest in the game of chess. This was my first introduction to a network economy where DC teams support research projects in exchange for an opportunity to compete against one another.

International teams, like the Dutch Power Cows and AnandTech, claim to have thousands of members. DC Teams have become a powerful force in helping to shape the future of distributed computation projects on the net, by providing a highly technical member base with access to thousands of machines. They are the unsung heroes of a new age.

Establishing Credibility and Trust

Project participants have many projects to choose from, but don't mind exploring a project for a brief time, in order to get a sense of it. However, potential members won't waste their time participating in a project that doesn't appear worthwhile.

Before a significant number of people take interest in a new project, it must first establish a certain degree of credibility. Establishing credibility begins by clearly articulating goals and demonstrating the project's commitment to achieving a measurable result. The project must clearly communicate the message: "This project is worth your time!"

Most distributed computing projects maintain web sites, which articulate the project goals, present project status reports, and offer software download areas. In some cases, project web sites feature online forums where members can post feedback directly to the development team and other members. A project's community forum offers project leaders and members opportunities to publicly engage in conversations. The presence of a public forum can go a long way toward communicating the commitment and seriousness of a project.

One way of gaining credibility is through association. Some DC projects enjoy near-instant credibility when well-established institutions or well-known companies sponsor them.

In the process of building credibility, you must also establish a relationship based on trust. Generally, participants must believe in the credibility and trustworthiness of a project before downloading and running potentially malicious software on their machines and networks.

One of the surest ways of establishing trust is to engage in direct conversations with potential members via email, on a project web site, and on other public forums. Nothing says, "your voice matters" faster than a prompt reply to a member's inquiry. Although this isn't always possible, the goodwill generated is worth its weight in gold.

Open and honest communication is a tool that tears down relationship barriers, and helps foster healthy and productive relationships. This is a point that is often difficult to remember when coping with difficult people. Let's face it – public relations can be a difficult job. Freedom of networked speech often results in members pretty much saying whatever they want while publicly venting frustrations. These sorts of behavior can quickly erode a project's credibility as mob-like conditions lead others to join in. This is where, as a leader, you must exercise the most restraint. Months and possibly years of relationship building can quickly crumble as a result of an ill-prepared response.

It is important to remember that project contributors give freely of themselves and that it is difficult to run a distributed computation project without them. Exercising tempered restraint and maintaining an eternal state of gratitude is vital to maintaining a successful project.

The most important element in a distributed computation project remains the people and communities who join together to unlock the vast potential of distributed machines. To paraphrase the Matrix: They are the gate keepers. They are guarding all the doors. They are holding all the keys.

Years ago, Sun Microsystems promoted their marketing slogan: "The network is the computer." Although this still remains relevant in the context of distributed computing, I'd like to offer another mantra: "The people are the network." The machines are simply tools that allow us to touch, if for just a moment, the very limits of our imaginations.

Carlos Justiniano is a software architect with Y3K Secure Enterprise Software Inc., where he focuses on data security, communications, and distributed computing.

Return to