Tapping the Matrix, Part 1
Pages: 1, 2
The goal of a SuperNode server is to dispatch work units to PeerNode clients and to later collect results. There is no shortage of ways to accomplish those goals; however, because many distributed computing projects rely on HTTP as an underlying protocol, the use of a web server is by far the most common approach.
Web servers form the foundation on top of which a distributed application can be hosted. Using a scripting language such as ASP or PHP, a web server can be transformed into a custom application server.
Let's say that we've written a PeerNode client that processes a chunk of data and returns an XML-encoded result to our project's SuperNode server. Our SuperNode server is an Apache web server running the PHP scripting language with access to a local database server. We've written a script to process the incoming XML data and store the results in a database table.
The XML data might look like this:
<client ID="firstname.lastname@example.org" ver="3.12.02" />
The result package consists of a user's information (in this case, the user's email address), followed by the version number of the client software that was used to process the original data. Additionally the result package contains a work unit identifier, followed by a resulting value.
On the server side, we have a PHP script that looks something like this:
header("Content-type: text/xml"); function GetFieldData($string, $param)
$s = strstr($string, $param);
$start = strpos($s,'"',0) + 1;
$end = $start;
while ($s[$end] != '"')
$end++; $sval = substr($s,$start,$end - $start);
} $data = $HTTP_RAW_POST_DATA;
: More code here…
In the prior code fragment, the
$data variable will contain the XML data sent from the PeerNode client.
$HTTP_RAW_POST_DATA is a built-in PHP variable that holds the data that was received via an HTTP
POST operation from a remote client.
Later in the script code, we'll use the
GetFieldData() function to extract some of the XML values. For example:
$clientID = GetFieldData($data, "ID");
$workunitID = GetFieldData($data, "WUID");
$resultValue = GetFieldData($data, "value");
After extracting the values, we'll store them in a database using a code fragment like this:
$dbinsertstring = "INSERT INTO tblResults VALUES($clientID, $workunitID, $resultValue)"; mysql_query($dbinsertstring);
In our PHP example, a single script was used to receive the XML data via HTTP, extract the data values, and finally, store the result in a database. While the specific function calls will be different when using other scripting languages, the general approach and ease of use remains similar.
Leveraging the proven flexibility, reliability, and robustness of a popular web server can be a wise decision. However, using a web server is entirely optional. After careful consideration, you may conclude that a custom server is the only viable approach for your project. The general operations would be roughly the same as our prior example, but taking a custom server approach would be considerably more complex. This is because developing a concurrent multi-threaded server application is a non-trivial task, and one worth avoiding if at all possible. However, if you're a glutton for punishment and enjoy TCP/IP programming, data structures, threads, and sleepless nights, a custom server might just be what the doctor ordered. Having written several custom servers, I can say that there are better ways of cultivating gray hairs. Fortunately for the intrepid developer, there are many great books available that explore server and protocol development.
Client Software Considerations
The development of the PeerNode client requires that you carefully consider several important issues; additionally, the complexity of the client software development will strongly depend on the approach you chose.
Let's consider a few vital questions:
- Which platforms will your project software support? Will you develop exclusively for Microsoft Windows, Linux, or Mac OSX? Will you attempt to support multiple platforms?
- Will your client software have external requirements, such as a dependency on a runtime library or framework such as Java or .NET?
- Will you build your client software using a high-level network programming library, or will you code using lower-level TCP/IP sockets.
Naturally, the more platforms your project supports, the broader your user base may become. This is a strong argument for supporting multiple platforms despite the resultant complexities. One way to sidestep many of the platform issues is to use a cross-platform development tool. For example, if you develop your client software using Java, then your users may only need the Java runtime software on their machines in order to run your client software. Another approach is to use a highly portable programming language such as C, C++, or Perl. Just keep in mind that while a core language may itself be portable, depending on runtime libraries and non-standardized language features can quickly compromise portability. A good example is developing a Windows client using the Microsoft Foundation Classes (MFC) and then trying to port the application for use on Linux. The key is to develop the application while using and testing on the target platforms. Don't introduce a feature unless you know for certain that the feature is present on all of your target platforms.
If you're not concerned with supporting multiple platforms, you might consider Microsoft .NET. At this time, .NET requires a 20MB framework download when used with older versions of Windows. Future releases of Windows will include a version of the .NET framework, so this may be less of an issue moving forward.
All of the programming languages we've considered in this article support the use of TCP/IP programming; however, the complexities increase depending upon the approach you take. For example, you can take a high-level approach, which shields you from specific network programming calls, or you can take the lower-level approach, where you'll handle the reading, writing, and buffering of network data.
The strongest argument for using a readymade framework, such as BOINC, is that many of the complexities of network programming are virtually eliminated.
Regardless of the software development approach you decide on, you'll still need to carefully consider usability issues. All software needs to be developed with an understanding of the target audience. Who will run the software? What skills are end users expected to posses? Understanding the answers to these and other important usability questions can directly translate to whether your project will be well received and ultimately, well-supported. To gain wide acceptance, your client software needs to shield end users from unnecessary complexities.
The SETI@home project was the first successful DC project to recognize the ubiquity of the PC screensaver as a vehicle for harnessing idle processing cycles. There are now several well-understood reasons why screensavers have proven ideal. Most people view screensavers as both innocent and non-intrusive. People know that screensavers only start when a machine is typically idle. Also, people in offices enjoy displaying their screensavers rather than simply turning off their monitors. This makes screensavers an ideal tool for advertising a project, often leading to a "word of mouth" marketing effect (See "The ChessBrain Project: A Global Effort To Build The World's Largest Chess SuperComputer," Justiniano, C and Frayn, (2003), Journal of the International Computer Games Association; ICGA Journal Vol. 26, No. 2, pp. 132-138.) Not all projects support the use of a screensaver; however, the merits are certainly worth considering.
Building your client software to require as little technical experience as humanly possible should be one of your most important goals.
The database server has become an indispensable fixture in distributed computing projects. While conventional wisdom dictates that databases are used for storing data, databases have matured into application-development tools that far exceed their originally intended uses.
Database servers are commonly distributed throughout a network and remotely accessed by other applications and servers. This allows distributed applications to communicate with one another by reading and writing database records. Is this the best way to perform inter-process communication? Perhaps not, but bear with me. Database servers also help to distribute an application's memory requirements among one or more servers. Without a database server, an application might need to maintain data structures in memory and on disk, taking away from the total available memory and negatively impacting overall performance. Database servers also help to distribute the data-processing load that results from the need to search, sort, and tabulate results. Finally, when properly utilized, database servers help glue together distributed applications to help achieve maximum scalability.
Application development can be a complex endeavor, and database servers have the potential of simplifying difficult tasks. There are powerful database systems available for every budget, leaving little reason not to use them in your own project.
Whether you choose to leverage existing servers and platforms (such as Apache, MySQL and PHP) or build your own custom server, it is vital that you consider how you'll test your project's software. An all-too-common mistake is to develop software without first considering how it will be tested. In the end, some software products cost more to test than they cost to actually develop in the first place.
Scripting languages provide a means of performing serious testing. Scripts can be used to test a server's functionality and its ability to withstand excessive stress. Scripts are easily modified and repurposed, and once written and tested, they allow for subsequent quality-assurance regression testing, ensuring that the server performs correctly after any recent modifications.
Another important consideration is to simulate end-user environments as closely as possible in order to perform accurate tests. It is essential to install proxy servers and firewalls during the software research and development stage to ensure proper behavior. In particular, the use of proxy caching servers should be carefully examined. A caching server seeks to reduce network traffic by storing and reusing data. This behavior can introduce problems for applications that use HTTP.
The use of open standards such as HTTP and XML ensure that both professional testing houses and informal testers can assist with the product testing. You see, interoperability does have its benefits.
Coming in Part Two
Next week, I'll continue this exploration into distributed computing by discussing network failures, security, software updates, and backup. See you then.
Return to OpenP2P.com.