Both Akamai and BitTorrent address the challenge of distributing large volumes of information across huge networks, striving to minimize bandwidth consumption and delays that users might notice. Their approaches to solving these problems, however, are very different.
Software as a Service
The Synchronized Web
You can find more information on these patterns in Chapter 7, Specific Patterns of Web 2.0.
Akamai and BitTorrent both avoid the issue of a single host trying to supply bandwidth-intensive content to a potentially global audience. A single server starts to slow down as it reaches its maximum capability, and the network in the immediate vicinity of the host server is affected because it’s handling a higher amount of traffic.
Again, the incumbent in this case (Akamai) has significantly changed its mechanics and infrastructure since the date of the original brainstorming session when the comparison of Web 1.0 and Web 2.0 was made (as depicted in Figure 3.1, “Tim’s list of Web 1.0 versus Web 2.0 examples”). Accordingly, understanding the patterns and advantages of each system is a good idea for budding Web 2.0 entrepreneurs. You shouldn’t view Akamai as antiquated. It is performing tremendously well financially, far outstripping many of the Web 2.0 companies mentioned in this book. It’s been one of NASDAQ’s top-performing stocks, reporting 47% growth and revenues of $636 million in 2007. With 26,000 servers, Akamai is also a huge Internet infrastructure asset.
Akamai’s original approach was to sell customers a distributed content-caching service. Its aim was simply to resolve bandwidth issues, and it solved that problem very well. If a customer like CNN News decided to host a video of a newscast, the content on the CNN server would be pulled through the Akamai network. The centrally located CNN server bank would modify the URIs of the video and other bandwidth-intensive content by morphing them to URLs of resources that were easier for the client making the request to access, often because they were hosted in physically closer locations. The client’s browser would load the HTML template, which would tell it to hit the Akamai network for the additional resources it required to complete the content-rendering process. At the time of this writing, end users do not see any indication of Akamai.com being used (although streaming videos do require modification of URLs).
Figure 3.6, “Overview of Akamai core pattern (courtesy of Akamai)” shows Akamai’s core architecture (as analyzed when used in Figure 3.1, “Tim’s list of Web 1.0 versus Web 2.0 examples”).
Pulling richer media (the larger files) from a system closer to the end user improves the user experience because it results in faster-loading content and streams that are more reliable and less susceptible to changes in routing or bandwidth capabilities between the source and target. Note that the Akamai EdgeComputing infrastructure is federated worldwide and users can pull files as required. Although Akamai is best known for handling HTML, graphics, and video content, it also offers accelerators for business applications such as WebSphere and SAP and has a new suite to accelerate AJAX applications.
BitTorrent is also a technology for distributing large amounts of data widely, without the original distributor incurring all the costs associated with hardware, hosting, and bandwidth resources. However, as illustrated in Figure 3.7, “BitTorrent’s pattern of P2P distribution”, it uses a peer-to-peer (P2P) architecture quite different from Akamai’s. Instead of the distributor alone servicing each recipient, in BitTorrent the recipients also supply data to newer recipients, significantly reducing the cost and burden on any one source, providing redundancy against system problems, and reducing dependence on the original distributor. This encompasses the concept of a “web of participation,” often touted as one of the key changes in Web 2.0.
BitTorrent enables this pattern by getting its users to download and install a client application that acts as a peer node to regulate upstream and downstream caching of content. The viral-like propagation of files provides newer clients with several places from which they can retrieve files, making their download experiences smoother and faster than if they all downloaded from a single web server. Each person participates in such a way that the costs of keeping the network up and running are shared, mitigating bottlenecks in network traffic. It’s a classic architecture of participation and so qualifies for Web 2.0 status, even if BitTorrent is not strictly a “web app.”
The BitTorrent protocol is open to anyone who wants to implement it. Using the protocol, each connected peer should be able to prepare, request, and transmit files over the network. To use the BitTorrent protocol to share a file, the owner of the file must first create a “torrent” file. The usual convention is to append .torrent to the end of the filename. Every *.torrent file must specify the URL of the tracker via an “announce” element. The file also contains an “info” section that contains a (suggested) name for the file, its length, and its metadata. BitTorrent clients use the Secure Hashing Algorithm-1 (SHA-1) to make declarations that let any client detect whether the file is intact and complete.
Decentralization has always been a hallmark of the Internet, appearing in many different guises that come (and sometimes go) in waves. Architecturally, this pattern represents a great way to guard against points of failure or slowdowns, as it is both self-scaling and self-healing. A very elegant architectural trait of peer to peer in particular is that the more people there are interested in a file, the more it will propagate, resulting in more copies being available for download to help meet the demand.
By the time the first Web 2.0 conversations started, the first incarnations of MP3.com and Napster were both effectively history. Neither of them was particularly well liked by the music industry, for reasons that feed into Web 2.0 but aren’t critical to the comparison between them. Their business stories share a common thread of major shift in the way music is distributed, but the way they went about actually transferring music files was very different, mirroring the Akamai/BitTorrent story in many ways.
Software as a Service
The Synchronized Web
Declarative Living and Tag Gardening
Persistent Rights Management
You can find more information on these patterns in Chapter 7, Specific Patterns of Web 2.0.
The music industry has historically been composed of three main groups: those who create music (writing, recording, or producing it); those who consume it; and those who are part of the conventional recording and music distribution industry, who sit in the middle (see Figure 3.8, “Conventional music industry model”).
Historically, music publishing and distribution has been done via physical media, from 78s to CDs. If you abstract the pattern of this entire process, you can easily see that the storage of music on physical media is grossly inefficient (see Figure 3.9, “Distribution pattern for audio content”).
Figure 3.10, “The electronic music distribution pattern” contains two “Digital Signal” points in the sequence. The persistence to some form of physical storage medium is unnecessary for people who are capable of working directly with the digital signal. If you’re providing a digital signal at the source, the signal can travel from the source to its ultimate target in this digital form, and the media is consumed as a digital signal, why would it make sense to use a non-digital storage medium (such as CD, vinyl, or tape) as an intermediate step? The shift to digital MP3 files has made the middle steps unnecessary. Figure 3.10, “The electronic music distribution pattern” depicts a simpler model that has many advantages, except to those whose business models depend on physical distribution.
For instance, this new pattern is better for the environment, because it does not involve turning petroleum products into records and CDs, or transporting physical goods thousands of miles. It satisfies people’s cravings for instant gratification, and it lets consumers store the music on physical media if they want, by burning CDs or recording digital audio tapes.
The old model also had one massive stumbling block: it arguably suppressed a large percentage of artists. For a conventional record company to sign a new artist, it must make a substantial investment in that artist. This covers costs associated with such things as recording the music, building the die for pressing it into physical media, and printing CD case covers, as well as the costs associated with manufacturing and distributing the media. The initial costs are substantial: even an artist who perhaps produces only 250,000 CDs may cost a record company $500,000 to sign initially. This doesn’t include the costs of promoting the artist or making music videos. Estimates vary significantly, but it’s our opinion that as a result, the conventional industry signs only one out of every 10,000 artists or so. If a higher percentage were signed, it might dilute each artist’s visibility and ability to perform. After all, there are only so many venues and only so many people willing to go to live shows.
An industry size issue compounds this problem. If the global market were flooded with product, each artist could expect to capture a certain portion of that market. For argument’s sake, let’s assume that each artist garners 1,000 CD sales on average. Increasing the total number of artists would cause each artist’s share of the market to decrease. For the companies managing the physical inventory, it’s counterproductive to have too much product available in the marketplace. As more products came to market, the dilution factor would impact sales of existing music to the point where it might jeopardize the record company’s ability to recoup its initial investment in each artist.
Love’s Manifesto, a speech given by Courtney Love during a music conference, illuminates several of the problems inherent in the music industry today and is a brilliant exposé of what is wrong with the industry as a whole (pun intended) and the realities faced by artists. You can read the speech online at http://www.indie-music.com/modules.php?name=News&file=article&sid=820.
Producers and online distributors of digital music benefit from two major cost reductions. In addition to not having to deal with physical inventory and all its costs, they also offload the cost of recording music onto the artists, minimizing some of the risk associated with distributing the work of previously unsigned bands. These companies often adopt a more “hands off” approach. Unlike conventional record companies, online MP3 retailers can easily acquire huge libraries of thousands of new, previously unsigned artists. They don’t need to censor whose music they can publish based on their perceptions of the marketplace, because adding tracks to their labels poses minimal risk. (They do still face some of the same legal issues as their conventional predecessors, though—notably, those associated with copyright.)
This approach also has significant benefits for many artists. Instead of having to convince a record company that they’ll sell enough music to make the initial outlay worthwhile, new independent artists can go directly to the market and build their own followings, demonstrating to record companies why they’re worth signing. AFI, for example, was the first MySpace band to receive more than 500,000 listens in one day. Self-promotion and building up their own followings allows clever artists to avoid record companies while still achieving some success.
In this model, artists become responsible for creating their own music. Once they have content, they may publish their music via companies such as Napster and MP3.com. Imagine a fictional company called OurCo. OurCo can assimilate the best of both the old and the new distribution models and act as a private label distribution engine, as depicted in Figure 3.11, “The best of the old and the new distribution models”.
When analyzing P2P infrastructures, we must recognize the sophistication of the current file-sharing infrastructures. The concepts of a web of participation and collaboration form the backbone of how resources flow and stream in Web 2.0. Napster is a prime example of how P2P networks can become popular in a short time and—in stark contrast to MP3.com—can embrace the concepts of participation and collaboration among users.
MP3.com, started because its founder realized that many people were searching for “mp3,” was originally launched as a website where members could share their MP3 files with each other.
The original MP3.com ceased to operate at the end of 2003. CNET now operates the domain name, supplying artist information and other metadata regarding audio files.
The first iteration of MP3.com featured charts defined by genre and geographical area, as well as statistical data for artists indicating which of their songs were more popular. Artists could subscribe to a free account, a Gold account, or a Platinum account, each providing additional features and stats. Though there was no charge for downloading music from MP3.com, people did have to sign up with an email address, and online advertisements were commonplace across the site. Although MP3.com hosted songs from known artists, the vast majority of the playlist comprised songs by unsigned or independent musicians and producers. Eventually MP3.com launched “Pay for Play,” which was a major upset to the established music industry. The idea was that each artist would receive payments based on the number of listens or downloads from the MP3.com site.
The original technical model that MP3.com employed was a typical client/server pattern using a set of centralized servers, as shown in Figure 3.13, “Typical client/server architecture model”.
MP3.com engineers eventually changed to a new model (perhaps due to scalability issues) that used a set of federated servers acting as proxies for the main server. This variation of the original architectural pattern—depicted in Figure 3.14, “Load-balanced client/server pattern with proxies” using load balancing and clusters of servers—was a great way to distribute resources and balance loads, but it still burdened MP3.com with the expense of hosting files. (Note that in a P2P system, clients are referred to as “nodes,” as they are no longer mere receivers of content: each node in a P2P network is capable of acting as both client and server.)
In Figure 3.14, “Load-balanced client/server pattern with proxies”, all nodes first communicate with the load balancing server to find out where to resolve or retrieve the resources they require. The load balancing server replies with the information based on its knowledge of which proxies are in a position to serve the requested resources. Based on that information, each node makes a direct request to the appropriate node. This pattern is common in many web architectures today.
Napster took a different path. Rather than maintaining the overhead of a direct client/server infrastructure, Napster revolutionized the industry by introducing the concept of a shared, decentralized P2P architecture. It worked quite differently from the typical client/server model but was very similar conceptually to the BitTorrent model. One key central component remained: keeping lists of all of the peers for easy searching. This component not only created scalability issues, but also exposed the company to the legal liability that ultimately did it in.
Napster also introduced a pattern of “Opting Out, Not Opting In.” As soon as you downloaded and installed the Napster client software, you became, by default, part of a massive P2P network of music file sharers. Unless you specifically opted out, you remained part of the network. This allowed Napster to grow at an exponential rate. It also landed several Napster users in legal trouble, as they did not fully understand the consequences of installing the software.
P2P architectures can generally be classified into two main types. The first is a pure P2P architecture where each node acts as a client and a server. There is no central server or DNS-type node to coordinate traffic, and all traffic is routed based on each node’s knowledge of other nodes and the protocols used. BitTorrent, for example, can operate in this mode. This type of network architecture (also referred to as an ad hoc architecture) works when nodes are configured to act as both servers and clients. It is similar conceptually to how mobile radios work, except that it uses a point-to-point cast rather than a broadcast communication protocol. Figure 3.15, “Ad hoc P2P network” depicts this type of network.
In this pure-play P2P network, no central authority determines or orchestrates the actions of the other nodes. By comparison, a centrally orchestrated P2P network includes a central authority that takes care of orchestration and essentially acts as a traffic cop, as shown in Figure 3.16, “Centrally orchestrated P2P network”.
The control node in Figure 3.16, “Centrally orchestrated P2P network” keeps track of the status and libraries of each peer node to help orchestrate where other nodes can find the information they seek. Peers themselves store the information and can act as both clients and servers. Each node is responsible for updating the central authority regarding its status and resources.
Napster itself was a sort of hybrid P2P system, allowing direct P2P traffic and maintaining some control over resources to facilitate resource location. Figure 3.17, “Conceptual view of Napster’s mostly P2P architecture” shows the classic Napster architecture.
Napster central directories tracked the titles of content in each P2P node. When users signed up for and downloaded Napster, they ended up with the P2P node software running on their own machines. This software pushed information to the Napster domain. Each node searching for content first communicated with the IP Sprayer/Redirector via the Napster domain. The IP Sprayer/Redirector maintained knowledge of the state of the entire network via the directory servers and redirected nodes to nodes that were able to fulfill its requests. Napster, and other companies such as LimeWire, are based on hybrid P2P patterns because they also allow direct node-to-node ad hoc connections for some types of communication.
Both Napster and MP3.com, despite now being defunct, revolutionized the music industry. MySpace.com has since added a new dimension into the mix: social networking. Social networking layered on top of the music distribution model continues to evolve, creating new opportunities for musicians and fans.