Distributed Systems Topologies: Part 2
by Nelson Minar01/08/2002
In the first part of this two-part series, I presented a 10,000-foot view of how a framework for comparing distributed systems, based on system topology, is developed. In this second part, I introduce seven criteria for evaluating a system design and discuss their relative merits. Systems with hybrid toplogies often seem to demonstrate the advantages of the various constituent designs that comprise their makeup.
Evaluating System Topologies
In the first part of this series I described distributed systems in terms of their core topologies: centralized, decentralized, rings, hierarchies, and hybrids. Now I take advantage of this description by using it to evaluate system designs.
In this second part, I describe seven characteristics of distributed systems that are commonly used when talking about system design and then analyze each characteristic for each of the topologies. As with the topology descriptions, the same caution about the high-level nature of this analysis applies to this article. The observations made here are generalizations and may not apply to any specific system. The intent is to develop a broad framework for considering system design that can then be applied to specific domains.
Seven Evaluation Properties
For this article, I boil down all possible ways to evaluate distributed systems into seven properties. While not exhaustive, this set is chosen because these properties are often used when talking about the advantages or disadvantages of decentralized systems. The resulting framework is a useful shorthand for thinking about system design.
- Manageability
- How hard is it to keep the system working? Complex systems require management: updating, repairing, and logging.
- Information coherence
- How authoritative is information in the system? If a bit of data is found in the system, is that data correct? Non-repudiation, auditability, and consistency are particular aspects of information coherence.
- Extensibility
- How easy is it to grow the system, to add new resources to it? The Web is the ultimate extensible system; anyone can create a new Web server or Web page and immediately have that contribution be part of the Web.
- Fault Tolerance
- How well can the system handle failures? Fault tolerance is a necessity in large distributed systems.
- Security
- How hard is it to subvert the system? Security covers a variety of topics, such as preventing people from taking over the system, injecting bad information, or using the system for a purpose other than which the owners intend.
- Resistance to lawsuits and politics
- How hard is it for an authority to shut down the system? The designers of Gnutella or Freenet consider their resistance to lawsuits to be one of their best features. Other parties consider this property to be a danger.
- Scalability
- How large can the system grow? Scalability is often promoted as a key advantage of decentralized systems over centralized, although the reality is more complex.
Evaluating Simple Topologies
With these seven concepts in mind, we can look at each of the basic system topologies and evaluate their effectiveness.
Centralized
| Manageable | Yes | ![]() |
| Coherent | Yes | |
| Extensible | No | |
| Fault-Tolerant | No | |
| Secure | Yes | |
| Lawsuit-Proof | No | |
| Scalable | ? |
The primary advantage of centralized systems is their simplicity. Because all data is concentrated in one place, centralized systems are easily managed and have no questions of data consistency or coherence.
Centralized systems are also relatively easy to secure: there is only one host that needs to be protected. The drawback of centralization is that everything is in only one place. If the central server goes down, everything does. There is no fault tolerance, and the system is easy to shut down with a lawsuit. Centralized systems are also often hard to extend -- resources can only be added to the central system.
The scalability of centralized systems is subtle. Scale is clearly limited by the capacity of the server, and so centralized systems are often thought of as unscalable. But computers are very fast and a single computer can often support all the demands of its users.
For example, a modest computer running a Web server can easily handle hundreds of thousands of visitors a day. And unlike more complex topologies, the scalability of a centralized system is very easy to measure. So while, theoretically, centralized systems are not scalable, in practice they often suffice.
Ring
| Manageable | Yes | ![]() |
| Coherent | Yes | |
| Extensible | No | |
| Fault-Tolerant | Yes | |
| Secure | Yes | |
| Lawsuit-Proof | No | |
| Scalable | Yes |
Ring systems typically have a single owner. This concentration gives them many of the same advantages of centralized systems: they are manageable, coherent, and relatively secure from tampering.
The added complexity of the ring is mitigated by fairly simple rules for propagating state between the nodes in a ring. But the single-owner restriction means rings are also not extensible: a user still needs the owner's permission to add a resource like a music file or a Web page into the ring. Similarly, a lawsuit only needs to shut down the owner to shut down the whole ring.
The advantages of rings over centralized systems are fault tolerance and simple scalability. If a host goes down in a ring, failover logic makes it a simple matter to have another host cover the problem. And well-designed rings are scalable -- one can simply add more hosts to the ring and expand the capacity nearly linearly.
Hierarchical
| Manageable | Partially | ![]() |
| Coherent | Partially | |
| Extensible | Partially | |
| Fault-Tolerant | Partially | |
| Secure | No | |
| Lawsuit-Proof | No | |
| Scalable | Yes |
Hierarchical systems have a completely different set of advantages from that of rings. Hierarchical systems are somewhat manageable in that they have a clear chain of action. But because these systems have such a broad scope, it can be hard to correct a host with a problem. Coherence is usually achieved with a cache consistency type of strategy; effective, but not complete.
Hierarchical systems are extensible in that any host in the system can add data, but the rules of data management may limit what information can be added. (For example, the oreilly.com DNS server can add hosts for oreilly.com, but for no one else.)
Hierarchical systems are more fault-tolerant and lawsuit-proof than centralized systems, but the root is still a single point of failure. They tend to be harder to secure than centralized systems. If a node high in the hierarchy is subverted or spoofed, the whole system suffers. And it's not just the root that is a risk: if data travels up the branches to the root, then leaf nodes may be able to inject bad information to the system.
The primary advantage of hierarchical systems is their incredible scalability -- new nodes can be added at any level to cover for too much load. This scalability is best demonstrated in DNS, which has scaled over the last 15 years from a few thousand hosts to hundreds of millions. The relative simplicity and openness of hierarchical systems, in addition to their scalability, make them a desirable option for large Internet systems.
Decentralized
| Manageable | No | ![]() |
| Coherent | No | |
| Extensible | Yes | |
| Fault-Tolerant | Yes | |
| Secure | No | |
| Lawsuit-Proof | Yes | |
| Scalable | Maybe |
Decentralized systems such as Gnutella have almost the exact opposite characteristics as centralized systems. The far-flung nature of these networks means the systems tend to be difficult to manage and that data in the system is never fully authoritative. They also tend to be insecure, in the sense that it is easy for a node to join the network and start putting bad data into the system.
A primary virtue of decentralized systems is their extensibility. For example, in Gnutella any node can join the network and instantly make new files available to the whole network. Decentralized systems also tend to be fault-tolerant and harder to sue. The failure or shutdown of any particular node does not impact the rest of the system.
The scalability of decentralized systems is hard to evaluate. In theory, the more hosts you add, the more capable a decentralized network becomes. In practice, the algorithms required to keep a decentralized system coherent often carry a lot of overhead. If that overhead grows with the size of the system, then the system may not scale well. The Gnutella network suffered this problem in the early stages, and it remains to be seen if Gnutella can ever scale to the millions of active users that more centralized architectures enjoy. Scalability of decentralized systems remains an active research topic.
Evaluating Hybrid Topologies
System topologies become even more interesting when you combine them into hybrid architectures. Often, different topologies are chosen for different parts of a system to get the best of the strengths without the weaknesses.
Centralized + Ring
| Manageable | Yes | ![]() |
| Coherent | Yes | |
| Extensible | No | |
| Fault-Tolerant | Yes | |
| Secure | Yes | |
| Lawsuit-Proof | No | |
| Scalable | Yes |
Systems that have a ring as their central server often enjoy the best of the simplicity of centralization with the redundancy of a ring. The hybrid system is still easily managed, coherent, and secure; the ring does not add much complexity over a purely centralized system.
This combination still has a single owner and therefore is not particularly extensible or lawsuit-proof. The key advantage is that using a ring as the server adds fault-tolerance and scalability. The power and simplicity of the combination of rings and centralized systems explains why this architecture is so popular with serious server-based applications such as Web commerce and high-availability databases.
Centralized + Decentralized
| Manageable | No | ![]() |
| Coherent | Partially | |
| Extensible | Yes | |
| Fault-Tolerant | Yes | |
| Secure | No | |
| Lawsuit-Proof | Yes | |
| Scalable | Apparently |
A system combining centralized and decentralized systems enjoys some of the advantages of both. Decentralization contributes to the extensibility, fault-tolerance, and lawsuit-proofing of the system. The partial centralization makes the system more coherent than a purely decentralized system, as there are relatively fewer hosts that are holding authoritative data. Manageability is about as difficult as a decentralized system, and the system is no more secure than any other decentralized system.
The amazing story is the scalability of this hybrid. Internet email runs very well for hundreds of millions of users and has grown enormously since its initial design. FastTrack-based systems have grown very quickly with none of the slowdowns that plagued Napster or Gnutella in their growth. There is growing interest in this kind of hybrid topology as an excellent architecture for peer-to-peer systems.
Conclusions
|
| |
A decentralized system is not always better or worse than a centralized system. The choice depends entirely on the needs of the application. The simplicity of centralized systems makes them easier to manage and control, while decentralized systems grow better and are more resistant to failures or shutdowns.
As for scalability, the story is not clear. Centralized systems have limited scale, but that limit is easy to understand. In contrast, decentralized systems offer the possibility of massive scalability, but in practice that can be very hard to achieve.
The second conclusion is the power of creating hybrid topologies. In centralized+ring systems, the ring covers many of the drawbacks of a purely centralized approach, providing easy scalability and fault tolerance. And centralized+decentralized systems are showing powerful scalability and extensibility while retaining some of the coherence of centralized systems.
System designers have to evaluate the requirements for their particular area and pick a topology that matches their needs. We are not limited to a few simple topologies; topologies can be combined to make hybrids. And while centralized systems are doing a lot of the work on the Internet, there is a lot of exciting potential in decentralized systems. In particular, combining decentralized topologies with other simpler topologies is a powerful approach.
Nelson Minar was co-founder of Popular Power.
Return to the OpenP2P.com.
You must be logged in to the O'Reilly Network to post a talkback.
Showing messages 1 through 7 of 7.
-
p2p
2004-07-13 01:23:27 silis [Reply | View]
Dear Nelson, i have read both your articles on distributed topologies and i have a problem in understanding the difference or similarities between p2p and decentralized topologies. Are they the same or p2p is just based on decentralized distributed systems? Moreover, is the decentralized topology the same for p2p as the diagram shows in your article?
-
Centralized and Decentralized systems
2003-01-01 10:22:43 anonymous2 [Reply | View]
Hello Nelson Minar,
I have red your great article dated on 14/12/2001 and its title is
“ Distributed Systems Topologies: Part 1”
I found myself confused to distinguish between the
The term “CENTRALIZED SYSTEM”
And the term “DISTRIBUTED SYSTEM”
Please can you answer me the following questions
To become clear to me the distinction B/W these two systems:
1. does “Decentralized system” same as the “Distributed system”
in concept? And just the difference in terms only?
2. what is the major difference B/W “Centralized system” and “Distributed system”?
is the difference that: in Centralized system all the Applications interfaces, computational process(functionalities), and data retrieval are in the central machine(such as Mainframe) where as the clients are dummies and do not contribute in any functionality except as I/O devices(i.e. clients are just keyboards and monitors. And do not have processors and storage devices in themselves).
And in Distributed system not all the functionality are in the central machine(server). instead, the clients share the server in some computational process and functionality(therefore they are not dummies in the sense that they have processors and storage devices, …etc).
3. you mentioned that : “The debate between centralized and decentralized systems is fundamentally about topology -- in other words, how the nodes in the system are connected. Topology can be considered at many different levels: physical, logical, connection, or organizational.”
Can you make it clear to me the difference between:
“Physical and Connecting” and between
“Logical and Organizational” in the context.
4. You mentioned: “Centralized+Centralized” and
“Centralized+DeCentralized” as two of the Basic Distributed systems topologies.
Did you mean that, in the “Centralized+Centralized” there is NO connection between the clients themselves(here, in this case the stacked servers which are central nodes for other nodes), but the connection is between each of them to the central server directly. where as in the “Centralized+DeCentralized” there is a connection between them and this connection is also connected to the central machice(server)?
Thank you for your time and efforts.
With lot of respect
Hikmat Jaber
hikmat101@hotmail.com
hikmat99@maktoob.com
-
Centralized and Decentralized systems
2003-02-17 10:05:37 Nelson Minar [Reply | View]
The term "distributed system" means different things to different people. I use it to mean any system that involves a network. In that usage centralized, decentralized, rings, hybrids, etc are all "distributed systems".
[logical vs physical topology]
My article is a bit sloppy on this point. One way you can look at a topology is the physical fact of how the bits move from machine to machine, the details of the routing. That view can be nonsensical; for instance all Internet systems are in some sense decentralized because that's the way Internet routing works. So it's often useful to look more at the logical structure of the system. oreilly.com is logically a single centralized web site, even if physically the site itself is organized as a ring and your web browser's packets get there via a decentralized system.
[centralized+decentralized connections]
In the centralized+decentralized system I describe in the article, what I mean is that you have a group of decentralized hosts acting as centralized servers to leaf clients. I recommend taking a look at the KaZaA/FastTrack design and its use of supernodes as a good example of this.
-
Here's a new one
2002-01-14 17:00:03 pfh [Reply | View]
I've recently been working on a P2P program that doesn't fit into any of the categories described in this article:
http://yoyo.cc.monash.edu.au/~pfh/circle/
I hope to have more details up on this site in a few weeks.
The basic idea is to divide a hashtable over all nodes in the network, and connect them up in such a way that the node with the information you want can be found easily. It is (theoretically) both very scalable and totally decentralized.
-
The problem of decidability
2002-01-10 06:23:46 hhh [Reply | View]
I think the ability to decide whether an event has happend or not, is very important because many application rely on this ability (ex. Databases, DNS, DSM, etc.). Decentralized systems often lag this ability, because the only way to decide anything is to search througth all nodes. Imagine searching through all of Gnutella to decide whether a file exits - this is next to imposible (I didn't say imposible :o).
This property is part of the 'Information coherence' property, but I think it should be a specific part of the analysis.
Currently I am doing my master's thesis, and I have had good results using a hierarchical topology with a decentralized topology as a safety net to increase fault-tolerance. This is all in its early stages, but the hierarchical topology gives decidability and preliminary tests show good scalability.
Hans Henrik Happe -
The problem of decidability
2002-01-16 03:30:36 kiwipeso [Reply | View]
I'm making a protocol (Samizdat) that is Decentralized Hierarchy based, it uses a list of files & metadata fors peers that is sent to servers on the network.
Only servers need to check the user lists to find files, it can also send you the closest or fastest connection to the file you want.
This is a key part of the operating system I am developing (KAOS) -
The problem of decidability
2002-01-10 10:17:02 Nelson Minar [Reply | View]
Thanks for the thoughts! I had a hard time defining "information coherence" for this article. I agree, "decidability" is a well-defined and well studied concept that gets at some of the issues of coherence.
Academic research on distributed systems has done a huge amount of work on the issues my article is about. My article doesn't really do that work justice; for folks who want to know more, there's a lot to learn!










