Gnutella Blown Away? Not Exactly
by Serguei Osokine07/10/2001
This is a comment to the recent articles about MusicCity (Kazaa) and Gnutella on OpenP2P.com -- Morpheus Out of the Underworld by Kelly Truelove and Andrew Chasin, and Is All Music File-Sharing Piracy? by Richard Koman
And when I say "a comment," I mean a comment -- it is not a criticism. In fact, I liked both articles a lot. If anything, this "comment" is a case study in the information propagation over the Net, and in how the myths are created.
While Kelly and Andrew's article provides a very thoughtful analysis of the MusicCity Network, it contains one very interesting statement:
"...Gnutella, which Clip2 and Lime Wire have independently estimated with simultaneous users of about 40,000 (mean). The Morpheus-KaZaA count stands at over 300,000 at present..."
This statement is entirely correct by itself -- the estimates of both Clip2 and LimeWire are usually pretty accurate.
What is missing here is the context. What these numbers -- 40,000 and 300,000 -- tell us, is the number of machines in the corresponding networks at any given moment. Which is not the same as the number of the network users.
In fact, later on, Kelly and Andrew say:
"...Also by default, Morpheus is launched and run in the background when the PC is booted. Further, the Morpheus application does not close when its window is closed; it simply minimizes to the system tray and keeps running in the background."
Taken together, these two quotes paint a very clear picture of MusicCity as a network that uses "daemon clients" to maximize the clients' (and thus the P2P file servers') uptime and the content availability. A very sensible move, by the way. For the past year I've probably spent a couple of thousands of hours on Gnutella analysis and development, so I can remember hearing the similar suggestion on the Gnutella development forums and in private discussions quite a few times.
However, I think that this suggestion was never implemented in the mainstream Gnutella clients. Normally the Gnutella client is just another application -- not a daemon. You start it, you look for some files, you download them, and then close the application. Your client is a part of the network, and your files are shared only when you are looking for stuff yourself. If my memory serves me right, the average length of the Gnutella client session is just over one hour -- probably several times less than the session length of the MusicCity client. Naturally, the same number of daily users translates into a Gnutella network that is several times smaller than the MusicCity network with the same number of daily users.
|
Related Articles:
Is All Music File-Sharing Piracy? |
It is difficult to say why Gnutella clients work this way. Remember, Gnutella clients are developed by many independent vendors, and it is difficult to "reverse-engineer" their design decisions. If I'd have to guess, the reason might be that the current Gnutella network with its 40,000 simultaneous users is bigger than the file search query propagation radius anyway -- the search query can probably reach only about 5,000 to 10,000 hosts. (Which is a lot -- you can find pretty much everything you'd want on such a network.) So the network already has more content than can be searched by a single client. Thus the thinking probably is: "why should we take measures to increase the network size if we cannot fully search even the current network?"
Well, I think that this reasoning is a mistake, because the issue here is not only how many other clients you can search, but also how sure you can be that the other client won't be shut down in the middle of the file transfer. "Daemon clients" obvously increase the probability of the successful download, which is right now the most serious problem for the Gnutella network. Still, every client developer has his own reasons for doing or not doing something. This diversity can be viewed as an advantage of the Gnutella network, in fact -- no vendor can crash the network even if his latest client version has some very serious bugs. (And, by the way, I am sorry if there are daemon clients for Gnutella; I'm just not aware of them).
So Kelly and Andrew made a very clear and correct technical statement. OK, I did wince when I read it, since I've had a premonition that someone was going to misinterpret it, but still the statement was pretty clear if you kept the whole article in mind -- it never mentioned the number of people using Gnutella or MusicCity over, say, a one-day or one-week period. The total number of users was also never mentioned.
|
| |
And then -- in just four days -- the second article. In this article, Richard Koman says:
"...MusicCity regularly boasts more than 300,000 simultaneous users, blowing away not only Napster but also Gnutella, which averages a mean of about 40,000 users."
Note the use of the expression "blow away" here. This is not a technical statement anymore, is it? What we have here is a subtle message that wonderful MusicCity Network (no irony here, it really is pretty cool) attracts eight times more people than the Gnutella Network.
Now it's about time for the disclaimer. Here goes: I don't have an idea how many daily, weekly or total users are there in Gnutella or Music City networks.
OK, I might have some projections for Gnutella Network -- after all, I've seen quite a lot of statistical data about it. Still, I do not have enough data to do a comparison between the total numbers of users in these networks. For all I know, Gnutella can have more, less or the same number of users as MusicCity. What I'm saying is that nothing in Kelly and Andrew's article provides any information to make such a comparison even possible -- much less any basis to say that MusicCity "blows away" Gnutella in terms of the user base.
Yes, the Gnutella network is smaller -- but does it tell us anything about how many people prefer Gnutella to MusicCity or other way around? No. It sure doesn't.
On the other hand, I cannot blame Richard for falling into this trap -- if I would not have spent lots of time analyzing the Gnutella Network, I might end up with the same impression myself after reading the article by Kelly and Andrew (and again, they are not to blame -- everything they said was entirely correct, although maybe just a little bit too vague).
This is a very interesting situation: the piece of the information (call it meme or whatever) starts a life of its own. Sort of like a ... not a virus, right? There's no executable code in these "40,000/300,000" statements, after all... Call it "prion", maybe? A small piece of DNA that cannot even replicate on its own, but is capable of transferring the Mad Cow Disease all right. I mean, I am afraid to even think how Richard's article will be quoted in a few days or weeks. How about this:
"MusicCity Blows Gnutella Away!!!"
And what is interesting, it is always Gnutella that is on the receiving end of such "prion diseases". I still remember the February 2001 article that stated that Gnutella is doomed because it does not scale. I was rolling on the floor laughing after I read it, because by then it was already pretty clear to all the insiders that today Gnutella is the only truly and infinitely scalable P2P network (if you wonder why, and don't mind working through a bit of math, see the flow control algorithm design document).
The joke was on me, though. In the months that followed, every time I was having a conversation with literally anyone who knew about P2P but was not intimately familiar with Gnutella design, the first question I had to answer invariably was: "But I've heard that Gnutella does not scale?" Which translates as: "How come you are doing something that pathetic?"
Oh, well. Maybe it is just that the distributed, "P2P" marketing is less effective than the centralized one?
And in any case, let's be careful with these numbers, OK? :-)
You must be logged in to the O'Reilly Network to post a talkback.
Showing messages 1 through 3 of 3.
-
Filesharing goes well!
2003-01-29 06:58:42 anonymous2 [Reply | View]
Despite Morpheus' popularity, I can still use Gnutella as a very useful tool. It is acually...
And the right thing is that Gnutella is not run all the time at background, while Morpheus/KaZaa do ususally, so it really gives a clue why Gnutella *seems* to be unpopular.
What can I say? I can still actively and successfully use this great tool to find some even very rare things on the Net... So, it's worth it's little cost. While e.g. KaZaa's cost is too high for even it's highest availability and popularity...
-
Interrupted File Transfers
2001-07-12 16:21:46 skoubidou [Reply | View]
<<...how sure you can be that the other client won't be shut down in the middle of the file transfer. "Daemon clients" obvously increase the probability of the successful download, which is right now the most serious problem for the Gnutella network.>>
The solution to this problem is really quite simple and has been done by both Limewire and Gnucleus, and I cannot understand why it hasnt been implemented in more popular gnutella clients such as Bearshare and Gnotella. It has been done also in Kazaa/MusicCity and I believe Napster had done the same to their software before their demise.
The solution is simple. Compare filesizes and filenames of returned results, and create a download queue which, if the download is interrupted, the client will go through and resume the file from another location. This can be performed in a loop so that if all hosts in the queue are either offline or busy, the download can continue when a host gets a free download slot.
**For those of you who might claim that comparing filesizes and filenames is not feasible due to the amount of files on gnutella with similar names/sizes, the benefit of such a feature greatly outweighs the extremely rare inconveniences of an occasional bad search result.
This already has been implemented, and I urge all Gnutella developers to create such a feature for the overall reliability of gnutellaNet and to distribute bandwidth across the network evenly.
Omar





This is the reverse of queueing the files _you're_ fetching and retargeting if the current host you're accessing "goes away". This enables you to stay on and complete in-process requests, while not queueing any additional send/receives.