Web Development in Heavy Trafficby Pier Fumagalli
It happens from time to time: you spend a few years working on one peculiar aspect of a problem, you believe you become "experienced" in that problem, and, once your environment changes, you notice how you were looking at it with the eyes of a blind man.
It happened to me recently. Since 1997, I have been working on the problems related to the integration of Web servers (especially the Apache Web server) with servlet containers (notably, Apache JServ and Tomcat), and I thought I became an expert in that field. And then I went through the common round of layoffs, and had to look for a new job.
I ended up, luckily, at VNU Business Publications, in London, on its Web team. VNU has been a news company for 20-odd years; not a small Internet startup with a revolutionary idea, nor a software giant with thousands of engineers. Our business is simple: we sell information, through newspapers and magazines and on our Web site.
Given our relatively simple business plan (you wish), our problem is even simpler: our Web site has a lot of traffic (around 130 servlet requests per second at peak times) and it needs to be up and running 24 hours a day, 365 days a year. It's quite different from what I was used to: the soft and nice world of research far from practical use.
To handle this kind of load, of course, your infrastructure needs to be almost perfect. You need to have the best Web server, the best servlet container, the best of everything. I concentrate my efforts on the first two: Web server (Apache) and servlet container (Tomcat).
But instead of speaking about these issues now (you can come to the O'Reilly Open Source Conference and browse around at my talk on Wednesday, I would like to address a topic related to high-load Web sites and servlets.
The topic I want to address is how to design (or adapt) your Web application for deployment under high loads, and I want to share some of the obvious (and not obvious) performance enhancements we currently use on our news site.
I was lately addressed by one of the Tomcat developers about the fact that the Apache module I am developing (
mod_webapp) "still forwards requests for static webapp resources to Tomcat." Pointless to say that it does but, if you see it from my perspective, this is indeed a pretty low priority for us.
VNU handles several million requests each day throughout its Web servers, and static content accounts for approximately 70 percent of our bandwidth. In our case, the problem would be that if every single image, PDF, or CSS file had to go through the servlet engine to be processed and delivered, we would overload our Java Virtual Machine input/output.
We use JSPs to style our content, so the solution we adopted was simple: we invented a new tag. In our JSPs, instead of using the HTML
<img> tag, we developed a very small custom tag:
<vnu:img>. Our tag takes the SRC argument, and prints out a different (real HTML this time)
<img> tag. For example, if in our JSPs you see something like:
<vnu:img src="/v6_logo_main.gif" ...>
our custom tag handler would convert it to something like:
<img src="http://images.vnunet.com/v6_image/v6_logo_main.gif" ...>
Why did we do that? It is pretty easy to explain. First of all
images.vnunet.com is another Web server. Simply enough, all of our static traffic is handled by another instance of Apache, finely tuned to deliver only files, without the additional load of servlet requests, CGIs, Perl, or PHP (you name it).
Another advantage of this custom tag is that we can easily scale. If the amount of static data served becomes too much to be handled by one server, we can promptly put up a second one, modify our tag library, and "rewrite" the SRC attribute, one time with
images.vnunet.com and another with
images2.vnunet.com, without needing to start to load balance our servers or modify our router's configurations.
But probably the biggest advantage we can get out of a simple idea like this is that we can distribute our static content geographically. If, for example, one day we decided to use a service like Akamai, again, the only place throughout our entire application where I have to change something is our tag library, without even thinking about the thousands of JSP pages sitting on my server.
We use JSP, but such a simple approach can be used with any templating language and, if you have static HTML files, the only thing you have to do is parse them once, rewrite what needs to be rewritten, and the trick is done with a simple SED script.
Pages: 1, 2