oreilly.comSafari Books Online.Conferences.


Ads in Cache-Friendly Pages

by Jennifer Vesperman

You want to make your Web pages cache-friendly, but you're worried you'll lose advertising revenue. Ads are a reality in our current model of the Web, and it's important to ensure that they work and to keep track of how often they are served. Fortunately, it's easy to make ads work without losing cache friendliness in our pages.

To explain how to make ads uncacheable while having a cacheable page, I need to explain entities and selectively applying Cache-control headers. For a detailed description of Cache-control headers, see Cache-Friendly Web Pages.

The principles described here work for all the major elements in a Web page. Once you understand them, you can selectively apply cache headers to every entity on your pages.

Entities and cache control

HTTP is a client-server protocol. The client initiates the contact, asking the server to provide the entity described in the URI (Uniform Resource Identifier). A URI can be a location (a URL) or a name (a URN). URLs are the most common, but HTTP can handle either.

The server returns a response. The response consists of some HTTP headers and an "entity." The entity includes entity headers and a body that is (we hope) the requested data. Cache-control headers and Expires headers are entity headers, and they apply only to the entity they are included in.

Related Reading

Web Caching
By Duane Wessels

Inside the entity, especially if the entity is HTML, there may be URIs referring to additional entities. HTML image tags are the most common of these, and Web browsers read image tags and then send additional GET requests for the new entities to Web servers.

Any time you include a URI (relative or full) in your HTML, you may be referring to another entity. The exception is a fragment token, #foo, in the same entity it refers to. If stripping the fragment off would leave you pointing to the same page, it's the same entity. Other than that, every distinct URI refers to a distinct entity.

The browser (or other client) will make a separate request to pull down each entity. In the case of a link, it waits for the user to initiate the request. In the case of images, it usually initiates the request itself. (Some browsers do not request images, or do so only on user-initiated request.) Other included objects may or may not be automatically downloaded; please see the HTML specification to determine what the browser is expected to do with them.

All images, including images that are ads, are individual entities. And individual entities have their own, individual Cache-control or Expires headers. So we can have ads in cache-friendly Web pages without actually caching the ads.

This fact also allows us to have images with very long expiry times in frequently changing (and rapidly expiring) Web pages, and to have other elements uncacheable.

  • Set the expiry of the main Web page to whatever expiry is appropriate for that page.
  • Set the expiry of unchanging images or other included entities to the maximum (a year).
  • Set expiries of anything you want not cached to "do not cache" using either Cache control or a zero expiry time. You may want to do this for ads.

Setting ads to be uncacheable is one way of attempting to count the hits accurately. It's a rather unfriendly way of doing it, though; it forces the reloading of a (usually) static object, every time.

Pages: 1, 2

Next Pagearrow

Linux Online Certification

Linux/Unix System Administration Certificate Series
Linux/Unix System Administration Certificate Series — This course series targets both beginning and intermediate Linux/Unix users who want to acquire advanced system administration skills, and to back those skills up with a Certificate from the University of Illinois Office of Continuing Education.

Enroll today!

Linux Resources
  • Linux Online
  • The Linux FAQ
  • Linux Kernel Archives
  • Kernel Traffic

  • Sponsored by: