Ads in Cache-Friendly Pagesby Jennifer Vesperman
You want to make your Web pages cache-friendly, but you're worried you'll lose advertising revenue. Ads are a reality in our current model of the Web, and it's important to ensure that they work and to keep track of how often they are served. Fortunately, it's easy to make ads work without losing cache friendliness in our pages.
To explain how to make ads uncacheable while having a cacheable page, I need to explain entities and selectively applying Cache-control headers. For a detailed description of Cache-control headers, see Cache-Friendly Web Pages.
The principles described here work for all the major elements in a Web page. Once you understand them, you can selectively apply cache headers to every entity on your pages.
Entities and cache control
HTTP is a client-server protocol. The client initiates the contact, asking the server to provide the entity described in the URI (Uniform Resource Identifier). A URI can be a location (a URL) or a name (a URN). URLs are the most common, but HTTP can handle either.
The server returns a response. The response consists of some HTTP headers and an "entity." The entity includes entity headers and a body that is (we hope) the requested data. Cache-control headers and Expires headers are entity headers, and they apply only to the entity they are included in.
Inside the entity, especially if the entity is HTML, there may be URIs referring to additional entities. HTML image tags are the most common of these, and Web browsers read image tags and then send additional GET requests for the new entities to Web servers.
Any time you include a URI (relative or full) in your HTML, you may be referring to another
entity. The exception is a fragment token,
#foo, in the same entity it refers
to. If stripping the fragment off would leave you pointing to the same page, it's the same
entity. Other than that, every distinct URI refers to a distinct entity.
The browser (or other client) will make a separate request to pull down each entity. In the case of a link, it waits for the user to initiate the request. In the case of images, it usually initiates the request itself. (Some browsers do not request images, or do so only on user-initiated request.) Other included objects may or may not be automatically downloaded; please see the HTML specification to determine what the browser is expected to do with them.
All images, including images that are ads, are individual entities. And individual entities have their own, individual Cache-control or Expires headers. So we can have ads in cache-friendly Web pages without actually caching the ads.
This fact also allows us to have images with very long expiry times in frequently changing (and rapidly expiring) Web pages, and to have other elements uncacheable.
- Set the expiry of the main Web page to whatever expiry is appropriate for that page.
- Set the expiry of unchanging images or other included entities to the maximum (a year).
- Set expiries of anything you want not cached to "do not cache" using either Cache control or a zero expiry time. You may want to do this for ads.
Setting ads to be uncacheable is one way of attempting to count the hits accurately. It's a rather unfriendly way of doing it, though; it forces the reloading of a (usually) static object, every time.
Pages: 1, 2