Introducing Automatic Data Expirationby William Grosso, author of Java RMI
Developers building frameworks or distributed programs (or frameworks for distributed programming) face challenges that are somewhat different from those facing developers building stand-alone applications. One of the more interesting problems is simply trying to find simple and convenient algorithms for determining when data and object references are no longer valid or need to be refreshed. The difference between data your application owns and data that originates in, and is changed by, someone else's code (possibly running in a distinct simultaneous process) is enormous.
The choices you make when handling such information often determine whether an application can scale to enterprise-level performance. For example, suppose you modify a client-server application by adding a client-side cache for frequently accessed data. Because the data is available locally, fewer client-side operations will result in calls to the server. The application will be more responsive to end users, and this performance improvement will be evident regardless of how many clients are using the server, or how busy the server is. Moreover, because the client fetches data less often, the client will put less stress on servers and use less network bandwidth. By making a simple change to the client, and without changing the server at all, you've improved the performance and scalability of the application.
On the other hand, you need to be careful when caching information. If the client frequently displays out-of-date or incorrect information, you will have improved the application's performance at the cost of functionality. This is not a good tradeoff to make.
This is the first article in a three-part series about how to expire or update data automatically, will explore the basics of data expiration. The goal is to present the main issues and work through the standard solutions. I'll start by reviewing six real-world cases where data expiration is necessary
and then discuss what they have in common. After laying this groundwork, I'll present a series of solutions
-- starting with a very simple solution that doesn't work very well, and gradually improving it until
we've reached two fairly nice solutions (one based on using a
TreeSet to build an index and
one similar to the way Tomcat solves the problem for session keys). I'll wind things down by summarizing what's really required to do data expiration right.
The second article will pick up where the first one left off and introduce a generic datastructure, which I've named HashBelt, that quite nicely solves many of the problems involved in data expiration. As an example of how to use a HashBelt, I'll incorporate it into the
RemoteStubCache object that was first introduced as part of my earlier series on Command Objects in distributed programming.
The source code for this article can be downloaded here.
Before You Start
This article is not for people who are just starting to program, or just starting to program in Java. While I am going to cover almost everything you need to know about data expiration, I'm not going to explain the basics of Java coding. If you don't know how to use the collection libraries, haven't written a distributed program before, or don't have a good grasp of what the
synchronized keyword does, you'll probably have trouble reading these articles. To learn more about the basics of distributed programming and writing multi-threaded code, I recommend my book, Java RMI. In addition to covering RMI, it covers most of the basic design and coding decisions involved in building distributed applications.
In addition, the code examples in these articles use generics quite heavily. This is especially true in the second article. More information about the new generics facilities can be obtained from the generics specification . A friendlier introduction can be found in the third article in my series on command objects.