J2EE Transaction Frameworks: Building the Frameworkby Dibyendu Baksi
The availability of cheap computing power and increased network bandwidth gives rise to distributed component-based computing applications. A distributed component-based application is a configuration of services provided by different application components running on physically independent computers that appear to the users of the system as a single application running on a single physical machine. Several things motivate the adoption of distributed component-based systems over traditional centralized systems.
- Distributed application: Some tasks are inherently distributive and by their very nature require cooperative work from multiple agents. In such cases, it is preferable to locate and harness computing power and data where they are naturally available and most needed.
- Reliability: Because of the shared, cooperative, and distributed nature of the system, there is no single point of failure in the system. By using new technologies of failover, recovery and distributed synchronization techniques, greater reliability is ensured.
- Scalability: As the requirements of the application grow over time, by properly designing the system, it can handle more loads by adding new services and hardware.
- Performance: As the domain of computing covers wider application areas, the nature of the problems that need to be solved gets more complicated. To solve these more complex problems, we need faster computers with more computing power at a reasonable price.
- Economics: It is possible to pay less for equivalent levels of computing power when the system is split across multiple machines.
To give the illusion to users of a single unified application running on a single physical machine, instead of a collection of disparate applications running on heterogeneous computers connected via a network, a distributed system needs to be transparent in the following ways.
- Data location: It is not necessary for the user of the system to know where data is located in the network.
- Failure: It is not necessary for the user of the system to worry about consistency of data even if there is a failure within the network or data sources.
- Replication: It is not necessary for the user of the system to know how data replication is done.
- Distribution: It is not necessary for the user of the system to know how computing power and data are distributed across the system.
The distributed system allows a user to store, access, and manipulate data transparently from many computers while maintaining the integrity of data during system failures. The management of distributed data and transactions is accomplished at the local and global levels. A local data manager, or resource manager, enables the access and manipulation of data or resources. These resource managers provide the transparency of data location, data models, and database security and authority control. A local transaction management system is responsible for initiating, monitoring, and terminating transactions in a computing system. A distributed transaction management system extends the scope of a local transaction management system by coordinating with the local resource managers to view related transactions over a network as a single transaction.
A transaction is a group of statements that represents a unit of work, which must be executed as a unit. Transactions are sequences of operations on resources -- like read, write or update -- that transforms one consistent state of the system into a new consistent state. In order to reflect the correct state of reality in the system, a transaction should have the following properties.
- Atomicity: This is the all-or-nothing property. Either the entire sequence of operations is successful or unsuccessful. A transaction should be treated as a single unit of operation. Completed transactions are only committed and incomplete transactions are rolled back or restored to the state where it started. There is absolutely no possibility of partial work being committed.
- Consistency: A transaction maps one consistent state of the resources (e.g. database) to another. Consistency is concerned with correctly reflecting the reality of the state of the resources. Some of the concrete examples of consistency are referential integrity of the database, unique primary keys in tables etc.
- Isolation: A transaction should not reveal its results to other concurrent transactions before it commits. Isolation assures that transactions do not access data that is being concurrently updated. The other name for isolation is serialization.
- Durability: Results of completed transactions have to be made permanent and cannot be erased from the database due to system failure. Resource managers ensure that the results of a transaction are not altered due to system failures.