Atomic File Transactions, Part 2by Jonathan Amsterdam
This article is the second of a two-part series. If you haven't read the first part yet, it would make sense to do so before reading this article. In this article, I describe my algorithm for atomic filesystem transactions, and tour the design of my package. A detailed justification of the algorithm can be found in the appendix. The code for the system is implemented in the Java package
com.astrel.io.atomic, available here.
An Atomicity Algorithm
The first step in a correct atomicity algorithm is keeping a journal. (The term "log" is used in the transaction processing literature, but to avoid stepping on its common -- and related -- meaning of "outputting messages about the system for later analysis," we'll use journal. Writers of journalling filesystems have done the same.)
The journal for a transaction is a record of all the actions that have been taken under the transaction. (In my scheme, each transaction has its own journal. In traditional systems, there is a single journal for all transactions.) Armed with a journal, the recovery process knows what happened, and in what order, before the crash, and can reconstruct the original state from the information in the journal plus the backup files.
The algorithm has four parts:
- Action execution
The execution phase performs the actions as they are requested, writing enough information in the journal to undo each action, should that become necessary. The commit phase cleans up and deletes the journal. Rollback plays back the journal in reverse order, undoing the actions, then cleans up and deletes the journal. Recovery looks for existing journal files and cleans up after them, undoing uncommitted transactions.
Let's look at each part in detail, beginning with execution. The algorithm for executing a single action under a transaction first writes undo information to the journal, then performs the action:
E1. Write the action's name, its arguments (the file to delete) and any other necessary undo information (the name of the backup file) into the journal.
E2. Flush the journal's contents to disk.
E3. Write any backup files needed to undo the action.
E4. Perform the action.
E5. If there is an error performing the action, erase its entry from the journal.
Step E5 guarantees that every journal entry represents a successful action, except possibly the last.
To analyze the correctness of this algorithm with respect to crashes, you would have to consider every possible crash point, and show that all transactions are still atomic. I undertake this task in the appendix.
Now let's turn to the commit portion of the algorithm. When commit occurs, all the actions have been executed and the program is happy with the results, so there is little to do.
C1. Write in the journal that the transaction has committed.
C2. Delete the backup files for the transaction.
C3. Delete the journal.
The actual commit occurs when C1 completes successfully. Deleting the backup files and the journal is just cleanup. We delete the journal last because it is the only place that contains the locations of the backups.
The rollback algorithm undoes the actions in reverse order, then cleans up and deletes the journal. The basic algorithm is complicated by the possibility of crashes before or during rollback:
R1. For each journal entry that has not already been undone, in reverse order:
R1.1 If the action needs to be undone, undo it.
R1.2 Record in the journal that the action has been undone.
R2. Delete the backup files for the transaction.
R3. Delete the journal.
In the absence of crashes, R1.2 would be unnecessary, and in R1.1 all of the information to determine whether an action needed to be undone would be in memory, since the system could keep track of the success or failure of each action. But if a crash occurs, then rollback will be run as part of recovery, and only the filesystem (including the saved journal information) will be available to determine whether the action needs to be undone. That is why step R1.2 keeps track in the journal of which actions have been undone. See the appendix for a detailed analysis.
Recovery happens when the system is restarted after a crash. If it finds no journals in the
TransactionManager's directory, then it has nothing to do. Otherwise, for each journal:
RE1. If the journal ends in a commit marker, perform steps C2 and C3.
RE2. Otherwise, perform rollback.
At the end of a successful recovery, all uncommitted transactions will have been rolled back, and all backup and journal files will have been deleted.
A number of optimizations to the rollback/recovery part of this algorithm are possible. For example, if a transaction creates a file and then performs several other actions on it, there is no need to undo the actions -- it is enough to delete the file. Similarly, if a backup copy of a file is made, then subsequent actions on a file are irrelevant; recovery can simply restore the backup. After giving it some serious thought, I decided not to implement these optimizations, for four reasons. First, rollbacks are rare -- most transactions commit successfully -- so improving the speed of rollback would not be of much benefit. Second, typical transactions probably don't involve multiple operations on the same file. For instance, it seems unlikely that a transaction would write to nonexistent file once (thereby creating it), then close it and write to it again. A more common pattern would involve a single access to each of many files. Third, renaming complicates these optimizations; renamings still have to be undone carefully, and in reverse order. Finally, adding these optimizations would complicate the architecture of my package, making it more difficult to understand its correctness and harder for users to add their own actions.