November 2021 – Manos Athanassoulis

A few days ago, I was discussing in class the concept of deadlocks as part of a transaction management lecture. I presented a scenario in which two very simple transactions caused a deadlock (T1 locks A, T2 locks B, and then T2 attempts to lock B and T1 attempts to lock A), and I asked the class whether they had any ideas on how to avoid such a conundrum.

Immediately one of my students came up with the idea of acquiring all locks at the beginning of the transaction. The idea, of course, works and ensures that we have no deadlocks. In fact, it is called Conservative Two-Phase Locking (or C2PL), and it was earlier termed Predeclaration Locking in the 1985 ACM TODS paper on “Locking performance in centralized databases” (thanks, Dennis, for pointing me in the right direction!).

However, the research papers that mention either term are authored in 2000 or earlier. So, why recent research has not been considering C2PL?

After constructing a few examples, it quickly became clear.

First, let’s clarify that indeed C2PL always avoids deadlocks. Since all locks are acquired at the beginning of each transaction, it is not possible to be in a situation where a transaction has acquired a lock and waits for another one. It either has all locks or none. Nevertheless, we do have to pay attention to the implementation. If each lock acquisition is a separate step, we can still have a deadlock unless we release all locks in case we cannot acquire them all in a very small predetermined timeout.

With the deadlock avoidance out of our way, we now discuss the drawbacks of C2PL. The most important one is that C2PL requires the knowledge of all the locks a transaction might need. While this might seem possible for simple transactions that change the values of a few objects, it quickly becomes a hard task if a transaction has the simplest if-statement. Consider the following three examples:

Transaction T1	Transaction T2	Transaction T3
Read (A)	Read (A)	Read (A)
If (A>10) Read (B) B:=B+5 Write(B) Endif	If (A>10) Read (B) B:=B+5 Write(B) Endif	If (A>10) Read (B) B:=B+5 Write(B) Endif
If (A<=5) Read (B) B:=B+12 Write(B) EndIf	If (A<5) Read (B) B:=B+12 Write(B) EndIf	If (A<5) Read (C) C:=C+12 Write(C) EndIf
Commit	Commit	Commit

In the first case, simply by observing the code, we know that we will only need to get a shared lock on A and an exclusive lock on B. So, we can acquire them at the beginning of the transaction and avoid deadlocks.

In the second case, we see that we will need to get a shared lock on A and (if A>10 or A<5) an exclusive lock on B. In other words, we would not need an exclusive lock for B if 5≤=A<10. Getting a lock on both A and B might prove overly conservative and it would reduce opportunities for concurrency if 5≤A<10.

For the third case, we need a shared lock on A, and depending on the value of A, we might need an exclusive lock on B (if A>10), an exclusive lock on C (if A<5), or no other lock if 5≤A<10. Getting a lock on all three A, B, and C leads to even more restriction of concurrency than before.

Overall, the main problem of C2PL is that it has to make a decision at the very beginning before it knows the decision of each if-statement. As a result, the only decision it can make is to always lock every possible object. For T1, acquiring both S(A) and X(B) is ok, however, in the case of T2, acquiring X(B) might prove to be unnecessary depending on the value of A. Grabbing X(B) does not impact correctness, but it stops from running other transactions that want to access B. Similarly, T3 has to grab S(A), X(B), and X(C). We know that one of the two locks is not needed but we do not know which one before we read A (and hence before we acquire S(A)). To sum up, C2PL sacrifices too much concurrency to avoid deadlocks.

So, one can think that we can increase concurrency by releasing the locks as early as possible and, as a result, gradually. While this will somewhat increase concurrency, it is opening new cans of worms: cascading aborts or even non-serializable transaction execution.

Consider a transaction T4 that acquired all locks at the beginning, and after finishing updating object A, it immediately releases the lock of A. If T4 later aborts and another transaction T5 reads the updated value of A, it would have read a value that was never committed, and hence, it should be as if it never existed. In order to avoid this inconsistency, T5 has to be aborted as well, leading to a cascading abort. If T5 has committed, however, then things are even worse: T5 committed using data that should never have been used. The result of the execution now is not equivalent to any serial schedule. Note that this schedule would be a non-recoverable schedule. A recoverable schedule is one that transactions commit only after all transactions whose changes they read commit.

If we focus only on the cascading abort problem, things can also get more complicated. In order to abort, T4 has to reacquire the lock of the previously unlocked object A to change it back to its older value. By attempting to reacquire the lock, we now face the possibility of having a deadlock, which was the very problem we wanted to avoid in the first place.

To avoid all these problems, we can release all the locks at the end, having a Strict Conservative 2PL. This has no deadlocks, no cascading aborts, and the allowed schedules are always serializable and recoverable, but it significantly reduces concurrency.

Overall, there is no perfect solution.

Simple Two-Phase Locking (gradual lock acquisition and gradual lock release) ensures conflict serializability and guarantees Isolation if no transactions fail (i.e., if there are no aborts). However, it can lead to deadlocks and uncommitted reads (that may lead to cascading aborts and unrecoverable schedules non-serializable execution).

Strict 2PL (gradual lock acquisition and late lock release at the very end) avoids uncommitted reads at the expense of reducing concurrency but does not avoid deadlocks.

Conservative 2PL (early lock acquisition of all locks at the beginning and gradual lock release) avoids deadlocks at the expense of reducing concurrency but does not avoid uncommitted reads (that may lead to cascading aborts and unrecoverable schedules).

Strict Conservative 2PL (early lock acquisition of all locks at the beginning and late lock release at the very end) avoids both deadlocks and uncommitted reads but limits concurrency even further.

Putting everything together, Conservative 2PL (Strict or not) would often acquire more locks than needed (when the transaction has even the simplest if-statements), which is probably why we haven’t seen much of it in the last twenty years!

PS: Thank you, Dennis and George, for discussing in detail the guarantees of the variants of 2PL!

Month: November 2021

Predeclaration Locking (aka Conservative 2PL)