As a sequel of the last blog, this blog picks the routine of Tendermint
to derive a two-phase BFT consensus. Particularly, we will analyze:
- What is the function of the
lock
notion? - Why are two phases necessary, from the perspective of
Tendermint
? - Why is
Tendermint
a right consensus, which can achieves bothsafety
andliveness
? - Why is
Tendermint
not responsive?
Common contents with the last blog (including "basic concepts" and "one-phase protocol") are omitted here.
In the next blog, we will try to compare and unify PBFT
and Tendermint
.
I. One-phase + view-switch protocol
To extricate the system from halting, a view-switch (i.e., round-change
in Tendermint
) mechanism is necessary. The view-switch mechanism in Tendermint
is also passive.
As discussed in the last blog, when the view is switched, each replica must make a judgement if an index \(i\) has been committed in the old view.
Review that:
there are mainly two ways to design the judgement rules: (1) based on the collection of other replicas' opinions; (2) based on its local status.
PBFT
-like routine picks the first one, whileTendermint
-like routine picks the second.
The first judgement rule has been discussed in the last blog. Let us analyze talk about the second judgement rule here.
A. Judgement rule 2: based on its local status
DLS
protocol also takes this rule, which is quite simple: each replica judges whether to vote for a new request only based on its voting status. If a replica has voted for a request \(r\) at \(i\), it will never vote for any contradictory request \(r'\) at \(i\) later.
This simple rule can guarantee safety, but it cannot ensure liveness. Take an example to illustrate the liveness problem as follows:
- In view \(v\), the leader has broadcast \(r\) at \(i\) and \(f\) correct replicas have voted for \(r\).
- View is switched to \(v+1\).
- The new correct leader in view \(v+1\) broadcasts \(r'\) (\(r' \neq r\)) at \(i\) and other \(f+1\) correct replicas (not the \(f\) correct replicas above) vote for \(r'\).
- From now on, if all the byzantine replicas keep silent, the protocol will halt, thus compromising liveness.
II. Two-phase + view-switch protocol
From Section I., we know one-phase + view-switch
is not enough for a correct BFT consensus. Tendermint
augments it with an additional phase, whose specification can be succinctly described as follows. Note that to faciliate the promising comparison between PBFT
and Tendermint
, the blockchain-related terminologies of Tendermint
are removed here.
Two phases in
Tendermint
are named asprevote
andprecommit
respectively.A replica will lock on an index \(i\) with a request \(r\) if it has colleted 2/3+
prevote
s on \(r\).If a replica has locked on \(i\) with \(r\), it will never
prevote
for \(r'\) (\(r'\neq r\)) at \(i\) later.Besides, if a replica has voted for a newer request \(r^n\) at \(i\), it will discard all the messages (e.g.,
prevote
orprecommit
messages) received later for the old request \(r^o\) at \(i\).Other parts are quite similar to Section I.
As an analogy of Section II.A, it is easy to draw the conclusion that Tendermint
satifies safety. Next, we analyze how Tendermint
achieves livenss.
A. How does Tendermint
achieve liveness?
Note that we do some modifications to Tendermint
to make it more similar to PBFT
. Without relying on the blockchain-related structures, we claim that there will be independent locks for each index \(i\).
If a replica insists on locking on \(i\) with \(r\), the protocol may get halted and compromise liveness. Take an example to illustrate the liveness problem as follows:
- In view \(v\), \(f\) correct nodes locked on \(i\) with \(r\).
- View is switched to \(v+1\).
- The new leader in \(v+1\) broadcasts \(r'\) at \(i\). All the \(f+1\) correct nodes except the above \(f\) ones and \(f\) byzantine nodes
prevote
for \(r'\). Therefore, the \(f+1\) correct nodes can lock on \(r'\) at \(i\). - From now on, if all the byzantine replicas keep silent, the protocol will halt, thus compromising liveness.
To deal with the above problem, Tendermint
adds a new rule (i.e., unlock-on-polka
), where polka
means 2/3+ prevote
s for the same request \(r\) at \(i\). The unlock-on-polka
rule states that if a replica sees a newer polka
for \(r'\) at \(i\), it will release the lock on the old request (\(r\)) and relock on the new one (\(r'\)).
With this rule, \(f\) correct nodes locking on \(r\) in the above example are expected to switch to lock on \(r'\). Thus, all the correct nodes may broadcast precommit
messages and \(r'\) can get committed.
B. Will unlock-on-polka
compromise safety?
No! The reason is as follows:
Assume a replica locks on \(i\) with \(r\) and it receives a polka
on a newer \(r'\). We further assume the oldest polka
with a newer request than \(r\) is \(r^o\) (\(r^o\) may equal \(r'\)): \(r^o = min\{r_n|r_n > r\}\). \(r^o\) must have got prevotes
from 2/3+ replicas. Besides, since \(r^o\) is the oldest one newer than \(r\), and a replica only prevotes
for its locked request or when it lock on nothing, all the replicas prevoting
for \(r^o\) must lock on nothing. Therefore, there are at least 2/3+ replicas locked on nothing at that time. Besides, since these replicas have voted for a new request \(r^o\), they will discard all the later-received messages for \(r\) and will never lock on \(r\). Therefore, \(r\) can neither be precommitted by 2/3+ replicas nor be committed by any replica. In this regard, it is safe to release the lock on \(r\).
From another perspective, if a replica has committed \(r\) with \(i\), no \(r'\) with \(i\) can get locked or even committed later. By contrast, if there is a polka
for a newer \(r'\) with \(i\), we can assure that no older \(r\) with \(i\) has been committed.
C. When will a lock on an uncommitted request be released or be committed finally?
Nobody knows and nothing can guarantee it in Tendermint
. Let us take an example to explain it.
Assume a replica locks on \(i\) with \(r\) but does not commit it, there can be two cases. - Case 1 (\(f+1\) or more correct replicas lock on \(r\)): \(r\) will be finally committed. - Case 2 (\(f\) or less correct replicas lock on \(r\)): there are two subcases - Case 2.1: a new leader proposes \(r'\), and each correct replica without locking on \(r\) and Byzantine replicas can prevote for \(r'\). \(r'\) can be locked and further committed later. - Case 2.2: a new leader proposes \(r\), and each correct replica can prevote for \(r\). \(r\) can be locked by more replicas and committed later.
After long enough, one of Case 1, Case 2.1, and Case 2.2 will definitely take place, thus achieving liveness. However, nobody knows the exact time. That's the reason why Tendermint
is not responsive.
Reference
- Buchman, E. "Tendermint: Byzantine fault tolerance in the age of blockchains." Doctoral dissertation, 2016.
- Dwork, C., Lynch, N., & Stockmeyer, L. "Consensus in the presence of partial synchrony." Journal of the ACM (JACM), 1988, 35(2), pp. 288-323.