0%

Consensus From One-phase To Two-phase, A Routine Of Tendermint

As a sequel of the last blog, this blog picks the routine of Tendermint to derive a two-phase BFT consensus. Particularly, we will analyze:

  1. What is the function of the lock notion?
  2. Why are two phases necessary, from the perspective of Tendermint?
  3. Why is Tendermint a right consensus, which can achieves both safety and liveness?
  4. Why is Tendermint not responsive?

Common contents with the last blog (including "basic concepts" and "one-phase protocol") are omitted here.

In the next blog, we will try to compare and unify PBFT and Tendermint.

I. One-phase + view-switch protocol

To extricate the system from halting, a view-switch (i.e., round-change in Tendermint) mechanism is necessary. The view-switch mechanism in Tendermint is also passive.

As discussed in the last blog, when the view is switched, each replica must make a judgement if an index \(i\) has been committed in the old view.

Review that:

there are mainly two ways to design the judgement rules: (1) based on the collection of other replicas' opinions; (2) based on its local status. PBFT-like routine picks the first one, while Tendermint-like routine picks the second.

The first judgement rule has been discussed in the last blog. Let us analyze talk about the second judgement rule here.

A. Judgement rule 2: based on its local status

DLS protocol also takes this rule, which is quite simple: each replica judges whether to vote for a new request only based on its voting status. If a replica has voted for a request \(r\) at \(i\), it will never vote for any contradictory request \(r'\) at \(i\) later.

This simple rule can guarantee safety, but it cannot ensure liveness. Take an example to illustrate the liveness problem as follows:

  • In view \(v\), the leader has broadcast \(r\) at \(i\) and \(f\) correct replicas have voted for \(r\).
  • View is switched to \(v+1\).
  • The new correct leader in view \(v+1\) broadcasts \(r'\) (\(r' \neq r\)) at \(i\) and other \(f+1\) correct replicas (not the \(f\) correct replicas above) vote for \(r'\).
  • From now on, if all the byzantine replicas keep silent, the protocol will halt, thus compromising liveness.

II. Two-phase + view-switch protocol

From Section I., we know one-phase + view-switch is not enough for a correct BFT consensus. Tendermint augments it with an additional phase, whose specification can be succinctly described as follows. Note that to faciliate the promising comparison between PBFT and Tendermint, the blockchain-related terminologies of Tendermint are removed here.

Two phases in Tendermint are named as prevote and precommit respectively.

A replica will lock on an index \(i\) with a request \(r\) if it has colleted 2/3+ prevotes on \(r\).

If a replica has locked on \(i\) with \(r\), it will never prevote for \(r'\) (\(r'\neq r\)) at \(i\) later.

Besides, if a replica has voted for a newer request \(r^n\) at \(i\), it will discard all the messages (e.g., prevote or precommit messages) received later for the old request \(r^o\) at \(i\).

Other parts are quite similar to Section I.

As an analogy of Section II.A, it is easy to draw the conclusion that Tendermint satifies safety. Next, we analyze how Tendermint achieves livenss.

A. How does Tendermint achieve liveness?

Note that we do some modifications to Tendermint to make it more similar to PBFT. Without relying on the blockchain-related structures, we claim that there will be independent locks for each index \(i\).

If a replica insists on locking on \(i\) with \(r\), the protocol may get halted and compromise liveness. Take an example to illustrate the liveness problem as follows:

  • In view \(v\), \(f\) correct nodes locked on \(i\) with \(r\).
  • View is switched to \(v+1\).
  • The new leader in \(v+1\) broadcasts \(r'\) at \(i\). All the \(f+1\) correct nodes except the above \(f\) ones and \(f\) byzantine nodes prevote for \(r'\). Therefore, the \(f+1\) correct nodes can lock on \(r'\) at \(i\).
  • From now on, if all the byzantine replicas keep silent, the protocol will halt, thus compromising liveness.

To deal with the above problem, Tendermint adds a new rule (i.e., unlock-on-polka), where polka means 2/3+ prevotes for the same request \(r\) at \(i\). The unlock-on-polka rule states that if a replica sees a newer polka for \(r'\) at \(i\), it will release the lock on the old request (\(r\)) and relock on the new one (\(r'\)).

With this rule, \(f\) correct nodes locking on \(r\) in the above example are expected to switch to lock on \(r'\). Thus, all the correct nodes may broadcast precommit messages and \(r'\) can get committed.

B. Will unlock-on-polka compromise safety?

No! The reason is as follows:

Assume a replica locks on \(i\) with \(r\) and it receives a polka on a newer \(r'\). We further assume the oldest polka with a newer request than \(r\) is \(r^o\) (\(r^o\) may equal \(r'\)): \(r^o = min\{r_n|r_n > r\}\). \(r^o\) must have got prevotes from 2/3+ replicas. Besides, since \(r^o\) is the oldest one newer than \(r\), and a replica only prevotes for its locked request or when it lock on nothing, all the replicas prevoting for \(r^o\) must lock on nothing. Therefore, there are at least 2/3+ replicas locked on nothing at that time. Besides, since these replicas have voted for a new request \(r^o\), they will discard all the later-received messages for \(r\) and will never lock on \(r\). Therefore, \(r\) can neither be precommitted by 2/3+ replicas nor be committed by any replica. In this regard, it is safe to release the lock on \(r\).

From another perspective, if a replica has committed \(r\) with \(i\), no \(r'\) with \(i\) can get locked or even committed later. By contrast, if there is a polka for a newer \(r'\) with \(i\), we can assure that no older \(r\) with \(i\) has been committed.

C. When will a lock on an uncommitted request be released or be committed finally?

Nobody knows and nothing can guarantee it in Tendermint. Let us take an example to explain it.

Assume a replica locks on \(i\) with \(r\) but does not commit it, there can be two cases. - Case 1 (\(f+1\) or more correct replicas lock on \(r\)): \(r\) will be finally committed. - Case 2 (\(f\) or less correct replicas lock on \(r\)): there are two subcases - Case 2.1: a new leader proposes \(r'\), and each correct replica without locking on \(r\) and Byzantine replicas can prevote for \(r'\). \(r'\) can be locked and further committed later. - Case 2.2: a new leader proposes \(r\), and each correct replica can prevote for \(r\). \(r\) can be locked by more replicas and committed later.

After long enough, one of Case 1, Case 2.1, and Case 2.2 will definitely take place, thus achieving liveness. However, nobody knows the exact time. That's the reason why Tendermint is not responsive.

Reference

  1. Buchman, E. "Tendermint: Byzantine fault tolerance in the age of blockchains." Doctoral dissertation, 2016.
  2. Dwork, C., Lynch, N., & Stockmeyer, L. "Consensus in the presence of partial synchrony." Journal of the ACM (JACM), 1988, 35(2), pp. 288-323.