As for a Byzantine Fault Tolerance (BFT) consensus, there are mainly two design routines: PBFT
-like and Tendermint
-like. The former contains a complex view-change
phase, while the latter includes a notion of lock
.
In this blog, we try to derive a two-phase BFT consensus, along the routine of PBFT
. Particularly, we will analyze:
- Why is a view-switch process necessary?
- What is the function of the view-switch process?
- Why are two phases necessary?
- Why is
PBFT
a correct consensus algorithm?
In the next blog, we will pick the other routine (i.e., Tendermint
) and derive the design of Tendermint
.
After that, a blog trying to compare and unify PBFT
and Tendermint
will be posted, since both of them are two-phase consensus and similar to each other.
I. Basic Concepts of BFT Consensus
Before going ahead, we must first clarify some basic terminologies:
- Request: A command to be agreed and processed by the BFT cluster.
- Commit: Any request having been agreed can be considered committed, which will not be rolled back.
- Index: Each committed request will correspond to an index, indicating its execution sequence.
Also, a correct BFT consensus must satisfy two properties:
- Safety: It requires that if an index \(i\) is committed with a request \(r\) by an honest replica, any other correct replicas cannot commit \(i\) with a different request \(r'\) (\(r' \neq r\)).
- Liveness: It requires that if the indexes of committed requests should keep increasing. In other words, the system will never halt.
II. One-phase protocol
Let us start with an one-phase protocol as follows:
- As a leader: broadcast the index \(i\) with a request \(r\) to all the replicas
- As a replica: broadcast the vote (\(v_r\)) for \(r\)
- As a replica: commit \(i\) with \(r\) once collecting 2/3+ votes
This protocol can satisfy safety but missing liveness. It is easy to understand its satisfication of safety. Violation of liveness will take place if:
- Either the leader is evil and sends contradictory requests to different honest replicas intentionally;
- Or the leader is not evil but crashes unintentionally.
III. One-phase + view-switch protocol
To extricate the system from halting, a view-switch (i.e., view-change
in PBFT
) mechanism is necessary. The view-switch mechanism can be passive or active:
- Passive view-switch: view will be switched after a failure detection, e.g.,
view-change
inPBFT
andround-change
inTendermint
- Active view-switch: view will be switched for each new request, e.g.,
new-view
inBasic HotStuff
.
The view-switch process can occur either when some replicas have committed \(i\) with a request \(r\) or when no one has committed \(i\). According to the safety requirement, a correct replica should reject a different request \(r'\) (\(r'\neq r\)) at \(i\) in the new view. Therefore, it is necessary to judge if \(r\) has been committed and reject/accept \(r'\) based on the judgement.
There are mainly two ways to design the judgement rules: (1) based on the collection of other replicas' opinions; (2) based on its local status. In the PBFT
-like routine, the first judgement rule is picked. The second judgement rule will be presented in the next blog (e.g., Tendermint
-like routine).
However, with an one-phase + view-switch
protocol, it is challenging or even impossible for a replica without committing \(i\) to judge if someone else has committed \(i\). We elaborate on the reason for the first judgement rule here, with the second rule to be discussed in the next blog.
A. Judgement rule 1: based on collection of other replicas' opinions
This rule combined with the view-switch specification can be considered as a simplified version of PBFT
(one-phase PBFT
). The rule can be described as follows:
- When the view is changed, each replica broadcasts its opinion on an index \(i\). If it has voted for a request \(r\) at \(i\) before, it will broadcast an opinion of {\(r\), \(i\)}; otherwise, it broadcasts an opinion of {
empty
, \(i\)}. - A replica will try to collect 2/3+ replicas' opinions on the index \(i\). If 2/3 opinions are of the format {\(r\), \(i\)}, the replica can judge that \(i\) has been committed with \(r\) before; or if 2/3 opinions are of the format {
empty
, \(i\)}, the replica can judge that \(i\) has been committed before.
However, if there are no matching 2/3 opinions collected by a replica, it will be stuck in a dilemma. Accepting a new request with \(i\) may vialate safety, while rejecting a new request with \(i\) may compromise liveness.
In fact, the judgement rule above is a little different from PBFT
. To be more specific, a replica broascasts its opinion in the above description. By contrast in PBFT
, it only sends its opinion to the new leader and the leader further packages and broadcasts the opinions. However, the underlying essentials are the same.
IV. Two-phase + view-switch protocol
From Section III., we know one-phase + view-switch
is not enough for a correct BFT consensus. PBFT
augments it with an additional phase, whose specification can be succinctly described as follows:
Two phases in
PBFT
are named asprepare
andcommit
respectively. It requires that a replica has to receive 2/3+commit
messages before it finally commits (executes) a request.When the view is changed, each replica broadcasts its
prepared
opinions on \(i\), in the format {prepared
, \(r\), \(i\)}.Other parts are quite similar to Section III.A.
A. How does PBFT
achieve safety?
If \(i\) is committed with a request \(r\), it must have been prepared
by 2/3+ replicas. Collection of 2/3+ prepared
messages in the view-switch process can ensure that if \(i\) has been committed with \(r\) before, at least one prepared
opinion of {prepared
, \(r\), \(i\)} will be received by each replica. Since a prepared
opinion is consisted of 2/3+ votes, it is enough for a replica to make judgements that \(i\) is committed with at most one request \(r\) and it will only vote for \(r\). Therefore, safety is guaranteed.
B. How does PBFT
achieve liveness?
On one hand, each replica only needs to wait for 2/3+ messages, which can be assured by the model assumption. On the other hand, all the honest replicas can make a consistent and correct judgement. A new uncontradicted request will finally receive 2/3+ votes (prepare
and commit
messages) and got committed.
Reference
- Castro, M., & Liskov, B. "Practical byzantine fault tolerance." In Proceedings of the 3rd USENIX Symposium on Operating Systems Design and Implementation (OSDI'99), pp. 173-186.
- Yin, M., Malkhi, D., Reiter, M. K., Gueta, G. G., & Abraham, I. "HotStuff: BFT consensus with linearity and responsiveness." In Proceedings of the 2019 ACM Symposium on Principles of Distributed Computing (PODC'19), pp. 347-356.