While I was reading the HoneybadgerBFT
paper, a question came to me is that "Reliable Broadcast (RBC
) seems to implement the functions of a consensus algorithm already, why an extra phase of Asynchronous Binary Agreement (ABA
) is needed"?
In this blog, we will try to answer this question.
I. An Overlooked Fact of Consensus: Round-by-Round Flow
A fact for almost all the consensus algorithms is that a practical consensus should be run round by round.
The reason is that requests will be continuously sent from the external clients to the consensus cluster. At a time, the cluster must select a subset from the currectly received requests and then just process the selected subset. This one-time process is exactly what Reliable Broadcast (RBC
) does.
Only if the previous round of RBC
(\(RBC_r\)) is ended, can the requests in the next RBC
(\(RBC_{r+1}\)) be checked. In particular, if \(RBC_r\) is considered to be discarded, legitimacy of requests in \(RBC_{r+1}\) can be checked based on the status of round \(r-1\); Otherwise, legitimacy of requests in \(RBC_{r+1}\) must be checked based on the status of round \(r\).
Therefore, RBC
must be processed round by round.
II. Drawbacks of RBC: Inefficiency or Even Halting
RBC
can guarantee that for a correct replica, if it delivers a piece of data, it knows all the other correct replicas will definitely deliver the same data finally. However, if the replica has not delivered a piece of data, it does not know if others will deliver the data and if it should wait for it. The waiting time could be very long or even it will never deliver the data.
In other words, a round of RBC
can be extremely inefficient or even never end. In the latter case, the system is just halted.
As stated in Section I, RBC
of round \(r+1\) can only be processed after RBC
of round \(r\) is ended. Therefore, the inefficiency or halting of a RBC
round will lead to the inefficiency or halting of the whole system.
III. Functions of ABA
In short, ABA
is used to deal with drawbacks of RBC
. Particularly, it can stop an inefficient RBC timely and rescue the system from a possible halting status.
To be more specific, the input for ABA
by a replica is based on the status if it has delivered the corresponding data from the RBC instance. If more than 2f+1 replicas inputting 1, it means the delivery speed is high. In this case, ABA
can exactly return 1, indicating that each replica should wait for the data.
On the other hand, if a replica does not deliver the data from a RBC
instance for a long time, it will input 0 for ABA
. If more than 2f+1 replicas inputting 0, it means the delivery speed is quite slow. In this case, ABA
can exactly return 0, indicating that each replica should stop waiting or discard the already received data.
A. How to define a long time
?
As stated before, the choice to input 0 or 1 for ABA
is made based on the definition of long time
.
In HoneybadgerBFT, the protocol stipulates that if a replica has input 1 for at least 2f+1 ABA
instances, it considers that it has waited for a long time and thus inputs 0 for the remaining ABA
instances. This is quite reasonable.
Reference
- Miller, A., Xia, Y., Croman, K., Shi, E., & Song, D. "The honey badger of BFT protocols." In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security (CCS'16), pp. 31-42.