Why ABA Is Needed in HoneybadgerBFT?

While I was reading the HoneybadgerBFT paper, a question came to me is that "Reliable Broadcast (RBC) seems to implement the functions of a consensus algorithm already, why an extra phase of Asynchronous Binary Agreement (ABA) is needed"?

In this blog, we will try to answer this question.

I. An Overlooked Fact of Consensus: Round-by-Round Flow

A fact for almost all the consensus algorithms is that a practical consensus should be run round by round.

The reason is that requests will be continuously sent from the external clients to the consensus cluster. At a time, the cluster must select a subset from the currectly received requests and then just process the selected subset. This one-time process is exactly what Reliable Broadcast (RBC) does.

Only if the previous round of RBC (\(RBC_r\)) is ended, can the requests in the next RBC (\(RBC_{r+1}\)) be checked. In particular, if \(RBC_r\) is considered to be discarded, legitimacy of requests in \(RBC_{r+1}\) can be checked based on the status of round \(r-1\); Otherwise, legitimacy of requests in \(RBC_{r+1}\) must be checked based on the status of round \(r\).

Therefore, RBC must be processed round by round.

II. Drawbacks of RBC: Inefficiency or Even Halting

RBC can guarantee that for a correct replica, if it delivers a piece of data, it knows all the other correct replicas will definitely deliver the same data finally. However, if the replica has not delivered a piece of data, it does not know if others will deliver the data and if it should wait for it. The waiting time could be very long or even it will never deliver the data.

In other words, a round of RBC can be extremely inefficient or even never end. In the latter case, the system is just halted.

As stated in Section I, RBC of round \(r+1\) can only be processed after RBC of round \(r\) is ended. Therefore, the inefficiency or halting of a RBC round will lead to the inefficiency or halting of the whole system.

III. Functions of ABA

In short, ABA is used to deal with drawbacks of RBC. Particularly, it can stop an inefficient RBC timely and rescue the system from a possible halting status.

To be more specific, the input for ABA by a replica is based on the status if it has delivered the corresponding data from the RBC instance. If more than 2f+1 replicas inputting 1, it means the delivery speed is high. In this case, ABA can exactly return 1, indicating that each replica should wait for the data.

On the other hand, if a replica does not deliver the data from a RBC instance for a long time, it will input 0 for ABA. If more than 2f+1 replicas inputting 0, it means the delivery speed is quite slow. In this case, ABA can exactly return 0, indicating that each replica should stop waiting or discard the already received data.

A. How to define `a long time`?

As stated before, the choice to input 0 or 1 for ABA is made based on the definition of long time.

In HoneybadgerBFT, the protocol stipulates that if a replica has input 1 for at least 2f+1 ABA instances, it considers that it has waited for a long time and thus inputs 0 for the remaining ABA instances. This is quite reasonable.

Reference

Miller, A., Xia, Y., Croman, K., Shi, E., & Song, D. "The honey badger of BFT protocols." In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security (CCS'16), pp. 31-42.

I. An Overlooked Fact of Consensus: Round-by-Round Flow

II. Drawbacks of RBC: Inefficiency or Even Halting

III. Functions of ABA

A. How to define a long time?

Reference

A. How to define `a long time`?