Byzantine generals problem explained

A reliable computer system must be able to function even if one or more of its components fails. A failing component may display a frequently overlooked behavior: delivering contradicting data to different sections of the system. So, what is the Byzantine generals problem? The Byzantine generals problem is an abstract expression of the problem of dealing with this type of failure.

The Byzantine generals problem is a game theory problem that describes how difficult it is for dispersed parties to reach a consensus without the help of a trusted central party. How can members of a network agree on a specific reality when no one can verify the identities of other members?

Game theory is a framework for thinking about social events with competing actors. In a strategic environment, game theory conceives social circumstances among competing participants and produces optimal decision-making of autonomous and competing agents.

The Byzantine generals are based on a game theory analogy. The problem is that multiple generals besiege Byzantium. They've encircled the city, but they must decide when to assault as a group. They will win if all generals assault simultaneously; however, they will lose if they attack. 

Because any letters they transmit or receive could have been intercepted or deceptively sent by Byzantium's defenders, the generals have no secure communication channels with one another. How can the generals coordinate simultaneous attacks?

This article aims to explain what a Byzantine fault in blockchain is and how to solve the Byzantine generals problem. 

The research paper

"The Byzantine Generals Problem," a research article by Leslie Lamport, Robert Shostak and Marshall Pease, was published in 1982. The importance of this problem is evident from the opening page, which notes that the National Aeronautics and Space Administration (NASA), the Ballistic Missile Defense Systems Command and the Army Research Office all financed their research. 

Although the Byzantine generals problem had been studied in computer science before 1982, this was one of the first attempts to translate it into parallel and proposed solutions. The following analogy illustrates the Byzantine generals problem. Several divisions of the Byzantine army are stationed just outside of an enemy’s city, prepared for war. The only way for various generals to connect is through a messenger. They must agree on a course of action. 

However, we must presume that certain generals, intent on keeping loyal generals from deciding on a single course of action, are traitors. To ensure that a tiny group of traitors cannot interrupt communications, an algorithm is required. 

To address the problem of the Byzantine generals, loyal generals need a safe means to agree on a plan (known as consensus) and carry it out (known as coordination). While solving the Byzantine generals problem is a difficult task, we now better understand the fundamental problem. It's critical to note that, as the example suggests, the concept can be applied to military communications. 

However, this issue affects all types of computer systems, not only those used in military applications. The Byzantine generals problem must be solved if a dispersed group of nodes (e.g., computers or other physical devices) needs to achieve reliable communications.

Understanding Byzantine fault tolerance (BFT)

There are several reasons why a distributed computer system could fail. In the military scenario above, Byzantine failures are essentially traitors attempting to interrupt communications between loyal generals.

This could be a software defect, a hardware malfunction, or a malicious attack when applied to real-world computer systems. To put it another way, Byzantine failures don't always have to be the result of a well-coordinated effort by a bad actor. There can be difficulties that hinder nodes from reaching consensus on distributed networks.

Any system failure that exhibits diverse symptoms to different observers is referred to as a Byzantine fault. It contains no constraints and assumptions about the type of behavior that a node can exhibit (e.g., a node can generate arbitrary data while posing as an honest actor).

In every distributed computer system, Byzantine failures are virtually unavoidable.

Let's imagine there's a power outage and all of the nodes go offline simultaneously. Now, the question arises if the network is still operational and capable of sustaining reliable communication? Or does the system as a whole stop working or become open to attacks all of a sudden?

In a reasonably secure network, anything as minor as a few offline nodes has no discernible effect on the network. Byzantine fault tolerance is the ability to defend against these conditions. Networks that can endure more Byzantine failures are said to have a higher tolerance, implying that they are more secure than those that can't.

The actual incidence and taxonomy of Byzantine faults in various systems is a vast and challenging subject. It can, however, be specified in such a way that a formal definition of Byzantine fault tolerance emerges.

It's worth noting that Byzantine flaws are the most serious and difficult to correct. Byzantine fault tolerance is required in nuclear power plants, aviation engine systems and pretty much any system whose actions are dependent on the results of a large number of sensors.

Byzantine generals problem in a distributed system

Only decentralized systems are susceptible to the Byzantine generals problem, as they lack a dependable source of information and have no way of confirming the information they get from other network users. In centralized systems, an authority is trusted to disseminate accurate information while preventing the spread of erroneous or fraudulent information across the network.

For example, in the traditional financial system, banks are trusted to provide clients with accurate balances and transaction histories. If a bank tries to deceive or mislead its consumers, the central bank or government is authorized to restore faith.

The Byzantine generals dilemma, which inconsistently necessitates the establishment of truth, is not solved by centralized systems. Instead, they choose not to confront the problem at all, opting for efficiency over trustworthiness. Centralized systems, on the other hand, are prone to central authority corruption.

Byzantine generals problem example

The Byzantine generals problem is exemplified by money. How could a society build a monetary system that all members can trust and agree on? Societies have used precious metals or other rare items, such as shells or glass beads, as currency for most of history. Gold addressed the Byzantine generals problem because it was trusted and recognized across decentralized systems like international trade.

Gold's weight and purity, on the other hand, remained untrustworthy until now. The failure of gold to completely address the Byzantine generals problem led to the establishment and issuance of money being taken over by trusted central bodies, mainly governments. Governments monopolized mints to instill confidence in the currency's weight and purity. Therefore, Byzantine failure was not solved by centralized systems.

Moreover, the trusted central authorities for money, governments, have betrayed that confidence by seizing, debasing or modifying it. To solve the Byzantine generals problem, a currency must be verifiable, counterfeit-resistant and untrustworthy. This accomplishment was not accomplished until the advent of Bitcoin (BTC).

How do you solve Byzantine generals problem?

The problem can be solved by implementing a protocol that employs fault-tolerant mechanisms. When faced with uncertainty, adopting a procedure among the generals is the best method to make choices. 

As a result, it becomes probabilistic rather than deterministic because nothing can be guaranteed. That is precisely the case when there is less direct communication among peers, and each is self-contained. Because each general is in a different place, there is a physical separation between them.

Solutions for achieving Byzantine fault tolerance

Blockchain: The solution for the Byzantine general problem

The Byzantine general problem can be solved with the help of a blockchain. It's all about giving people a way to communicate safely and securely in an unpredictable world. In the actual world, most transactions occur between strangers who do not know or trust one another. 

Each individual is like a general, waiting for orders to attack or defend their position. There are no middlemen to arbitrate the attack on your behalf; you are entirely on your own in making your decision.

A blockchain creates a layer that can be trusted without needing to trust every individual. This is accomplished by a network of nodes coming together to agree on the truth before it is recorded. If the general is unsure about the substance of the communication, the other generals can verify it using what they know to be true.

Once one node has recorded it, a copy is sent to all other nodes in the network, making the information redundant. The PoW consensus algorithm is designed to achieve this goal. Bad actors will still try to game the system because the information isn't always accurate. 

Since the system was designed to be utilized by the general public, fault-tolerant mechanisms and security are in place in a blockchain. In this scenario, cryptography was required to ensure that the messages could not be altered. 

The system provides key pairs for digitally signing a communication to verify identity as proof that it comes from the persons purported to have sent it. Once messages have been authenticated, they are recorded for transparency and historical proof of accountability.

How does Bitcoin solve the Byzantine generals problem?

With regard to money, Bitcoin was the first realized solution to the Byzantine generals problem. Many plans and projects before Bitcoin attempted to create money that was independent of the government, but they all failed in some way.

Bitcoin needs the means to handle ownership and avoid double-spending as a monetary system. Bitcoin employs a blockchain, or a public, distributed ledger, that stores a history of all transactions to accomplish this in a trustless manner. The blockchain, in the Byzantine generals analogy, is the truth that all parties must agree on.

If all the nodes in the Bitcoin network could agree on which transactions happened when and in what order, they could verify Bitcoin ownership and create a working, trustless monetary system without the need for a centralized authority.

Proof-of-Work (PoW) and the Byzantine generals problem

Satoshi Nakamoto released the first Bitcoin whitepaper in October 2008. Although the name "Byzantine generals problem" isn't used in this document, Nakamoto effectively provided a solution that would be implemented in January 2009 with the introduction of the Bitcoin network.

Satoshi devised a means to use cryptographic security and public-key encryption to answer the Byzantine general problem in a digital electronic network. To prevent data tampering, cryptographic security uses hashing, a process of encoding. The identity of a network user is verified via public key encryption.

A transaction is secured in a block that is connected to other blocks by its hash value in cryptographic security. All hashes may be tracked down to the root of all hashes, which is an initial block. The blockchain is a system that uses a Merkle Tree to verify hashes that come from a genesis block. 

Every block in the network that comes from the first block, also known as the genesis block, is valid. Miners validate blocks, who compete with other miners to solve cryptographic puzzles to produce blocks as part of a PoW consensus method.

By employing a proof-of-work consensus mechanism, Bitcoin overcame the Byzantine generals problem and established a clear, objective rulebook for the blockchain. To add information to the blockchain, called blocks, a network member must publish proof that they put a lot of effort into making the block. This work comes at a high cost to the creator, incentivizing them to share accurate information.

There can be no disagreement or tampering with the information on the Bitcoin network because the rules are objective. The system for selecting who can mint new Bitcoin and the laws regulating which transactions are valid or invalid are both objectives. Furthermore, it is impossible to remove a block from the blockchain after it has been added, making Bitcoin's history unchangeable.

Therefore, the Byzantine generals problem is solved by miners who are similar to generals in Satoshi's version of the blockchain. Each node is responsible for validating transactions, which are similar to messages delivered to the generals. Bad actors (for example, hackers) who aim to steal messages or harm the network can be considered the enemy.

Hackers (i.e., the man-in-the-middle) cannot readily attack the blockchain because messages use cryptographic security. To prevent manipulation, the messages or transactions are bundled into blocks and hashed for added protection. Satoshi makes things more probabilistic by putting miners in a competition to validate the blocks. This makes it more decentralized because no single miner can receive all of the rewards by monopolizing validation. 

Instead, miners must compete to solve a riddle by using their computational power, known as the hash rate. The higher a miner's hash rate, the more likely they are to solve the puzzle. When the miner who has solved the puzzle broadcasts the solution to the network, all other miners must validate or deny the value if it is erroneous. A difficulty target is a value that must be equal to or less than the correct value.

Members of the Bitcoin network can thus agree on the status of the blockchain and all transactions in blockchain at any time. Each node verifies whether blocks are valid according to the proof-of-work criterion and whether transactions are valid according to additional criteria.

If a network member tries to broadcast misleading information, all nodes on the network will detect it as objectively invalid and ignore it. There is no need to trust other members of the Bitcoin network because each node can verify every information on the network, itself, making Bitcoin a trustless system.

The blockchain is also decentralized, which means the system should not have a single point of failure. The blocks are saved in a distributed database, which is replicated across the network. This redundancy also aids in fault tolerance, ensuring that no single malfunctioning computer may bring the entire system down. This is the equivalent of having many messengers in case one is ambushed by the enemy. The message will not be lost because it will be copied by other messengers.

The novel solutions: Proof-of-stake (PoS) and Delegated proof-of-stake (DPoS)

PoS is another blockchain consensus mechanism that seeks to address the Byzantine generals problem. It was first deployed in 2012. PoS-based networks, unlike PoW-based networks, are not reliant on cryptocurrency mining. Instead, a technique called staking is performed. 

Users (called validators) stake funds in this system. Validators who own more coins on a blockchain can validate more blocks and earn greater rewards. Users that attempt to validate incorrect transactions risk losing their staked cash.

Users can stake coins using normal home computers instead of needing specialized machines for mining in a PoW-based network. Several PoS-based networks have created ways to prevent double-spending attacks and other potential security vulnerabilities caused by Byzantine failures. For example, Ethereum 2.0 (Serenity) will use the Casper PoS algorithm, which requires a two-thirds majority of nodes to agree on a block before it can be created.

Delegated proof-of-stake is a blockchain consensus technique that operates similarly to proof-of-stake and was first developed in 2014. Both require users to put money on the line. Only a few users (known as delegates) can validate transactions and generate blocks in DPoS-based networks.

In general, any user can stake a blockchain's coins to cast a vote in support of a delegate candidate. Block rewards are usually distributed proportionally to the amount of monies staked in delegate elections by elected nodes to their voters.

Nodes can reach consensus considerably faster using DPoS than they can with PoW or PoS. At scale, this means transactions can be handled significantly quicker. Maintaining a high level of Byzantine fault tolerance with DPoS may become problematic in some cases due to the tradeoff. 

Because fewer nodes are accountable for keeping the network safe, it is potentially easier for nodes to conspire against the majority's best interests. DPoS-based networks, on the other hand, try to avoid this scenario by holding delegate elections regularly to ensure that delegates are held accountable for their decisions.