Blockchain has become a buzzword grown out of Bitcoin and subsequent cryptocurrencies. The original Bitcoin paper did not use the term blockchain and it took some time for the term to emerge, in order to describe the underlying technology that permits implementing digital currencies and other applications.
Consequently, and also due to the decentralised nature of blockchain communities, there is no official definition of the term. Many definitions out there are aimed at the investment community or the general public, which is fine, but means they lack a technical depth. Other definitions merely reflect the lack of understanding of the author. Those with a background in Computer Science or Software Engineering that would like to understand the underlying technologies need more precise definition.
Therefore, here is a definition aimed at software engineers and computer scientists:
A blockchain is a linked list data structure, implementing an infinite state engine with immutable state transitions, on top of a peer-to-peer network with a byzantine failure model.
Let’s take a look at this definition one concept at a time.
Linked list data structure
The data held in a blockchain is represented in a list of block, with each block linking back to the previous one. The following diagram is from the original Bitcoin paper and shows how data (transctions) is organised in blocks.
Blocks are linked by including the cryptographic hash of the previous block. All blockchains have a similar way of organising data. However, just defining a blockchain as a linked list of blocks is not enough. Some additional properties are required.
Infinite state engine
Blockchain systems represent state engines. State engines are systems modelled as a series of state through which the system transitions. State engines are termed finite, if there is a finite number of possible system states. The possible states are known in advance. In the case of infinite state engines, there is an endless number of possible states.
Blockchains thus, are infinite state engines. Transactions take the system from one state to another. System state may be account balances, as in Bitcoin’s case, represented by the set of UTXOs (unspent transaction outputs). In more general purpose blockchains, such as Ethereum, state can represent any piece of data.
A virtual machine (VM) typically executes transactions to take the system form one state to another. The Bitcoin VM executes a domain specific script language encoded in transaction inputs and outputs. The Ethereum VM executes more complex operations, as it is touring complete (it can be used execute anything that can be modelled computationally), but the concept is the same.
Transactions, which we have just defined as state transitions, are immutable in a blockchain system. This means, that once a transaction has been confirmed, i.e. included in a valid block, it cannot be undone. State can only be reversed by issuing another transaction, but history of transactions cannot be modified.
Peer-to-peer network (P2P)
The data structure and the infinite state engine a system implements is no enough to make it a blockchain. A real blockchain executes on a P2P network. Each full node has a copy of the data-structure, and importantly, all nodes reach consensus on the correct version of the data. That is; all nodes agree on the same system state. A blockchain system thus implements various P2P protocols, such as node and neighbour detection, group communication protocols and consensus protocols.
Byzantine failure model
The biggest achievement of blockchains is probably, that they achieve all the above in a byzantine failure model. In distributed systems research a failure model is the assumption on what may go wrong in a system. For instance, the crash failure model assumes that nodes may crash and the the partition failure model assumes that nodes (or groups of nodes) may be temporarily isolated, due to network faults. Both failure models however assume that nodes try to behave correctly, i.e. there are no malicious participants.
In the byzantine failure model we assume that a node may behave in any way, i.e. system state integrity may not be the goal of every node and there may be malicious participants. Thus, no node is assumed to be trusted.
I was working in distributed systems research in the early 2000s and we managed to implement pretty reliable systems with crash and partition failure models, but byzantine failure models were considered extremely difficult. Blockchains present a working solution to this problem, which is why they can actually be used to model financial transactions in trustless systems.
Blockchains achieve operation with a byzantine failure model by implementing strict consensus mechanisms which include financial incentives for nodes to maintain state integrity, such as proof-of-work or proof-of-stake based models implemented by cryptocurrencies. This also explains why blockchains usually come with their own currency and why transactions tend to have an ascociated transaction fee.
Supporting a byzantine failure model is very costly. This is the main reason why blockchain solutions scale badly. It is important to think carefully wether a problem requires a blockchain based solution, or may work on a traditional centralised system or even a distributed system that does not require a byzantine failure model.
Relaxed definitions and technology advances
The above definition is muddied in some systems which are usually also grouped in the blockchain category. These include systems that relax the failure model by including some trusted nodes, for example systems that use proof-of-authority consensus schemes, in which blocks are created by a set of trusted nodes.
At the other end of the scale there are systems with similar properties that replace the underlying data structure. For example there are systems that build on directed acyclic graphs instead of linked lists of blocks, such as RaiBlocks and IOTA.