The Bitcoin network
The Bitcoin network is a peer-to-peer (P2P) network where nodes perform transactions. They verify and propagate transactions and blocks. Nodes called miners also produce blocks. There are different types of nodes on the network. The two main types of nodes are full nodes and simple payment verification (SPV) nodes. Full nodes, as the name implies, are implementations of Bitcoin Core clients performing the wallet, miner, full blockchain storage, and network routing functions. However, it is not necessary for all nodes in a Bitcoin network to perform all these functions. SPV nodes or lightweight clients perform only wallet and network routing functionality.
Versioning information is coded in the Bitcoin client in the version.h
file, which is available here: https://github.com/bitcoin/bitcoin/blob/0cda5573405d75d695aba417e8f22f1301ded001/src/version.h#L9.
Some nodes are full blockchain nodes with a complete blockchain as they are more secure and play a vital role in block propagation, while some nodes perform network routing functions only but do not perform mining or store private keys (the wallet function). Another type of node is solo miner nodes, which can perform mining, store full blockchains, and act as Bitcoin network routing nodes.
There are a few nonstandard but heavily used nodes. These are called pool protocol servers. These nodes make use of alternative protocols, such as the stratum protocol. These nodes are used in mining pools. Nodes that only compute hashes use the stratum protocol to submit their solutions to the mining pool. Some nodes perform only mining functions and are called mining nodes. It is possible to run SPV software that runs a wallet and network routing function without a blockchain. SPV clients only download the headers of the blocks while syncing with the network. When required, they can request transactions from full nodes. Verifying transactions is possible by using a Merkle root in the block header with a Merkle branch to prove that the transaction is present in a block in the blockchain.
There are also different protocols that have been developed to facilitate communication between Bitcoin nodes. One such protocol is called Stratum. It is a line-based protocol that makes use of plain TCP sockets and human-readable JSON-RPC to operate and communicate between nodes. Stratum is commonly used to connect to mining pools.
Most protocols on the internet are line-based, which means that each line is delimited by a carriage return and newline \r \n
character. More detail on this protocol are available at this link: https://en.bitcoin.it/wiki/Stratum_mining_protocol.
A Bitcoin network is identified by its magic value. Magic values are used to indicate the message's origin network.
A list of these values is shown in the following table:
A full Bitcoin node performs four functions. These are wallet, miner, blockchain, and network routing. We discussed mining and blockchains in the previous chapter, Chapter 6, Introducing Bitcoin. We will focus on the Bitcoin network protocol and wallets in this chapter.
Before we examine how the Bitcoin discovery protocol and block synchronization work, we need to understand the different types of messages that the Bitcoin protocol uses. Also, note that the latest version of the Bitcoin protocol is 70015, which was introduced with Bitcoin Core client 0.19.0.1.
There are 27 types of protocol messages in total, but they're likely to increase over time as the protocol grows. The most commonly used protocol messages and an explanation of them are listed as follows:
- Version: This is the first message that a node sends out to the network, advertising its version and block count. The remote node then replies with the same information and the connection is then established.
- Verack: This is the response of the version message accepting the connection request.
- Inv: This is used by nodes to advertise their knowledge of blocks and transactions.
- Getdata: This is a response to
inv
, requesting a single block or transaction identified by its hash. - Getblocks: This returns an
inv
packet containing the list of all blocks starting after the last known hash or 500 blocks. - Getheaders: This is used to request block headers in a specified range.
- Tx: This is used to send a transaction as a response to the
getdata
protocol message. - Block: This sends a block in response to the
getdata
protocol message. - Headers: This packet returns up to 2,000 block headers as a reply to the
getheaders
request. - Getaddr: This is sent as a request to get information about known peers.
- Addr: This provides information about nodes on the network. It contains the number of addresses and address list in the form of an IP address and port number.
- Ping: This message is used to confirm if the TCP/IP network connection is active.
- Pong: This message is the response to a
ping
message confirming that the network connection is live.
When a Bitcoin Core node starts up, first, it initiates the discovery of all peers. This is achieved by querying DNS seeds that are hardcoded into the Bitcoin Core client and are maintained by Bitcoin community members. This lookup returns a number of DNS A records. The Bitcoin protocol works on TCP port 8333
by default for the main network and TCP 18333
for testnet.
DNS seeds are declared in the chainparams.cpp
file in the Bitcoin source code, which can be viewed on GitHub at the following link: https://github.com/bitcoin/bitcoin/blob/0cda5573405d75d695aba417e8f22f1301ded001/src/chainparams.cpp#L116.
First, the client sends a protocol message, version
, which contains various fields, such as version, services, timestamp, network address, nonce, and some other fields. The remote node responds with its own version
message, followed by a verack
message exchange between both nodes, indicating that the connection has been established.
After this, getaddr
and addr
messages are exchanged to find the peers that the client does not know. Meanwhile, either of the nodes can send a ping
message to see whether the connection is still active. getaddr
and addr
are message types defined in the Bitcoin protocol.
This process is shown in the following diagram of the protocol:
Figure 7.1: Visualization of node discovery protocol
This network protocol sequence diagram shows communication between two Bitcoin nodes during initial connectivity. Node A is shown on the left-hand side and Node B on the right. First, Node A starts the connection by sending a version message that contains the version number and current time to the remote peer, Node B. Node B then responds with its own version message containing the version number and current time. Node A and Node B then exchange a verack
message, indicating that the connection has been successfully established. After this connection is successful, the peers can exchange getaddr
and addr
messages to discover other peers on the network.
Now, the block download can begin. If the node already has all the blocks fully synchronized, then it listens for new blocks using the inv
protocol message; otherwise, it first checks whether it has a response to inv
messages and has inventories already. If it does, then it requests the blocks using the getdata
protocol message; if not, then it requests inventories using the getblocks
message. This method was used until version 0.9.3. This was a slower process known as the blocks-first approach and was replaced with the headers-first approach in 0.10.0.
The initial block download can use the blocks-first or headers-first method to synchronize blocks, depending on the version of the Bitcoin Core client. The blocks-first method is very slow and was discontinued on 16th February 2015 with the release of version 0.10.0.
Since version 0.10.0, the initial block download method named headers-first was introduced. This resulted in major performance improvement, and blockchain synchronization that used to take days to complete started taking only a few hours. The core idea is that the new node first asks peers for block headers and then validates them. Once this is completed, blocks are requested in parallel from all available peers. This happens because the blueprint of the complete chain is already downloaded in the form of the block header chain.
In this method, when the client starts up, it checks whether the blockchain is fully synchronized if the header chain is already synchronized; if not, which is the case the first time the client starts up, it requests headers from other peers using the getheaders
message. If the blockchain is fully synchronized, it listens for new blocks via inv
messages, and if it already has a fully synchronized header chain, then it requests blocks using getdata
protocol messages. The node also checks whether the header chain has more headers than blocks, and then it requests blocks by issuing the getdata
protocol message:
Figure 7.2: Bitcoin Core client >= 0.10.0 header and block synchronization
The preceding diagram shows the Bitcoin block synchronization process between two nodes on the Bitcoin network. Node A, shown on the left-hand side, is called an Initial Block Download (IBD) node, and Node B, shown on the right, is called a sync node.
IBD node means that this is the node that is requesting the blocks, while sync node means the node where the blocks are being requested from. The process starts by Node A first sending the getheaders
message, which is met with a getheaders
message response from the sync node. The payload of the getheaders
message is one or more header hashes. If it's a new node, then there is only the first genesis block's header hash. Sync Node B replies by sending up to 2,000 block headers to IBD Node A. After this, the IBD node, Node A, starts to download more headers from Node B and blocks from multiple nodes in parallel; that is, it acts as the IBD and receives multiple blocks from multiple nodes, including Node B. If the sync node does not have more headers than 2,000 when the IBD node makes a getheaders
request, the IBD node sends a getheaders
message to other nodes. This process continues in parallel until the blockchain synchronization is complete.
The Getblockchaininfo
and getpeerinfo
Remote Procedure Calls (RPCs) were updated with a new functionality to cater for this change. An RPC known as getchaintips
is used to list all known branches of the blockchain. This also includes headers-only blocks. Getblockchaininfo
is used to provide information about the current state of the blockchain. getpeerinfo
is used to list both the number of blocks and the headers that are common between peers.
Wireshark can also be used to visualize message exchange between peers and can serve as an invaluable tool to learn about the Bitcoin protocol. A sample of this is shown here. This is a basic example showing the version
, verack
, getaddr
, ping
, addr
, and inv
messages.
In the details, valuable information such as the packet type, command name, and results of the protocol messages can be seen:
Figure 7.3: A sample block message in Wireshark
A protocol graph showing the flow of data between the two peers can be seen in the preceding screenshot. This can help you understand when a node starts up and what type of messages are used.
In the following example, the Bitcoin dissector is used to analyze the traffic and identify the Bitcoin protocol commands.
The exchange of messages such as version
, getaddr
, and getdata
can be seen in the following example, along with the appropriate comment describing the message name.
This exercise can be very useful in order to learn about the Bitcoin protocol and it is recommended that the experiments be carried out on the Bitcoin testnet (https://en.bitcoin.it/wiki/Testnet), where various messages and transactions can be sent over the network and then be analyzed by Wireshark.
Wireshark is a network analysis tool and is available at https://www.wireshark.org.
The analysis being performed here by Wireshark shows messages being exchanged between two nodes. If you look closely, you'll notice that the top three messages show the node discovery protocol that we introduced earlier:
Figure 7.4: Bitcoin node discovery protocol in Wireshark
Nodes run different Bitcoin client software. The most common are full and SPV clients. We introduced these briefly at the start of this chapter, but we'll have a deeper look at these clients and related concepts, such as bloom filters, in the next sections.
Full client and SPV client
Bitcoin network nodes can fundamentally operate in two modes: full client or lightweight SPV client. Full clients are thick clients or full nodes that download the entire blockchain; this is the most secure method of validating the blockchain as a client. SPV clients are used to verify payments without requiring the download of a full blockchain. SPV nodes only keep a copy of block headers of the current longest valid blockchain. Verification is performed by looking at the Merkle branch, which links the transactions to the original block the transaction was accepted in. This is not very practical and requires a more pragmatic approach, which was implemented with BIP37 (you can see this at https://github.com/bitcoin/bips/blob/master/bip-0037.mediawiki), where bloom filters were used to filter for relevant transactions only.
Bloom filters
A bloom filter is a data structure (a bit vector with indexes) that is used to test the membership of an element in a probabilistic manner. It provides probabilistic lookup with false positives but no false negatives. This means that this filter can produce an output where an element that is not a member of the set being tested is wrongly considered to be in the set. Still, it can never produce an output where an element does exist in the set, but it asserts that it does not. In other words, false positives are possible, but false negatives are not.
Elements are added to the bloom filter after hashing them several times and then setting the corresponding bits in the bit vector to 1 via the corresponding index. To check the presence of an element in the bloom filter, the same hash functions are applied and then compared with the bits in the bit vector to see whether the same bits are set to 1.
Note that not every hash function (such as SHA1) is suitable for bloom filters as they need to be fast, independent, and uniformly distributed. The most commonly used hash functions for bloom filters are fnv
, murmur
, and Jenkins
.
These filters are mainly used by simple payment verification SPV clients to request transactions and the Merkle blocks that they are interested in. A Merkle block is a lightweight version of the block, which includes a block header, some hashes, a list of 1-bit flags, and a transaction count. This information can then be used to build a Merkle tree. This is achieved by creating a filter that matches only those transactions and blocks that have been requested by the SPV client. Once version messages have been exchanged and the connection is established between the peers, the nodes can set filters according to their requirements.
These probabilistic filters offer a varying degree of privacy or precision, depending on how accurately or loosely they have been set. A strict bloom filter will only filter transactions that have been requested by the node, but at the expense of the possibility of revealing the user addresses to adversaries who can correlate transactions with their IP addresses, thus compromising privacy.
On the other hand, a loosely set filter can result in retrieving more unrelated transactions but will offer more privacy. Also, for SPV clients, bloom filters allow them to use low bandwidth as opposed to downloading all transactions for verification.
BIP37 proposed the Bitcoin implementation of bloom filters and introduced three new messages to the Bitcoin protocol:
filterload
: This is used to set the bloom filter on the connection.filteradd
: This adds a new data element to the current filter.filterclear
: This deletes the currently loaded filter.
More details can be found in the BIP37 specification. This is available at https://github.com/bitcoin/bips/blob/master/bip-0037.mediawiki.
Now, we'll move to a different but relevant topic. So far, we've discussed that, on a Bitcoin network, there are full clients (nodes), which perform the function of storing a complete blockchain. If you cannot run a full node, then SPV clients can be used to verify that particular transactions are present in a block by only downloading the block headers instead of the entire blockchain. At times, even running an SPV node is not feasible (especially on low-resource devices such as mobile phones) and the requirement is only to be able to send and receive Bitcoin somehow. For this purpose, wallets (wallet software) are used that do not require downloading even the block headers. We'll introduce wallets and some different types next.