Vitalik Talks About The Purge: The Long-Term Sustainability Path of Ethereum

2025-07-24 20:54:48

Vitalik: The Possible Future of Ethereum, The Purge

One challenge facing Ethereum is that, by default, the expansion and complexity of any blockchain protocol tend to increase over time. This occurs in two ways:

Historical Data: Any transaction conducted and any account created at any point in history must be permanently stored by all clients and downloaded by any new client to fully synchronize with the network. This will lead to increasing client load and synchronization time over time, even if the chain's capacity remains unchanged.
Protocol Features: Adding new features is much easier than removing old ones, leading to increased code complexity over time.

In order for Ethereum to sustain itself in the long term, we need to apply strong counter-pressure to these two trends, reducing complexity and inflation over time. However, at the same time, we need to retain one of the key attributes that makes blockchain great: persistence. You can place an NFT, a love letter in a transaction call data, or a smart contract containing 1 million dollars on the chain, enter a cave for ten years, and when you come out, you find it still there waiting for you to read and interact with. For DApps to confidently go fully decentralized and remove upgrade keys, they need to be assured that their dependencies will not upgrade in ways that would break them - especially L1 itself.

If we are determined to strike a balance between these two demands and minimize or reverse bloating, complexity, and decline while maintaining continuity, it is absolutely possible. Living organisms can achieve this: while most organisms age over time, a few lucky ones do not. Even social systems can have very long lifespans. In some cases, Ethereum has already succeeded: proof of work has disappeared, the SELFDESTRUCT opcode has largely vanished, and beacon chain nodes have stored old data for up to six months. Finding this path for Ethereum in a more general way and moving towards a long-term stable outcome is the ultimate challenge for Ethereum's long-term scalability, technological sustainability, and even security.

The Purge: Main Objectives

Reduce client storage requirements by minimizing or eliminating the need for each node to permanently store all historical records or even the final state.
Reduce protocol complexity by eliminating unnecessary features.

History expiry

What problem does it solve?

As of the time of writing, a fully synchronized Ethereum node requires about 1.1 TB of disk space to run the client, in addition to several hundred GB of disk space for the consensus client. The vast majority of this is historical data: data about historical blocks, transactions, and receipts, most of which are several years old. This means that even if the Gas limit does not increase at all, the size of the node will continue to grow by several hundred GB each year.

What is it, and how does it work?

A key simplifying feature of the historical storage problem is that, since each block points to the previous block through hash links (and other structures), achieving consensus on the current block is sufficient for achieving consensus on the history. As long as the network reaches consensus on the latest block, any historical block, transaction, or state (account balance, nonce, code, storage) can be provided by any single participant along with a Merkle proof, and that proof allows anyone else to verify its correctness. Consensus is an N/2-of-N trust model, while history is an N-of-N trust model.

This provides us with many options for how to store historical records. One natural choice is a network where each node only stores a small portion of the data. This is how seed networks have operated for decades: while the network as a whole stores and distributes millions of files, each participant only stores and distributes a few of those files. Perhaps counterintuitively, this method may not even necessarily reduce the robustness of the data. If we can make running nodes more economically feasible, we could establish a network with 100,000 nodes, each storing a random 10% of the historical records, meaning each piece of data would be replicated 10,000 times - exactly the same replication factor as a network of 10,000 nodes, where each node stores everything.

Now, Ethereum has begun to move away from the model where all nodes permanently store all history. Consensus blocks (i.e., the parts related to proof of stake consensus) only store for about 6 months. Blobs are stored for about 18 days. EIP-4444 aims to introduce a one-year storage period for historical blocks and receipts. The long-term goal is to establish a unified period (possibly around 18 days) during which each node is responsible for storing everything, and then establish a peer-to-peer network composed of Ethereum nodes that stores old data in a distributed manner.

Erasure codes can be used to improve robustness while keeping the replication factor the same. In fact, the Blob has already implemented erasure coding to support data availability sampling. The simplest solution is likely to reuse these erasure codes and also include execution and consensus block data in the blob.

What is the connection between ### and existing research?

EIP-4444
Torrents and EIP-4444
Portal Network
Portal Network and EIP-4444
Distributed storage and retrieval of SSZ objects in Portal
How to increase gas limit (Paradigm)

What else needs to be done, and what needs to be weighed?

The remaining main tasks include building and integrating a specific distributed solution to store historical records------at least the execution history, but ultimately also including consensus and blob. The simplest solution is to simply introduce existing torrent libraries and the Ethereum native solution known as the Portal Network. Once either of these is introduced, we can open EIP-4444. EIP-4444 itself does not require a hard fork, but it does require a new version of the network protocol. Therefore, enabling it for all clients at the same time is valuable; otherwise, there is a risk of clients failing due to the expectation of downloading the complete historical records when connecting to other nodes, but not actually retrieving them.

The main trade-off involves how we strive to provide "ancient" historical data. The simplest solution is to stop storing ancient history tomorrow and rely on existing archival nodes and various centralized providers for replication. This is easy, but it undermines Ethereum's status as a permanent record repository. A more difficult but safer approach is to first build and integrate a torrent network to store historical records in a distributed manner. Here, "how hard we work" has two dimensions:

How do we strive to ensure that the largest set of nodes actually stores all the data?
How deep is the integration of historical storage into the protocol?

An extreme paranoid approach to (1) would involve custodial proof: effectively requiring each proof-of-stake validator to store a certain proportion of historical records and regularly check in an encrypted manner whether they are doing so. A more moderate approach would be to set a voluntary standard for the percentage of history stored by each client.

For (, the basic implementation only involves the work that has been completed today: the Portal has stored the ERA files containing the entire Ethereum history. A more thorough implementation would involve actually connecting it to the synchronization process so that if someone wants to sync a full history storage node or an archive node, they can achieve it through direct synchronization from the portal network, even if no other archive nodes are online.

) How does it interact with other parts of the roadmap?

If we want to make running or starting a node extremely easy, then reducing historical storage requirements can be said to be more important than statelessness: of the 1.1 TB required by the node, about 300 GB is state, and the remaining approximately 800 GB has become historical. Only by achieving statelessness and EIP-4444 can we realize the vision of running an Ethereum node on a smartwatch with just a few minutes of setup.

Limiting historical storage also makes newer Ethereum nodes more feasible, as they only support the latest version of the protocol, making them simpler. For example, many lines of code can now be safely deleted because the empty storage slots created during the DoS attack in 2016 have all been removed. Now that the shift to proof of stake has become history, clients can safely remove all code related to proof of work.

State expiry

( What problem does it solve?

Even if we eliminate the need for clients to store historical records, the storage requirements of clients will continue to grow, approximately 50 GB per year, as the state continues to grow: account balances and random numbers, contract code and contract storage. Users can pay a one-time fee, thus imposing a burden on current and future Ethereum clients forever.

The status is harder to "expire" than history, because the EVM is fundamentally designed around the assumption that once a state object is created, it will always exist and can be read by any transaction at any time. If we introduce statelessness, some argue that this issue may not be so bad: only specialized block builder classes need to actually store state, while all other nodes (even those generating lists!) can run statelessly. However, there is a viewpoint that we do not want to rely too much on statelessness, and ultimately we may want to make the status expire to maintain the decentralization of Ethereum.

) What is it and how does it work

Today, when you create a new state object (which can happen in one of three ways: (i) sending ETH to a new account, (ii) creating a new account using code, (iii) setting a previously untouched storage slot), the state object remains in that state indefinitely. Instead, what we want is for the object to automatically expire over time. The key challenge is to do this in a way that achieves three goals:

Efficiency: No need for a large amount of extra computation to run the expiration process.
User-friendliness: If someone enters the cave for five years and comes back, they should not lose access to their ETH, ERC20, NFT, and CDP positions...
Developer friendliness: Developers do not have to switch to a completely unfamiliar thinking model. In addition, applications that are currently rigid and not updated should be able to continue functioning normally.

It is easy to solve problems if these goals are not met. For example, you could have each state object also store an expiration date counter (which can be extended by burning ETH, and this may automatically occur whenever read or written), and have a process that loops through the states to remove state objects with expired dates. However, this introduces additional computation (and even storage requirements), and it certainly cannot meet the requirements for user-friendliness. Developers also find it difficult to reason about edge cases where stored values sometimes reset to zero. If you set an expiration timer within the contract scope, it would technically make the developer's life easier, but it complicates the economics: developers must consider how to "pass on" the ongoing storage costs to users.

These are issues that the Ethereum core development community has been working to address for many years, including proposals such as "blockchain rent" and "regeneration." Ultimately, we combined the best parts of the proposals and focused on two categories of "known least bad solutions:"

Partial status expiration solution
Suggestions for state expiration based on address cycles.

Partial state expiry

Some expired proposals follow the same principles. We divide the status into blocks. Each person permanently stores the "top-level mapping", which indicates whether a block is empty or not. Data in each block is only stored if the data has been accessed recently. There is a "revival" mechanism that is activated if it is no longer stored.

The main differences between these proposals are: (i) how we define "recent", and (ii) I

ETH1.91%

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

13 Likes