Introduction
Sharding is one of the most significant innovations in Ethereum 2.0 (eth2), alongside Proof of Stake (PoS). This proposal outlines a focused implementation called "data sharding", designed to store and verify the availability of approximately 250 kB of data per shard. By ensuring data availability, sharding provides a secure and high-throughput foundation for Layer 2 solutions like rollups.
To alleviate the burden of downloading all data, we combine two techniques:
- Randomly sampled committees for attestations.
- Data Availability Sampling (DAS) for lightweight verification.
Randomly Sampled Committees Explained
Imagine handling 16 MB of data per slot (eth2's initial capacity). We split this into 64 blobs, each 256 kB in size. With 6,400 validators in the PoS system, how do we verify the data without:
- Requiring everyone to download everything.
- Allowing attackers with few validators to compromise the system?
The Committee Approach
- Division of Labor: Validators 1-100 verify Blob 1, 101-200 verify Blob 2, etc.
- Attestations: Each committee signs off on their blob. The network accepts a blob if most committee members attest to its validity.
The Problem
An attacker controlling consecutive validators (e.g., 1971-2070) could dominate a single committee with just ~1.5% of total validators, enabling invalid blobs.
Solution: Random Sampling
- Shuffling: Use a hash-based Random Number Generator (RNG) to shuffle validator indices.
- Attack Resistance: Attackers can't manipulate committee assignments after validators are indexed. Controlling 1/3+ of validators is required for meaningful influence.
Data Availability Sampling (DAS) Demystified
DAS flips the committee model: clients sample data within blobs instead of across blobs.
How It Works
- Each client privately selects
Nrandom indices from a blob and requests data at those positions. - Goal: Verify ≥50% data availability. If <50% is available, clients reject the blob.
Why It’s Secure
- Efficiency: Clients download minimal data (~512 bytes per blob).
- Attack Resistance: Even a 51% attacker can't trick clients into accepting unavailable blobs.
Erasure Coding: A Safety Net
To prevent partial data releases (50–99%), we use erasure coding:
- Concept: Encode blobs so that ≥50% availability allows reconstruction of the full data.
- Example: A line defined by two points (
(1, 4),(2, 7)) can recover the remaining points. Extend this to polynomials for larger datasets.
Kate Commitments
Replace Merkle roots with polynomial commitments (e.g., Kate commitments) to prove correct evaluations without complex fraud proofs.
Committee vs. DAS: A Hybrid Approach
Why Committees Aren’t Enough
- 51% Attacks: Weakens defenses (invalid blocks can slip through).
- Threshold Challenges: Balancing validator thresholds is tricky.
- Quantum Resistance: DAS is slightly more future-proof.
Why DAS Needs Committees
- Novelty: DAS is untested; committees add reliability.
- Latency: DAS has higher latency.
- Edge Cases: Committees mitigate risks (e.g., proposer-targeting attacks).
Data Availability’s Role in Ethereum
Key Reads
👉 Why BitTorrent/IPFS Fall Short
Critical Insight: BitTorrent can’t achieve consensus on data availability, leaving room for attacks.
P2P Layer Mechanics
Subnet Architecture
- 2048 Horizontal Subnets: 1 per shard-slot pair.
- 2048 Vertical Subnets: 1 per blob index.
Blob Broadcast Process
- Head: Sent to the global subnet.
- Body: Sent to the relevant horizontal subnet.
- Sample Distribution: Peers propagate samples to vertical subnets.
Self-Healing Unpublished Blobs
- Reverse Distribution: Vertical → horizontal subnets.
- Reconstruction: With ≥50% samples, anyone rebuilds the blob.
- Redistribution: Push the reconstructed blob.
Beacon Chain Integration
- ShardHeaders: Proposed blobs are attested by committees (2/3 votes for confirmation).
- Fork Choice: Chains with invalid blobs are entirely rejected (tight coupling).
Low Validator Counts
Below 262,144 validators? Rotate shard assignments to maintain committee sizes (e.g., 50 shards per slot).
Economic Design
- EIP-1559-Like Fees: Adjust per-byte costs based on demand (target: 50% block capacity).
Security Assumptions
- Honest Minority DAS: Attackers must publish >50% of a blob’s data. With 20 samples per client and ~70 clients per shard, the system stays secure.
FAQ
1. Can sharding add execution later?
Yes. This design is forward-compatible (e.g., via fraud proofs or SNARKs).
2. Why combine committees and DAS?
Committee redundancy mitigates risks while DAS scales efficiently.
3. How does erasure coding improve security?
It ensures clients can reconstruct full data if ≥50% is available, preventing partial-data attacks.
👉 Explore Ethereum 2.0’s Roadmap
Additional Resources
Disclaimer: ECN translations aim to bridge the language gap—always refer to original sources for authoritative content.