In this talk I will discuss my work on RAMCloud and its novel fast crash
recovery system. RAMCloud is a datacenter storage system that stores all data
in DRAM. Rather than replicating in DRAM for redundancy, it provides
inexpensive durability and availability by recovering quickly after crashes.
RAMCloud scatters backup data across thousands of disks, and it harnesses
hundreds of servers in parallel to reconstruct lost data. The system uses a
log-structured approach for all its data, in DRAM as well as on disk; this
provides high performance both during normal operation and during recovery.
RAMCloud employs randomized techniques to manage the system in a scalable and decentralized fashion. In a 60-node cluster, RAMCloud recovers 35 GB of data
from a failed server in 1.6 seconds. Measurements suggest that the approach
will scale to recover larger memory sizes in less time with larger clusters.