Big data storage: Apache Cassandra 0.7 released
The Apache Cassandra project released version 0.7 of their distributed database server earlier today. Cassandra aims to provide a fully distributed, high-available, scalable database that can store huge amounts of data and process them fast. The project was open sourced by Facebook in 2008, and Cassandra is currently used by some of the largest sites on the internet such as Facebook and Twitter, with at least one installation storing over 100TB of data.The new version brings some long-awaited new features, such as:
- Secondary indexes. Up until now only indexes on a unique row key were supported; this feature will result in dramatic speed improvements for some workloads;
- Live schema updates. Or rather: automatic schema updates; previous version required manual updates to configuration files on all cluster nodes, resulting in lots of opportunities for human errors;
- Improved read performance;
- Several improvements to the CLI interface.
“Running any large website is a constant race between scaling your user base and scaling your infrastructure to support it,” said David King, Lead Developer at Reddit. “Our traffic more than tripled this year, and the transparent scalability afforded to us by Apache Cassandra is in large part what allowed us to do it on our limited resources. Cassandra v0.7 represents the real-life operations lessons learned from installations like ours and provides further features like column expiration that allow us to scale even more of our infrastructure.”
For a full list of changes check the release notes.