Signal Chooses Apache Cassandra for its Distributed Data Store

January 20, 2015

At Signal we’ve been running Apache Cassandra in production since late 2011. However, before settling on Cassandra, we evaluated it alongside other NoSQL contenders. The main benefit that led us to choose Cassandra is the automatic replication both intra- and inter-region. This allows us to focus on building out our dataset and platform instead of worrying about tricky replication issues.

Read about how we use Cassandra, the challenges we’ve overcome, and what we recommend for others to focus on over at Planet Cassandra:

Before settling on Cassandra a couple of us evaluated it alongside CouchDB, MongoDB and Riak. At that time Cassandra 0.8 was the latest release. For every one of the databases we worked up a trial to see what it would actually look like if we used this database as the backend. It might have been a little ridiculous to have four different backend implementations at one point, but it was extremely useful in our evaluation as we got first hand experience with with each of these technologies.

While we were flexible on the underlying data model, we had some hard constraints around performance, capacity and multi-data center replication. We wanted to make sure if a client updates their data in one region, it propagated to the other in a reasonable amount of time. Out of the box cross-data center replication was the ultimate requirement that led us to Cassandra over the other databases.

Finally if you’re in Chicago, stop by and see us at the Cassandra Tech Day on February 9, where I’ll be presenting on lessons learned from scaling Cassandra.

Matt Kemp

Matt Kemp is a Team Lead and engineer at Signal, where he works on much of the core back-end infrastructure. Straddling the line between Dev and Ops, he specializes in distributed systems, performance and monitoring. Prior to Signal, Matthew held roles at IMC Financial Markets and Orbitz Worldwide.

Subscribe for Updates
X