4.1.2 Understanding Cassandra data modeling
As I mentioned, Cassandra is quite different from a relational database. Before you can start mapping your domain types to Cassandra tables, it’s important to understand a few of the ways that Cassandra data modeling is different from how you might model your data for persistence in a relational database.
A few of the most important things to understand about Cassandra data modeling follow:
- Cassandra tables may have any number of columns, but not all rows will necessarily use all of those columns.
- Cassandra databases are split across multiple partitions. Any row in a given table may be managed by one or more partitions, but it’s unlikely that all partitions will have all rows.
- A Cassandra table has two kinds of keys: partition keys and clustering keys. Hash operations are performed on each row’s partition key to determine which partition(s) that row will be managed by. Clustering keys determine the order in which the rows are maintained within a partition (not necessarily the order in which they may appear in the results of a query). Refer to Cassandra documentation http://mng.bz/yJ6E for a more detailed explanation of data modeling in Cassandra, including partitions, clusters, and their respective keys.
- Cassandra is highly optimized for read operations. As such, it’s common and desirable for tables to be highly denormalized and for data to be duplicated across multiple tables. (For example, customer information may be kept in a customer table as well as duplicated in a table containing orders placed by customers.)
Suffice it to say that adapting the Taco Cloud domain types to work with Cassandra won’t be a matter of simply swapping out a few JPA annotations for Cassandra annotations. You’ll have to rethink how you model the data.
