Big Data & Analytics

New Features of Apache Cassandra 2.2 and 3.0

Cassandra

Towards the end of 2015 Apache Cassandra’s 2.2 and 3.0 releases were rolled out. Another significant development was the announcement to follow “tick-tock” releases from 3.x onwards with odd releases having bug-fixes and even releases having bug-fixes as well as new features.

In this blog we will look at some of the features added in the 2.2 and 3.0 releases and emphasize on how these new features will take Cassandra ahead in terms of ease of development and make it more powerful.

JSON support (2.2 feature)

Looking at the need of supporting multiple data structures in the database for modern applications, JSON support has been added to the CQL. With this new feature the CQL is more powerful and can deal with JSON documents. A table is defined in the same way i.e. one can INSERT and SELECT data from the columns in the form of JSON documents. This increases the ease of development by reducing the dependency on data model objects.

Cassandra data types are not supported in JSON, only user-defined data types can be inserted as string literals and JSON maps respectively. Similarly data types like lists, sets and maps are also easily represented.

An INSERT statement will have the JSON document with the columns and their values as below:

INSERT INTO playlists JSON '{"id":1,"song_order":1,"song_id":8086,"title":"Song1","album":"Album1","artist":"Artist1"}';

Similarly, a SELECT statement will result in the following:

SELECT JSON * from playlists;
[json]

—————————————————————————————————————————————————————–

{"id": 1, "song_order": 1, "album": "Album1", "artist": "Artist1", "song_id": 8086, "title": "Song1"}

Cassandra provides a function from Json() to pass the value of a single column in JSON format in the INSERT or UPDATE query and toJson() returns the value of a single column in JSON format from a SELECT query. Both these functions can be applied on single columns only and are very useful while working with JSON input/output. Overall, this feature goes a long way in increasing development productivity while working with JSON documents.

Materialized Views (3.0 feature)

This is definitely the most useful feature in recent releases for developers looking to reduce the burden of handling denormalization.

Denormalization is the norm in Cassandra where same set of data is duplicated in multiple tables and is handled with different primary key combinations for different use cases. However, it comes with an additional burden of keeping the data in each table consistent, hence consumes lot of development efforts and makes the table prone to inconsistencies.

Using Materialized Views one can create multiple views on the same table each having different primary key combinations. Thus, each view can be queried with its own primary key combination in the same way a table is queried. This feature provides very fast data lookup in the view. However, only eventual consistency is promised between the views and table.

For example:

A playlist table may need to be queried based on playlist_id (to get the entire playlist), song_order (to play songs sequentially) or song_id (to play a particular song in the playlist). Each of these use case requires a different partition and clustering (for sorting) key combination. The superset of all the primary key columns is chosen as the primary key of the base table. The views generated on the table will have their own partition and clustering key combinations from this superset.

Our table would look like the following:

CREATE TABLE playlists (
id int,
song_order int,
song_id int,
title text,
album text,
artist text,
PRIMARY KEY (id, song_order,song_id ));

The first use case can be satisfied using the following view:

CREATE MATERIALIZED VIEW entire_playlist AS
SELECT * FROM playlists
WHERE playlist_id IS NOT NULL AND song_id IS NOT NULL AND song_order IS NOT NULL
PRIMARY KEY (playlist_id, song_order,song_id)
WITH CLUSTERING ORDER BY (song_order desc);

If you see the playlist_id is the partition key for this view and the results are sorted on song_order.

The important thing to note here is that any alteration to the base table in terms of INSERT, UPDATE and DELETE is reflected on all the views. One can’t DROP a base table when views are dependent on it. Materialized Views have poorer write performances compared to normal tables due to extra data consistency checks. Having low cardinality data can put the burden of serving the views on fewer nodes.

Overall, this feature can be called as work in progress with support for complex SELECT statements to be introduced in later releases. Nevertheless it takes away the concern of handling denormalization from the client-side code and eliminates the possibility of data inconsistencies to a large extent.

Role Based Access Control (2.2 feature)

This feature allows user and permission management at the database level. It again eliminates the effort that was earlier required to provide external authorizations. Moreover, it allows database administrators to very efficiently assign permissions to database users.

The uniqueness about this feature lies in the fact that instead of assigning permissions to a user it assigns permissions to a role. A role can further be assigned to another role thus allowing the possibility to generate a tree like permission structure. Administrators can bundle permissions under roles and assign it to database users.

For example:

CREATE ROLE my_user WITH PASSWORD = 'xoriant123#' AND LOGIN = true;
GRANT SELECT ON music_service.playlists TO my_user;
CREATE ROLE new_user;
GRANT MODIFY ON music_service.playlists TO new_user;
GRANT my_user to new_user;

In the above example we can see how authentication is provided for my_user role. The new_user role inherits SELECT permission on the playlists table from my_user while it already has MODIFY permission. The logged in user who creates my_user automatically gets all its permissions. In this way permissions and roles can be inherited and hierarchies can be created for the database. Superuser permissions can also be assigned at role level.

All the above makes an administrator’s task of creating required permissions very structured and easy to manage. Future releases can see more additions to authorization and authentication in Cassandra.

All the features discussed here meet the demand of faster and easier development. Some of the features also bring Cassandra’s functionality close to relational databases. Both 2.2 and 3.0 releases have introduced bug fixes as well as new features. More can be seen in the future releases which are planned to arrive one in each month.