Strict vs Eventual Consistency

October 6, 2017

Products, TechTalks

Apurv Gupta | Chief Architect

Don’t get hoodwinked by Eventually Consistent storage systems, when you really need Strict Consistency

As monolithic systems reached their limits, they started getting replaced with scale-out (distributed) systems. The trend started two decades ago in compute (mainframes were replaced with server farms) and then made its way to storage (databases, file systems). In the database world, the Relational vs NoSQL debate has been raging for some time.

Today, I want to talk to you about distributed data storage platforms and consistency. This is a very important attribute to consider when planning your storage infrastructure.

Let’s start with some basics. In a distributed system, the assumption is that individual nodes will fail, and the system must be resilient to node failures. Therefore, the data must be spread out across multiple nodes with redundancy.

In this context, let’s ask the following question::
“If I perform a write (or update) on one node, will I always see the updated data across all the nodes?”

It seems like an innocuous question – everyone would answer in the affirmative : “duh, of course!”.

But not so fast. This is in fact a hard problem to solve in distributed systems, especially while maintaining performance. Systems that make this guarantee are called “Strictly Consistent”. However – a lot of systems take the easy way out, and only provide Eventual Consistency.

Strict vs Eventual Consistency

Let’s define Strict and Eventual Consistency:

Strict consistency: For any incoming write operation, once a write is acknowledged to the client, the following holds true:

  • The updated value is visible on read from any node.
  • The update is protected from node failure with redundancy.

Eventual consistency: Weakens the above conditions by adding the word “eventually” and adds the condition “provided there are no permanent failures”.

Clearly, Strict Consistency is better because the user is guaranteed to always see the latest data, and data is protected as soon as it is written.

So, why don’t we always make systems Strictly Consistent? First – because under some scenarios, the implementation of Strict Consistency can significantly impact performance (Latency and Throughput).

Second – Strict Consistency isn’t always required and Eventual Consistency may suffice in some use cases. For example, in a shopping cart, say an item addition happened and the datacenter failed, it is OK for customers to add that item again. Eventual Consistency would be sufficient.

However, you wouldn’t want this happening to your bank account with a deposit you just made. It simply cannot vanish because a node failed somewhere in the distributed system. Strict Consistency is required.

Why Enterprise Storage Needs Strict Consistency

In enterprise storage, there are cases where Eventual Consistency is the right model. E.g., Cohesity offers this model for cross-site replication. But in a vast majority of cases, Strict Consistency is required.

Let’s look at a few examples where Cohesity beats the competition hands down by offering Strict Consistency where it’s needed:

1 – Scale-out file storage: It so happens that one of the leading scale-out file storage system provides only Eventual Consistency. The data is written to only one node (on NVRAM) and acknowledged. A recent customer told us that under heavy load, a node may get marked offline (effectively, it’s down), resulting in clients getting “File-Not-Found” errors for files they had successfully written just a few seconds prior. This wreaks havoc on their applications.

Cohesity solution: With Cohesity, this problem does not occur. The file is made visible to all nodes before the write is acknowledged. Even if an individual node goes down, the file is still available across the other nodes in the system.

2 – Instant recovery from backup. Next-generation scale-out backup solutions (Cohesity included) provide instant VM recovery from backup. Such solutions boot VMs from a copy of the backup image on the backup system. The backup system serves as primary storage for the duration of the recovery, until the data can be moved back to the original datastore using Storage vMotion. The advantage is clear: you are back in business ASAP.

However, some of the competing scale-out backup solutions only provide Eventual Consistency for the writes. Consequently, if a failure happens on the recovery node, the application fails and the system loses live production VM data.

Cohesity solution: Cohesity presents the recovery data on a distributed NFS volume with Strict Consistency for writes. Therefore, the data continues to be available with no data loss, even in the event of an individual node failure.

These considerations should be top of mind as you design your secondary storage system. And of course be sure to ask the different vendors about their underlying design principles! With Strict Consistency, data written to Cohesity is acknowledged only after it is safely written to multiple nodes in the cluster – not just to one node. This avoids a single point of failure. And that’s critical to avoid downtime and maintain business continuity.

To learn more about Cohesity’s SpanFS file system, and how it provides you with the highest levels of performance, resilience, and scale, be sure to check out the SpanFS web page.

Leave a Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

*