Consistency can be defined as an absence of a contradiction. For distributed event driven systems data state inconsistencies are a fact of life.
First lets consider a direct call system. In this case each component calls the other directly. At the end of the client call to the sales system all the other components are up to date with the request.
Consistency needs to be considered in terms of 'perspective', 'context' and 'scope'.
In the example above a key perspective is the client application working on behalf of the user. From the user’s 'perspective' consistency should be achieved when the request has completed. From an 'operational' perspective (one we are most interested in) the system handling the request is consistent at the completion of the request.
From an operational perspective, while the system handling the request might be consistent it is very unlikely that all of our systems are consistent. Logs are probably still in flight to our log aggregation system. The datawarehouse job has not yet run so the warehouse knows nothing about the transaction. So even in a apparently consistent system using direct calls inconsistency within some context or perspective is an issue. Typically an IT systems issue. Each system is likely to have different consistency scopes but here are some typical definitions.
A successful client transation results in a persisted state of that transation. This considers the service and its backing store to be a black box. For system consistency we can ignore aspects like database consistency as lower consistency levels specific to a service implementation.
The next level or scope of consistency. The service or service cluster is consistent. For Event Sourced, CQRS systems this level means that the read stores of the the service servicing the request are consistent with the command/write side that accepted the request.
Until Service/Cluster consistency is reached querying for an updated state will return previous state. This is a major challenge for CQRS event systems. A client has issued a request that has been accepted by the service (transnational consistency). It is reasonable for the client to assume that it can query for the results but until the service/cluster is consistent that query will either not find data or return previous values.
Client that accept asynchronous results are great in these situations. The original request can almost be considered fire-and-forget (need to check for errors). When the service is consistent the updated view can be published to the client asynchronously.
Defining this is tricky and depends on the system architecture. For customer servicing this would include all the services and applications that make up the capability - including systems that we integrate with.
Until capability consistency is achieved views from other services will not be consistent with the update. If the user is traversing a multi-step process that involves multiple back end services then capability consistency becomes critical. A user should not traverse to a new step until the service is ready to handle that step.
All systems within the enterprise are consistent with the client request. All stores, warehouses, reports etc reflect the client request.
Each consistency level or scope should be given a service level (SLA) for consistency. Transaction consistency is typically in the small number of milliseconds, while enterprise consistency is likely to be in hours/days.
The definitions above only apply to a single request. The chart above shows how different levels of consistency are met. Each level builds upon the other as the consistency scope increases.
In reality most systems are continually servicing requests from client applications. A large number of those requests change the state of the systems. This means that in any system where cause and effect are separated in time we can never achieve full consistency at all levels.
This means that we need to get better at handling inconsistency and conflicting data states/views.
Our systems and applications are in a continuously inconsistent/conflicted state how do we manage them? By decomposing consistency problems into levels we can now have a framework for measurement, analysis and assessment.
In my next post I plan on exploring consistency measurement at scale before exploring how to compensate for failures.