Graham Brooks - Notification Event Pattern

[ {
  "new-shopping-basket" : "http://some.domain/baskets/urk4ls16hysh"
}, {
  "item-added" : "http://some/domain/baskets/urk4ls16hysh/item/1"
}, {
  "item-added" : "http://some/domain/baskets/urk4ls16hysh/item/2"
}, {
  "item-removed" : "http://some/domain/baskets/urk4ls16hysh/item/1"
}, {
  "basket-emptied" : "http://some/domain/baskets/urk4ls16hysh"
} ]

Back-References

Notification Events tell subscribers that something has changed. Typically the event contains a reference to the thing that has changed. The back reference could 'point' to different things. The simplest is probably back-reference to latest.

Back-reference to latest

The examples above all contain references to a single version of a resource or entity - the latest version. This is perhaps the simplest implementation of this pattern.

This pattern should be used for producers that maintain current state. Consumers are limited to the current version. This pattern is partially immune to event loss. If all entities or data items are fast moving then the loss of an individual event only means a delay in the consumer becoming aware of the updates.

Figure 3. Back-reference to latest

In this example there is no indication of the change made to the order. The back-reference points to the resource. The subscriber could find that after responding to the last even that the order has been cancelled. The brevity of the notification is a key characteristic of this variation. The listening code does very little work. The component reading the order resource does the work.

Back-reference to version

In this variant the producer provides a reference to the version of the data that caused the event. http://some.domain/customer/{customer-id}/2 might refer to the second version fo the customer record. Numeric version lables might tempt consumers to inferr versions and traverse the version numbering scheme. More opaque version references such as GUIDs would avoid this problem.

Lets assume that we have some sort of fast moving data items and many distributed consumers. Notification Events are distributed to the consumers over a event bus like Kafka.

Figure 4. Multiple subsribers to versioned resource

REST style hypermedia links would still allow a consumer to traverse the list while reducing coupling between the producer and consumers.

This means that the reference provided in the event is immutable which is great for caching if there are a large number of consumers.

Immutable versions of an entity have a lot of advantages. In particular immutability means that the data can be cached. Placing the cache close to the subscriber reduces the number of expensive 'long distance' queries.

In the example below Subscribers A and B are 'close' to each other and can share a cache. The cache can be long lived (because the data is immutable) and can significantly reduce the number of queries back to the publisher.

Figure 5. Multiple subscribers to versioned resource with cache

Back-reference to difference

In this variant the event contains a reference to the change that has been made.

http://sales/order/{order-id}/2/3 refers to the changes made between versions 2 and 3 of the customer record. This pattern variant requires the producer to maintain a running set of differences generated when the event is issued or when queried. An event sourced store would make this relatively straightforward but is more complex than just maintaining the latest state.

This variant is useful if subscribers are interested in the value changes so they can infer or apply the change to their own data sets.

Topics and Queues

In both subscriber and publish-subscribe models it is often useful to segment the event feed for subscribers. The event notification pattern is particularly suited to situations where the event is ephemeral. Its significance is bounded by updates to a particular entity. If the event stream is partitioned by topic then the event value is relaive to the topic stream.

Single topic - firehose

This design implements a single topic subscription where all events are published onto the same topic. All changes in the publishing system are fed into a single topic. Subscribers consume the event stream filtering out the events they are not interested in.

Event order is inherrent in the producer. If there are competing consumers the consumer is responsible for ensurign events are processed in a reasonable order.

Topic for each entity type

This design partitions the event stream by entity. Customer changes would be published on one topic and Orders on another. For our shopping cart example there would be a single topic shopping-cart-events.

Consumers subscribe to the stream and filter out the events that they are not interested in. If new events are discovered they would be added to the topic.

Like the fire-hose design message sequencing is less of a problem. If an event feed the producer is responsible for supplying events in order. For a broker system the broker can be responsible for delivery order.If a competing consumers design is used additional configuration is required to make sure the events are processed in sequence.

Topic for each entity type change

This design partitions the event stream by the entity and the type of change.

For our ecommerce shopping basket example this would mean topics for:

new shopping basket
item added
item removed
basket emptied

If a client application is tracking shopping baskets to keep a count of the number of open baskets it would need to listen to the new shopping basket and basket emptied events.

If an application needs to maintain state of the baskets then it would need to listen to more topics. The producer and consumer have more shared knowledge. If some time in the future a new event is generated by the producer all consumers could be impacted.

Legacy Systems

The simplicity of this event pattern makes it valuable in updating legacy systems where either the source code is not available or difficult to change.

A few years ago I was working on a system refactoring project. A search function had been implemented some years before based on queries to a relational store. This system was pretty complex to accommodate the evolving user’s search requirements. The queries were becoming more and more complex and taking a long time to execute sucking up database and application resources. The implementation was becoming a serious scaling problem.

The team realised that creating and using a search index would relieve a lot of load on the database server and allow us to remove a lot of complex search query code. We decided to add an Elasticsearch cluster to the overall architecture and use this to drive user searches over the data. The index needed to be fairly up to date (near real time) so a batch approach to building the index by querying the data store infrequently would not work. We had access to the source but over time it had become particularly complex and difficult to change safely. A lot of the logic had been implemented in SQL for speed. For flexibility the stored procedures generated SQL on the fly. Finding all the places where data changed was just too much for the budget. We decided to implement database triggers attached to the tables we were interested in. These triggers would generate events and publish them onto an event bus. An elasticsearch agent picked up these events and then queried the database for the new data to update the index.

Once we had the index build working well (triggers have lots of interesting edge cases) we could re-implement the search code to call the elasticsearch index instead of the expensive SQL stored procedures. Over a few releases we were able to untangle and strangle a large area of the system while adding additional search capabilities and greater scale overall.

The notification event pattern worked very well for this first stage change to the exiting system. Because we could run the index build in parallel with the current version we could parallel run the two versions and compare results. Index updating could be tuned to provide the right level of freshness. The triggers were kept as simple as possible. Because they ran in the client transaction we had to make sure that they did not affect existing functions but also run very quickly.

We found that changes tended to cluster generating a lot of churn on individual data entities. To reduce this churn a later release moved away from the event bus. The triggers wrote to an event log table. The indexing system could then query for rows that had changed since the last index partial build. While similar to the batch approach the refresh frequency was still high and confined to the rows that had changed event if those rows had changed many times during between refreshes. This change reduced the number of data queries significantly. Index rebuilds were more controlled and overall we got better throughput of changed into the index. The search index was fresher than the rebuild on individual change. Queries could pull more data in each query which again inproved throughput.

Because the notification event contains only a few fields it is easier to apply to existing systems. This is particularly true if you don’t have access to change the source code but still need to know when someting changes.

Wrapping up

The notification pattern is a useful way of keeping a subscriber up to date with changes in the upstream system. Care should be taken as the number of subscribers increases. Each subscriber needs to be informed of changes so some form of caching or broker to spread the load is likely to be required.

At this point the back-reference to version become more useful if caching is required because the data returned from the publisher is immutable.

Overall system availability decreases with the number of subscribers (assumes that if one component is down then the system is down Availability in Series). Let assume that each component has a 99.9 availability which allows for just under 10 minutes of downtime a week.

If we have one component in our system then the overall availability is the same 10 minutes per week.

If we have one producer and one consumer, and the consumer needs the producer to be available to read from then the availability of the system becomes 99.8 (just over 20 minutes per week). At 10 subscribers 99.0 which equates to 1hour 40minutes a week.

This would be a worst case and these days much higher component availabilities are achieved but if this is one of your key architectural constraints then other event models that that carry state might be more appropriate.

Further Reading

Martin Fowler:What do you mean by “Event-Driven”?