A participant is interacting with multiple, independent services, which change their state based on the interaction. Some of the interactions may fail.
How can a requestor ensure a consistent outcome across multiple, independent providers?
- The communication channels used by loosely coupled, distributed systems usually do not provide transactional semantics at the transport level. Instead, participants can send and receive message, often reliably.
- Sending messages to multiple services does not provide transactional guarantees: one interaction may succeed while the next interaction may fail, leaving the overall system in an inconsistent state.
- A participant can interact with multiple services sequentially or in parallel. Sequential execution has the advantage that after an error all subsequent interactions can be stopped. However, sequential execution often unnecessarily increases latency when services are independent and can execute in parallel.
- Some operations are inherently reversible, e.g. adding money to an account can be "undone" by taking money off the account, at least if the account owner has not withdrawn the money in between.
The requestor performs operations optimistically. Should an operation fail, it instructs the provider(s) to undo already completed operations.
The Compensating Action conversation involves the following participants:
- The Requestor executes multiple interactions on one or more service providers. These interactions are part of an overall conversation but consist of multiple, independent messages.
- The Providers receive interactions from the Requestorand execute the associated actions. They can also receive undo messages to reverse prior actions.
Conversations using Compensating Action are optimistic because they optimize the "happy path", i.e. the flow without errors. This approach is useful if errors are rare or can be easily undone.
Not all actions can be easily compensated, e.g. a letter that is in the mailbox will be delivered. However, because real life does not provide ACID-style transactions either, business have had to come up with creative ways to compensate for a variety of actions. For example, a mailed letter to a customer can be compensated by a phone call asking the customer to ignore the letter. Similarly, many actions can be "undone" in creative ways: a customer can be asked to return a package sent in error, an overcharge can be offset with a store credit. A Compensating Action does not always leave the system in exactly the same state as before. For example, reversing a banking transaction leaves a trail of two money transfers, one in error and one to compensate for it. However, they leave the critical resource, the account balance, in the same state as before. In a business context, "compensation" is therefore often defined by the business and not a mathematical equation.
Many applications of Compensating Action span across multiple service providers (see diagram). In this case, Compensating Actions have to be sent to the provider corresponding to the original action.
Compensating Actions can fail. For example, reversing a credit to an account can fail if the account owner withdrew the funds in between. A bank will try to deal with this via collections or moving the account into minus. In the end, though banks do have to write off money due to failed Compensating Actions: using a Compensating Action does not guarantee 100% consistency.
If no Compensating Action is available for a service, the service consumer should apply PERFORM HARDEST TO REVERT ACTION LAST to minimize the chances that this action has to be reverted.
Compensating Action is not only applied in case of failure, but also if the Consumer receives an undesirable result from one of the service providers. For example, the price for a subsequent action may be too high, causing the consumer to abort the interaction and to undo all actions performed so far.
If an action allocates resources on part of the service provider, Compensating Action may lead a provider to allocate resources that are no longer available for other interactions, just to see them compensated later on. Therefore, having a provider use pessimistic resource allocation in an "optimistic" conversation puts the provider at a disadvantage. In real-life, providers typically compensate for this disadvantage by "over-committing" resources, i.e. allowing more resources to be committed than actually exist. The provider bets on receiving enough compensating actions to end up as close to 100% utilization as possible. Providers may also charge for the execution of a Compensating Action, e.g. airlines typically charge a fee for cancellations.
If multiple operations are performed in sequence on the same resource, the Compensating Action have to be performed in reverse order. This approach is referred to as "Saga", a term originating from the database world [Sagas] that is commonly applied to distributed systems. http://vasters.com/clemensv/2012/09/01/Sagas.aspx
Not always are activities performed sequentially. To make matters worse, failure may not be detected immediately after the relevant action. For example, an action may prepare a document and complete successfully, but the document may contain a mistake, such as an incorrect price in an offer letter. Leymann et al therefore extend the concept of Sagas into more general Compensation Spheres [Leymann]. Such spheres can contain any combination of activities, not just sequential ones, and can be compensated even after all activities in the sphere are completed.
Example: Travel Booking
A traveler who needs to book a trip typically interacts with an independent flight, hotel, and rental car portal. If the traveler already booked a hotel, but is unable to make a matching flight reservation, the customer will as the hotel to "compensate" the booking by canceling it. Hotels and airlines account for the probability of canceled reservations by over-provisioning, i.e. allowing the number of room or seat reservations to exceed the number of available rooms or seats. If fewer than the expected cancellations come in, hotels offer replacements and airlines offer compensation for passengers willing to give up their seat. Hotels and airlines also routinely provide incentives such as lower prices for non-cancellable reservations as it relieves them from the cost of over or under-provisioning.
Example: Coffee Shop
A modern coffee shop typically asks the customer to pay before the beverage is actually prepared. If the preparation of the beverage fails, they fill perform a Compensating Action by refunding the customer the paid amount.