A requestor is requesting a service from a provider. The requested operation may fail.
How can a requestor cope with a failing request to a provider?
- The communication channels used by loosely coupled, distributed systems usually do not provide transactional semantics at the transport level. Instead, participants can send and receive message, often reliably.
- Due to many different error scenarios in distributed systems, including communication errors and business-level errors, error handling can be complicated and difficult to implement correctly.
- Detecting errors with certainty may be difficult. For example “time out” is only an approximation for detecting errors and may cause false (error) positives, i.e. the consumer assumes an error that wasn't one from the viewpoint of the provider.
- While some operations are inherently reversible, e.g. debiting a bank account after a credit, other operations, such as shipping a package or scrapping a car cannot easily be undone. As a result, the requestor might not be able to do much to rectify the situation.
- Even with error handling, 100% reliability in distributed interactions can rarely be achieved, so accepting failure is reality.
- Not dealing with failure can increase throughput as no message exchanges are needed for error detection (e.g. confirmations) or error handling. Ironically, not dealing with failure can also make the interaction more robust because it remains simpler. Complex error handling can introduce errors into the conversation, such as infinite loops or unreachable states.
The requestor ignores the failure and continues as if nothing happened.
The Ignore Error conversation involves the following participants:
- The Requestor executes an operation on a service providers and receives a response regarding the status.
- The Provider receives operation requests from the Requestorand execute the associated actions. It reports back on the status of the operation.
Ignore Error is also known as Write-off.
Conversations using Ignore Error are very optimistic because they consider only the "happy path", i.e. the flow without errors. This approach is useful if errors are rare or of small value. Ignore Error is not useful for large financial transactions.
While Ignore Error may not appear to be an acceptable solution to many engineers who are trained in precision and Boolean algebra, this approach can be the most economical in a real business context. Building a system to reconcile errors situations automatically may be more expensive than the losses incurred from ignoring errors. Resolving errors manually is costly.
Ignoring a small errors can be lucrative if they are part of a larger transaction. Holding up a large transaction because of an individual, small error may cost more than ignoring the small error.
Calculating the cost of not handling errors must consider the total cost, for example a loss in customer satisfaction due to poor or inconsistent service.
Even though Ignore Error does not affect the conversation between Requestor and Provider, it does not mean that the consumer has nothing to do. The service consumer may still have to detect the error condition and choose a different path of execution internally, e.g. to not make subsequent requests. Also, the requestor may have to de-allocate resources related to the operation once it detects the failure.
Example: Coffee Shop
A modern coffee shop may have already started preparing a coffee for a customer when it turns out that the customer has insufficient funds. Simply discarding the coffee and processing other customers’ orders is generally more efficient than trying to obtain the funds because the loss of one beverage is small and the opportunity cost is high [Coffee Shop].
Example: Data Processing
Data processing often involves scanning large data sets, e.g. from log files. Invariably individual records may be incorrectly formatted or have other errors. On statistical operations it is often better to simply ignore such errors than aborting the whole calculation, which might have already consumed significant resources.
Example: Counter Example: German Supermarket
I recently lined up in a German super market when an invalid item rang up three customers ahead of me. The checkout lady placed the item in front of her and stopped working. She did not even ring up the customer's remaining items. She called for a price check while all remaining customers waited. 2 minutes later someone appeared to pick up the beverage and another minute later he re-appears with a price of around 1 Euro. The checkout lady proceeded to rung up the rest of the items just to find out the customer is not able to pay. So she ended up delaying the complete process for 3 minutes while the error handling produced zero business value - not items was sold. She should have simply ignored the error, i.e. giving the item out for free or should have handled the erroneous item in a separate transaction.