cdk-stepfunctions-patterns library is a set of AWS CDK constructs that provide resiliency patterns implementation for AWS Step Functions.
All these patterns are composable, meaning that you can combine them together to create quite complex state machines that are much easier to maintain and support than low-level JSON definitions.
- Try / Catch
- Try / Finally
- Try / Catch / Finally
- Retry with backoff and jitter
- Resilience lambda errors handling
- Validation of proper resilience lambda errors handling
Try / Catch pattern
TryTask construct adds a high level abstraction that allows you to use Try / Catch pattern with any state or sequence of states.
;;// ...new sfn.StateMachinethis, 'TryCatchStepMachine',
Try / Finally pattern
It is often useful to design state machine using Try / Finally pattern. The idea is to have a Final state that has to be executed regardless of successful or failed execution of the Try state. There may be some temporal resource you want to delete or notification to send.
Step Functions do not provide a native way to implement that pattern but it can be done using Parallel state and catch all catch specification.
TryTask construct abstracts these implementation details and allows to express the pattern directly.
;;// ...new sfn.StateMachinethis, 'TryFinallyStepMachine',
Try / Catch / Finally pattern
This is a combination of two previous patterns.
TryTask construct allows you to express rather complex
error handling logic in a very compact form.
;;// ...new sfn.StateMachinethis, 'TryCatchFinallyStepMachine',
Retry with backoff and jitter
Out of the box Step Functions retry implementation provides a way to configure backoff factor, but there is no built in way to introduce jitter. As covered in Exponential Backoff And Jitter and Wait and Retry with Jittered Back-off this retry technique can be very helpful in high-load scenarios.
RetryWithJitterTask construct provides a custom implementation of retry with backoff and
jitter that you can use directly in your state machines.
;;// ...new sfn.StateMachinethis, 'RetryWithJitterStepMachine',
Resilience lambda errors handling
ResilientLambdaTask is a drop-in replacement construct for
LambdaInvoke that adds retry for the most common
That would result in the following state definition:
Validation of proper resilience lambda errors handling
It is often a challenge to enforce consistent transient error handling across all state machines of a large application. To help with that, cdk-stepfuctions-patterns provides a CDK aspect to verify that all Lambda invocations correctly handle transient errors from AWS Lambda service.
ResilienceLambdaChecker aspect as shown below.
;;// ...// validate compliance rulesapp.node.applyAspectnew ResilienceLambdaChecker;
If there are some states in your application that do not retry transient errors or miss some recommended error codes, there will be warning during CDK synthesize stage:
PS C:\Dev\GitHub\cdk-stepfunctions-patterns> cdk synth --strict [Warning at /StepFunctionsPatterns/A] No retry for AWS Lambda transient errors defined - consider using ResilientLambdaTask construct. [Warning at /StepFunctionsPatterns/B] Missing retry for transient errors: Lambda.AWSLambdaException,Lambda.SdkClientException.