Retry Failed Activities
Activities sometimes fail for ephemeral reasons, such as a temporary loss of connectivity. At another time, the activity might succeed, so the appropriate way to handle activity failure is often to retry the activity, perhaps multiple times.
There are a variety of strategies for retrying activities; the best one depends on the details of your workflow. The strategies fall into three basic categories:
-
The retry-until-success strategy simply keeps retrying the activity until it completes.
-
The exponential retry strategy increases the time interval between retry attempts exponentially until the activity completes or the process reaches a specified stopping point, such as a maximum number of attempts.
-
The custom retry strategy decides whether or how to retry the activity after each failed attempt.
The following sections describe how to implement these strategies. The example workflow workers all use a
single activity, unreliableActivity, which randomly does one of following:
-
Completes immediately
-
Fails intentionally by exceeding the timeout value
-
Fails intentionally by throwing
IllegalStateException
Retry-Until-Success Strategy
The simplest retry strategy is to keep retrying the activity each time it fails until it eventually succeeds. The basic pattern is:
-
Implement a nested
TryCatchorTryCatchFinallyclass in your workflow's entry point method. -
Execute the activity in
doTry -
If the activity fails, the framework calls
doCatch, which runs the entry point method again. -
Repeat Steps 2 - 3 until the activity completes successfully.
The following workflow implements the retry-until-success strategy. The workflow interface is implemented in
RetryActivityRecipeWorkflow and has one method, runUnreliableActivityTillSuccess,
which is the workflow's entry point. The workflow worker is implemented in
RetryActivityRecipeWorkflowImpl, as follows:
public class RetryActivityRecipeWorkflowImpl implements RetryActivityRecipeWorkflow { @Override public void runUnreliableActivityTillSuccess() { final Settable<Boolean> retryActivity = new Settable<Boolean>(); new TryCatch() { @Override protected void doTry() throws Throwable { Promise<Void> activityRanSuccessfully = client.unreliableActivity(); setRetryActivityToFalse(activityRanSuccessfully, retryActivity); } @Override protected void doCatch(Throwable e) throws Throwable { retryActivity.set(true); } }; restartRunUnreliableActivityTillSuccess(retryActivity); } @Asynchronous private void setRetryActivityToFalse( Promise<Void> activityRanSuccessfully, @NoWait Settable<Boolean> retryActivity) { retryActivity.set(false); } @Asynchronous private void restartRunUnreliableActivityTillSuccess( Settable<Boolean> retryActivity) { if (retryActivity.get()) { runUnreliableActivityTillSuccess(); } } }
The workflow works as follows:
-
runUnreliableActivityTillSuccesscreates aSettable<Boolean>object namedretryActivitywhich is used to indicate whether the activity failed and should be retried.Settable<T>is derived fromPromise<T>and works much the same way, but you set aSettable<T>object's value manually. -
runUnreliableActivityTillSuccessimplements an anonymous nestedTryCatchclass to handle any exceptions that are thrown by theunreliableActivityactivity. For more discussion of how to handle exceptions thrown by asynchronous code, see Error Handling. -
doTryexecutes theunreliableActivityactivity, which returns aPromise<Void>object namedactivityRanSuccessfully. -
doTrycalls the asynchronoussetRetryActivityToFalsemethod, which has two parameters:-
activityRanSuccessfullytakes thePromise<Void>object returned by theunreliableActivityactivity. -
retryActivitytakes theretryActivityobject.
If
unreliableActivitycompletes,activityRanSuccessfullybecomes ready andsetRetryActivityToFalsesetsretryActivityto false. Otherwise,activityRanSuccessfullynever becomes ready andsetRetryActivityToFalsedoesn't execute. -
-
If
unreliableActivitythrows an exception, the framework callsdoCatchand passes it the exception object.doCatchsetsretryActivityto true. -
runUnreliableActivityTillSuccesscalls the asynchronousrestartRunUnreliableActivityTillSuccessmethod and passes it theretryActivityobject. BecauseretryActivityis aPromise<T>type,restartRunUnreliableActivityTillSuccessdefers execution untilretryActivityis ready, which occurs afterTryCatchcompletes. -
When
retryActivityis ready,restartRunUnreliableActivityTillSuccessextracts the value.-
If the value is
false, the retry succeeded.restartRunUnreliableActivityTillSuccessdoesn'thing and the retry sequence terminates. -
If the value is true, the retry failed.
restartRunUnreliableActivityTillSuccesscallsrunUnreliableActivityTillSuccessto execute the activity again.
-
-
Steps 1 - 7 repeat until
unreliableActivitycompletes.
Note
doCatch doesn't handle the exception; it simply sets the retryActivity object
to true to indicate that the activity failed. The retry is handled by the asynchronous
restartRunUnreliableActivityTillSuccess method, which defers execution until
TryCatch completes. The reason for this approach is that, if you retry an activity in
doCatch, you can't cancel it. Retrying the activity in
restartRunUnreliableActivityTillSuccess allows you to execute cancellable activities.
Exponential Retry Strategy
With the exponential retry strategy, the framework executes a failed activity again after a specified period of time, N seconds. If that attempt fails the framework executes the activity again after 2N seconds, and then 4N seconds and so on. Because the wait time can get quite large, you typically stop the retry attempts at some point rather than continuing indefinitely.
The framework provides three ways to implement an exponential retry strategy:
-
The
@ExponentialRetryannotation is the simplest approach, but you must set the retry configuration options at compile time. -
The
RetryDecoratorclass allows you to set retry configuration at run time and change it as needed. -
The
AsyncRetryingExecutorclass allows you to set retry configuration at run time and change it as needed. In addition, the framework calls a user-implementedAsyncRunnable.runmethod to run each retry attempt.
All approaches support the following configuration options, where time values are in seconds:
-
The initial retry wait time.
-
The back-off coefficient, which is used to compute the retry intervals, as follows:
retryInterval = initialRetryIntervalSeconds * Math.pow(backoffCoefficient, numberOfTries - 2)The default value is 2.0.
-
The maximum number of retry attempts. The default value is unlimited.
-
The maximum retry interval. The default value is unlimited.
-
The expiration time. Retry attempts stop when the total duration of the process exceeds this value. The default value is unlimited.
-
The exceptions that will trigger the retry process. By default, every exception triggers the retry process.
-
The exceptions that will not trigger a retry attempt. By default, no exceptions are excluded.
The following sections describe the various ways that you can implement an exponential retry strategy.
Exponential Retry with @ExponentialRetry
The simplest way to implement an exponential retry strategy for an activity is to apply an
@ExponentialRetry annotation to the activity in the interface definition. If the activity fails,
the framework handles the retry process automatically, based on the specified option values. The basic pattern
is:
-
Apply
@ExponentialRetryto the appropriate activities and specify the retry configuration. -
If an annotated activity fails, the framework automatically retries the activity according to the configuration specified by the annotation's arguments.
The ExponentialRetryAnnotationWorkflow workflow worker implements the exponential retry
strategy by using an @ExponentialRetry annotation. It uses an unreliableActivity
activity whose interface definition is implemented in ExponentialRetryAnnotationActivities, as
follows:
@Activities(version = "1.0") @ActivityRegistrationOptions( defaultTaskScheduleToStartTimeoutSeconds = 30, defaultTaskStartToCloseTimeoutSeconds = 30) public interface ExponentialRetryAnnotationActivities { @ExponentialRetry( initialRetryIntervalSeconds = 5, maximumAttempts = 5, exceptionsToRetry = IllegalStateException.class) public void unreliableActivity(); }
The @ExponentialRetry options specify the following strategy:
-
Retry only if the activity throws
IllegalStateException. -
Use an initial wait time of 5 seconds.
-
No more than 5 retry attempts.
The workflow interface is implemented in RetryWorkflow and has one method,
process, which is the workflow's entry point. The workflow worker is implemented in
ExponentialRetryAnnotationWorkflowImpl, as follows:
public class ExponentialRetryAnnotationWorkflowImpl implements RetryWorkflow { public void process() { handleUnreliableActivity(); } public void handleUnreliableActivity() { client.unreliableActivity(); } }
The workflow works as follows:
-
processruns the synchronoushandleUnreliableActivitymethod. -
handleUnreliableActivityexecutes theunreliableActivityactivity.
If the activity fails by throwing IllegalStateException, the framework automatically runs the
retry strategy specified in ExponentialRetryAnnotationActivities.
Exponential Retry with the RetryDecorator Class
@ExponentialRetry is simple to use. However, the configuration is static and set at compile
time, so the framework uses the same retry strategy every time the activity fails. You can implement a more
flexible exponential retry strategy by using the RetryDecorator class, which allows you to
specify the configuration at run time and change it as needed. The basic pattern is:
-
Create and configure an
ExponentialRetryPolicyobject that specifies the retry configuration. -
Create a
RetryDecoratorobject and pass theExponentialRetryPolicyobject from Step 1 to the constructor. -
Apply the decorator object to the activity by passing the activity client's class name to the
RetryDecoratorobject's decorate method. -
Execute the activity.
If the activity fails, the framework retries the activity according to the
ExponentialRetryPolicy object's configuration. You can change the retry configuration as needed
by modifying this object.
Note
The @ExponentialRetry annotation and the RetryDecorator class are mutually
exclusive. You can't use RetryDecorator to dynamically override a retry policy specified by an
@ExponentialRetry annotation.
The following workflow implementation shows how to use the RetryDecorator class to implement
an exponential retry strategy. It uses an unreliableActivity activity that doesn't have an
@ExponentialRetry annotation. The workflow interface is implemented in RetryWorkflow
and has one method, process, which is the workflow's entry point. The workflow worker is
implemented in DecoratorRetryWorkflowImpl, as follows:
public class DecoratorRetryWorkflowImpl implements RetryWorkflow { ... public void process() { long initialRetryIntervalSeconds = 5; int maximumAttempts = 5; ExponentialRetryPolicy retryPolicy = new ExponentialRetryPolicy( initialRetryIntervalSeconds).withMaximumAttempts(maximumAttempts); Decorator retryDecorator = new RetryDecorator(retryPolicy); client = retryDecorator.decorate(RetryActivitiesClient.class, client); handleUnreliableActivity(); } public void handleUnreliableActivity() { client.unreliableActivity(); } }
The workflow works as follows:
-
processcreates and configures anExponentialRetryPolicyobject by:-
Passing the initial retry interval to the constructor.
-
Calling the object's
withMaximumAttemptsmethod to set the maximum number of attempts to 5.ExponentialRetryPolicyexposes otherwithobjects that you can use to specify other configuration options.
-
-
processcreates aRetryDecoratorobject namedretryDecoratorand passes theExponentialRetryPolicyobject from Step 1 to the constructor. -
processapplies the decorator to the activity by calling theretryDecorator.decoratemethod and passing it the activity client's class name. -
handleUnreliableActivityexecutes the activity.
If the activity fails, the framework retries it according to the configuration specified in Step 1.
Note
Several of the ExponentialRetryPolicy class's with methods have a
corresponding set method that you can call to modify the corresponding configuration option
at any time: setBackoffCoefficient, setMaximumAttempts,
setMaximumRetryIntervalSeconds, and setMaximumRetryExpirationIntervalSeconds.
Exponential Retry with the AsyncRetryingExecutor Class
The RetryDecorator class provides more flexibility in configuring the retry process than
@ExponentialRetry, but the framework still runs the retry attempts automatically, based on the
ExponentialRetryPolicy object's current configuration. A more flexible approach is to use the
AsyncRetryingExecutor class. In addition to allowing you to configure the retry process at run
time, the framework calls a user-implemented AsyncRunnable.run method to run each retry attempt
instead of simply executing the activity.
The basic pattern is:
-
Create and configure an
ExponentialRetryPolicyobject to specify the retry configuration. -
Create an
AsyncRetryingExecutorobject, and pass it theExponentialRetryPolicyobject and an instance of the workflow clock. -
Implement an anonymous nested
TryCatchorTryCatchFinallyclass. -
Implement an anonymous
AsyncRunnableclass and override therunmethod to implement custom code for running the activity. -
Override
doTryto call theAsyncRetryingExecutorobject'sexecutemethod and pass it theAsyncRunnableclass from Step 4. TheAsyncRetryingExecutorobject callsAsyncRunnable.runto run the activity. -
If the activity fails, the
AsyncRetryingExecutorobject calls theAsyncRunnable.runmethod again, according to the retry policy specified in Step 1.
The following workflow shows how to use the AsyncRetryingExecutor class to implement an
exponential retry strategy. It uses the same unreliableActivity activity as the
DecoratorRetryWorkflow workflow discussed earlier. The workflow interface is implemented in
RetryWorkflow and has one method, process, which is the workflow's entry point. The
workflow worker is implemented in AsyncExecutorRetryWorkflowImpl, as follows:
public class AsyncExecutorRetryWorkflowImpl implements RetryWorkflow { private final RetryActivitiesClient client = new RetryActivitiesClientImpl(); private final DecisionContextProvider contextProvider = new DecisionContextProviderImpl(); private final WorkflowClock clock = contextProvider.getDecisionContext().getWorkflowClock(); public void process() { long initialRetryIntervalSeconds = 5; int maximumAttempts = 5; handleUnreliableActivity(initialRetryIntervalSeconds, maximumAttempts); } public void handleUnreliableActivity(long initialRetryIntervalSeconds, int maximumAttempts) { ExponentialRetryPolicy retryPolicy = new ExponentialRetryPolicy(initialRetryIntervalSeconds).withMaximumAttempts(maximumAttempts); final AsyncExecutor executor = new AsyncRetryingExecutor(retryPolicy, clock); new TryCatch() { @Override protected void doTry() throws Throwable { executor.execute(new AsyncRunnable() { @Override public void run() throws Throwable { client.unreliableActivity(); } }); } @Override protected void doCatch(Throwable e) throws Throwable { } }; } }
The workflow works as follows:
-
processcalls thehandleUnreliableActivitymethod and passes it the configuration settings. -
handleUnreliableActivityuses the configuration settings from Step 1 to create anExponentialRetryPolicyobject,retryPolicy. -
handleUnreliableActivitycreates anAsyncRetryExecutorobject,executor, and passes theExponentialRetryPolicyobject from Step 2 and an instance of the workflow clock to the constructor -
handleUnreliableActivityimplements an anonymous nestedTryCatchclass and overrides thedoTryanddoCatchmethods to run the retry attempts and handle any exceptions. -
doTrycreates an anonymousAsyncRunnableclass and overrides therunmethod to implement custom code to executeunreliableActivity. For simplicity,runjust executes the activity, but you can implement more sophisticated approaches as appropriate. -
doTrycallsexecutor.executeand passes it theAsyncRunnableobject.executecalls theAsyncRunnableobject'srunmethod to run the activity. -
If the activity fails, executor calls
runagain, according to theretryPolicyobject configuration.
For more discussion of how to use the TryCatch class to handle errors, see AWS Flow Framework for Java Exceptions.
Custom Retry Strategy
The most flexible approach to retrying failed activities is a custom strategy, which recursively calls an asynchronous method that runs the retry attempt, much like the retry-until-success strategy. However, instead of simply running the activity again, you implement custom logic that decides whether and how to run each successive retry attempt. The basic pattern is:
-
Create a
Settable<T>status object, which is used to indicate whether the activity failed. -
Implement a nested
TryCatchorTryCatchFinallyclass. -
doTryexecutes the activity. -
If the activity fails,
doCatchsets the status object to indicate that the activity failed. -
Call an asynchronous failure handling method and pass it the status object. The method defers execution until
TryCatchorTryCatchFinallycompletes. -
The failure handling method decides whether to retry the activity, and if so, when.
The following workflow shows how to implement a custom retry strategy. It uses the same
unreliableActivity activity as the DecoratorRetryWorkflow and
AsyncExecutorRetryWorkflow workflows. The workflow interface is implemented in
RetryWorkflow and has one method, process, which is the workflow's entry point. The
workflow worker is implemented in CustomLogicRetryWorkflowImpl, as follows:
public class CustomLogicRetryWorkflowImpl implements RetryWorkflow { ... public void process() { callActivityWithRetry(); } @Asynchronous public void callActivityWithRetry() { final Settable<Throwable> failure = new Settable<Throwable>(); new TryCatchFinally() { protected void doTry() throws Throwable { client.unreliableActivity(); } protected void doCatch(Throwable e) { failure.set(e); } protected void doFinally() throws Throwable { if (!failure.isReady()) { failure.set(null); } } }; retryOnFailure(failure); } @Asynchronous private void retryOnFailure(Promise<Throwable> failureP) { Throwable failure = failureP.get(); if (failure != null && shouldRetry(failure)) { callActivityWithRetry(); } } protected Boolean shouldRetry(Throwable e) { //custom logic to decide to retry the activity or not return true; } }
The workflow works as follows:
-
processcalls the asynchronouscallActivityWithRetrymethod. -
callActivityWithRetrycreates aSettable<Throwable>object named failure which is used to indicate whether the activity has failed.Settable<T>is derived fromPromise<T>and works much the same way, but you set aSettable<T>object's value manually. -
callActivityWithRetryimplements an anonymous nestedTryCatchFinallyclass to handle any exceptions that are thrown byunreliableActivity. For more discussion of how to handle exceptions thrown by asynchronous code, see AWS Flow Framework for Java Exceptions. -
doTryexecutesunreliableActivity. -
If
unreliableActivitythrows an exception, the framework callsdoCatchand passes it the exception object.doCatchsetsfailureto the exception object, which indicates that the activity failed and puts the object in a ready state. -
doFinallychecks whetherfailureis ready, which will be true only iffailurewas set bydoCatch.-
If
failureis ready,doFinallydoes nothing. -
If
failureisn't ready, the activity completed anddoFinallysets failure tonull.
-
-
callActivityWithRetrycalls the asynchronousretryOnFailuremethod and passes it failure. Because failure is aSettable<T>type,callActivityWithRetrydefers execution until failure is ready, which occurs afterTryCatchFinallycompletes. -
retryOnFailuregets the value from failure.-
If failure is set to null, the retry attempt was successful.
retryOnFailuredoes nothing, which terminates the retry process. -
If failure is set to an exception object and
shouldRetryreturns true,retryOnFailurecallscallActivityWithRetryto retry the activity.shouldRetryimplements custom logic to decide whether to retry a failed activity. For simplicity,shouldRetryalways returnstrueandretryOnFailureexecutes the activity immediately, but you can implement more sophisticated logic as needed.
-
-
Steps 2–8 repeat until
unreliableActivitycompletes orshouldRetrydecides to stop the process.
Note
doCatch doesn't handle the retry process; it simply sets failure to indicate that the
activity failed. The retry process is handled by the asynchronous retryOnFailure method, which
defers execution until TryCatch completes. The reason for this approach is that, if you retry an
activity in doCatch, you can't cancel it. Retrying the activity in retryOnFailure
allows you to execute cancellable activities.