Troubleshooting
Known issue resolution provides instructions to mitigate known errors. If these instructions don’t address your issue, see the Contact AWS Support section for instructions on opening an AWS Support case for this solution.
Known issue resolution
During the deployment of Workload Discovery on AWS and in the post-deployment phase, several common configuration errors can occur:
Note
To help make it easier to troubleshoot, we recommend disabling the rollback on failure feature in the AWS CloudFormation template. You can also find additional troubleshooting help in the Workload Discovery on AWS post-deployment configuration documentation
Config Delivery Channel Error
Issue: The following error occurs when deploying the main AWS CloudFormation template:
Failed to put delivery channel '<stack-name>-DiscoveryImport-<ID-string>-DeliveryChannel-<ID-string>' because the maximum number of delivery channels: 1 is reached. (Service: AmazonConfig; Status Code: 400; Error Code: MaxNumberOfDeliveryChannelsExceededException; Request ID: 4edc54bc-8c85-4925-b99d-7ef9c73215b3; Proxy: null)
Reason: Solution is being deployed to a region that already has AWS Config enabled.
Resolution: Follow the instructions in the pre-requisites section and deploy the solution with the CloudFormation parameter AlreadyHaveConfigSetup set to Yes.
Search Resolver Stack Deployment Times Out When Deploying To Existing VPC
Issue: Nested stack that provisions a custom resource to create an index in the OpenSearch cluster times out with the following error:
Embedded stack arn:aws:cloudformation:<region>::stack/<stack-name>-SearchResolversStack-<ID-string>/<guid> was not successfullycreated: Stack creation time exceeded the specified timeout
Reason: The private subnets provided as CloudFormation parameters do not have the ability to route to S3 (custom resources must write the result of their execution to an S3 bucket using a presigned URL). There are generally two reasons for this:
-
The private subnets do not have NAT gateways associated with them so there is no access to the internet.
-
The private subnet is using VPC endpoints instead of a NAT gateway and the S3 gateway endpoint is not configured correctly.
Resolution:
-
Provision NAT gateways in the VPC to allow tasks running in private subnets to access the internet, either using CloudFormation or the AWS CLI as per the documentation.
-
Ensure that the route tables for the subnets have been updated for the S3 VPC endpoint as per the documentation.
Resources Not Discovered After Account Has Been Imported
Issue: Accounts have been imported through the Web UI but no resources appear to be discovered after the discovery process has run.
Global resources template not deployed
Reason: When the CrossAccountDiscovery CloudFormation parameter is set to SELF_MANAGED, the global resources CloudFormation template has not been deployed.
Resolution: Deploy the global resources template in the required accounts, as per the documentation.
StackSet deployment error
Reason: When the CrossAccountDiscovery CloudFormation parameter is set to AWS_ORGANIZATIONS: one or more accounts is not
discovered and the Role Status column has Not Deployed entries. This means there has been a problem with the
automated deployment of the global resources template using StackSets.
Resolution: Go to the WdGlobalResources StackSet in the region that Workload Discovery has been deployed to and check the errors in the stack instances that have failed to deploy:
-
Sign in to the AWS CloudFormation console
. -
From the navigation menu, select StackSets.
-
Select the Service-managed tab.
-
In the search bar, search for
WdGlobalResources. -
Choose
WdGlobalResourcesfrom the search results. -
Select the Stack Instances tab.
-
Inspect the Detailed status column for any errors.
Discovery ECS task out of memory
Reason: The discovery process ECS task is running out of memory. This can happen when importing a large number of accounts or resources. The Last Discovered column in the UI will display Not Discovered or have a value larger than the one specified in the DiscoveryTaskFrequency CloudFormation parameter (the default value is 15 minutes). There will be an out of memory error in the ECS console. To verify, follow these steps:
-
Sign in to the Amazon Elastic Container Service console
. -
Select the cluster named workload-discovery-cluster.
-
Choose the Tasks tab.
-
Select the Stopped button in the Desired task status panel.
-
In the Last Status column check for the error message
OutOfMemoryError: Container killed due to memory usage.
Resolution: Update the Memory CloudFormation parameter to a larger value: start with double and keep increasing until the error stops.
Note
Only certain combination of CPU units and memory values are valid so you may have to update the CpuUnits CloudFormation parameter as well. The full list of combinations is listed in the ECS documentation.
Gremlin Lambda times out when connecting to AWS Neptune
Issue: GraphQL queries backed by the <stack-name>-GremlinResol-GremlinAppSyncFunction-<ID-string> lambda function timeout when attempting to connect to the AWS Neptune database.
Reason: The VPC that the database is running has a custom DNS configuration.
Resolution: Update the security group associated with the <stack-name>-GremlinResol-GremlinAppSyncFunction-<ID-string> lambda function to open port 53 for the UDP protocol.
Unable to access Elastic Container Registry
Issue: When the scheduled Amazon ECS task on Fargate is launched, the task fails with the following error:
ResourceInitializationError: unable to pull secrets or registry auth: execution resource retrieval failed: unable to retrieve ecr registry auth
Reason: The ECS task is is running in a VPC that does not have a route to the to the ECR API endpoint.
Resolution: Add a VPC endpoint for the com.amazonaws.<region>.ecr.api route as per the ECR documentation.
Unable to pull container from Elastic Container Registry
Issue: When the scheduled Amazon ECS task on Fargate is launched, the task fails with the following error:
CannotPullContainerFromRegistry: There is a connection issue between the task and Amazon ECR. Check your task network configuration
Reason: The ECS task is is running in a VPC that does not have a route to the ECR Docker endpoint.
Resolution: Add a VPC endpoint for the `com.amazonaws.<region>.ecr.dkr route as per the ECR documentation.
Only Non-AWS Config Resources Are Being Discovered In Specific Accounts
Issue: The only resource types that the solution discovers are the ones listed in the table on the Supported resources section.
Regional resources template not deployed
Reason: When the CrossAccountDiscovery CloudFormation parameter is set to SELF_MANAGED, the regional resources CloudFormation template has not been deployed in the regions of each account to be discovered.
Resolution: Deploy the regional resources templates in the required accounts, as per the documentation.
Regional resources template deployed incorrectly
Reason: When the CrossAccountDiscovery CloudFormation parameter is set to SELF_MANAGED, the regional resources CloudFormation template has been deployed in the regions of a number of accounts that did not have Config enabled but the CloudFormation parameter AlreadyHaveConfigSetup was erroneously set to Yes.
Resolution:
Delete the previous deployed regional resources stack (AWS Config will be in an inconsistent state otherwise) and re-deploy with the CloudFormation parameter AlreadyHaveConfigSetup set to No.
Config not enabled in required regions
Reason: When the CrossAccountDiscovery CloudFormation parameter is set to AWS_ORGANIZATIONS, AWS Config is not enabled in the regions of each account to be discovered. In AWS_ORGANIZATIONS mode, you are responsible for enabling Config as per your organization’s policies.
Resolution: Enable AWS Config in the regions of each account to be discovered.