View a markdown version of this page

Resource scanning - AWS Prescriptive Guidance

Resource scanning

The resource scanning component is the foundation of the Infrastructure Documentation Generator. Its job is to discover and inventory AWS resources across AWS Regions in a consistent and structured way. This process is critical because all subsequent phases (documentation generation and dependency mapping) rely on the data collected here.

Key responsibilities of the resource scanning component

This section describes the key actions and responsibilities of the resource scanning component.

Enumerate AWS resources

The resource scanner begins by identifying all AWS resources across AWS services. For each service, it performs targeted API calls through AWS SDK for Python (Boto3) to retrieve detailed configurations. Many AWS services consist of multiple sub-resources, such as Amazon Elastic Compute Cloud (Amazon EC2) instances, volumes, and security groups. The scanner systematically captures them all to ensure complete visibility of the cloud environment.

The scanner uses a declarative configuration approach through the SERVICE_SCAN_FUNCTIONS dictionary, which maps each AWS service to its specific API calls and response structures.

Optimize parallel execution

To maximize efficiency when scanning large AWS environments, the system implements parallel processing across multiple AWS Regions and services simultaneously. Using a thread pool architecture, the scanner distributes API calls across concurrent workers, dramatically reducing total execution time.

The core of this parallel execution strategy is implemented in the scan_resources function, which orchestrates the entire scanning process using the Python concurrent.futures framework.

Extract resource-based policies

Certain AWS services expose resource-based policies that define how external entities can interact with them. Examples of such services are Amazon Simple Storage Service (Amazon S3), AWS Lambda, Amazon Simple Notification Service (Amazon SNS), and Amazon Simple Queue Service (Amazon SQS). The scanner retrieves these policies directly from the resources and attaches them to the discovered metadata. This provides valuable context for understanding not only what resources exist but also how they are secured and who can access them.

The add_lambda_resource_policies function retrieves and attaches resource-based policies to Lambda functions, enriching the resource data.

Normalize and filter fields

AWS API responses can be verbose, containing extensive metadata that isn't always useful for documentation. To address this, the scanner filters responses down to a consistent set of essential fields, such as Name, Arn, and Id. This normalization ensures the results are lightweight, uniform, and easier to consume for both human readers and downstream processes like dependency mapping or documentation generation.

The filter_resource_fields function implements the tool's field filtering strategy, defining essential fields in RESOURCE_ESSENTIAL_FIELDS for each resource type. The function applies these filters to create consistent, streamlined resource representations:

Return results in JSON format

All scanned data is returned in a structured JSON format. The output includes details of AWS services and Regions scanned, lists of discovered resources with metadata, aggregated resource counts, and any attached policies. This JSON representation serves as a standardized contract between the scanning component and other parts of the system, supporting interoperability and easy integration with visualization, analysis, or reporting layers.

UI integration and delivery

The scanned results are displayed in the UI with expandable sections, grouped by service and Region. Users can drill down into specific AWS services and explore discovered resources interactively before moving to further phases.

Workflow of the resource scanning component

The resource scanning component uses the following workflow:

  1. Start scan – The process begins when the client (through an API or UI) initiates a scan request. The client provides AWS credentials or specifies a target account and IAM role to assume. This input allows the system to securely access the required AWS environment.

  2. Create session – The system establishes an SDK for Python (Boto3) session with the provided credentials. If scanning across multiple accounts, AWS Security Token Service (AWS STS) is used to assume the specified IAM role in the target AWS account. All subsequent API calls are executed within this authenticated and authorized context.

  3. Discover AWS Regions and AWS services – The scanner enumerates all available Regions and builds a service catalog of API operations to be performed. Both Regional services (such as Amazon EC2 and AWS Lambda) and global services (such as IAM and Amazon Route 53) are included to ensure full coverage.

  4. Parallel execution – Using a thread pool, the scanner issues API calls across Regions and services in parallel. This design significantly reduces scan time and enables scalability across large, multi-account AWS environments.

  5. Process responses and attach policies – The raw API responses are normalized and filtered to include only essential fields (such as Name, Arn, and Id). During this stage, the scanner also retrieves resource-based policies from services that expose them (for example, Amazon S3 buckets, Lambda functions, Amazon SNS topics, and Amazon SQS queues). These policies are directly attached to the corresponding resources in the results, providing deeper insight into access configurations.

  6. Aggregate results – Results from all Regions and services are combined into a single structured JSON document. This aggregated output includes metadata about Regions and services scanned, discovered resources, resource counts, and attached policies. The JSON format makes the results easy to consume for downstream processes such as dependency mapping, documentation generation, or visualization in the UI.

Example JSON output

The following example shows typical JSON output when scanning Amazon S3 buckets and Lambda functions. This output includes the AWS account ID, scan timestamp, AWS Regions and AWS services scanned, and detailed resource information including names and ARNs. For each resource, it also indicates whether a resource-based policy is present (as shown for the S3 bucket) or absent (as shown for the Lambda function).

{ "account_id": "123456789012", "scan_time": "2025-09-03 10:20:15", "regions_scanned": ["us-east-1", "us-west-2"], "services_scanned": ["s3", "lambda"], "resources": [ { "service": "s3", "region": "us-east-1", "function": "list_buckets", "resources": [ { "Name": "my-app-bucket", "Arn": "arn:aws:s3:::my-app-bucket", "resource_based_policy": { "Version": "2012-10-17", "Statement": [...] } } ], "resource_count": 1 }, { "service": "lambda", "region": "us-west-2", "function": "list_functions", "resources": [ { "FunctionName": "process-data-fn", "Arn": "arn:aws:lambda:us-west-2:123456789012:function:process-data-fn", "resource_based_policy": null } ], "resource_count": 1 } ] }

User view of resource scanning

The scanned infrastructure is displayed in the UI as expandable sections organized by AWS service and Region. Users can drill down into each service to view discovered resources and attached resource-based policies.