Troubleshooting VPC networking
Network monitoring and troubleshooting
CloudTrail logging
All Configuration API operations and workflow runs using VPC networking are logged in CloudTrail. Use CloudTrail to audit configuration changes and track which runs use VPC networking.
Troubleshooting with ENI flow logs
When your workflow runs access external resources over the internet, you can use VPC Flow Logs to verify connectivity and diagnose issues. HealthOmics provisions elastic network interfaces (ENIs) in your VPC subnets to route traffic from your workflow tasks. By examining flow logs on these ENIs, you can trace network traffic to and from external destinations.
Cost management for VPC Flow Logs
VPC Flow Logs can incur significant costs, especially at the VPC level. To minimize costs:
Delete flow logs after troubleshooting. Once you've resolved connectivity issues, delete the flow log to stop incurring charges.
Use Amazon S3 instead of CloudWatch Logs for long-term storage. Amazon S3 storage is significantly cheaper than CloudWatch Logs. Configure flow logs to publish to Amazon S3 if you need to retain logs for compliance or security analysis.
Set CloudWatch Logs retention policies. If using CloudWatch Logs, configure automatic log expiration (for example, 7 days) to prevent indefinite storage costs.
Use ENI-level flow logs for troubleshooting. For one-time debugging, create flow logs on the specific customer ENI rather than the entire VPC.
Setting up flow logs for troubleshooting
Option 1: VPC-level flow logs (for ongoing monitoring)
Enable flow logs on your VPC to automatically capture traffic from all HealthOmics workflow runs. This is best when you have many workflow runs and want comprehensive visibility without tracking individual ENIs.
-
Enable VPC Flow Logs. In the Amazon VPC console:
Choose Your VPCs and select the VPC used in your HealthOmics configuration
Choose the Flow logs tab
Choose Create flow log
Configure the flow log to capture All traffic (both accepted and rejected)
Select CloudWatch Logs as the destination for easier querying
-
Start a workflow run. Start a workflow run with VPC networking enabled. Note the run ID and start time for filtering flow logs later.
Query flow logs using CloudWatch Logs Insights by time window, destination IP, or traffic patterns. You don't need to identify specific ENI IDs.
Option 2: ENI-level flow logs (for targeted troubleshooting)
Enable flow logs on specific ENIs when you have only a few HealthOmics ENIs in your account. This is the most cost-effective approach and makes it easy to isolate traffic for specific workflow runs.
-
Find the customer ENI. In the Amazon EC2 console:
Choose Network Interfaces
Filter by tag
Service: HealthOmicsto show only ENIs created by HealthOmicsOptionally, further filter by the subnet ID from your HealthOmics configuration
Note the ENI ID and private IP address
-
Enable flow logs on the ENI.
Select the ENI and choose the Flow logs tab
Choose Create flow log
Configure the flow log to capture All traffic
Select CloudWatch Logs as the destination
Note
Flow logs only capture traffic from the time they are enabled. For VPC-level flow logs, enable them before running workflows. For ENI-level flow logs, once enabled on an ENI, the same flow log will capture traffic for all future workflow runs that use that ENI.
Understanding VPC Flow Log format
VPC Flow Logs use a space-separated format with the following fields:
version account_id interface_id srcaddr dstaddr srcport dstport protocol packets bytes start end action log_status
Field descriptions:
version — Flow log format version (typically 2)
account_id — Your AWS account ID
interface_id — The ENI ID (for example, eni-0e57c5476efeac402)
srcaddr — Source IP address
dstaddr — Destination IP address
srcport — Source port number
dstport — Destination port number
protocol — IANA protocol number (6=TCP, 17=UDP, 1=ICMP)
packets — Number of packets in the flow
bytes — Number of bytes in the flow
start — Flow start time (Unix timestamp)
end — Flow end time (Unix timestamp)
action — ACCEPT or REJECT
log_status — OK, NODATA, or SKIPDATA
Example flow log entries:
2 074296239033 eni-0e57c5476efeac402 10.0.130.58 13.226.238.96 40565 443 6 13 1502 1774338927 1774338929 ACCEPT OK 2 074296239033 eni-0e57c5476efeac402 13.226.238.96 10.0.130.58 443 40565 6 8 1024 1774338928 1774338930 ACCEPT OK
These entries show successful bidirectional HTTPS communication. Key IPs: 10.0.130.58 is the customer ENI created by HealthOmics in your account, and 13.226.238.96 is the external public domain your workflow is accessing. The first entry is outbound traffic, and the second is the return traffic. Both show ACCEPT, indicating the traffic was allowed by security groups.
Querying flow logs in CloudWatch Logs Insights
When flow logs are published to CloudWatch Logs, use CloudWatch Logs Insights to query and analyze the data.
Find rejected traffic (start here)
fields @timestamp, interfaceId, srcAddr, dstAddr, srcPort, dstPort, protocol, action | filter action = "REJECT" | sort @timestamp desc
If this returns results, you may have a connectivity issue. The rejected entries show which traffic is being blocked by security groups or network ACLs.
Find traffic to a specific external IP
First, resolve the domain to an IP address using nslookup or dig:
$ nslookup ftp.ncbi.nlm.nih.gov Server: 127.53.53.53 Address: 127.53.53.53#53 Non-authoritative answer: ftp.ncbi.nlm.nih.gov canonical name = ftp.wip.ncbi.nlm.nih.gov. Name: ftp.wip.ncbi.nlm.nih.gov Address: 130.14.250.10 Name: ftp.wip.ncbi.nlm.nih.gov Address: 130.14.250.11
The "Server" and "Address" at the top are your DNS resolver. The addresses under "Non-authoritative answer" (130.14.250.10 and 130.14.250.11) are the actual IPs for the domain.
Query flow logs using a prefix to match any IP in that range:
fields @timestamp, interfaceId, srcAddr, dstAddr, srcPort, dstPort, protocol, action | filter dstAddr like "130.14.250" | sort @timestamp desc
This matches any IP starting with 130.14.250, capturing traffic to all IPs in that subnet.
Find HTTPS traffic to external destinations
fields @timestamp, interfaceId, srcAddr, dstAddr, srcPort, dstPort, protocol, action | filter dstPort = 443 and protocol = 6 | filter not (dstAddr like /^10\./ or dstAddr like /^172\./ or dstAddr like /^192\.168\./) | sort @timestamp desc
The second filter excludes private IP ranges, showing only traffic to external (public) destinations.
Note
Protocol numbers: 6=TCP, 17=UDP, 1=ICMP. For load-balanced services (for example, CloudFront), DNS may return different IPs, so filter by destination port instead of IP address.
Common flow log patterns and issues
- Outbound traffic rejected
-
Outbound: 2 074296239033 eni-0e57c5476efeac402 10.0.130.58 13.226.238.96 40565 443 6 1 60 1774338927 1774338929 REJECT OKCause: Security group doesn't allow outbound traffic to the destination port or IP range.
Solution: Add an outbound rule to your security group:
For HTTPS: Allow TCP port 443 to 0.0.0.0/0
For HTTP: Allow TCP port 80 to 0.0.0.0/0
For broader access: Allow all TCP/UDP to 0.0.0.0/0
- Return traffic rejected
-
Outbound: 2 074296239033 eni-0e57c5476efeac402 10.0.130.58 8.8.8.8 54321 53 17 1 64 1774338927 1774338929 ACCEPT OK Return: 2 074296239033 eni-0e57c5476efeac402 8.8.8.8 10.0.130.58 53 54321 17 1 64 1774338928 1774338930 REJECT OKCause: Network ACL is blocking return traffic. Unlike security groups (stateful), network ACLs are stateless and require explicit rules for both directions.
Solution: In the VPC console, check your subnet's network ACL and verify inbound rules allow traffic on ephemeral ports (1024-65535) from external sources. Add rule if needed: Allow TCP/UDP ports 1024-65535 from 0.0.0.0/0
- Missing return traffic
-
Outbound: 2 074296239033 eni-0e57c5476efeac402 10.0.130.58 8.8.8.8 54321 53 17 1 64 1774338927 1774338929 ACCEPT OKCause: NAT Gateway/Internet Gateway not configured properly, or ENI doesn't have connectivity to internet.
Solution:
Verify route table has route to NAT Gateway (0.0.0.0/0 → nat-xxxxx)
Verify NAT Gateway is in AVAILABLE state with an Elastic IP
Check NAT Gateway is in a public subnet with route to Internet Gateway
- No flow log entries for expected traffic
-
Cause: Traffic not reaching the ENI, or flow logs not configured correctly.
Solution:
Verify flow logs are enabled and configured to capture all traffic
Check workflow logs in CloudWatch Logs to confirm the workflow is attempting to access the external resource
Verify route table has route to NAT Gateway (0.0.0.0/0 → nat-xxxxx)
Verify NAT Gateway is in AVAILABLE state with an Elastic IP
Best practices for flow log troubleshooting
-
Enable flow logs before starting troubleshooting. Flow logs only capture traffic from the time they are enabled. Enable them on all subnets in your HealthOmics configuration before running workflows.
-
Use CloudWatch Logs Insights for analysis. CloudWatch Logs Insights provides powerful querying capabilities for flow logs. Save commonly used queries for quick access.
-
Filter by time window. Narrow your flow log queries to the specific time window when your workflow run was active to reduce noise and improve query performance.
-
Look for both directions of traffic. Always verify that both outbound and return traffic show ACCEPT. A connection requires bidirectional communication.
-
Document your findings. When troubleshooting connectivity issues, document the customer ENI ID, IP addresses, ports, and flow log entries. This information is valuable for support cases and future troubleshooting.
-
Test with a simple workflow first. Before running complex workflows, test connectivity with a simple workflow that attempts to access the external resource and logs the result. This helps isolate network issues from workflow logic issues.
Configuration troubleshooting
Configuration stuck in CREATING status
Cause: Network resource provisioning can take several minutes.
Solution: Wait up to 10 minutes. If the status doesn't change to ACTIVE, check the following:
Your subnets and security groups exist and are in the same VPC.
You have the required IAM permissions.
The service-linked role was created successfully.
Run fails to start with VPC networking
Cause: The configuration might not be ACTIVE, or there might be network connectivity issues.
Solution:
Verify that the configuration status is ACTIVE by using
GetConfiguration.Check that security group rules allow the required outbound traffic.
Ensure that subnets are in Availability Zones where HealthOmics operates.
Cannot delete configuration
Cause: The configuration is in use by active workflow runs.
Solution: Wait for all runs using the configuration to complete, then retry the deletion.
Cannot delete service-linked role
Cause: Active VPC configurations exist in your account.
Solution: Delete all VPC configurations first, then delete the service-linked role.
Workflow cannot connect to external resource
Cause: Security group or route table misconfiguration.
Solution:
Enable VPC Flow Logs to identify rejected packets
Check security group outbound rules allow traffic to the destination
Verify route table has a route to NAT Gateway (0.0.0.0/0 → nat-xxxxxx)
For cross-Region AWS service access, ensure the destination Region is reachable
Test connectivity from an Amazon EC2 instance in the same subnet
Network performance issues
Symptom: Slow data transfer or workflow timeouts.
Cause: Network throughput limitations or NAT Gateway saturation.
Solution:
Network throughput begins at 10 Gbps per ENI and scales up to 100 Gbps over a 60-minute period with sustained traffic
For workflows with immediate high-throughput requirements, please contact AWS Support
Monitor NAT Gateway metrics in CloudWatch to identify saturation
Consider deploying additional NAT Gateways in multiple Availability Zones for higher throughput
Workflow cannot reach the internet
Cause: The private subnets might not have a route to a NAT gateway, or security group rules might be blocking outbound traffic.
Solution:
Verify that the route table for your private subnets includes a route to a NAT gateway (0.0.0.0/0 → nat-xxxxxxxxx).
Check that security group rules allow outbound traffic on the required ports.
Verify that the NAT gateway is in a public subnet with a route to an internet gateway.
Workflow run fails with connectivity errors
Cause: Network traffic might be blocked or misconfigured.
Solution:
Verify that the configuration is still in ACTIVE status by using
GetConfiguration.Create a VPC flow log on the ENIs in your VPC to inspect traffic. For more information, see VPC Flow Logs in the Amazon VPC User Guide.
Check the flow log for REJECT entries. If you see rejected packets, update your security group rules to allow the required outbound traffic.
If the flow log does not reveal a root cause, contact AWS Support.