Learned Skills
What Are Learned Skills?
Learned skills are structured knowledge files that the DevOps Agent generates from your Agent Space data. Each learned skill encodes a specific type of knowledge that the AWS DevOps Agent uses as it performs tasks. At launch, two learned skills are available: Agent Space Understanding and Tool Use Best Practices.
Agent Space Understanding
The Agent Space Understanding skill (understanding-agent-space) analyzes your connected cloud accounts, code repositories, and telemetry integrations to build a map of the resources and relationships in an Agent Space.
The skill produces a main SKILL.md file and a set of reference files. The main file contains a plain-language system overview with key domain concepts, the deployment environments (AWS account and region pairs, Azure subscriptions and regions, and so on), a container-level architecture diagram showing how logical services connect, the request paths that are central to your application with the components they traverse, and a mapping of code repositories to containers.
Each logical container receives a dedicated reference file describing its internal components (compute, data, messaging, network, and others) with resource types and physical identifiers such as ARNs, table names, and queue URLs. The reference file also captures observability coverage, including the alarms, dashboards, and monitors linked to each component. It also maps each component to its associated code repositories, packages, and infrastructure-as-code definitions, providing a complete traceability chain from source code to deployed resources.
Each critical request path receives a dedicated reference file describing the full end-to-end request flow at component granularity, from the entry point through each intermediate service, data store, and external dependency. The file includes a sequenced flow diagram showing the order of operations and interaction mechanisms between components, along with the responsibility of each participant. It also catalogs the observability signals relevant to the path: log group patterns for each hop, key metrics (latency, error rates, throttling, token quotas) with their alarm names and dimensions, and distributed trace spans that can be correlated across services and accounts.
Tool Use Best Practices
The Tool Use Best Practices skill analyzes past investigation tool uses to extract effective usage patterns, common failure modes, and parameter guidance. This helps the DevOps Agent avoid known pitfalls and run investigations with fewer wasted steps. The skill produces a main file and a set of per-tool reference files. The main file serves as a routing index that lists each tool with the investigation scenarios it supports and links to the corresponding reference file.
Each per-tool reference file can include up to three sections:
Best Practices — Investigation-driven techniques extracted from successful tool usage, such as CloudWatch Logs Insights query templates, environment-specific metric namespaces and dimensions, and CloudTrail event source filters. Each entry is organized around an investigation scenario and includes concrete parameter values and examples observed in past investigations.
Common Errors — Recurring failure modes and their fixes. Each entry describes a specific error condition, such as querying an inaccessible account or constructing a malformed aggregation query, and provides a corrective action so the agent can avoid or recover from the error without wasting investigation steps.
Output Management — Guidance for tool calls that tend to return large responses. Each entry describes a parameter change or processing strategy that reduces output size while preserving diagnostic value.
When live infrastructure access is available, the skill validates patterns against your environment before including them. Confirmed patterns are stated with confidence, unconfirmed patterns use cautious language, and disproved patterns are excluded. This keeps the skill aligned with the current state of your infrastructure.
Managing Learned Skills
Updates — The DevOps Agent automatically generates and updates learned skills based on activity in your Agent Space. The following describes when each skill is updated.
The DevOps Agent generates an updated Tool Use Best Practices skill every 30 investigations.
The Agent Space Understanding skill is generated by the learning agent, which runs whenever the underlying topology has been updated, but at most once every 24 hours. The underlying topology is refreshed when you add or update an agent space association, such as connecting a new AWS account or updating a CI/CD integration.
The topology does not refresh automatically on a periodic schedule. To regenerate the topology manually, choose the Regenerate button on the Topology page in the operator app, or chat with the agent on the Topology page and ask it to update the topology understanding.
You can also manually trigger an update to the Agent Space Understanding skill from the Skills page of your Agent Space's web app.
Deactivation — Learned skills are active by default. When active, the DevOps Agent loads them at the start of each DevOps Agent task. To stop a learned skill from being applied, deactivate it from the skill viewer in the operator app. Deactivating a skill does not delete it. The skill is retained and can be reactivated at any time. When a skill is deactivated, the DevOps Agent operates without that skill's knowledge.
Topology view — The Topology page in your Agent Space’s web app uses the Agent Space Understanding Skill to visually display your Agent Space environment as logical containers and components . Click any container to see its components, resource identifiers, and telemetry.