Key Capabilities
Autonomous Incident Response
AWS DevOps Agent begins investigating as soon as an alert arrives from services like CloudWatch or from external tools such as ServiceNow or PagerDuty. It correlates metrics, logs, traces, and recent deployments from platforms including GitHub, GitLab, Datadog, Dynatrace, New Relic, and Splunk.
The agent identifies probable root causes and updates incident channels in Slack with findings, timelines, and recommendations. It can also act as a virtual incident coordinator by managing communication and stakeholder updates.
Interactive Investigations
On-call teams can use the AWS DevOps Agent web app to trigger investigations, view analysis details, examine the application topology, and ask follow-up questions in natural language. Operators can refine the agent’s analysis by providing additional context, adjusting scopes, or steering the investigation toward specific resources or logs.
Proactive Operational Improvements
Beyond resolving incidents, AWS DevOps Agent detects patterns across historical events to uncover systemic gaps. It provides targeted recommendations in areas such as observability, infrastructure configuration, capacity tuning, testing, and deployment pipeline quality. This helps teams move from reactive processes to proactive reliability engineering.
Intelligent Application Topology
The agent continuously builds and updates a topology graph that maps AWS resources and their relationships, including compute, storage, networking, and deployment histories. This topology allows the agent to understand how changes in one part of the system may influence another during investigations.
Extensible Tool Integrations
Teams can connect additional tools using the Model Context Protocol (MCP), enabling the agent to ingest data from open source platforms like Prometheus and Grafana or internal tooling. This creates a unified investigation surface across complex multicloud and hybrid environments.