E00_Introduction_The_Agent_Trust_Problem | aws/agent-toolkit-for-aws

The Agent Trust Problem

A year of agents on AWS taught the cloud one lesson: the question was never how to give them more power. It was who they would answer to.

In February 2025, a backend engineer at a mid-sized fintech typed npx @awslabs/mcp into a terminal and got an AI coding agent wired straight into her AWS account. The agent could describe instances, list S3 buckets, call PutItem on DynamoDB, and even rotate Lambda functions — all from a one-line npm install. What it could not do was tell anyone it had done so. No CloudTrail record differentiated her agent's actions from her own IAM role's actions. No CloudWatch metric distinguished the human's describe_instances from the agent's. The developer was inside the blast radius, the security team was outside the audit trail, and the AWS Labs MCP server in the middle was, by design, a thin pipe.

A year later, that pipe is still there. It is just no longer the product. The product is the door around it.

AWS's agent-toolkit-for-aws repository, GA-tagged and Apache-2.0, is the formal answer to a question AWS Labs' MCP servers could not answer: how do you let an AI coding agent act on AWS without losing observability and without throttling the developer? The answer is not a better MCP server. There is still one managed AWS MCP Server, accessed through a pinned mcp-proxy-for-aws@1.6.3 package, fronted by a regional HTTPS endpoint at https://aws-mcp.<region>.api.aws/mcp. The answer, instead, is everything *around* that pipe: a curated library of 107 skills, three cross-host manifest shims (Claude Code, Codex, Cursor), four plugin bundles, one regional MCP endpoint, and — the part I want you to remember when the series ends — one PreToolUse hook that refuses to let a secret leak.

That sentence sounds modest. It is not. If you have shipped an AI agent into a regulated AWS environment, or if you are a platform engineer who has had to write the IAM policy that decides what your agent can and cannot touch, you already know that the hard part was never "does the agent reach the API." The hard part is the boundary. And the boundary, in this toolkit, is the hook.

I started this analysis treating the Agent Toolkit for AWS as just another MCP project — a rebranding exercise, a marketplace entry, a re-skin of what AWS Labs had already shipped. Reading the README's third paragraph changed that. The paragraph that names the three differentiators — IAM condition keys that distinguish agent actions from human actions, CloudWatch metrics and CloudTrail audit logging for every request, and skills that have been end-to-end evaluated — is not a feature list. It is a thesis. The toolkit exists *because* the AWS Labs version could not provide those three things, and the team that built it decided that the gap was large enough to justify a new distribution.

The thesis is auditable. Every claim in the README's third paragraph corresponds to a concrete artifact in the repository, and that correspondence is what makes the toolkit interesting to a senior engineer. The IAM condition key claim maps to the aws:CalledVia context key, which appears in the policy guidance for the aws-core and aws-agents plugins. The CloudWatch and CloudTrail claim maps to the metadata bus of mcp-proxy-for-aws@1.6.3, which tags every request with INSTALL_SOURCE=agent-toolkit so AWS's backend can meter it. The end-to-end evaluation claim maps to the fact that the toolkit's 107-skill library is bounded at 107 — the number is a budget, not an accident, because the team can only end-to-end-evaluate so many skills per quarter. The claim, the artifact, the budget, the boundary. You can see them line up.

This series is about the five pieces of evidence in the repository that prove that thesis. Each chapter opens with a concrete artifact (a JSON file, a Python script, a SKILL.md frontmatter) and then explains the design choice it embodies, then projects to a real-world consequence for the reader. By the end, you should be able to answer three questions with conviction: should your team adopt this toolkit, which of the four plugins to install first, and — most importantly — whether the boundary AWS has drawn is the boundary you want.

A quick roadmap. E01 maps the surface: four plugins, 107 skills, and a manifest format zoo that will matter the moment you try to write your own. The ratio of skills to plugins is roughly 27 to 1, and that ratio is the most important fact in the codebase, because it tells you what the project is actually for. E02 opens the .mcp.json files and shows that the AWS MCP Server is one service with four viewpoints, not four services. The four configurations are not redundant — they are four *postures* the agent takes, and the differences between them are the most under-read evidence in the repository. E03 is the chapter where the red thread pays off: the PreToolUse hook, the IAM condition keys, the secret resolution pattern, and why the headline security primitive is a 30-line Python script that runs before every Bash call. If you remember nothing else from the series, remember the hook. E04 closes the loop: how to read a SKILL.md, how the routing graph works, how the AWS Labs migration actually proceeds, and the decision tree for adopting the toolkit as a solo developer, a platform team, or a regulated enterprise.

One warning, before we start. The repository is small (the whole thing is 924 files, latest commit 49c4592 chore: bump proxy version (#119)), but it is dense. The interesting moves are not in the README; they are in a hooks/ directory that only one of the four plugins ships, in the validator that enforces kebab-case on every skill name, and in the SKILL.md frontmatter descriptions that read like routing tables more than documentation. If you skim, you will see a packaging job. If you read closely, you will see a governance answer.

That distinction is the whole point of the series.

A second warning, on style. I have used first-person throughout. The reason is that the toolkit makes architectural choices that I have personally got wrong in production — not in public repositories, not in a way I will name — and reading the source forced me to revise my priors. The most useful thing this series can do is trace that revision: where I started, what I read, what changed. Where you and I disagree, I will name the disagreement. Where the toolkit is wrong, I will say so. Where it is right for reasons that are not obvious, I will try to surface the reasons. Senior engineers do not want marketing copy. They want another engineer's working notes.

A third warning, on the AWS Labs question. AWS Labs MCP servers still exist. The README is explicit: "AWS Labs MCP servers, skills, and plugins will continue to work and accept contributions, and over time the best of AWS Labs will be transitioned to the Agent Toolkit for AWS." That sentence is doing two things at once. It is reassuring existing users. And it is signaling that the Agent Toolkit is the direction AWS is investing in, not the AWS Labs packages. If you are starting a new project, the question is not "should I use AWS Labs or the Agent Toolkit?" — it is "how do I migrate from AWS Labs to the Agent Toolkit when my security team asks for an audit story?" E04 answers that question directly.

Imagine you are a platform engineer at a Series B fintech. Your CTO just approved Cursor for the engineering team. The first thing you do, before you wire a single authentication token, is read the boundary. What can the agent touch? What does it log? What does it refuse? The Agent Toolkit for AWS is the document that answers those three questions in that order. By the time you finish E04, you will know which of its answers to keep, which to harden further, and which to challenge.

The series has one red thread, and I want to name it now so you can watch it as you read. The Agent Toolkit for AWS is the answer to a question AWS Labs MCP servers could not answer: how to let AI coding agents act on AWS without the security team losing observability or the developer losing control. The answer is not a better MCP server. The answer is a curated library of 107 skills, three cross-host manifest shims, one regional MCP endpoint, and one PreToolUse hook that refuses to let a secret leak. Every chapter in this series will pull on one of those four threads. By the end, you will have a working answer to the trust problem that AWS Labs could not solve — and you will know whether to adopt the answer. You will also know which of the four plugins to install first, what the manifest format zoo implies for the plugins you write yourself, why pinning the proxy version is non-negotiable, where the PreToolUse hook's blind spots are, and how the SKILL.md frontmatter descriptions form a routing graph that the team treats as a feature, not a bug.

How would you design an MCP server for an agent that already has root on your laptop?