Skills, Migration, and What Comes Next
The 107 SKILL.md files are not documentation. They are routing tables written in natural language, and the routing graph they form is the reason the toolkit is bounded at 107 — and not 1,070.
Key Takeaways
- Every SKILL.md has the same shape: a frontmatter
descriptionwith triggers and disambiguation clauses, a body of workflow steps, and constraints in MUST / SHOULD / MAY vocabulary. - The 107-skill cap is a budget consequence of end-to-end evaluation: the team can evaluate 107 skills, not 1,070. The routing graph is what keeps the cap low while covering the surface.
- AWS Labs packages will continue to work — the migration is a curated transition, not a hard cutover. The Agent Toolkit is the direction AWS is investing in.
- The adoption path is by persona: solo developer (install aws-core + pin proxy), platform team (audit all four plugins, write IAM condition key policy), regulated enterprise (30-day pilot, capture CloudTrail evidence, pentest the hook).
- The repository is in active release cadence — the latest commit bumps
mcp-proxy-for-aws, and the skills directory tracks new AWS services (Lambda durable functions, S3 Vectors, system-table SQL).
In early 2025, the awslabs GitHub organization was a sprawl. A developer who wanted an MCP server for AWS would search npm, find a dozen packages — @awslabs/mcp-server-aws, @awslabs/mcp-server-athena, @awslabs/mcp-server-cost-explorer — and assemble their own stack. Each package had its own README, its own version cadence, its own bug tracker, its own maintainer rotation. The skill libraries were scattered across separate repos. The install commands were inconsistent. The IAM posture of each package was the developer's responsibility. The audit story was: "we shipped a bunch of MCP servers; good luck."
The engineer who typed npx @awslabs/mcp in February and got a fully privileged session with no audit trail was standing in that sprawl. The Agent Toolkit for AWS is the answer to the sprawl, and the answer is not a single better MCP server. It is a curated, bounded, governance-aware library. The boundary is the skill library, not the plugin count. The plugin count is four. The skill count is 107, and 107 is deliberate.
I read the SKILL.md files as documentation the first time. They are not documentation. They are routing tables.
The anatomy of a SKILL.md
Open plugins/aws-data-analytics/skills/storing-and-querying-vectors/SKILL.md. The first 11 lines are YAML frontmatter; the rest is the workflow. The frontmatter is the routing. The body is the answer. They are not the same kind of artifact.
---
name: storing-and-querying-vectors
description: >-
Store and query vector embeddings using Amazon S3 Vectors, a cost-effective long-term
vector storage service with its own API namespace (s3vectors). Triggers on: create
S3 vector bucket, vector index, store embeddings, semantic search, RAG vector storage,
similarity search, vector database, migrate from other vector databases. Do NOT
use for: querying tabular data (use querying-data-lake), S3 object storage, or hundreds/thousands
of sustained QPS (use OpenSearch).
version: 1
---
The description field is the routing rule. It is one paragraph of natural language, and it is doing two things at once. It tells the agent *when* to use this skill ("Triggers on: create S3 vector bucket, vector index, store embeddings, semantic search, RAG vector storage, similarity search, vector database, migrate from other vector databases"). And it tells the agent *when not* to use this skill ("Do NOT use for: querying tabular data (use querying-data-lake), S3 object storage, or hundreds/thousands of sustained QPS (use OpenSearch)"). The disambiguation clause — "use querying-data-lake," "use OpenSearch" — is a pointer to another node in the routing graph.
Read the next 150 lines of that file and you find the workflow: a six-step procedure for creating a vector bucket, generating embeddings, putting vectors, querying. Each step has a "Constraints" subsection. Constraints begin with the words "You MUST," "You SHOULD," or "You MAY" — the RFC 2119 vocabulary. The agent's behavior is constrained in natural language, by a vocabulary the LLM is unusually good at following, before any code is generated.
That is the pattern. Every one of the 107 SKILL.md files has the same structure: frontmatter description with triggers and disambiguation, body with workflow steps, constraints in MUST/SHOULD/MAY vocabulary. The 107 files differ in domain, in depth, in the AWS services they cover. They do not differ in shape.
graph LR
S[User intent: store vectors] --> R{Which skill?}
R -- "RAG / similarity search" --> A[storing-and-querying-vectors]
R -- "hundreds/thousands QPS" --> B[amazon-opensearch-service]
R -- "tabular data" --> C[querying-data-lake]
R -- "S3 object storage" --> D[signing-in-to-aws + aws-sdk-python-usage]
A --> A1[create vector bucket]
A --> A2[create index with immutables]
A --> A3[generate embeddings]
A --> A4[put vectors batch <500]
A --> A5[query vectors]
B --> B1[provision domain]
B --> B2[hybrid / k-NN search]
C --> C1[Athena federated query]
C --> C2[Glue Data Catalog lookup]
The diagram is not in the repository. I drew it from the routing clauses in the SKILL.md frontmatter descriptions. Every arrow in the diagram is a sentence in some skill's description field. The routing graph is encoded in natural language, distributed across 107 files, and only exists in the moment the agent reads a description and decides to load or not load a skill. That is a design choice with consequences.
Why 107, not 1,070
The cost of a SKILL.md is paid in two currencies: agent context window and developer maintenance. Each skill the agent loads consumes tokens from the model's context. Each skill in the library consumes maintainer attention — a skill that drifts from the actual API is worse than no skill, because the agent will confidently generate code from it. The toolkit's 107 skills represent a balance point: broad enough to cover the surface an agent will encounter in a typical AWS project (the categories are analytics, database, serverless, networking, EC2, operations, migration, security, storage, system-table, web), narrow enough that each skill is end-to-end evaluated.
The README's third paragraph names the differentiator: "Agent skills that have undergone thorough end-to-end evaluations, so you can be confident that workflows will complete successfully." The 107-skill cap is not an accident. It is the result of a budget: the team can evaluate 107 skills end-to-end. They cannot evaluate 1,070. The number is a consequence of the evaluation budget, and the routing graph is the tool that lets the team keep the cap low while covering the surface.
The graph works because the disambiguation clauses are explicit. The agent does not have to guess whether to use storing-and-querying-vectors or amazon-opensearch-service for a vector search. The storing-and-querying-vectors description says "Do NOT use for: hundreds/thousands of sustained QPS (use OpenSearch)." The agent reads both descriptions, sees the disambiguation, picks correctly. If the disambiguation were missing, the agent would have to make a judgment call, and judgment calls in routing are where AI systems waste the most time and tokens.
This is the pattern that scales. Every new skill added to the library must include a "Triggers on" clause and a "Do NOT use for" clause. The validator at tools/validate.py does not enforce the disambiguation language (the validator checks kebab-case, name length, description length, and JSON manifest schema) — that is enforced by the team's review process, not by a tool. But the library is shaped as if it were enforced, because the disambiguation is what keeps 107 skills from collapsing into noise.
The AWS Labs migration
The README addresses the migration directly:
"In 2025, AWS began releasing MCP servers, skills, and plugins as part of AWS Labs. The Agent Toolkit for AWS is the successor to those tools. We recommend using the Agent Toolkit for AWS, because it offers key features including:
- IAM condition keys that distinguish between agent actions and human actions...
- CloudWatch metrics and CloudTrail audit logging for every request...
- Agent skills that have undergone thorough end-to-end evaluations...
AWS Labs MCP servers, skills, and plugins will continue to work and accept contributions, and over time the best of AWS Labs will be transitioned to the Agent Toolkit for AWS to ensure that customers can access the broadest array of tooling and guidance for their agents."
Three things to notice in that paragraph. First, "will continue to work" — the migration is not a hard cutover. Developers with existing AWS Labs integrations do not need to drop everything and rewrite. Second, "the best of AWS Labs will be transitioned" — the team is curating, not blanket-converting. Some AWS Labs packages will not make the cut, and the team is choosing which to bring forward. Third, the CONTRIBUTING.md says "This project is not accepting external code contributions at this time." The migration is a one-way door: AWS picks which skills move, AWS writes the agent-toolkit versions, and external contributors cannot submit their own. The toolkit is treated as a first-party AWS distribution, not a community open-source project.
For a developer who already uses @awslabs/mcp-server-X, the practical question is whether the Agent Toolkit's aws-core plugin is a drop-in replacement. The answer is: for the three differentiators (IAM condition keys, CloudWatch metrics, end-to-end-evaluated skills), yes, it is strictly better. For everything else (the surface area of which AWS services have skills, the depth of each skill's workflow, the speed of new service coverage), the AWS Labs packages may be ahead in some areas and behind in others. The migration is a feature-for-feature comparison you have to do yourself.
What comes next
The repository's commit log, as of 49c4592, is small but the file diffs tell a story. The most recent commit is chore: bump proxy version (#119), a version bump of mcp-proxy-for-aws to a newer minor. That is a routine maintenance commit, but it signals an active release cadence. The toolkit is not a one-time drop. The skills directory includes three Lambda execution models that did not exist a year ago (aws-lambda-durable-functions, aws-lambda-managed-instances, aws-lambda-microvms). A new S3 Vectors primitive has a dedicated skill. A new system-table SQL layer for CloudWatch, SageMaker Catalog, and S3 has three skills. The library is growing in the directions AWS's own service catalog is growing.
If you are a solo developer, the action is short. Install aws-core first. Install the rest of the skills with npx skills add aws/agent-toolkit-for-aws/skills. Pin your proxy. Read the routing graph in the SKILL.md frontmatters — it is the difference between a 30-second answer and a 5-minute one. If you are a platform team, the action is medium. Audit the four plugins against the workloads your team ships. Adopt aws-core for general work, aws-data-analytics if you have data engineers, aws-agents if you are building on Bedrock AgentCore, and aws-agents-for-devsecops if you have an incident response workflow. Write the IAM condition key policy in the README's example. Enable CloudTrail data events on the agent's role. If you are a regulated enterprise, the action is long. Adopt the toolkit behind a feature flag, run it for 30 days with a single team, capture CloudTrail evidence that the agent's actions are scoped by aws:CalledVia, run a pentest against the PreToolUse hook, and review the secret resolution pattern end-to-end. The toolkit is built for your audit story. The audit story is the point.
The engineer who typed npx @awslabs/mcp in February 2025 had no audit story. The engineer who types /plugins install aws-core@claude-plugins-official in mid-2026 has CloudTrail, IAM condition keys, a PreToolUse hook, and 107 evaluated skills. The toolkit that gives the agent power is the same toolkit that takes the power back, audits it, and routes it. The trust problem that AWS Labs could not answer is answered here — not by removing the agent's reach, but by drawing a boundary the agent cannot cross without leaving a record. The record is the boundary. The record is the governance. The record is the toolkit.