Skip to content

Lesson 18: agent skills - reusable knowledge for agents

In Lesson 3, we learned about tools - functions that let agents take actions like calling APIs, querying databases, and running code. Tools are about doing things.

Skills are about knowing things. A skill packages domain expertise - instructions, best practices, decision frameworks, and reference materials - into a modular unit that an agent can discover and use when needed.

Think about the difference between giving someone a wrench (a tool) and giving them a repair manual (a skill). The wrench lets them turn bolts. The manual tells them which bolts to turn, in what order, and what to watch out for.

ELI5: Think of skills like recipe cards in a kitchen

Section titled “ELI5: Think of skills like recipe cards in a kitchen”

A professional kitchen has tools (knives, pans, ovens) and recipe cards. A new chef can pick up a knife without instructions. But to make a specific dish, they need the recipe card - it tells them which tools to use, in what order, at what temperature, and what the result should look like.

Agent skills work the same way. They are the recipe cards that tell an agent how to approach a specific type of task, which tools to use, and what good output looks like.

Key takeaway: Skills encode domain expertise as portable, reusable packages. Tools let agents act. Skills tell agents how and when to act.


Consider this scenario: your team has an agent that helps with code reviews. You want it to follow your team’s specific review checklist, flag common patterns you care about, and format its feedback in a particular way.

You could put all of this in the agent’s system prompt. But system prompts get crowded fast. If you add review instructions, deployment procedures, documentation standards, and testing conventions all into one prompt, you end up with a bloated context window and an agent that is mediocre at everything.

Skills solve this by letting you:

  1. Package expertise separately - Each skill is its own file, focused on one domain
  2. Load on demand - Skills are only loaded when relevant, saving context window space
  3. Share across teams - A well-written skill can be reused across projects and agents
  4. Iterate independently - Update a skill without changing the agent’s core configuration

This is the key technical motivation. Every token in the context window has a cost - both in money and in attention. If you load 50,000 tokens of instructions at startup, the agent pays that cost on every single turn, even when most of those instructions are irrelevant.

Skills use progressive disclosure to keep the cost low:

  • At startup, load only skill names and descriptions (~100 tokens each)
  • When a skill is triggered, load its full instructions
  • Only load reference materials when the instructions explicitly need them

One analysis showed this approach reducing a 150,000-token workflow to approximately 2,000 tokens at startup.


Skills follow an open specification maintained at agentskills.io. The format is simple:

my-skill/
SKILL.md # Required: metadata + instructions
references/ # Optional: additional documentation
assets/ # Optional: templates, schemas, data files
scripts/ # Optional: executable code

The only required file is SKILL.md. Everything else is optional.

A SKILL.md file has two parts: YAML frontmatter for metadata, and Markdown content for instructions.

---
name: code-review
description: >
Reviews pull requests following team standards. Checks for
security issues, test coverage, naming conventions, and
documentation. Use when asked to review code or a PR.
---
## Code Review Process
When reviewing code, follow these steps in order:
### 1. Security check
- Look for hardcoded secrets, SQL injection, XSS vulnerabilities
- Check that user input is validated and sanitized
- Verify authentication and authorization on new endpoints
### 2. Test coverage
- New public functions should have tests
- Edge cases should be covered (empty input, null values, errors)
- Check that tests actually assert meaningful behavior
### 3. Naming and structure
- Functions and variables should have descriptive names
- Files should be in the correct directory per project conventions
- No single function should exceed 50 lines
### 4. Documentation
- Public APIs should have docstrings
- Non-obvious logic should have inline comments
- README should be updated if behavior changes
### Output format
Present findings as a list grouped by category (Security, Tests,
Style, Docs). For each finding, include the file path, line number,
severity (high/medium/low), and a suggested fix.
FieldRequiredPurpose
nameYesUnique identifier, lowercase with hyphens (e.g., code-review)
descriptionYesWhat the skill does and when to trigger it (up to 1024 chars)
licenseNoLicense for the skill
compatibilityNoEnvironment requirements (e.g., “Requires Python 3.10+“)
metadataNoArbitrary key-value pairs (author, version, tags)

The description field is critical. It is the primary way agents decide whether to activate a skill. Write it to clearly describe both what the skill does and when it should be used.


Skills are designed to load incrementally. This is the key architectural idea that makes them practical:

At startup, the agent loads only the name and description from the frontmatter of every installed skill. This costs roughly 100 tokens per skill. Even with 50 skills installed, the startup cost is only about 5,000 tokens.

The agent uses this metadata to decide: “Given the current task, is this skill relevant?”

Level 2: Instructions (loaded on activation)

Section titled “Level 2: Instructions (loaded on activation)”

When the agent decides a skill is relevant, it loads the full SKILL.md body. This is where the step-by-step instructions, decision frameworks, and examples live. The recommendation is to keep this under 5,000 tokens.

Files in the references/, assets/, and scripts/ directories are loaded only when the Level 2 instructions reference them. These might include:

  • references/security-checklist.md - Extended security review criteria
  • assets/api-schema.json - API specification for validation
  • assets/response-template.md - Template for formatted output
  • scripts/run-linter.sh - Script the agent can execute

This three-level approach means you can write very detailed skills without paying the context cost upfront.

Startup: [L1: name + description] ~100 tokens per skill
|
Task matches: [L2: full instructions] ~2,000-5,000 tokens
|
As needed: [L3: reference files] Variable

These three concepts work at different layers. Understanding the distinction helps you decide which to use:

DimensionSkillsTools / Function CallingMCP
What it providesKnowledge and instructionsExecutable functionsStandardized protocol for tool integration
AnalogyA recipe cardA kitchen applianceA power outlet standard
NatureNatural language guidanceCode that runsJSON-RPC communication layer
ExecutionLLM interprets instructionsDeterministic function callProtocol for calling remote tools
LatencyLocal (just text)Depends on functionNetwork round-trip
Best forEncoding expertise, workflows, review criteriaTaking actions (API calls, file ops, queries)Connecting to external services with auth and discovery
Context costLow (progressive loading)Medium (schema per tool)Higher (full schemas upfront)

In a typical agent, all three are used:

  1. Skills tell the agent how to approach the task and which tools to use
  2. Tools (function calling) let the agent execute actions
  3. MCP provides a standard way to connect to remote tool servers

Example: A “deploy-to-staging” skill might include instructions like:

  • Step 1: Run the test suite using the run_tests tool
  • Step 2: Check the staging environment status using the Kubernetes MCP server
  • Step 3: If tests pass and staging is healthy, deploy using the deploy tool
  • Step 4: Verify the deployment by checking health endpoints

The skill provides the workflow logic. The tools and MCP servers provide the execution capability.


A skill should do one thing well. Instead of a “development” skill that covers everything, create separate skills for code review, deployment, documentation, and testing.

Skills are interpreted by a language model. Be explicit about:

  • When to use the skill (triggering conditions)
  • What steps to follow (ordered process)
  • How to handle edge cases (decision points)
  • What good output looks like (examples or templates)

Real expertise includes knowing when to deviate from the standard process:

### Handling large PRs (>500 lines changed)
If the PR changes more than 500 lines:
- Focus review on the most critical files first (API endpoints, auth, data models)
- Skip cosmetic issues (formatting, naming) unless they affect readability
- Suggest splitting the PR if the changes cover multiple unrelated concerns

Include examples of what the skill’s output should look like:

### Example output
**Security - High**
`src/api/auth.py:45` - Password is compared using `==` instead of
`hmac.compare_digest()`. This is vulnerable to timing attacks.
Suggested fix: Replace with `hmac.compare_digest(stored_hash, provided_hash)`

If your instructions are getting long, move detailed reference material to L3 files in the references/ directory and reference them from the main instructions:

For the full security checklist, refer to `references/security-checklist.md`.

Google’s Agent Development Kit supports skills through the SkillToolset class. Here is a conceptual overview of how it works:

Place skill directories in a skills/ folder within your agent project:

my-agent/
agent.py
skills/
code-review/
SKILL.md
references/
security-checklist.md
deploy/
SKILL.md
scripts/
pre-deploy-check.sh

The agent discovers and loads skills from this directory. Only the L1 metadata is loaded at startup. Full instructions load when the agent activates the skill.

For dynamic skill creation or modification, ADK also supports defining skills in code using the Skill model class. This is useful when skill content needs to change based on runtime conditions.

For detailed implementation guidance, see the ADK Skills documentation.


The Agent Skills specification has been adopted by multiple platforms:

PlatformSupportDetails
Claude Code (Anthropic)YesSkills as /slash-commands, anthropics/skills repo
Google ADKYesSkillToolset class, file-based and code-based
GitHub CopilotYesWorks in VS Code, CLI, and Copilot coding agent
OpenAIYesAgents SDK with skills support
Spring AIYesJava ecosystem via spring-ai-agent-utils

The specification is maintained by a community working group and published at agentskills.io. Because the format is just Markdown files in a directory, skills are portable across platforms that support the spec.


---
name: database-migration
description: >
Creates and reviews database migrations. Use when the user asks to
add, modify, or remove database tables or columns, or when reviewing
migration files.
---
## Creating Migrations
1. Verify the current migration state: run `alembic heads` to check for conflicts
2. Create the migration: `alembic revision --autogenerate -m "description"`
3. Review the generated migration file for:
- Correct up/down operations (both directions should work)
- No data loss in down migration
- Appropriate indexes for new columns
- Nullable columns for existing tables (to avoid breaking existing rows)
4. Test the migration: `alembic upgrade head` then `alembic downgrade -1`
## Common Pitfalls
- Adding a NOT NULL column to an existing table without a default value
will fail if the table has existing rows. Always add a default or make
it nullable first, then backfill.
- Renaming columns requires a two-step migration: add new column, migrate
data, drop old column. Alembic's autogenerate does not handle renames.
- Large table alterations should be done in batches on production. Add a
note in the migration file if the table has >1M rows.
---
name: incident-response
description: >
Guides incident response and post-mortem creation. Use when there is
a production incident, outage, or when creating post-mortem documents.
---
## During an Incident
1. Assess severity using the service dashboard at `monitoring.internal/overview`
2. Check recent deployments: `gcloud run revisions list --service=api --limit=5`
3. Check error rates: `gcloud logging read "severity>=ERROR" --limit=50 --freshness=1h`
4. If a recent deployment is suspect, rollback:
`gcloud run services update-traffic api --to-revisions=PREVIOUS_REVISION=100`
## After Resolution
Create a post-mortem document using the template in `assets/postmortem-template.md`
with these sections filled in:
- Timeline of events (with timestamps)
- Root cause analysis
- Impact (users affected, duration, data loss if any)
- What went well in the response
- Action items with owners and due dates

SituationUse SkillsUse Something Else
Team has specific review criteriaYes-
Agent needs to follow a multi-step workflowYes-
Agent needs to call an APINoUse a tool or MCP
Agent needs project context (build commands, structure)NoUse AGENTS.md
Workflow is simple and one-offNoJust put it in the prompt
Knowledge changes rarely and is domain-specificYes-
Knowledge changes frequently or needs live dataNoUse RAG or tools

  • Skills package domain expertise as portable, reusable Markdown files
  • They use progressive disclosure (L1/L2/L3) to minimize context window cost
  • Skills tell agents how and when to act; tools let agents actually act
  • The SKILL.md file is the only required component - frontmatter for metadata, body for instructions
  • Write skills focused on one domain, with clear steps, decision points, and output examples
  • Keep L2 instructions under 5,000 tokens; move detailed material to L3 references
  • Skills are supported across multiple platforms: Claude Code, ADK, GitHub Copilot, OpenAI, Spring AI
  • Skills, tools, and MCP work at different layers and complement each other


Previous Lesson: MCP Deep Dive | Next Lesson: Orchestrators ->