Claude Code Skills System - Building Custom Workflows That Actually Run

Matthew Diakonov··11 min read

Claude Code Skills System - Building Custom Workflows That Actually Run

The skills system in Claude Code turns natural language instructions into repeatable, composable workflows. You write a Markdown file, drop it in .claude/skills/, and suddenly your AI agent can execute complex multi-step operations with a single /skill-name command. This post covers how the system works, how to structure skills for reliability, and how to chain them into production pipelines.

What the Skills System Actually Is

A skill is a SKILL.md file that lives in .claude/skills/{skill-name}/. When you invoke it (via /skill-name or by asking the agent to use it), Claude reads the file and follows its instructions. That is the entire mechanism. There is no runtime, no SDK, no compilation step.

The simplicity is the point. Skills are just instructions, which means they can describe anything Claude can do: edit files, run shell commands, call APIs, interact with browsers via MCP, or orchestrate other tools.

| Component | Location | Purpose | |---|---|---| | Skill definition | .claude/skills/{name}/SKILL.md | Instructions the agent follows | | Skill registry | ~/.claude/skill-registry.json | Index of all available skills across projects | | Settings hooks | .claude/settings.json | Deterministic triggers that fire on events | | CLAUDE.md | Project root or ~/.claude/ | Global context and constraints |

The distinction between skills and hooks matters. Skills are invoked on demand; you call them when you want something done. Hooks are automatic triggers that fire on specific events (before a tool runs, after a file edit, when the agent stops). Skills handle the "what," hooks handle the "when."

Anatomy of a SKILL.md File

Every skill file starts with YAML frontmatter:

---
name: deploy-staging
description: "Build, test, and deploy the current branch to staging. Runs type checking, unit tests, and pushes to the staging environment."
user_invocable: true
---

Three fields, all required in practice:

  • name: the slash command. /deploy-staging triggers this skill.
  • description: what the skill does. Claude uses this to decide whether a skill matches a user request, so be specific. "Deploy stuff" is worse than "Build, test, and deploy the current branch to staging."
  • user_invocable: set to true if you want it available as a slash command. Set to false for skills that only get called by other skills.

After the frontmatter, the body is freeform Markdown. This is where you write the actual workflow. Claude reads the entire file and follows it step by step.

What Makes a Good Skill Body

The body should read like instructions for a competent colleague who has never seen this project before. That means:

  1. State the goal in one sentence at the top
  2. List preconditions (what must be true before starting)
  3. Describe each step with enough detail that there is exactly one interpretation
  4. Include the exact commands to run, not descriptions of commands
  5. Define success criteria so the agent knows when it is done

Here is a minimal but complete example:

---
name: test-and-lint
description: "Run the full test suite and linter. Fail fast on first error."
user_invocable: true
---

# Test and Lint

Run tests and linting for the project. Stop at the first failure.

## Steps

1. Run type checking:
   \`\`\`bash
   npx tsc --noEmit 2>&1 | head -30
   \`\`\`
   If this fails, stop and report the type errors.

2. Run the test suite:
   \`\`\`bash
   npm test 2>&1
   \`\`\`
   If any test fails, stop and report which tests failed.

3. Run the linter:
   \`\`\`bash
   npm run lint 2>&1 | head -50
   \`\`\`
   Report any warnings or errors.

## Success

All three commands exit 0 with no errors.

Compare that to a vague skill that says "check the code quality." The explicit version leaves no room for the agent to guess what "quality" means or which tools to use.

Structuring Skills for Reliability

After building and running 30+ skills daily, a few patterns separate the ones that work every time from the ones that fail intermittently.

Reliable Skill ArchitectureFrontmattername + descriptionPreconditionsvalidate before workSteps 1..Nexplicit commandsDoneNarrow ScopeOne skill = one jobExact CommandsNo ambiguous proseExit CriteriaDefine "done" explicitlyAnti-pattern: vague instructions, multiple responsibilities, no verification"check everything and make it good" → unreliable, inconsistent results

Keep Each Skill Narrow

A skill that does one thing well is better than a skill that does five things inconsistently. We run 30+ skills reliably because each one has a single, well-defined job. A deploy skill deploys. A test skill tests. A review skill reviews. When you need a pipeline that does all three, you chain them (more on that below).

Use Exact Commands, Not Descriptions

"Run the tests" is ambiguous. npm test 2>&1 | head -50 is not. The agent will figure out what you mean either way, but exact commands eliminate a class of failure where the agent picks the wrong test runner, forgets to pipe stderr, or runs tests in the wrong directory.

Validate Before Doing Work

A skill that charges ahead without checking preconditions will fail in surprising ways. Good skills start with checks:

## Preconditions

1. Confirm you are on the `main` branch: `git branch --show-current`
2. Confirm no uncommitted changes: `git status --porcelain`
3. Confirm the build passes: `npx next build 2>&1 | tail -5`

If any check fails, the skill should stop and explain why, not try to fix the precondition itself.

Chaining Skills Into Pipelines

The real power of the skills system shows up when you compose skills. A skill can reference other skills by name, and Claude will invoke them in sequence. This is how you build pipelines that would otherwise require a CI system.

Here is a real example from our SEO pipeline. The gsc-seo-page skill:

  1. Reads a state file to find the next target query
  2. Claims the query (sets status to in_progress)
  3. Checks for duplicate content
  4. Writes an MDX blog post with frontmatter, body, tables, and SVG diagrams
  5. Runs npx next build to verify the page compiles
  6. Commits and pushes to main
  7. Waits for Vercel to deploy, then verifies HTTP 200
  8. Updates the state file with status: done

That is eight distinct operations orchestrated by a single skill file. The skill runs on a cron schedule via launchd, no human in the loop. When it fires, it picks up the highest-impression pending query from Google Search Console and ships a page for it.

State Files as Coordination Mechanism

When multiple skills or multiple agent instances need to share state, a JSON file in the repo works better than environment variables or databases. The pattern:

{
  "queries": [
    {
      "query": "some search term",
      "status": "pending",
      "page_slug": null,
      "completed_at": null
    }
  ]
}

Skills read from and write to this file. The in_progress status prevents two concurrent runs from picking the same query. The done status with page_slug and completed_at creates an audit trail.

This approach is intentionally low-tech. JSON files are easy to inspect, easy to fix manually, and version-controlled alongside your code. You do not need Redis for a pipeline that runs once an hour.

Integrating Skills with MCP Tools

Skills become significantly more powerful when combined with Model Context Protocol (MCP) servers. Each skill is essentially a mini wrapper around whatever tools the agent has access to: file system, shell, browser automation, APIs.

For example, a skill that posts to social media can:

  1. Use file tools to read the content
  2. Use Playwright MCP to open a browser, navigate to the platform, fill in forms, and click "Post"
  3. Use shell tools to log the result

The skill does not need to know how Playwright works internally. It just says "navigate to this URL" and "click this button." MCP handles the protocol layer.

| MCP Server | What It Enables | Example Skill Use | |---|---|---| | Playwright | Browser automation (forms, clicks, screenshots) | Social media posting, OAuth flows | | File system | Read/write/edit files | Code generation, config updates | | Gmail | Email send/receive/search | Notification skills, inbox triage | | macOS accessibility | Native app control | Testing GUI applications | | Custom servers | Anything with an API | Database queries, deployment triggers |

Common Pitfalls

  • Overloading a single skill. If your SKILL.md is over 300 lines, it is doing too much. Split it into focused sub-skills and chain them. A 500-line skill is hard to debug when step 47 fails.

  • Relying on implicit context. A skill should work when invoked cold, with no prior conversation. If your skill assumes the agent already knows which branch to use or which file to edit, it will break when run from a cron job or a fresh session.

  • Skipping verification. Every skill that writes output should verify that output. Wrote a file? Check it exists. Ran a build? Check exit code. Pushed to git? Check the remote. The 10 seconds spent verifying saves 10 minutes of debugging a silent failure.

  • Using vague descriptions in frontmatter. The description field is how Claude decides whether to use your skill. "Handles deployment" matches too many requests. "Build the Next.js app, run tests, deploy to Vercel staging via CLI, and post the preview URL to Slack" matches exactly the right ones.

  • Forgetting about concurrent runs. If your skill modifies shared state (a JSON file, a database row, a git branch), you need a locking mechanism. Even something as simple as a status field (in_progress) prevents two agents from stepping on each other.

Minimal Working Example

Here is a complete skill you can drop into any project:

---
name: check-deps
description: "Check for outdated dependencies and report which ones have major version bumps available."
user_invocable: true
---

# Check Dependencies

Report outdated npm dependencies, highlighting major version bumps.

## Steps

1. Run the outdated check:
   \`\`\`bash
   npm outdated --json 2>/dev/null || true
   \`\`\`

2. Parse the JSON output. For each package where `latest` differs from
   `current` by a major version, flag it.

3. Present a table:

   | Package | Current | Latest | Type |
   |---------|---------|--------|------|
   | react   | 18.2.0  | 19.1.0 | major |
   | next    | 14.1.0  | 15.2.0 | major |

4. If no major bumps exist, report "All dependencies are on current
   major versions."

## Success

A clear summary of outdated packages is presented. No changes are made
to package.json.

Save that as .claude/skills/check-deps/SKILL.md, and you can run /check-deps from any conversation.

Wrapping Up

The Claude Code skills system is a file-based workflow engine. You write Markdown instructions, the agent follows them. The key to making it work reliably is narrow scope per skill, explicit commands over vague descriptions, precondition checks, and verification steps. Chain skills together for complex pipelines, use state files for coordination, and combine with MCP tools for anything that needs browser or API access.

Fazm is an open source macOS AI agent. Open source on GitHub.

Related Posts