ドメイン固有のビジネスワークフロー向け AI エージェントスキルの構築

The general-purpose AI coding agent handles a surprisingly wide range of tasks competently. It reads documentation, writes code, debugs failures, and reasons through unfamiliar problems with enough context. But most teams deploying these agents in real operational environments hit the same friction point: providing that context is itself a task, one that grows with the complexity of the domain.

A data analyst who needs the agent to extract supplier lead times from a folder of mixed-format purchase orders does not want to explain what a purchase order is, how their company’s naming conventions work, and what output format downstream systems expect — every single time. A developer maintaining infrastructure monitoring does not want to paste the same alert thresholds and severity definitions into every conversation. The repetition is the problem.

Skills — packaged, reusable configurations that bundle domain knowledge and behavioral instructions into a single invocable unit — are a direct answer to that problem. This post describes how they are built, how they are deployed in recurring workflows, and where they reliably deliver value.

スキルとは実際に何か

A skill is not a model feature or an API capability. It is a file — specifically, a Markdown file placed in a directory the agent scans at session start. When the agent encounters a request that matches the skill’s description, it loads the skill’s instructions and follows them, rather than reasoning from scratch about how to approach the task.

The file contains four things: a description block that tells the agent when to activate the skill; an instruction set that specifies how to handle the task class; optional tool access declarations that expand or restrict what the agent can reach during execution; and optional examples demonstrating correct behavior on realistic inputs.

What the file does not contain is hardcoded business data. A skill encodes how to approach a class of tasks — the domain vocabulary, the reasoning pattern, the output format — not the tasks themselves. That distinction, between encoding approach and encoding data, is the central design question in building skills that remain useful over time.

ドメイン知識をエンコードすべきとき — そうでないとき

Most skill implementations that fail do so because they encode too much. Teams, enthusiastic about the concept, write a skill that includes specific vendor names, current pricing tiers, a client’s preferred report format, and the exact API endpoints in use today. Six months later, half of that information is wrong, the skill has drifted from reality, and no one is sure which parts to trust.

The practical rule: encode what changes slowly, leave to the agent what changes frequently.

Domain vocabulary changes slowly. A skill for processing import documentation should know what a certificate of origin is, what the deductive method means in customs valuation, and what the regulatory framework looks like for the target market. That knowledge, once correct, stays correct for years. It belongs in the skill.

Specific filing formats, counterparty names, current regulatory thresholds, and exact deadlines change constantly. These belong in the data the agent receives at runtime — not in the skill definition. The skill teaches the agent how to read a class of documents. The actual document arrives as context.

A useful test: if you would need to update the skill every time something changes in the business environment, the wrong things are in the skill. Well-designed skills survive organizational change. Poorly designed ones become stale almost immediately.

SKILL.md の構造

The canonical skill file structure is straightforward. A frontmatter block identifies the skill and defines when to activate it; the body defines how to use it.

---
name: import-document-processor
description: Use when processing import declarations, ACP applications,
or customs documentation for Singapore, Japan, or EU markets
---

## Context
Background on the domain: what this class of documents is, why it
exists, what the agent needs to know to interpret one intelligently.

## Approach
Step-by-step instructions for handling the task: what to look for,
how to handle ambiguous cases, what to escalate to human review.

## Output Format
Exactly what the output should contain, in what structure.

## Edge Cases
Known failure modes and how to handle them.

The description block is operationally the most important part. It determines whether the agent activates the skill at all. A vague description — “use for documents” — produces erratic activation. A precise one, naming the document types, business contexts, and relevant markets, produces reliable behavior.

But in reality, the description should not over-specify. If it is too narrow, the agent misses genuine matches. The goal is a description that a thoughtful practitioner would recognize as matching their situation — no more, no less. This requires calibration in practice, and it is worth testing edge cases during initial deployment to verify the activation boundaries are where you expect them.

繰り返しタスクの /loop パターン

Some business workflows run on a schedule. Daily extraction from an API endpoint. Weekly aggregation of supplier reports. Monthly generation of client-facing summaries. These are not one-off interactions — they are operational processes that run continuously, regardless of whether someone initiates them manually.

The /loop pattern turns a skill-driven interaction into a recurring operation:

/loop 24h process-supplier-reports

This executes the named skill at the specified interval, passing current context at each invocation. The skill handles the work; the loop handles the scheduling.

Recurring skills require specific design considerations that interactive skills do not. They need to be idempotent — executing the same skill twice on the same input should not create duplicate records, send duplicate notifications, or corrupt downstream state. They need explicit error behavior — clear definition of what happens when input is malformed, when an API is unavailable, or when expected data is absent. And if downstream systems consume their output, that output needs to be machine-readable, not just legible to humans.

The skills that function reliably on a loop are almost always narrower than those built for interactive use. An interactive skill might gracefully handle five different document types with reasonable degradation for unfamiliar ones. The loop variant handles one document type precisely and errors explicitly on anything unexpected. Operational reliability favors specificity over flexibility.

A practical heuristic: if you cannot describe the complete expected behavior of the skill — including error cases — in under ten minutes, it is probably too broad for reliable scheduled execution.

本番からの実践的な例

Three patterns we have used in real operational environments:

Data extraction from supplier communications. Suppliers send order confirmations, shipping notifications, and invoice updates in inconsistent formats — sometimes structured, often not. A skill configured for each supplier’s communication style extracts structured data from unstructured text. The skill knows that a particular supplier includes a purchase order reference in the second paragraph of their confirmation email, that their date format is DD.MM.YYYY, and that their product codes follow a specific pattern. What changes weekly is the actual email content; what stays fixed is how to parse it. The extracted output feeds directly into inventory systems without manual intervention between message receipt and data entry.

Report generation from operational data. Generating a weekly operational summary requires knowing which metrics matter, what thresholds are significant, what comparisons are meaningful across periods, and how to frame findings for a specific audience. None of that changes week to week. A skill encodes it once. At runtime, the agent receives the current data and applies the configured analytical framework. The output format, the language register, the sections that must always appear — all defined in the skill, not re-specified each time. The result is consistency that manual prompting rarely achieves at scale.

System monitoring and alerting. Infrastructure generates logs continuously. Monitoring skills define what patterns constitute a noteworthy event, how to classify severity, and what information a human needs to act on an alert effectively. Paired with the loop pattern and running on a short interval, these skills check for defined conditions and generate structured alerts when those conditions are met.

The key design consideration here is false positive management. A skill that alerts on every anomaly trains practitioners to ignore alerts. The real work is calibration — defining precise conditions under which an alert is worth human attention. This is not a technical problem; it is a domain knowledge problem. Getting it right requires detailed input from the people who currently respond to issues manually, because they have already internalized the threshold between signal and noise. The skill formalizes that knowledge.

スキルが解決しないもの

Skills encode competence in specific domains. They do not create general capability.

A well-configured skill handles known patterns reliably. It does not handle genuinely novel situations well, because the instructions that make it reliable in familiar contexts actively constrain it in unfamiliar ones. The practical implication is that skills work best when the task space is reasonably bounded — import documentation from specific markets, report generation for a defined set of metrics, log analysis against known alert patterns. “Help with business operations” is not a bounded task space. Skills cannot make an agent competent in domains where no one has invested in defining what competence looks like.

The other limitation is maintenance. Skills that encode domain knowledge require updates when that knowledge changes — not every change, because well-designed skills avoid encoding volatile information, but some changes are inevitable. Keeping skills current requires the same discipline as keeping documentation current. Organizations that do not invest in it discover, gradually, that their skills have drifted from reality. The failure mode is quiet: the skill continues to run, continues to produce output, and the output is subtly wrong in ways that take time to detect.

The organizations that extract the most value from skills invest in both initial configuration and ongoing calibration. The initial investment is smaller than most expect. The ongoing maintenance commitment is larger than most plan for — and planning for it honestly is what separates skills that remain useful from skills that become a source of silent errors.