複数の AI コーディングツール、数百のセッション：正直な比較

The AI coding assistant market has fragmented in ways that make honest comparison genuinely difficult. Marketing materials are uniformly optimistic, benchmark scores are designed to favor each tool's strengths, and the reviewers writing most comparisons have spent a few hours with each product rather than the months of real operational use that surfaces the important patterns. After hundreds of sessions across Claude Code, Cursor, Gemini CLI, OpenCode, and Qwen Code — in production codebases, not toy projects — a more honest picture has emerged.

The salient finding is not that one tool is universally better. Each has a different optimization target, and those targets become visible only under sustained use. Understanding what each tool was actually built for helps explain both the strengths and the failure modes.

Claude Code：ターミナルファーストの作業馬

Claude Code occupies a distinct position: it is a CLI tool with deep agentic capabilities rather than an editor extension that adds chat to an existing workflow. This design choice has consequences that become more apparent over time.

The strength is autonomy over multi-step tasks. Given a well-scoped spec, Claude Code will read the relevant files, understand the existing patterns, write the implementation, run the tests, and fix failures — without requiring the developer to direct each step. The context window management is the best of any tool tested: it handles large codebases gracefully and maintains coherent state across long sessions in a way that other tools struggle with.

But in reality, this autonomy is a double-edged property. The same capability that makes Claude Code excellent for structured implementation work makes it occasionally overzealous on loosely specified tasks. Give it a vague instruction on a large codebase and it will make decisions — some of which you may not want. The lesson from sustained use is that Claude Code rewards tight spec discipline: the quality of output scales directly with the quality of the prompt. Developers who invest in structured planning — specs, acceptance criteria, explicit constraints — get substantially better results than those who work conversationally.

The terminal-first design also means IDE integration is secondary. Developers who want inline suggestions, real-time tab completions, and a chat panel docked to their editor will find the workflow less natural than Cursor. Claude Code is optimized for the developer who thinks in tasks, not lines.

Cursor：IDE ネイティブのベンチマーク

Cursor remains the benchmark for inline AI assistance. The tab completion is genuinely good — not in the sense that it occasionally suggests something useful, but in the sense that it has learned to predict multi-line completions that match the local file's style and conventions with enough accuracy to change the typing rhythm. After long enough use, the absence of this feels like a regression.

The composer and agent modes have matured considerably. For single-file or few-file changes where the context is clear, Cursor produces clean, well-scoped edits. The inline diff review workflow — seeing exactly what will change before accepting — is the most ergonomic of any tool in this comparison.

The failure mode surfaces on larger, cross-cutting changes. Cursor's context retrieval is good but not exceptional: it will sometimes miss that a pattern it is implementing already exists elsewhere in the codebase, or propose an approach that conflicts with established conventions in a file it did not pull into context. On small-to-medium tasks, this is manageable. On architectural changes spanning many files, it requires more steering than Claude Code's more thorough upfront analysis.

The commercial model is also worth noting. Cursor's pricing tiers create friction for teams, and the request limits on certain models can interrupt flow at inconvenient moments. For a solo developer this is a minor irritation. For a team, it is a coordination overhead that adds up.

Gemini CLI：Google のインフラ戦略

Gemini CLI arrived with the backing of Google's infrastructure and a context window large enough that it rarely struggles with codebase scale. In terms of raw capacity — how much code it can reason about in a single prompt — it is ahead of most alternatives. For a developer working with a genuinely large monorepo, this matters.

But in reality, context window size is not the limiting factor in most coding sessions. What matters more is what the tool does with the context, and here Gemini CLI's behavior has been inconsistent across our sessions. The responses are often verbose — explaining at length what it is about to do rather than doing it — and the code quality, while generally correct, sometimes reflects a preference for textbook patterns over the pragmatic conventions that characterize real-world codebases.

Integration with the Google ecosystem is, predictably, well-handled. For teams already invested in Google Cloud — Cloud Run, BigQuery, Firebase — Gemini CLI has native awareness that other tools lack. It understands IAM configuration, suggests the right client libraries, and generates deployment configurations that actually work rather than requiring significant correction. Outside the Google ecosystem, this advantage disappears.

The MCP server support is notable. Gemini CLI has invested in the Model Context Protocol ecosystem, which means it can connect to external data sources and tools in ways that extend its utility beyond pure code generation. Teams building on MCP-native infrastructure will find it a more natural fit than tools that treat tool use as an afterthought.

OpenCode：設定可能性の提案

OpenCode positions itself as a flexible, model-agnostic terminal tool — bring your own API keys, configure your preferred models, run against local or remote inference. For developers who want control over the entire stack, the proposition is real.

The practical experience has been mixed. The configurability that makes OpenCode appealing to power users also means there is more setup, more edge-case handling, and less polish on the workflows that other tools have refined through extensive user testing. The tool works — the model-agnostic architecture is technically sound — but it works in a way that requires more active management than tools with opinionated defaults.

Where OpenCode earns its place is in offline or air-gapped environments, and in organizations with data residency requirements that prevent sending code to third-party APIs. The ability to run against local models via Ollama or similar inference servers is genuinely useful for these use cases. For everyone else, the friction cost generally exceeds the configurability benefit when compared to tools that have spent more engineering effort on the default experience.

Qwen Code：オープンソースの候補

Qwen Code (built on Qwen's open-source model family) represents a different point on the capability-cost curve. As a tool running against models that can be self-hosted, it offers economics that commercial APIs cannot match at volume. For teams generating large quantities of code — documentation, test generation, boilerplate — the cost profile is materially different.

The capability ceiling is real. On complex architectural reasoning, subtle bug detection, and multi-file refactoring, Qwen Code trails the commercial tools. The gap has narrowed with each model generation, but for production work where the cost of an incorrect change is high, the quality differential still matters enough to influence the tooling choice.

The practical use case where Qwen Code has demonstrated genuine value is in high-volume, lower-stakes generation — particularly in multilingual contexts. The model's Chinese-language capability is strong, which matters for teams working across both English and Chinese codebases or documentation.

実際に重要なパターン

After hundreds of sessions, the most salient observation is not about which tool is best — it is about the conditions under which each tool succeeds. All of them fail predictably when given ambiguous instructions on unfamiliar codebases. All of them perform well when given clear tasks, relevant context, and an operator who understands the tool's working model well enough to steer it effectively.

The developers getting the most out of AI coding tools are not the ones who found the best tool. They are the ones who developed the discipline to work with these tools on the tools' terms — investing in spec quality, providing explicit constraints, reviewing outputs carefully, and treating the AI as a capable but context-limited collaborator rather than an autonomous replacement for engineering judgment.

The tool matters. The operator discipline matters more.