Tuning a Hermes-Style Agent That Grows With Your Project

A project-specific AI assistant built on a messaging channel — Telegram, Discord, Slack, or email through a thin gateway — behaves differently in a multi-user group than in a single-user DM. Onboarding a team to the bot exposes a class of bugs that the DM test path never reaches: empty-reply crashes on keyword triggers, miscalibrated progress notifications, race conditions between parallel triggers, and memory drift in the agent's own conversation log.

This guide documents eight tuning patterns we applied to a minimal hermes-style assistant — built on the Claude Code CLI as a subprocess rather than the full NousResearch framework. Each pattern is a concrete problem with the code to fix it. The patterns apply to either codebase.

What This Fixes

Generic Unexpected error replies when the LLM returns empty stdout on a keyword trigger
Reassurance text firing on every short reply because the threshold is below typical response latency
Subprocesses that stall mid-stream and never return, with no timeout to clean them up
Race conditions when two group members trigger the agent simultaneously against a shared session
Silent stderr-pipe deadlocks once the agent runs long enough to fill the 64 KB buffer
Bug-fallback messages in the conversation log that the agent reads back as legitimate behavior

Prerequisites

This guide assumes the architecture from Building a Project-Specific AI Assistant via Telegram: one Docker container per project, non-root user, persistent OAuth-authenticated Claude Code CLI, aiogram bot wrapper, bind-mounted workspace and session-state volumes. Several patterns depend on specific versions:

Claude Code CLI 2.1.139 or later (for --output-format stream-json --verbose --include-partial-messages)
aiogram 3.28.2 or later (for ChatActionSender, message.react(), ReactionTypeEmoji)
Python 3.13 base image
A Telegram group where the bot is a member with can_react_to_messages: true

For an email channel on the same project, see setting up a project mailbox with DKIM, SPF, and DMARC.

What a Hermes-Style Agent Is

The hermes-style pattern is named after NousResearch's open framework. Three properties distinguish it from a stateless chatbot:

Persistent memory. A workspace on disk that the agent reads and writes between turns, so context survives container restarts.
Multi-channel presence. The same agent instance talks on Telegram, Discord, Slack, or email through a thin gateway.
A closed learning loop. Operator corrections become workspace edits that the agent reads on the next turn.

NousResearch ships a full reference implementation with a TUI, multi-channel gateway, skills system, and RL training hooks. A minimal variant on top of the Claude Code CLI subprocess keeps the moving parts small enough to template per consulting mandate. The patterns below apply equally to either approach.

Pattern 1: Typed Empty-Reply Handling

A keyword-based trigger (matching \bhermes\b in group messages) can fire on a sentence that contains the bot's name but is not addressed to it. The LLM correctly returns empty output. Three layers downstream each fail to handle the empty case:

The engine returns "" with returncode 0.
The split function returns [""] because len("") <= max_chars matches.
The send loop calls bot.send_message(chat_id, ""); Telegram returns Bad Request: message text is empty; a generic except Exception at the top of the handler swallows the traceback and sends the user-facing error.

Filtering empty strings in one layer prevents the crash but produces a silent skip — the trigger fired, the bot consumed compute, the user sees nothing. The two-step fix uses a typed exception for empty output and a Telegram reaction (👀) on the triggering message as the acknowledgment:

class HermesEmptyResponse(HermesError):
    """Subprocess returned successfully but with empty result."""

class HermesHangError(HermesError):
    """Watchdog killed subprocess after no stream-event for N seconds."""

The engine raises HermesEmptyResponse when result.strip() == "". The handler catches it and calls message.react([ReactionTypeEmoji(emoji="👀")]). The conversation log gets a marker block — a separate entry that records the silent acknowledgment without polluting the chat with text — so the agent's own future memory reads see that a trigger fired and was intentionally not answered.

Pattern 2: Reaction-Permission Pre-Flight with Lazy Cache

Telegram's setMessageReaction is not universally available. Some groups restrict the allowed reaction set; some custom emojis need administrator allowlisting. The ChatFullInfo type documents the rule: if available_reactions is omitted, all standard emoji are allowed; if it is an array, only those emoji work. The bot needs to be a member of the group — administrator status is not required for reactions in groups.

Per-trigger verification wastes API calls. One getChat per chat with a one-hour cache is enough:

_reaction_cache: dict[int, tuple[bool, float]] = {}
_REACTION_CACHE_TTL_SEC = 3600
MINI_ACK_EMOJI = "👀"

async def _reactions_allowed(bot: Bot, chat_id: int) -> bool:
    now = time.monotonic()
    cached = _reaction_cache.get(chat_id)
    if cached and cached[1] > now:
        return cached[0]
    try:
        chat = await bot.get_chat(chat_id)
        allowed = (
            chat.available_reactions is None
            or any(
                isinstance(r, ReactionTypeEmoji) and r.emoji == MINI_ACK_EMOJI
                for r in (chat.available_reactions or [])
            )
        )
    except Exception:
        allowed = False
    _reaction_cache[chat_id] = (allowed, now + _REACTION_CACHE_TTL_SEC)
    return allowed

Wrap the actual reaction call in try/except (TelegramBadRequest, TelegramForbiddenError) regardless — the cache lags permission changes.

Pattern 3: Stream-Mode and the Idle-Time Watchdog

A hard timeout on the whole subprocess (asyncio.wait_for(proc.communicate(), timeout=300)) caps total duration regardless of progress. Removing it without a replacement is documented as unsafe: the Claude Code stream-idle-hang issue describes API calls that stall mid-stream and never return, leaking a subprocess.

Switching to --output-format stream-json --verbose --include-partial-messages emits events at every milestone — per-token text_delta, tool-use start and stop, API retries, rate-limit notices, the final result event. A real stall produces silence on the stream; a long task produces a sequence of small events. The watchdog kills on idle time, not total duration:

WATCHDOG_NO_EVENT_SEC = 60

async def watchdog() -> None:
    while True:
        await asyncio.sleep(5)
        if proc.returncode is not None:
            return
        idle_sec = time.monotonic() - state["last_event_ts"]
        if idle_sec > WATCHDOG_NO_EVENT_SEC:
            state["killed_by_watchdog"] = True
            try:
                proc.kill()
            except ProcessLookupError:
                pass
            return

The final response text comes from the result event's result field — deterministic, single-source, and unaffected by partial-stream parsing. The same event carries is_error, api_error_status, duration_ms, and total_cost_usd, all of which go into the structured log line.

Pattern 4: Calibrating the Reassure Schedule

The threshold question — when does the bot send a text update during a long-running call — is empirical. The right answer depends on the latency distribution of real triggers. Three thresholds, with text aligned to what the user actually needs to know:

_REASSURE_SCHEDULE = (
    (15, "On it."),
    (90, "Taking longer than usual, still on it."),
    (300, "Genuinely large task — almost there."),
)

The thresholds derive from two constraints. The lower bound is set by the typical short-reply latency: if most replies arrive within X seconds, the first reassurance must fire later than X, or it lands at roughly the same moment as the answer. Nielsen's response-time research identifies 10 seconds as the canonical limit for keeping a user's attention without a progress indicator; the typing indicator that aiogram's ChatActionSender renders below that threshold already satisfies the requirement up to about 15 seconds.

The upper threshold (90 seconds) is the gap at which the framing changes from working to working but longer than usual — a separate signal that the call is in the long-tail of the distribution. The wording avoids implying that the user asked for something heavy. The bot is the one doing the work; the message acknowledges that work, not the request.

Pattern 5: Per-Chat Concurrency Lock

Two group members can trigger the agent within the same second — one with an @-mention, one with the keyword. Both handler invocations spawn claude --continue subprocesses against the same persistent session file. The session lock-file is not strict; concurrent writes produce truncated session-jsonl files and lost turns.

Serialize per-chat at the handler layer with a lazily-created lock:

_chat_locks: dict[int, asyncio.Lock] = {}

def _get_chat_lock(chat_id: int) -> asyncio.Lock:
    lock = _chat_locks.get(chat_id)
    if lock is None:
        lock = asyncio.Lock()
        _chat_locks[chat_id] = lock
    return lock

async with _get_chat_lock(message.chat.id):
    response = await _run_hermes_with_ux(bot, message, prompt, ctx)
    ...

Lazy creation matters: an asyncio.Lock instantiated at module-import time binds to whichever event loop is current at import, which may not be the loop the handler runs on after a restart. Deferring the instantiation until the first call inside an active loop avoids the binding mismatch. For small groups, the lock dictionary stays small; for larger fleets, add LRU eviction.

Pattern 6: Exception Hierarchy and Except-Order

The engine exception classes form a tree:

HermesError(RuntimeError) — anything wrong with the subprocess
HermesEmptyResponse(HermesError) — successful run with empty result
HermesHangError(HermesError) — watchdog killed

Python's except matches the first compatible clause. If except HermesError precedes the subclass handlers, it captures HermesEmptyResponse and routes it to the error path, bypassing the mini-ack. Subclass-first ordering is required:

try:
    response = await _run_hermes_with_ux(bot, message, prompt, ctx)
    ...
except HermesEmptyResponse:
    # mini-ack path
    ...
except HermesHangError as exc:
    # retry-once-then-bail path
    ...
except HermesError as exc:
    # exit-not-zero, api-error, etc.
    ...
except Exception:
    # last resort
    ...

Add this to a code-review checklist. Re-ordering the blocks by visual scan inverts the intent.

Pattern 7: Draining stderr in Parallel

Streaming over stdout requires reading lines as they arrive: async for line in proc.stdout. If stderr is also piped, the subprocess can fill its stderr buffer while stdout is still being read. Default pipe buffers on Linux are around 64 KB. Once stderr fills, the subprocess blocks waiting for it to drain, and the async-for-line loop never advances. The watchdog eventually kills it after the idle period, but the result is lost.

Drain stderr in parallel from the start of the subprocess, then await the drain task after proc.wait():

proc = await asyncio.create_subprocess_exec(
    *cmd,
    stdout=asyncio.subprocess.PIPE,
    stderr=asyncio.subprocess.PIPE,
    cwd=str(WORKSPACE),
)
stderr_task = asyncio.create_task(proc.stderr.read())

# ... stream-loop on stdout ...

rc = await proc.wait()
try:
    stderr_b = await stderr_task
except Exception:
    stderr_b = b""
stderr = stderr_b.decode("utf-8", errors="replace").strip()

The Claude Code CLI emits little stderr in stream-json mode, so the failure mode is rare in practice. The fix is one extra line.

Pattern 8: Memory-Edit Discipline

A hermes-style agent reads its own conversation logs as memory. Bug-fallback messages written to that log become indistinguishable from intentional past behavior on the next read. The first instinct is to insert correction markers ([CORRECTION: the previous entry was a bug]) so the next memory read sees the fix.

Verify that the bug-fallback was logged before editing. In the case above, the generic except Exception block called message.answer(...) to send the error to the user but did not call conversation_log.log_outgoing(...). The error message reached Telegram but never reached the agent's memory file. No retroactive edit was needed.

Treat the agent's workspace as the agent's. Before any plan that involves editing files inside it, take a fresh state snapshot — the agent may have rewritten its own CLAUDE.md or notes since the last read. Anthropic's context engineering guide describes persistent memory as an artifact between sessions, not a notepad that the operator scribbles in. Domain-specific skills stay more durable when they live alongside agent-curated notes rather than in operator-edited files that the agent learns to distrust.

Operational Notes

Bind-mount persistence. Bind-mounted volumes for the workspace and Claude OAuth credentials survive docker compose up -d --force-recreate as long as the mount paths are unchanged. Verify before any compose-file edit.

Pre-deploy safety check. Grep the last five minutes of logs for a claude_subprocess_start without a matching claude_result_event. A pending subprocess means a restart will kill an in-flight run. Wait until the logs are clean. For broader failure scenarios, see our disaster-recovery writeup.

Pattern reusability across mandates. The full stack — engine, handlers, conversation log, file intake — clones to a new mandate by changing two environment variables (a project-name and an instance-id). The bot token, OAuth credentials, workspace, and allow-list parameterize per project. For the operational angle on running many per-project assistants in parallel, see solo operations at scale.

Reaction emoji selection. The 👀 emoji is in the default Telegram standard set and works in groups where available_reactions is unset. If a group restricts to a custom subset, the cache reflects that and the mini-ack silently skips. Make the emoji a per-deployment configuration constant rather than a hardcoded literal.

Hermes-Agent versus a minimal custom build. NousResearch's framework includes a TUI, slash-command system, multi-channel gateway, skills hub, and RL training integration. A minimal Claude Code CLI wrapper produces the same conversational shape with roughly a tenth of the moving parts. Both converge on the same set of group-chat UX problems; the patterns in this post apply to either.

When to Apply Each Pattern

The patterns are not equally urgent. Apply them in the order encountered:

Pattern 1 (empty-reply handling) is required as soon as the bot is added to a group with keyword-trigger detection.
Pattern 4 (reassure schedule) is required after the first short reply lands at the same time as the reassurance message.
Patterns 3 and 7 (stream-mode, stderr drain) are required as soon as long-running tasks start hanging.
Pattern 5 (concurrency lock) is required when the first session-file truncation appears in the logs.
Patterns 2, 6, and 8 are background hardening — apply during code review before they break in production.

Build the project-specific assistant first: the base architecture guide covers the container, OAuth, workspace, and handler layout. Onboard a small team with the scaling guide for allow-lists, group setup, and trigger detection. Add the email channel for the project with the DKIM/DMARC guide when out-of-band notifications start arriving. Return to this post when the patterns above are needed.

tva runs several per-project assistants in parallel for different consulting mandates. To get help building or tuning yours, get in touch.

Related Insights

Building a Project-Specific AI Assistant via Telegram — the base architecture this guide tunes
Scaling a Telegram AI Assistant from Solo to Team — allow-lists, group setup, trigger detection
Setting Up a Mailbox for Your Project-Specific AI Agent — the email channel for the same per-project pattern
Building AI Agent Skills for Domain-Specific Business Workflows — making the assistant useful for one domain
Solo Operations at Scale: Managing Dozens of Projects with a Small Team — operating many assistants in parallel