tva
← Insights

Building a Project-Specific AI Assistant via Telegram

An AI assistant scoped to a single client mandate is structurally different from an org-wide AI tool. Different boundaries, different memory model, different secrets, different deployment lifecycle. This guide describes the project-specific AI assistant pattern we use at tva: one Telegram bot per consulting mandate, each backed by its own Claude Code CLI instance with persistent OAuth, its own workspace, and its own allow-list of participants. The pattern is template-cloneable across mandates, and the entire stack runs in a single Docker container.

This guide is written so a developer (or an LLM) can read it linearly and reproduce the setup from scratch. Every version is pinned. Every configuration choice is justified. Every architecture decision lists the alternatives we rejected and why.

What You'll Need

  • A Linux server with Docker Engine 29.x and Docker Compose v2 (we run on a Hetzner Cloud VPS; any container-capable host works)
  • An Anthropic Pro or Max subscription (this guide uses OAuth-subscription auth, not API-key billing)
  • A registered Telegram bot via @BotFather with its token in hand
  • The Telegram user IDs of every person you want to allow into the assistant (via @userinfobot)
  • A browser on your laptop for the one-time OAuth flow
  • A clear idea of the consulting mandate this instance will serve

What This Builds

  • A single Docker container that combines a Python Telegram bot and the Claude Code CLI in one runtime
  • OAuth against a Claude Max subscription, with credentials persisted across container restarts
  • An allow-list middleware that silently drops messages from anyone not on the list, both in DMs and groups
  • A persistent workspace where the assistant maintains its own CLAUDE.md and notes/ files over time
  • File-intake for Telegram attachments (PDFs, photos, voice notes, etc.) into /workspace/incoming/
  • Conversation logs in Markdown, human-readable, available to the assistant via Read-Tool
  • An identity layer (HERMES_PROJECT_NAME, HERMES_INSTANCE_ID) that makes the stack cloneable to other mandates with two env-var changes

The Project-Instance Model: Why Per-Mandate Beats Org-Wide

The default instinct when building an internal AI tool is to make it organisation-wide: one bot, one workspace, every team member has access. This works for low-stakes use cases (a Slack bot that summarises documents, an internal Q&A tool). It breaks down quickly for consulting work, where each mandate has its own confidentiality boundary, its own stakeholders, its own knowledge base, and its own deadlines.

The project-instance model inverts this. One bot per mandate. One workspace per mandate. One allow-list per mandate. The assistant's memory is scoped to the project, not to the operator. When the mandate ends, the instance can be archived or destroyed without touching anything else.

Concretely:

  • The Telegram bot has a project-specific username (e.g. @some_project_assistant_bot) registered separately with BotFather
  • The Docker container has a project-specific name (e.g. some-project-assistant), running from a project-specific directory (/opt/some-project-assistant/)
  • The OAuth session is scoped to this instance — ideally a separate Anthropic account if you run multiple instances, to avoid refresh-token rotation conflicts
  • The workspace at /workspace/CLAUDE.md contains only the briefing for this specific mandate
  • The allow-list contains only the participants of this specific mandate

Two environment variables make the stack template-cloneable: HERMES_PROJECT_NAME (the display name, used in the system prompt and the /help output) and HERMES_INSTANCE_ID (the slug used in directory paths and the Claude session identifier). To clone the stack for a new client, you change two env vars, register a new BotFather bot, run a fresh OAuth login, fill out the workspace template, and the entire codebase remains bit-identical.

Seven Architecture Decisions to Make Before Writing Code

The reason this stack is small and predictable is that we made seven deliberate decisions before writing the first line of code. Each decision has alternatives, and the alternatives matter. If your context is different from ours, picking the other branch on any of these gives you a different (and possibly better) stack. We list the decisions, the alternatives we considered, and the trade-offs that drove our choice.

Decision 1: Memory Granularity

The choice is between a global assistant memory (one Claude session for all chats, the assistant remembers everything across DMs and groups) versus a per-chat memory (each chat has its own isolated session, with strict privacy boundaries between DMs and group conversations).

We picked global. The reasoning: a consulting assistant benefits from being able to connect information across conversations. What was discussed in a DM about a vendor evaluation feeds into the group conversation about that vendor's contract. Per-chat memory would force the operator to repeat context, and the assistant would feel disconnected.

The cost is real: there is no privacy boundary between DMs and groups. Anything mentioned in a DM is potentially recallable in a group response. This is an explicit, documented choice — not a side effect. For a different use case (e.g. an HR bot where personal disclosures must stay private), per-chat memory would be the correct answer.

Implementation: Claude Code CLI's --continue flag with a fixed working directory. The session file lives at ~/.claude/projects/-workspace/sessions/<auto-id>.jsonl, persistent via bind-mount, and resumes on every subsequent claude invocation from the same working directory.

Decision 2: Subscription OAuth vs API Key

You can drive Claude Code CLI two ways: with the operator's Pro/Max subscription (OAuth-based, no per-call billing) or with an Anthropic API key (pay-per-token). The default is subscription. The trap is that several environment variables silently switch to API-key billing if they're set in the parent environment.

According to Anthropic's authentication documentation, the resolution order is: Bedrock/Vertex/Foundry cloud-provider flags first, then ANTHROPIC_AUTH_TOKEN, then ANTHROPIC_API_KEY, then apiKeyHelper, then CLAUDE_CODE_OAUTH_TOKEN, and finally the subscription OAuth from /login. If any of the higher-precedence options is set — even to an empty string in some shells — the CLI will not fall through to subscription auth.

To guarantee OAuth-only operation, set all six environment variables explicitly to empty strings in the compose file's environment: block. This is defensive but cheap.

environment:
  ANTHROPIC_API_KEY: ""
  ANTHROPIC_AUTH_TOKEN: ""
  ANTHROPIC_BASE_URL: ""
  CLAUDE_CODE_USE_BEDROCK: ""
  CLAUDE_CODE_USE_VERTEX: ""
  CLAUDE_CODE_USE_FOUNDRY: ""

The trade-off: subscription auth has a single-use refresh-token rotation that conflicts if the same Anthropic account is used by both the container and the operator's laptop simultaneously. If you parallel-use, you'll get random logouts. For a dedicated assistant, a dedicated Anthropic account is the cleaner answer. For comparison context on different AI tools and their auth models, see our honest comparison of Claude Code, Cursor, and other CLI agents.

Decision 3: One Container or Two

The bot needs the Telegram framework (Python, aiogram). The Claude engine needs Node and the @anthropic-ai/claude-code CLI. You can run them as two containers (e.g. bot in Python container, claude in Node container, IPC between them) or merge them into one.

The two-container approach is structurally cleaner but introduces an IPC problem. The bot needs to invoke claude as a subprocess, which means it needs either Docker socket access to the other container (a privilege escalation risk) or a custom file-based IPC layer (extra latency and code). Neither is appealing.

The single-container approach trades container-purity for operational simplicity. One image, one OAuth session, one set of environment variables, one bind-mount layout. The image is ~700 MB instead of ~120 MB, but disk is rarely the bottleneck.

We picked single container. The Dockerfile installs both the Python stack (slim base + pip) and the Node stack (NodeSource keyring + claude-code) in sequence, exposes a single entrypoint, and the bot calls claude via asyncio.create_subprocess_exec. No IPC, no socket proxy, no inter-container networking.

Decision 4: Workspace Bootstrap

The assistant needs a knowledge base. The choice is whether to seed it with project context (so the operator doesn't have to feed every fact via chat) or start empty (the assistant learns purely from interactions).

We seed. A workspace template at templates/workspace-CLAUDE.md.template contains placeholder sections for: the operator's profile, the participants and their roles, the mandate's background, language conventions, and instructions for how the assistant should maintain notes over time. When a new instance is bootstrapped, the template is copied into data/workspace/CLAUDE.md and the placeholders are filled in.

The assistant then maintains the file itself via the Write and Edit tools. When you correct it ("that's not how this client uses that term"), it can update the workspace file so the correction sticks for future sessions. Combined with the global session memory, this gives the assistant two layers of state: short-term in the Claude session, long-term in workspace files. Both persist across container restarts via bind-mounts.

Decision 5: Group Behavior and Privacy Mode

Telegram bots in groups have a privacy-mode setting: by default, a bot only sees messages directly addressed to it (commands, @mentions, replies). Other group messages are not delivered to the bot at all. You can disable this in BotFather (/setprivacy → Disable), at which point the bot sees every message in every group it's a member of.

For a consulting assistant that should learn from group discussions, disabled is the right setting. But this raises a follow-on question: how does the bot decide which messages to respond to and which to just log?

Our trigger model: in a DM, every message gets a response. In a group, the bot only responds to (a) explicit @mentions of the bot, (b) replies to its own previous messages, or (c) messages that contain the word "Hermes" as a standalone word (case-insensitive, word-boundary matched). Every other message is logged to /workspace/conversations/chat-<id>.md but does not trigger a Claude call.

This means the assistant has read-through to group context (it can look up the log file via the Read tool when it needs background) but doesn't generate noise. The conversation log is also a useful human artifact — the operator can cat it to see the project history.

Decision 6: Response UX

Claude responses can take 5–30 seconds when tool use is involved (web fetches, file reads, multi-step reasoning). The choices are: buffered (wait for the full response, send one message) or streamed (edit a single message progressively as tokens arrive).

Streamed is cooler but more complex: Telegram's per-message edit rate is throttled, but the exact ceiling isn't documented as a single number. The general send rate (Telegram's published limits: about 30 messages per second globally, no more than one per second per chat, 20 per minute per group) gives a rough upper bound. A naive token-by-token edit stream gets throttled quickly. The implementation requires chunk aggregation, partial-message handling from Claude's stream-json output, and graceful degradation when edits get throttled.

We picked buffered. While the response is being generated, the bot sends a typing chat action every 5 seconds so the user sees "is typing" in their Telegram client. When the response is ready, it goes out as one or more messages. Responses longer than 4,000 characters are auto-split at the last paragraph boundary before the limit, with a 0.3-second pause between messages to stay within Telegram's send rate.

Decision 7: Strict Allow-List as the Outer Boundary

A Telegram bot is reachable by anyone who finds its username. If you don't filter, random users will discover the bot and try commands. For a consulting assistant, this is unacceptable: the assistant has the operator's OAuth credentials, has access to the workspace, can make tool calls. You don't want strangers in this loop.

The allow-list is implemented as an aiogram outer-middleware on the update level — the highest interception point, before filter resolution and handler lookup. The check is on event_from_user.id (the numeric Telegram user ID, which is stable per user even when they change their username). Allow-list members are configured via a CSV environment variable (HERMES_ALLOWED_USERS). If a sender is not in the set, the middleware returns None without invoking the handler: no log entry beyond a debug-level drop event, no conversation log write, no Claude call, no response.

This is also the right place for the operator-must-be-allowed validation: the settings loader (using pydantic-settings) verifies that HERMES_OPERATOR_ID is contained in HERMES_ALLOWED_USERS at startup. Mis-configurations crash the container immediately rather than silently locking the operator out.

The Container Stack

With the seven decisions made, the stack falls out naturally. Here is the full docker-compose.yml:

services:
  hermes:
    build: ./hermes
    container_name: project-assistant
    restart: unless-stopped
    init: true
    stop_grace_period: 15s
    security_opt:
      - no-new-privileges:true
    cap_drop:
      - ALL
    pids_limit: 512
    env_file:
      - .env
    environment:
      ANTHROPIC_API_KEY: ""
      ANTHROPIC_AUTH_TOKEN: ""
      ANTHROPIC_BASE_URL: ""
      CLAUDE_CODE_USE_BEDROCK: ""
      CLAUDE_CODE_USE_VERTEX: ""
      CLAUDE_CODE_USE_FOUNDRY: ""
      DISABLE_AUTOUPDATER: "1"
      PYTHONDONTWRITEBYTECODE: "1"
      PYTHONUNBUFFERED: "1"
      TERM: xterm-256color
    volumes:
      - ./data/claude:/home/hermes/.claude
      - ./data/claude.json:/home/hermes/.claude.json
      - ./data/bot:/data
      - ./data/workspace:/workspace
    logging:
      driver: json-file
      options:
        max-size: "10m"
        max-file: "3"

Key choices that aren't obvious from reading:

  • init: true — runs tini as PID 1, so the Python process receives SIGTERM cleanly. Without this, docker compose stop waits 10 seconds and then SIGKILLs, leaving the bot session unclosed.
  • stop_grace_period: 15s — slightly longer than aiogram's default polling_timeout of 10 seconds. This gives the shutdown hook time to close the Telegram session cleanly, which prevents TelegramConflictError: terminated by other getUpdates request when the container restarts faster than Telegram releases the previous long-poll connection.
  • cap_drop: ALL and no-new-privileges:true — the container needs no Linux capabilities, no privilege escalation. Both are tightened by default.
  • No read_only: true — Claude Code writes to ~/.claude/, ~/.npm/, and occasionally /tmp/ for self-updates. Read-only root would require extensive tmpfs mounts to compensate. Not worth it for the security gain.
  • The two file mounts — ./data/claude.json:/home/hermes/.claude.json mounts a specific file (not a directory). This file must exist on the host before compose up, initialised with {}, or Claude Code throws a JSON parse error on startup.

The Dockerfile combines the two runtimes:

FROM python:3.13-slim-trixie

ENV PYTHONUNBUFFERED=1 \
    PYTHONDONTWRITEBYTECODE=1 \
    DEBIAN_FRONTEND=noninteractive

WORKDIR /app

RUN apt-get update && apt-get install -y --no-install-recommends \
        ca-certificates curl file git gnupg locales poppler-utils tmux \
    && locale-gen en_US.UTF-8 \
    && apt-get clean && rm -rf /var/lib/apt/lists/*

RUN mkdir -p /etc/apt/keyrings \
    && curl -fsSL https://deb.nodesource.com/gpgkey/nodesource-repo.gpg.key \
        | gpg --dearmor -o /etc/apt/keyrings/nodesource.gpg \
    && echo "deb [signed-by=/etc/apt/keyrings/nodesource.gpg] https://deb.nodesource.com/node_20.x nodistro main" \
        > /etc/apt/sources.list.d/nodesource.list \
    && apt-get update && apt-get install -y --no-install-recommends nodejs \
    && apt-get clean && rm -rf /var/lib/apt/lists/*

RUN npm install -g @anthropic-ai/claude-code

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

RUN userdel -r ubuntu 2>/dev/null || true \
    && useradd -m -u 1000 -s /bin/bash hermes \
    && mkdir -p /data /workspace \
    && chown -R hermes:hermes /data /workspace /app

COPY --chown=hermes:hermes src/ ./src/

USER hermes
WORKDIR /app

CMD ["python", "-u", "-m", "src.main"]

The pinned versions you should be aware of:

  • python:3.13-slim-trixie — Debian 13 base, current stable
  • aiogram>=3.28,<4.0 — the bot framework (we lock to 3.28.x at time of writing, current latest is 3.28.2)
  • pydantic-settings>=2.14,<3.0 — settings loader
  • structlog>=25.4,<26.0 — JSON logging
  • @anthropic-ai/[email protected] — installed via npm global from NodeSource Node 20
  • poppler-utils — for pdftotext as a Read-tool fallback when Claude's PDF parsing doesn't fit the file

Two things to watch out for in this Dockerfile:

  1. The userdel -r ubuntu line. The Python slim base on Debian 13 doesn't ship a default UID-1000 user, but if you switch to a base image that does (some Ubuntu derivatives), the useradd -u 1000 will fail. Always remove the existing UID-1000 user first.
  2. The NodeSource keyring pattern. We avoid the curl | bash install script; it's deprecated and non-reproducible. The keyring + nodistro codename approach is reproducible and audit-friendly.

For a deeper view of how container hardening fits into a multi-service production stack, see our walkthrough of running over a hundred Docker containers in production.

The Python Stack: Five Modules That Do the Work

The bot code is split into focused modules. None of them is large; the largest is around 200 lines.

  • settings.py — pydantic-settings BaseSettings with a custom BeforeValidator for the comma-separated allow-list. The validator handles the edge case where pydantic-settings JSON-decodes a bare integer (a single-element allow-list like HERMES_ALLOWED_USERS=12345 becomes int, not list[int]) and converts it to a single-element list.
  • middleware.py — AllowListMiddleware as dp.update.outer_middleware. The check on data.get("event_from_user") uses aiogram's built-in user-context extraction.
  • trigger.py — is_trigger(message, bot_id, bot_username). Returns (True, "DM" | "@mention" | "text_mention" | "reply" | "keyword") or (False, None). The keyword match uses a word-boundary regex (\bhermes\b) so substrings don't trigger.
  • conversation_log.py — append-only Markdown logs to /workspace/conversations/chat-<id>.md. Both inbound and outbound messages are logged.
  • file_intake.py — handles eight Telegram attachment types (document, photo, video, audio, voice, animation, sticker, video_note). Downloads to /workspace/incoming/chat-<id>/<timestamp>-<name> with a 20 MB hard limit (Telegram's bot-API download cap).
  • hermes_engine.py — wraps the Claude Code CLI subprocess. Uses asyncio.create_subprocess_exec with cwd=/workspace and --continue (after the first call) to maintain the global session.
  • response_pipeline.py — combines typing-indicator refresh, auto-split, and inter-message delay.
  • handlers.py — three command handlers (/ping, /status, /help) and a default-message handler that runs the trigger check and dispatches to the engine.

The Claude CLI invocation is the central piece. Here is the full subprocess call:

cmd = [
    "claude",
    "--print",
    "--add-dir", "/workspace",
    "--dangerously-skip-permissions",
    "--append-system-prompt", _build_system_prompt(),
]
if SESSION_MARKER.exists():
    cmd.append("--continue")
cmd.append(_build_user_prompt(text, ctx))

proc = await asyncio.create_subprocess_exec(
    *cmd,
    stdout=asyncio.subprocess.PIPE,
    stderr=asyncio.subprocess.PIPE,
    cwd="/workspace",
)
stdout, stderr = await asyncio.wait_for(proc.communicate(), timeout=300)
SESSION_MARKER.touch(exist_ok=True)
return stdout.decode("utf-8").strip()

The system prompt sets the assistant's persona, the validation discipline (don't claim, validate via tools first), the learning loop (write new facts to /workspace/CLAUDE.md or notes/), the Telegram output format (plain text, no Markdown — Telegram's default mode doesn't render Markdown reliably), and a note about the conversation logs being available for read-on-demand.

The user prompt wraps the actual message with context metadata: chat source (DM with X, or group Y with members A, B, C), sender identity, timestamp, trigger reason, and the file path of the conversation log for this chat. This lets the assistant decide whether to look up group context before responding.

Step-by-Step: Bootstrapping a New Instance

Assuming the codebase is in a Git repo and you have a server with Docker installed, here is the full bootstrap. Substitute your own values for <instance-id> and <project-name>.

Step 1: Register the bot with BotFather.

  • In Telegram, message @BotFather
  • /newbot → set name and username (e.g. some_project_assistant_bot)
  • Save the token BotFather returns
  • /setprivacy → select your bot → Disable (so the bot sees all group messages, not just commands and mentions)
  • Optional: /setcommands with ping, status, help

Step 2: Clone the repo and prepare directories on the server.

sudo mkdir -p /opt/<instance-id>
sudo chown $USER /opt/<instance-id>
git clone <repo-url> /opt/<instance-id>
cd /opt/<instance-id>

mkdir -p data/claude data/bot data/workspace/{notes,conversations,incoming}
touch data/claude.json
sudo chown -R 1000:1000 data/

The chown step is critical and easy to miss. Docker creates missing bind-mount source directories as root-owned; if you skip the chown, the container user (UID 1000) cannot write to them and the OAuth login fails silently. Verify with stat -c "%a %u:%g" data/claude — you want 1000:1000, not 0:0.

Step 3: Initialise the workspace.

cp templates/workspace-CLAUDE.md.template data/workspace/CLAUDE.md
# Open the file and replace placeholders:
# {{HERMES_PROJECT_NAME}}, {{OPERATOR_NAME}}, {{CLIENT_LEGAL_ENTITY}}, etc.

Step 4: Configure secrets.

cp .env.example .env
chmod 600 .env
sudo chown 1000:1000 .env
# Edit .env: TELEGRAM_BOT_TOKEN, HERMES_ALLOWED_USERS,
# HERMES_OPERATOR_ID, HERMES_PROJECT_NAME, HERMES_INSTANCE_ID

Step 5: Pre-condition — send /start to the bot.

Telegram does not let a bot send a DM to a user until the user has initiated the chat at least once. Before the first container start, the operator must open the bot in Telegram and send /start. The bot won't respond (no handler is registered for /start), but Telegram opens the DM channel internally. Without this step, the first onboarding DM throws TelegramForbiddenError.

Step 6: Build and start the container.

docker compose build hermes
docker compose up -d hermes
sleep 5
docker compose logs --tail=20 hermes

You should see a JSON log sequence: startup (with project name and allowed_users count), polling_initialized (with drop_pending=true), onboarding_sent (the first DM to the operator), then aiogram's polling-started events.

Step 7: OAuth login.

This is a one-time, interactive step. In a terminal wide enough to display the OAuth URL on one line (250+ characters — otherwise the URL wraps and Cloudflare rejects the auth with Unknown scope: us):

docker exec -it project-assistant tmux new-session -s claude
# Inside the container:
claude
# Inside the claude REPL:
/login
# Choose: "Claude Pro or Max subscription"
# Copy the displayed URL, open it in a browser, log in, paste the
# authorization code back into the terminal.
/status   # should show "Subscription", not "API key"
/quit
# Detach tmux with Ctrl-b d, then exit the docker exec.

Step 8: Smoke test in Telegram.

  • DM the bot: /ping → pong
  • DM the bot: /status → diagnostic output (allow-list count, session messages, log counts)
  • DM the bot: a natural question — the assistant should respond using the Claude engine
  • Send a small PDF or image with a caption — the assistant should reference its contents
  • Send two messages in a row, where the second references the first — multi-turn memory should hold

Step 9: Add additional participants (when ready).

  • Each participant retrieves their Telegram user ID from @userinfobot
  • Update .env: HERMES_ALLOWED_USERS=<operator_id>,<member1_id>,<member2_id>
  • docker compose restart hermes
  • Create a Telegram group, invite all members and the bot, test with /ping@some_project_assistant_bot

Operational Notes and Known Risks

The stack has been running stably, but several issues are worth being aware of.

Cloudflare WAF on OAuth refresh. Anthropic's auth endpoint sits behind Cloudflare. There is a known issue (open in the anthropics/claude-code repo) where Cloudflare classifies certain server IPs as headless Linux and blocks OAuth refresh permanently. Reporters have seen lockouts lasting weeks. The recovery path is to re-authenticate from a different IP (residential, VPN). The mitigation is to avoid custom user-agents and aggressive retry loops, and to consider claude setup-token (a one-year token generated from an authenticated machine, plugged into the container via CLAUDE_CODE_OAUTH_TOKEN) as a fallback if you operate in a high-risk IP range.

Refresh-token single-use rotation. Anthropic's OAuth refresh tokens are single-use. If two clients (e.g. your laptop and your container) share the same account and both refresh in parallel, the first refresh invalidates the other side. The practical advice: a dedicated Anthropic account per assistant instance. If you can't, accept the occasional re-login and don't run Claude Code on your laptop and in the assistant simultaneously.

The empty claude.json trap. If data/claude.json is a zero-byte file at first container start, Claude Code throws Configuration Error: invalid JSON, Unexpected EOF. Initialise it with echo "{}" > data/claude.json, not touch. The error is recoverable in the REPL ("Reset with default configuration"), but better to avoid the friction.

The OAuth URL line-wrap bug (our observation). The OAuth URL is roughly 530 characters. In a terminal of normal width, it wraps over multiple lines. When you copy the wrapped output, the line breaks come along, and after URL-encoding, the scope parameter looks like user:inference us\ner:profile. Cloudflare then sees us as a scope and rejects with Invalid OAuth Request — Unknown scope: us. This isn't tracked as an upstream issue at the time of writing — we hit it ourselves. Widening the terminal to 250+ characters before launching claude avoids it, or you can manually de-wrap the URL in your browser's address bar after pasting.

The Ubuntu UID 1000 conflict. Some modern Ubuntu base images — notably ubuntu:24.04 (Noble) after the Canonical OCI rebase — ship with a default ubuntu user at UID 1000. If you switch the Dockerfile base from python:3.13-slim-trixie (which doesn't ship a default UID-1000 user) to one that does, useradd -u 1000 hermes fails with UID 1000 is not unique. The Dockerfile includes a defensive userdel -r ubuntu 2>/dev/null || true before useradd for this reason.

The conversation log can grow. The append-only logs in /workspace/conversations/ grow with every message. Over months of active use, individual files can reach megabytes. There's no built-in rotation. If you care, add a cron-style job to archive logs older than N days, or split per-month.

For broader operational concerns around self-hosted services and what happens when they fail, see our writeup on disaster recovery for self-hosted services.

What Is Deliberately Out of Scope

The temptation when building an internal AI tool is to add features. We keep this stack small. The following are explicitly out of scope for the version described here, and we've made an active choice not to add them yet:

  • RAG / vector database. The assistant's knowledge is in Markdown files (CLAUDE.md and notes/) and conversation logs. Read-tool calls handle retrieval. This is enough for a single-mandate scope. Once the workspace passes a certain size, a real RAG layer (PostgreSQL + pgvector, for example) becomes necessary, but until then it's overkill.
  • Audio transcription. Voice notes are downloaded and metadata is logged, but the assistant can't yet transcribe them. Adding Whisper or a similar pipeline is a half-day of work, deferred until needed.
  • Health check endpoint. The container has no HTTP server; there's nothing to scrape. Docker's restart-policy plus log monitoring covers most failure modes.
  • Streaming responses. See Decision 6 above.
  • Multi-tenant on a single container. Each mandate gets its own container. This is intentional — see the project-instance model above.

The deferred features are not bugs. They're choices, and they reduce the surface area for the things that do exist. For context on the discipline of letting AI tools stay narrow, see building AI agent skills for domain-specific workflows and the Claude Skills overview.

Cloning for the Next Mandate

The two-env-var design pays off here. To stand up a new instance:

  1. Clone the repo to /opt/<new-instance-id>
  2. Change HERMES_PROJECT_NAME and HERMES_INSTANCE_ID in .env
  3. Register a new bot with BotFather (one-time)
  4. Fill in the workspace template with the new mandate's context (one-time)
  5. OAuth login from inside the new container (one-time, ideally on a separate Anthropic account)
  6. docker compose up -d

The code is bit-identical across instances. The only things that vary are the two env vars, the bot credentials, the workspace content, and the OAuth session.

You can run multiple instances on the same server. Each has its own directory, its own container name, its own bind-mount tree. Disk usage is roughly 700 MB image (shared across instances thanks to Docker's layer cache) plus per-instance workspace growth (typically tens of MB after months).

Where This Fits in the Toolchain

A project-specific AI assistant is not a replacement for general-purpose AI coding tools. We still use Claude Code, Cursor, and Gemini CLI directly for development work. The assistant is for the consulting context: project memory, document analysis, status updates, ad-hoc research within a defined mandate. It runs in parallel to the rest of the AI tool stack, not in place of it.

The pattern is also not a replacement for chat-based AI products like Claude.ai or ChatGPT. Those are the right answer for one-off tasks, personal questions, and general knowledge work. A project-specific assistant is the right answer when the project has its own boundary, its own participants, and its own evolving knowledge base that you don't want to dump into a generic chatbot every time you need to reference it.

If you're considering building something similar for your consulting practice, agency, or internal team, the stack above is a reasonable starting point. The Dockerfile, compose file, and module structure are reproducible from this guide. The decisions are documented. If your context differs from ours on any of the seven decisions, branch and adapt — the code is short enough that rewriting one module is straightforward. Get in touch if you want to compare notes or have us help with a specific implementation.


Related Insights

Further Reading