ปรับจูน Hermes-Style Agent ให้เติบโตไปพร้อมกับโปรเจกต์

AI assistant สำหรับโปรเจกต์เฉพาะที่ทำงานผ่านช่องทางข้อความ — Telegram, Discord, Slack หรืออีเมลผ่าน gateway บาง — มีพฤติกรรมต่างออกไปเมื่อใช้ใน group chat หลายคน เทียบกับ DM แบบตัวต่อตัว การนำทีมเข้ามาใช้ bot เผยให้เห็น bug ที่การทดสอบผ่าน DM ไม่เคยสัมผัส ทั้ง empty-reply crash จากการตรวจจับ keyword, การแจ้งความคืบหน้าที่ผิดจังหวะ, race condition ระหว่าง trigger คู่ขนาน และ memory drift ใน conversation log ของ agent

คู่มือนี้รวบรวม 8 pattern การปรับแต่งที่ใช้กับ hermes-style assistant ขั้นต่ำ — ที่สร้างบน Claude Code CLI เป็น subprocess แทนที่จะใช้ NousResearch framework เต็มรูปแบบ แต่ละ pattern คือปัญหาที่จับต้องได้พร้อม code สำหรับแก้ไข ทั้งหมดนี้ใช้ได้กับทั้งสอง codebase

สิ่งที่จะได้รับการแก้ไข

reply Unexpected error แบบ generic เมื่อ LLM คืน stdout ว่างจาก keyword trigger
ข้อความให้กำลังใจที่ fire ทุก reply สั้นๆ เพราะ threshold ต่ำกว่า response latency ทั่วไป
subprocess ที่หยุดค้างกลางสตรีมและไม่มี timeout มาตัดทิ้ง
race condition เมื่อสมาชิก group สองคน trigger agent พร้อมกันบน session เดียวกัน
stderr-pipe deadlock แบบเงียบๆ เมื่อ agent ทำงานนานพอจนบัฟเฟอร์ 64 KB เต็ม
ข้อความ bug-fallback ใน conversation log ที่ agent อ่านกลับมาราวกับเป็นพฤติกรรมปกติ

สิ่งที่ต้องเตรียม

คู่มือนี้ต้องการ architecture จากการสร้าง AI Assistant สำหรับโปรเจกต์ผ่าน Telegram: หนึ่ง Docker container ต่อโปรเจกต์, ผู้ใช้ที่ไม่ใช่ root, Claude Code CLI ที่ authenticate ด้วย OAuth แบบถาวร, aiogram bot wrapper, bind-mounted workspace และ session-state volumes pattern บางส่วนต้องการเวอร์ชันที่เฉพาะเจาะจง:

Claude Code CLI 2.1.139 ขึ้นไป (สำหรับ --output-format stream-json --verbose --include-partial-messages)
aiogram 3.28.2 ขึ้นไป (สำหรับ ChatActionSender, message.react(), ReactionTypeEmoji)
Python 3.13 base image
กลุ่ม Telegram ที่ bot เป็นสมาชิก พร้อม can_react_to_messages: true

สำหรับช่องทางอีเมลในโปรเจกต์เดียวกัน ดูการตั้ง mailbox สำหรับโปรเจกต์พร้อม DKIM, SPF และ DMARC

Hermes-Style Agent คืออะไร

pattern แบบ hermes-style ได้ชื่อมาจากฟ้าเวิร์กโอเพนซอร์สของ NousResearch สามคุณสมบัติที่แยกมันออกจาก chatbot แบบ stateless:

หน่วยความจำถาวร workspace บน disk ที่ agent อ่านและเขียนระหว่าง turn ทำให้ context อยู่รอดแม้ container จะ restart
หลายช่องทาง agent instance เดียวกันคุยได้บน Telegram, Discord, Slack หรืออีเมลผ่าน gateway บาง
วงจรการเรียนรู้ปิด การแก้ไขของ operator กลายเป็นการแก้ไข workspace ที่ agent อ่านใน turn ถัดไป

NousResearch มี reference implementation เต็มรูปแบบพร้อม TUI, multi-channel gateway, ระบบ skills และ RL training hooks การสร้าง variant ขั้นต่ำบน subprocess ของ Claude Code CLI ทำให้ส่วนที่เคลื่อนไหวน้อยพอที่จะทำเป็น template ต่อแต่ละ consulting mandate pattern ด้านล่างใช้ได้เท่ากันทั้งสองแนวทาง

Pattern 1: การจัดการ Empty-Reply แบบมี Type

trigger แบบ keyword (จับคู่ \bhermes\b ใน group message) อาจ fire กับประโยคที่มีชื่อ bot แต่ไม่ได้ address ถึงมัน LLM คืน output ว่างอย่างถูกต้อง แต่สาม layer ถัดลงมาต่างล้มเหลวในการจัดการกรณีว่างนี้:

engine คืน "" พร้อม returncode 0
ฟังก์ชัน split คืน [""] เพราะ len("") <= max_chars ตรงเงื่อนไข
send loop เรียก bot.send_message(chat_id, ""); Telegram ตอบกลับ Bad Request: message text is empty; except Exception แบบ generic ที่ด้านบนของ handler กลืน traceback ไว้และส่ง error ที่ user เห็นแทน

การกรอง empty string ใน layer เดียวป้องกัน crash ได้ แต่ให้การ skip แบบเงียบ — trigger fire แล้ว bot ใช้ compute ไป แต่ user ไม่เห็นอะไร การแก้ไขสองขั้นตอนใช้ typed exception สำหรับ output ว่าง และ reaction ใน Telegram (👀) บน message ที่ trigger เป็นการยืนยัน:

class HermesEmptyResponse(HermesError):
    """Subprocess returned successfully but with empty result."""

class HermesHangError(HermesError):
    """Watchdog killed subprocess after no stream-event for N seconds."""

engine raise HermesEmptyResponse เมื่อ result.strip() == "" handler จับมันและเรียก message.react([ReactionTypeEmoji(emoji="👀")]) conversation log ได้ marker block — entry แยกต่างหากที่บันทึกการยืนยันแบบเงียบโดยไม่ปนเปื้อน chat ด้วยข้อความ — เพื่อให้การอ่านหน่วยความจำของ agent ในอนาคตเห็นว่า trigger fire แล้วและไม่ได้ตอบโดยตั้งใจ

Pattern 2: การตรวจสอบสิทธิ์ Reaction ล่วงหน้าด้วย Lazy Cache

setMessageReaction ของ Telegram ไม่ได้ใช้ได้ทุกที่ กลุ่มบางกลุ่มจำกัด reaction set ที่อนุญาต, emoji บางตัวต้องให้ administrator อนุมัติก่อน ประเภท ChatFullInfo กำหนดกฎ: ถ้า available_reactions ไม่มี แสดงว่า emoji มาตรฐานทั้งหมดใช้ได้; ถ้าเป็น array จะใช้ได้เฉพาะ emoji เหล่านั้น bot ต้องเป็นสมาชิกของกลุ่ม — ไม่จำเป็นต้องเป็น administrator สำหรับ reaction ใน group

การตรวจสอบทุก trigger เปลือง API call หนึ่ง getChat ต่อ chat พร้อม cache หนึ่งชั่วโมงเพียงพอ:

_reaction_cache: dict[int, tuple[bool, float]] = {}
_REACTION_CACHE_TTL_SEC = 3600
MINI_ACK_EMOJI = "👀"

async def _reactions_allowed(bot: Bot, chat_id: int) -> bool:
    now = time.monotonic()
    cached = _reaction_cache.get(chat_id)
    if cached and cached[1] > now:
        return cached[0]
    try:
        chat = await bot.get_chat(chat_id)
        allowed = (
            chat.available_reactions is None
            or any(
                isinstance(r, ReactionTypeEmoji) and r.emoji == MINI_ACK_EMOJI
                for r in (chat.available_reactions or [])
            )
        )
    except Exception:
        allowed = False
    _reaction_cache[chat_id] = (allowed, now + _REACTION_CACHE_TTL_SEC)
    return allowed

ห่อ reaction call จริงด้วย try/except (TelegramBadRequest, TelegramForbiddenError) ไว้เสมอ — cache ล่าช้ากว่าการเปลี่ยนสิทธิ์

Pattern 3: Stream Mode และ Idle-Time Watchdog

timeout แบบ hard บน subprocess ทั้งก้อน (asyncio.wait_for(proc.communicate(), timeout=300)) จำกัดระยะเวลาทั้งหมดโดยไม่คำนึงถึงความคืบหน้า การลบออกโดยไม่มีอะไรมาแทนถือว่าไม่ปลอดภัย: Claude Code stream-idle-hang issue อธิบาย API call ที่หยุดค้างกลางสตรีมและไม่เคยกลับมา ทิ้ง subprocess ไว้ค้างอยู่

การเปลี่ยนไปใช้ --output-format stream-json --verbose --include-partial-messages ปล่อย event ทุก milestone — text_delta ต่อ token, tool-use เริ่มและจบ, API retry, การแจ้ง rate-limit และ event result สุดท้าย การหยุดค้างจริงๆ ผลิตความเงียบในสตรีม; งานยาวๆ ผลิต sequence ของ event เล็กๆ watchdog ตัดเมื่อ idle ไม่ใช่เมื่อถึงเวลาทั้งหมด:

WATCHDOG_NO_EVENT_SEC = 60

async def watchdog() -> None:
    while True:
        await asyncio.sleep(5)
        if proc.returncode is not None:
            return
        idle_sec = time.monotonic() - state["last_event_ts"]
        if idle_sec > WATCHDOG_NO_EVENT_SEC:
            state["killed_by_watchdog"] = True
            try:
                proc.kill()
            except ProcessLookupError:
                pass
            return

ข้อความ response สุดท้ายมาจากฟิลด์ result ของ event result — ชัดเจน, single-source และไม่ได้รับผลกระทบจาก partial-stream parsing event เดียวกันยังมี is_error, api_error_status, duration_ms และ total_cost_usd ซึ่งทั้งหมดนี้ไปยัง structured log line

Pattern 4: การปรับเทียบ Reassure Schedule

คำถามเรื่อง threshold — เมื่อไหร่ bot ส่ง text update ระหว่าง call ที่ใช้เวลานาน — เป็นเรื่องเชิงประสบการณ์ คำตอบที่ถูกต้องขึ้นอยู่กับการกระจาย latency ของ trigger จริง สาม threshold พร้อมข้อความที่สอดคล้องกับสิ่งที่ user ต้องการรู้จริงๆ:

_REASSURE_SCHEDULE = (
    (15, "On it."),
    (90, "Taking longer than usual, still on it."),
    (300, "Genuinely large task — almost there."),
)

threshold เหล่านี้มาจากสองเงื่อนไข ขอบล่างกำหนดจาก latency ของ reply สั้นทั่วไป: ถ้า reply ส่วนใหญ่มาภายใน X วินาที reassurance แรกต้องเริ่มหลัง X ไม่เช่นนั้นจะมาถึงเวลาเดียวกับคำตอบ งานวิจัยเรื่อง response time ของ Nielsen ระบุ 10 วินาทีเป็นขีดจำกัดที่รับรู้ได้สำหรับการรักษาความสนใจของ user โดยไม่มีตัวบ่งชี้ความคืบหน้า; typing indicator ที่ ChatActionSender ของ aiogram แสดงในช่วงต่ำกว่านั้นตอบโจทย์ได้จนถึงประมาณ 15 วินาที

threshold บน (90 วินาที) คือช่วงที่กรอบเปลี่ยนจาก กำลังทำ ไปเป็น กำลังทำแต่นานกว่าปกติ — สัญญาณแยกว่า call อยู่ในหางยาวของการกระจาย การเลือกคำหลีกเลี่ยงการสื่อว่า user ถามอะไรที่หนักเกินไป bot คือตัวที่ทำงาน; ข้อความยืนยันงาน ไม่ใช่คำขอ

Pattern 5: Per-Chat Concurrency Lock

สมาชิก group สองคนสามารถ trigger agent ภายในวินาทีเดียวกัน — คนหนึ่งด้วย @-mention, อีกคนด้วย keyword การเรียก handler ทั้งสองก็ spawn subprocess claude --continue บน session file ที่แชร์กัน lock-file ของ session ไม่เข้มงวด; การเขียนพร้อมกันผลิต session-jsonl file ที่ถูกตัดออกและสูญเสีย turn

ทำให้เป็น serial ต่อ chat ที่ handler layer ด้วย lock ที่สร้างเมื่อจำเป็น:

_chat_locks: dict[int, asyncio.Lock] = {}

def _get_chat_lock(chat_id: int) -> asyncio.Lock:
    lock = _chat_locks.get(chat_id)
    if lock is None:
        lock = asyncio.Lock()
        _chat_locks[chat_id] = lock
    return lock

async with _get_chat_lock(message.chat.id):
    response = await _run_hermes_with_ux(bot, message, prompt, ctx)
    ...

การสร้างแบบ lazy มีความสำคัญ: asyncio.Lock ที่สร้างตอน import module จะผูกกับ event loop ที่ active ตอน import ซึ่งอาจไม่ใช่ loop ที่ handler ทำงานอยู่หลัง restart การเลื่อนการสร้างไปจนถึง call แรกภายใน loop ที่ active หลีกเลี่ยงความขัดแย้งนี้ สำหรับกลุ่มเล็กๆ lock dictionary ยังคงเล็ก; สำหรับ fleet ขนาดใหญ่ขึ้น เพิ่ม LRU eviction

Pattern 6: Exception Hierarchy และลำดับ Except

exception class ของ engine ก่อตัวเป็นต้นไม้:

HermesError(RuntimeError) — ทุกอย่างที่ผิดพลาดกับ subprocess
HermesEmptyResponse(HermesError) — run สำเร็จแต่ได้ผลว่าง
HermesHangError(HermesError) — watchdog ตัดทิ้ง

except ของ Python จับ clause แรกที่ตรงกัน ถ้า except HermesError อยู่ก่อน subclass handler มันจะจับ HermesEmptyResponse และส่งไปยัง error path แทน mini-ack ลำดับ subclass-first จึงจำเป็น:

try:
    response = await _run_hermes_with_ux(bot, message, prompt, ctx)
    ...
except HermesEmptyResponse:
    # mini-ack path
    ...
except HermesHangError as exc:
    # retry-once-then-bail path
    ...
except HermesError as exc:
    # exit-not-zero, api-error, etc.
    ...
except Exception:
    # last resort
    ...

เพิ่มสิ่งนี้ใน code review checklist การเรียงลำดับ block ใหม่โดยดูภาพรวมจะกลับเจตนา

Pattern 7: การ Drain stderr แบบขนาน

การ stream ผ่าน stdout ต้องการการอ่าน line ขณะที่มาถึง: async for line in proc.stdout ถ้า stderr ถูก pipe ด้วย subprocess อาจเต็ม stderr buffer ขณะที่ stdout ยังถูกอ่านอยู่ pipe buffer เริ่มต้นบน Linux ประมาณ 64 KB เมื่อ stderr เต็ม subprocess บล็อกรอให้มันระบาย และ async-for-line loop ไม่ก้าวหน้าอีก watchdog ตัดทิ้งหลัง idle period แต่ผลลัพธ์สูญหาย

Drain stderr แบบขนานตั้งแต่เริ่ม subprocess แล้ว await drain task หลัง proc.wait():

proc = await asyncio.create_subprocess_exec(
    *cmd,
    stdout=asyncio.subprocess.PIPE,
    stderr=asyncio.subprocess.PIPE,
    cwd=str(WORKSPACE),
)
stderr_task = asyncio.create_task(proc.stderr.read())

# ... stream-loop on stdout ...

rc = await proc.wait()
try:
    stderr_b = await stderr_task
except Exception:
    stderr_b = b""
stderr = stderr_b.decode("utf-8", errors="replace").strip()

Claude Code CLI ปล่อย stderr น้อยมากใน stream-json mode ดังนั้น failure mode นี้เกิดขึ้นน้อยในทางปฏิบัติ การแก้ไขคือบรรทัดพิเศษหนึ่งบรรทัด

Pattern 8: วินัยการแก้ไข Memory

agent แบบ hermes-style อ่าน conversation log ของตัวเองเป็น memory ข้อความ bug-fallback ที่เขียนลงใน log นั้นแยกไม่ออกจากพฤติกรรมอดีตที่ตั้งใจเมื่ออ่านครั้งถัดไป สัญชาตญาณแรกคือแทรก correction marker ([CORRECTION: the previous entry was a bug]) เพื่อให้การอ่าน memory ครั้งถัดไปเห็นการแก้ไข

ตรวจสอบก่อนว่า bug-fallback ถูก log ไว้จริงก่อนแก้ไข ในกรณีข้างต้น except Exception แบบ generic เรียก message.answer(...) เพื่อส่ง error ไปยัง user แต่ไม่ได้เรียก conversation_log.log_outgoing(...) ข้อความ error ถึง Telegram แต่ไม่เคยถึง memory file ของ agent ไม่จำเป็นต้องแก้ไขย้อนหลัง

ถือว่า workspace ของ agent เป็นของ agent ก่อนแผนใดๆ ที่เกี่ยวกับการแก้ไขไฟล์ภายใน ให้ snapshot สถานะล่าสุด — agent อาจเขียน CLAUDE.md หรือ notes ของตัวเองใหม่ตั้งแต่อ่านครั้งล่าสุด คู่มือ context engineering ของ Anthropic อธิบาย persistent memory ว่าเป็น artifact ระหว่าง session ไม่ใช่สมุดจดที่ operator เขียนลงไป Skills เฉพาะ domain ทนทานกว่าเมื่ออยู่ควบคู่กับ notes ที่ agent ดูแลแทนที่จะอยู่ในไฟล์ที่ operator แก้ไขจนกระทั่ง agent เริ่มไม่ไว้ใจ

หมายเหตุปฏิบัติการ

Bind-mount persistence. Bind-mounted volumes สำหรับ workspace และ Claude OAuth credentials อยู่รอดตลอด docker compose up -d --force-recreate ตราบที่ mount path ไม่เปลี่ยน ตรวจสอบก่อนแก้ไข compose file ใดๆ

การตรวจสอบความปลอดภัยก่อน deploy. Grep log ห้านาทีล่าสุดเพื่อหา claude_subprocess_start ที่ไม่มี claude_result_event ตรงกัน subprocess ที่รอดำเนินการหมายความว่า restart จะตัด run ที่กำลังทำงานอยู่ รอจนกว่า log จะสะอาด สำหรับสถานการณ์ความล้มเหลวในวงกว้างกว่า ดูบทความ disaster recovery ของเรา

ความสามารถนำ pattern ไปใช้ซ้ำข้าม mandate. stack ทั้งหมด — engine, handlers, conversation log, file intake — clone ไปยัง mandate ใหม่ได้โดยเปลี่ยนตัวแปรสภาพแวดล้อมสองตัว (project-name และ instance-id) bot token, OAuth credentials, workspace และ allow-list กำหนด parameter ต่อโปรเจกต์ สำหรับมุมมองการปฏิบัติการในการรัน assistant ต่อโปรเจกต์หลายตัวพร้อมกัน ดูsolo operations at scale

การเลือก reaction emoji. emoji 👀 อยู่ใน Telegram standard set เริ่มต้นและทำงานในกลุ่มที่ available_reactions ไม่ได้กำหนด ถ้ากลุ่มจำกัดเป็น subset กำหนดเอง cache จะสะท้อนสิ่งนั้นและ mini-ack จะข้ามไปเงียบๆ กำหนด emoji เป็น configuration constant ต่อ deployment แทนที่จะ hardcode

Hermes-Agent เทียบกับ custom build ขั้นต่ำ. framework ของ NousResearch รวม TUI, ระบบ slash-command, multi-channel gateway, skills hub และ RL training integration Claude Code CLI wrapper ขั้นต่ำผลิต conversational shape เดียวกันด้วยชิ้นส่วนที่เคลื่อนไหวประมาณหนึ่งในสิบ ทั้งสองมาบรรจบที่ปัญหา UX ใน group chat ชุดเดียวกัน; pattern ในบทความนี้ใช้ได้กับทั้งคู่

เมื่อไหรควรใช้แต่ละ Pattern

pattern เหล่านี้ไม่ได้เร่งด่วนเท่ากัน ใช้ตามลำดับที่พบ:

Pattern 1 (empty-reply handling) จำเป็นทันทีที่เพิ่ม bot เข้า group ที่มี keyword-trigger detection
Pattern 4 (reassure schedule) จำเป็นหลังจากที่ reply สั้นแรกมาถึงพร้อมกับข้อความ reassurance
Pattern 3 และ 7 (stream-mode, stderr drain) จำเป็นทันทีที่งานที่ใช้เวลานานเริ่มค้าง
Pattern 5 (concurrency lock) จำเป็นเมื่อ session-file truncation แรกปรากฏใน log
Pattern 2, 6 และ 8 คือการ hardening พื้นหลัง — ใช้ระหว่าง code review ก่อนที่จะพังใน production

สร้าง assistant สำหรับโปรเจกต์เฉพาะก่อน: คู่มือ architecture พื้นฐานครอบคลุม container, OAuth, workspace และ handler layout นำทีมเล็กๆ เข้ามาด้วยคู่มือ scalingสำหรับ allow-list, การตั้งค่ากลุ่ม และ trigger detection เพิ่มช่องทางอีเมลสำหรับโปรเจกต์ด้วยคู่มือ DKIM/DMARCเมื่อการแจ้งเตือนนอกช่องทางเริ่มเข้ามา กลับมาบทความนี้เมื่อต้องการ pattern ข้างต้น

tva รัน assistant ต่อโปรเจกต์หลายตัวพร้อมกันสำหรับ consulting mandate ต่างๆ ถ้าต้องการความช่วยเหลือในการสร้างหรือปรับแต่งติดต่อเราได้เลย