Krahnie rebuilt as its own Python service on the Claude Agent SDK, with Claude
orchestrating directly instead of running as a plugin inside the Hermes framework. Same Salesforce
delivery-ops workflows, same Slack UX — but Krahnie now owns its agent loop, which made it dramatically
faster and cheaper. This is the code-level reference and the rationale for treating it as the successor
to the krahnie-slack Hermes plugin.
The original Krahnie ran as the krahnie-slack plugin inside the self-hosted Hermes framework,
with gpt-5.5 as the brain and Hermes' hand-rolled gateway/agent loop. The rebuild keeps every workflow but
swaps that for a small Python service built directly on the Claude Agent SDK (claude-agent-sdk),
with Claude running the agent loop.
The architecture is three layers, and the key insight is that two of them are model-agnostic:
agent.py + prompts.py, a few dozen lines).Slack (Socket Mode, shared app with Hermes)
│ message / assistant_thread / action / view / command
▼
slack_gateway.py ── set RequestContext (channel, thread, user) per turn
│
▼
agent.run_turn(text, ctx, session_key) ← Claude Agent SDK, Haiku
│ warm persistent client per conversation; thinking disabled; tools=[]
▼
in-process MCP tools (tools/*) ← only Krahnie's tools exist
│ query Salesforce (REST), render Block Kit
▼
post to Slack (slackio) · the card IS the response
Writes and buttons bypass the agent: a tool posts a confirm card, and the gateway's button/modal handlers perform the actual Salesforce write on click (confirm-before-commit). Sonnet is called directly (not through the agent loop) for the genuinely reasoning-heavy bits — card-work summaries and story-point enrichment.
~/krahnie/
run.py entrypoint: load .env, start gateway, run forever
.env SLACK_*, ANTHROPIC_API_KEY, KRAHNIE_SF_* (gitignored)
requirements.txt claude-agent-sdk, slack-bolt, slack-sdk, anthropic, PyYAML
krahnie/
agent.py SDK wiring: Haiku, tools=[], warm pool, run_turn()
prompts.py system prompt + intent routing
context.py RequestContext (per-conversation holder)
slack_gateway.py Socket Mode transport + all interactivity/commands
tools/
__init__.py build_server(get_ctx) → in-process MCP server
week.py post_week_at_a_glance
cards.py post_client_cards, post_my_time, find_cards, scope/parse helpers
research.py summarize_card_work + shared build_summary (Sonnet)
logtime.py post_log_confirm (+ picker), post_correction_confirm
createcard.py post_card_create_confirm
writes.py SF write layer + modal builders + result renderers
salesforce.py REST query + create_record + patch_record (token cached)
blockkit.py pure Block Kit renderers (verbatim from Hermes)
refresh.py week payload + render
enrich.py Sonnet: summarize_work, estimate_card (rubric cached)
pending.py confirm-before-commit store (nonce → proposed change)
scheduler.py durable reusable scheduler (one-time + weekly)
state.py snooze/digest state · slackio.py · envload.py · story_points.md
scripts/ smoke_sdk · test_context_flow · bench · test_prewarm · test_wave1 · test_wave3
var/ state/ pending/ scheduled/ (gitignored runtime data)
agent.py)A single configured agent, reached via run_turn(text, ctx, session_key). Three deliberate choices
define its speed and cost profile:
| Option | Value | Why |
|---|---|---|
model | claude-haiku-4-5-20251001 | Cheapest capable model; fast tool-routing (the Hermes "minimal reasoning" analog). |
thinking | {"type":"disabled"} | No thinking-token latency or spend on routing turns. |
tools | [] | Strips every built-in tool (Bash/Read/Write/…). Cuts cached context ~54K → ~3K tokens (≈94%). Krahnie literally has no tools but its own. |
mcp_servers | {"krahnie": <server>} | In-process MCP server holding Krahnie's tools (mcp__krahnie__*). |
can_use_tool | deny-by-default | Defense in depth: allow only mcp__krahnie__*, deny everything else. |
setting_sources | [] | Hermetic — loads no ~/.claude settings/skills/MCP from disk. |
The SDK spins up its runtime subprocess + MCP handshake on the first query() of a client (~13 s cold).
To hide that, prewarm() at startup keeps a pool of pre-connected clients (each warmed with a throwaway
query so the model path is hot) and primes the Salesforce token. A new conversation adopts a warm client;
the pool replenishes in the background. Result: the first real message of a conversation is ~3 s, not ~13 s.
Pool size via KRAHNIE_WARM_POOL (default 1).
contextvars at connect() time. So an ambient contextvar set per-turn
in the caller's task is stale inside the callback — turn 2's tool saw turn 1's channel.The fix: each conversation owns a mutable holder; run(text, ctx) stamps the current turn's
RequestContext into it, and that conversation's tools read it via a bound getter
(build_server(lambda: holder["current"])). A mutable dict read at call-time is shared by reference and
always reflects the current turn — immune to which task runs the callback. Validated across multi-turn.
Tools are built per-conversation by factories bound to that conversation's context getter, then bundled into
one in-process MCP server. Tool names the agent sees are mcp__krahnie__<name>.
| Tool | Behavior | |
|---|---|---|
post_week_at_a_glance | read | The "what's my week" digest — cards due/overdue, utilization vs capacity, Refresh/Snooze buttons. |
post_client_cards | read | Cards for a client/project/board. Subject matched across all three (robust to which field the model picks). Filters: overdue, unlogged, mine. |
post_my_time | read | Cards the asking user logged time to this week + utilization. |
summarize_card_work | read | Fetch a card's work logs, write a Sonnet recap, post it. Shares build_summary() with the ⋯ menu. |
post_log_confirm | write | Log time. Confident match → confirm card; otherwise a picker of the client's (or your own) open cards. |
post_correction_confirm | write | Correct/move an existing entry — before→after diff card. |
post_card_create_confirm | write | Create a card — Create/Edit/Cancel card; fires async Sonnet AI insights after creation. |
Routing lives in prompts.py. Key rules: a named client/project/board always wins over "do I have"
phrasing; durations convert to decimal hours (90 min → 1.5); after a posting tool succeeds the agent stays
silent (the card is the response).
No tool writes directly. A write tool stashes the proposed change in pending.py (one JSON-on-disk
store, nonce-keyed, with a kind discriminator: log / correction /
card_create / log_intent) and posts a Block Kit confirm card whose buttons carry the nonce.
The gateway's button handler loads (and consumes) the pending change and performs the Salesforce write.
agent → post_log_confirm → pending.save("log", {...}) + confirm card
user taps Confirm → gateway _run_log_confirm → pending.load(nonce) → writes.log_work → chat_update to "✅ Logged …"
The fuzzy picker (a favorite Hermes feature): when the card hint isn't a confident match,
find_cards() (difflib + token search over open cards, scoped to a client, relaxing to all of the
client's cards) drives a picker. Each button carries {intent_nonce, card_id}; tapping one resolves the
card and chat_updates the picker into the normal confirm card.
slack_gateway.py reuses the Hermes transport patterns on slack_bolt AsyncApp + Socket Mode:
the 3-second-ack rule is honored by await ack() then asyncio.create_task(...) for all
work; assistant-pane threads are handled so replies land in the conversation; messages are de-duped.
| Surface | Handlers | |
|---|---|---|
| Digest buttons | ui | Refresh (re-query + chat_update), Snooze (durable schedule, §11) |
| Confirm cards | ui | Confirm / Edit / Cancel for log, correction, card-create |
| Picker | ui | krahnie_logpick → resolve chosen card → confirm card |
| Per-card ⋯ menu | ui | Log time → modal · Summarize → Sonnet recap in-thread |
| Modals | ui | views.open + view_submission for the log + new-card forms |
| Slash commands | cmd | /my-week · /logtime · /newcard (registered in the shared Slack app) |
hermes-gateway.service before running Krahnie.salesforce.py)The biggest single speed win. The Hermes plugin shelled out to the sf CLI per query (~3 s of Node
cold-start each). Krahnie fetches an access token + instance URL once via sf org display
(cached in-memory) and then hits the REST API directly over HTTPS for every query and write.
| Function | REST |
|---|---|
query(soql) | GET /services/data/vXX/query · follows pagination · 401 → refresh + retry |
create_record(sobject, fields) | POST /sobjects/{sobject} |
patch_record(sobject, id, patch) | PATCH /sobjects/{sobject}/{id} (204 on success) |
Per-query latency went from ~6.6 s (two CLI calls) to ~0.3 s. The one-time token fetch (~3.5 s) is also prewarmed at startup, so even the first real query is fast.
enrich.py)Haiku orchestrates; the genuinely reasoning-heavy work calls Sonnet 4.6 directly via the Anthropic SDK:
summarize_work(...) — the card-work recap behind summarize_card_work and the ⋯ menu.estimate_card(title, desc) — story points + next steps + follow-up questions, via a forced tool call
(submit_estimate) for guaranteed structure. The editable rubric (story_points.md) is sent as a
cache_control system block so repeated estimates reuse it.On card creation the gateway renders the created card with a "✨ Generating AI insights…" note, runs
estimate_card in the background, writes the AI_* fields, and updates the message in place with
story points + next steps + follow-ups.
scheduler.py)A general, reusable scheduling primitive — not snooze-specific. Each job is a JSON file under
var/scheduled/: {id, run_at (UTC), kind, payload, recurring}. The gateway runs a 30-second
poll loop that fires due jobs by dispatching on kind, then re-arms recurring jobs or removes one-time
ones. Because jobs live on disk, they survive gateway restarts (the gap the Wave-2 in-process snooze had).
schedule(run_at, kind, payload, recurring=…) · due(now) · reschedule_recurring(job) · next_weekly(spec)week_repost (snooze re-post) and week_digest_all (the proactive Monday 8:00 CT digest to all mapped users — opt-in via KRAHNIE_WEEKLY_DIGEST=1).kind and register a handler. Mirrors Hermes' cron modularity.Pricing (per 1M tokens): Haiku 4.5 $1 / $5 · Sonnet 4.6 $3 / $15 · cache reads ≈ 0.1× input. Prompt caching is automatic via the SDK; stripping the built-in tool defs is what made it cheap by shrinking what gets cached. Sonnet is the only notable spend (summaries/enrichment, ~$0.01–0.02 each) and fires infrequently and by design.
# Krahnie shares the Slack app with Hermes — stop Hermes first
systemctl --user stop hermes-gateway.service
cd ~/krahnie && ./.venv/bin/python run.py
# Ctrl-C to stop, then: systemctl --user start hermes-gateway.service
| Env flag | Effect |
|---|---|
KRAHNIE_WARM_POOL | Pre-connected client pool size (default 1). |
KRAHNIE_WEEKLY_DIGEST=1 | Enable the recurring Monday 8:00 CT all-hands digest. |
KRAHNIE_STORY_POINT_GUIDE | Override the story-point rubric path. |
KRAHNIE_SF_ORG / KRAHNIE_SF_INSTANCE | Salesforce org alias + Lightning instance for record links. |
Cutover: once Krahnie is proven, run it as its own systemd user service (mirroring
hermes-gateway.service) and leave Hermes stopped/disabled for this workload. Both can't hold the
shared Slack app at once.
tools=[] is the cost lever. The SDK loads its full built-in tool catalog by default (~51K cached tokens). Strip it when the agent only needs your MCP tools — context drops ~94% and cold cost collapses.sf; the MCP server also uses REST, so it wouldn't have helped — direct is leanest.connect() is lazy; the runtime spins up on the first query(). A throwaway warmup query + token prefetch at startup makes the first real turn fast.agent.py + prompts.py) are Claude-specific. Cheap to maintain, cheap to re-point.