Krahnborn · Krahnie · Claude Agent SDK

Krahnie — standalone SDK orchestrator
Technical build reference

Krahnie rebuilt as its own Python service on the Claude Agent SDK, with Claude orchestrating directly instead of running as a plugin inside the Hermes framework. Same Salesforce delivery-ops workflows, same Slack UX — but Krahnie now owns its agent loop, which made it dramatically faster and cheaper. This is the code-level reference and the rationale for treating it as the successor to the krahnie-slack Hermes plugin.

Repo ~/krahnie Brain Claude Haiku 4.5 (orchestrator) · Sonnet 4.6 (enrichment) Org krahn-tj-dev (sandbox) Updated 2026-06-15

1 · Why a standalone SDK orchestrator
2 · Architecture & request flow
3 · Repo layout
4 · The orchestrator (agent.py)
5 · Context delivery (the task-isolation fix)
6 · Tools (the workflows)
7 · Write flows & confirm-before-commit
8 · Gateway: interactivity & slash commands
9 · Salesforce REST layer
10 · Sonnet enrichment
11 · Durable scheduler
12 · Cost & performance
13 · Running it · env · cutover
14 · Key learnings

1Why a standalone SDK orchestrator

The original Krahnie ran as the krahnie-slack plugin inside the self-hosted Hermes framework, with gpt-5.5 as the brain and Hermes' hand-rolled gateway/agent loop. The rebuild keeps every workflow but swaps that for a small Python service built directly on the Claude Agent SDK (claude-agent-sdk), with Claude running the agent loop.

The architecture is three layers, and the key insight is that two of them are model-agnostic:

Transport — Slack Socket Mode gateway (ack, threading, buttons, modals). Reused patterns from Hermes; doesn't care what the brain is.
Orchestrator — the swappable brain. This is the only Claude-specific layer (agent.py + prompts.py, a few dozen lines).
Tools — the workflows (Salesforce queries, Block Kit rendering). Pure business logic, ported verbatim from the Hermes plugin.

The payoff (measured): warm conversational turn dropped from ~10–13 s to ~2–3 s, cold-start cost from $0.16 → ~$0.01, warm-turn cost to ~$0.004, and the per-query Salesforce latency from ~6.6 s to ~0.3 s. See §12. Lock-in is thin: the tools and transport are reusable; only the orchestrator layer is Claude-native.

2Architecture & request flow

Slack (Socket Mode, shared app with Hermes)
   │   message / assistant_thread / action / view / command
   ▼
slack_gateway.py ── set RequestContext (channel, thread, user) per turn
   │
   ▼
agent.run_turn(text, ctx, session_key)        ← Claude Agent SDK, Haiku
   │   warm persistent client per conversation; thinking disabled; tools=[]
   ▼
in-process MCP tools (tools/*)                ← only Krahnie's tools exist
   │   query Salesforce (REST), render Block Kit
   ▼
post to Slack (slackio)  ·  the card IS the response

Writes and buttons bypass the agent: a tool posts a confirm card, and the gateway's button/modal handlers perform the actual Salesforce write on click (confirm-before-commit). Sonnet is called directly (not through the agent loop) for the genuinely reasoning-heavy bits — card-work summaries and story-point enrichment.

3Repo layout

~/krahnie/
  run.py                      entrypoint: load .env, start gateway, run forever
  .env                        SLACK_*, ANTHROPIC_API_KEY, KRAHNIE_SF_* (gitignored)
  requirements.txt            claude-agent-sdk, slack-bolt, slack-sdk, anthropic, PyYAML
  krahnie/
    agent.py                  SDK wiring: Haiku, tools=[], warm pool, run_turn()
    prompts.py                system prompt + intent routing
    context.py                RequestContext (per-conversation holder)
    slack_gateway.py          Socket Mode transport + all interactivity/commands
    tools/
      __init__.py             build_server(get_ctx) → in-process MCP server
      week.py                 post_week_at_a_glance
      cards.py                post_client_cards, post_my_time, find_cards, scope/parse helpers
      research.py             summarize_card_work + shared build_summary (Sonnet)
      logtime.py              post_log_confirm (+ picker), post_correction_confirm
      createcard.py           post_card_create_confirm
    writes.py                 SF write layer + modal builders + result renderers
    salesforce.py             REST query + create_record + patch_record (token cached)
    blockkit.py               pure Block Kit renderers (verbatim from Hermes)
    refresh.py                week payload + render
    enrich.py                 Sonnet: summarize_work, estimate_card (rubric cached)
    pending.py                confirm-before-commit store (nonce → proposed change)
    scheduler.py              durable reusable scheduler (one-time + weekly)
    state.py                  snooze/digest state · slackio.py · envload.py · story_points.md
  scripts/                    smoke_sdk · test_context_flow · bench · test_prewarm · test_wave1 · test_wave3
  var/                        state/ pending/ scheduled/ (gitignored runtime data)

4The orchestrator (`agent.py`)

A single configured agent, reached via run_turn(text, ctx, session_key). Three deliberate choices define its speed and cost profile:

Option	Value	Why
`model`	`claude-haiku-4-5-20251001`	Cheapest capable model; fast tool-routing (the Hermes "minimal reasoning" analog).
`thinking`	`{"type":"disabled"}`	No thinking-token latency or spend on routing turns.
`tools`	`[]`	Strips every built-in tool (Bash/Read/Write/…). Cuts cached context ~54K → ~3K tokens (≈94%). Krahnie literally has no tools but its own.
`mcp_servers`	`{"krahnie": <server>}`	In-process MCP server holding Krahnie's tools (`mcp__krahnie__*`).
`can_use_tool`	deny-by-default	Defense in depth: allow only `mcp__krahnie__*`, deny everything else.
`setting_sources`	`[]`	Hermetic — loads no `~/.claude` settings/skills/MCP from disk.

Warm-pool prewarming

The SDK spins up its runtime subprocess + MCP handshake on the first query() of a client (~13 s cold). To hide that, prewarm() at startup keeps a pool of pre-connected clients (each warmed with a throwaway query so the model path is hot) and primes the Salesforce token. A new conversation adopts a warm client; the pool replenishes in the background. Result: the first real message of a conversation is ~3 s, not ~13 s. Pool size via KRAHNIE_WARM_POOL (default 1).

5Context delivery — the task-isolation fix

The trap we hit: a persistent SDK client runs tool callbacks inside its own connection task, which snapshots contextvars at connect() time. So an ambient contextvar set per-turn in the caller's task is stale inside the callback — turn 2's tool saw turn 1's channel.

The fix: each conversation owns a mutable holder; run(text, ctx) stamps the current turn's RequestContext into it, and that conversation's tools read it via a bound getter (build_server(lambda: holder["current"])). A mutable dict read at call-time is shared by reference and always reflects the current turn — immune to which task runs the callback. Validated across multi-turn.

6Tools (the workflows)

Tools are built per-conversation by factories bound to that conversation's context getter, then bundled into one in-process MCP server. Tool names the agent sees are mcp__krahnie__<name>.

Tool		Behavior
`post_week_at_a_glance`	read	The "what's my week" digest — cards due/overdue, utilization vs capacity, Refresh/Snooze buttons.
`post_client_cards`	read	Cards for a client/project/board. Subject matched across all three (robust to which field the model picks). Filters: `overdue`, `unlogged`, `mine`.
`post_my_time`	read	Cards the asking user logged time to this week + utilization.
`summarize_card_work`	read	Fetch a card's work logs, write a Sonnet recap, post it. Shares `build_summary()` with the ⋯ menu.
`post_log_confirm`	write	Log time. Confident match → confirm card; otherwise a picker of the client's (or your own) open cards.
`post_correction_confirm`	write	Correct/move an existing entry — before→after diff card.
`post_card_create_confirm`	write	Create a card — Create/Edit/Cancel card; fires async Sonnet AI insights after creation.

Routing lives in prompts.py. Key rules: a named client/project/board always wins over "do I have" phrasing; durations convert to decimal hours (90 min → 1.5); after a posting tool succeeds the agent stays silent (the card is the response).

7Write flows & confirm-before-commit

No tool writes directly. A write tool stashes the proposed change in pending.py (one JSON-on-disk store, nonce-keyed, with a kind discriminator: log / correction / card_create / log_intent) and posts a Block Kit confirm card whose buttons carry the nonce. The gateway's button handler loads (and consumes) the pending change and performs the Salesforce write.

agent → post_log_confirm → pending.save("log", {...}) + confirm card
user taps Confirm → gateway _run_log_confirm → pending.load(nonce) → writes.log_work → chat_update to "✅ Logged …"

The fuzzy picker (a favorite Hermes feature): when the card hint isn't a confident match, find_cards() (difflib + token search over open cards, scoped to a client, relaxing to all of the client's cards) drives a picker. Each button carries {intent_nonce, card_id}; tapping one resolves the card and chat_updates the picker into the normal confirm card.

8Gateway: interactivity & slash commands

slack_gateway.py reuses the Hermes transport patterns on slack_bolt AsyncApp + Socket Mode: the 3-second-ack rule is honored by await ack() then asyncio.create_task(...) for all work; assistant-pane threads are handled so replies land in the conversation; messages are de-duped.

Surface		Handlers
Digest buttons	ui	Refresh (re-query + chat_update), Snooze (durable schedule, §11)
Confirm cards	ui	Confirm / Edit / Cancel for log, correction, card-create
Picker	ui	`krahnie_logpick` → resolve chosen card → confirm card
Per-card ⋯ menu	ui	Log time → modal · Summarize → Sonnet recap in-thread
Modals	ui	`views.open` + `view_submission` for the log + new-card forms
Slash commands	cmd	`/my-week` · `/logtime` · `/newcard` (registered in the shared Slack app)

Shared Slack app: Krahnie connects to the same Slack app as the Hermes gateway. Two Socket Mode clients on one app token both receive events — only one may run at a time. Stop hermes-gateway.service before running Krahnie.

9Salesforce REST layer (`salesforce.py`)

The biggest single speed win. The Hermes plugin shelled out to the sf CLI per query (~3 s of Node cold-start each). Krahnie fetches an access token + instance URL once via sf org display (cached in-memory) and then hits the REST API directly over HTTPS for every query and write.

Function	REST
`query(soql)`	GET `/services/data/vXX/query` · follows pagination · 401 → refresh + retry
`create_record(sobject, fields)`	POST `/sobjects/{sobject}`
`patch_record(sobject, id, patch)`	PATCH `/sobjects/{sobject}/{id}` (204 on success)

Per-query latency went from ~6.6 s (two CLI calls) to ~0.3 s. The one-time token fetch (~3.5 s) is also prewarmed at startup, so even the first real query is fast.

10Sonnet enrichment (`enrich.py`)

Haiku orchestrates; the genuinely reasoning-heavy work calls Sonnet 4.6 directly via the Anthropic SDK:

summarize_work(...) — the card-work recap behind summarize_card_work and the ⋯ menu.
estimate_card(title, desc) — story points + next steps + follow-up questions, via a forced tool call (submit_estimate) for guaranteed structure. The editable rubric (story_points.md) is sent as a cache_control system block so repeated estimates reuse it.

On card creation the gateway renders the created card with a "✨ Generating AI insights…" note, runs estimate_card in the background, writes the AI_* fields, and updates the message in place with story points + next steps + follow-ups.

11Durable scheduler (`scheduler.py`)

A general, reusable scheduling primitive — not snooze-specific. Each job is a JSON file under var/scheduled/: {id, run_at (UTC), kind, payload, recurring}. The gateway runs a 30-second poll loop that fires due jobs by dispatching on kind, then re-arms recurring jobs or removes one-time ones. Because jobs live on disk, they survive gateway restarts (the gap the Wave-2 in-process snooze had).

schedule(run_at, kind, payload, recurring=…) · due(now) · reschedule_recurring(job) · next_weekly(spec)
Kinds today: week_repost (snooze re-post) and week_digest_all (the proactive Monday 8:00 CT digest to all mapped users — opt-in via KRAHNIE_WEEKLY_DIGEST=1).
New scheduled features just pick a kind and register a handler. Mirrors Hermes' cron modularity.

12Cost & performance

Warm turn latency

~2–3 s

was ~10–13 s on Hermes

Warm turn cost

~$0.004

Haiku, ~3K-token context

Cold-start cost

~$0.01

was $0.16 before stripping

Context size

~3K tok

from ~54K (tools=[])

SF query

~0.3 s

from ~6.6 s (REST vs CLI)

Cold-start latency

hidden

warm-pool prewarming

Pricing (per 1M tokens): Haiku 4.5 $1 / $5 · Sonnet 4.6 $3 / $15 · cache reads ≈ 0.1× input. Prompt caching is automatic via the SDK; stripping the built-in tool defs is what made it cheap by shrinking what gets cached. Sonnet is the only notable spend (summaries/enrichment, ~$0.01–0.02 each) and fires infrequently and by design.

13Running it · env · cutover

# Krahnie shares the Slack app with Hermes — stop Hermes first
systemctl --user stop hermes-gateway.service
cd ~/krahnie && ./.venv/bin/python run.py
# Ctrl-C to stop, then: systemctl --user start hermes-gateway.service

Env flag	Effect
`KRAHNIE_WARM_POOL`	Pre-connected client pool size (default 1).
`KRAHNIE_WEEKLY_DIGEST=1`	Enable the recurring Monday 8:00 CT all-hands digest.
`KRAHNIE_STORY_POINT_GUIDE`	Override the story-point rubric path.
`KRAHNIE_SF_ORG` / `KRAHNIE_SF_INSTANCE`	Salesforce org alias + Lightning instance for record links.

Cutover: once Krahnie is proven, run it as its own systemd user service (mirroring hermes-gateway.service) and leave Hermes stopped/disabled for this workload. Both can't hold the shared Slack app at once.

14Key learnings

tools=[] is the cost lever. The SDK loads its full built-in tool catalog by default (~51K cached tokens). Strip it when the agent only needs your MCP tools — context drops ~94% and cold cost collapses.
Persistent clients break ambient contextvars. Tool callbacks run in the client's connection task. Deliver per-turn context through a mutable per-conversation holder, not a contextvar.
Go around the CLI for Salesforce. Direct REST (token cached) is ~16× faster per query than sf; the MCP server also uses REST, so it wouldn't have helped — direct is leanest.
Prewarm to hide cold start. connect() is lazy; the runtime spins up on the first query(). A throwaway warmup query + token prefetch at startup makes the first real turn fast.
The architecture is the moat. Transport and tools are model-agnostic and ported verbatim; only ~a few dozen lines (agent.py + prompts.py) are Claude-specific. Cheap to maintain, cheap to re-point.

Krahnie SDK orchestrator · ~/krahnie · Claude Agent SDK (Haiku orchestrator / Sonnet enrichment) · successor to the Hermes krahnie-slack plugin · generated 2026-06-15.

Contents