Krahnborn · Krahnie · Claude Agent SDK

Krahnie — standalone SDK orchestrator
Technical build reference

Krahnie rebuilt as its own Python service on the Claude Agent SDK, with Claude orchestrating directly instead of running as a plugin inside the Hermes framework. Same Salesforce delivery-ops workflows, same Slack UX — but Krahnie now owns its agent loop, which made it dramatically faster and cheaper. This is the code-level reference and the rationale for treating it as the successor to the krahnie-slack Hermes plugin.

Repo ~/krahnie Brain Claude Haiku 4.5 (orchestrator) · Sonnet 4.6 (enrichment) Org krahn-tj-dev (sandbox) Updated 2026-06-15

Contents

1Why a standalone SDK orchestrator

The original Krahnie ran as the krahnie-slack plugin inside the self-hosted Hermes framework, with gpt-5.5 as the brain and Hermes' hand-rolled gateway/agent loop. The rebuild keeps every workflow but swaps that for a small Python service built directly on the Claude Agent SDK (claude-agent-sdk), with Claude running the agent loop.

The architecture is three layers, and the key insight is that two of them are model-agnostic:

The payoff (measured): warm conversational turn dropped from ~10–13 s to ~2–3 s, cold-start cost from $0.16 → ~$0.01, warm-turn cost to ~$0.004, and the per-query Salesforce latency from ~6.6 s to ~0.3 s. See §12. Lock-in is thin: the tools and transport are reusable; only the orchestrator layer is Claude-native.

2Architecture & request flow

Slack (Socket Mode, shared app with Hermes)
   │   message / assistant_thread / action / view / command
   ▼
slack_gateway.py ── set RequestContext (channel, thread, user) per turn
   │
   ▼
agent.run_turn(text, ctx, session_key)        ← Claude Agent SDK, Haiku
   │   warm persistent client per conversation; thinking disabled; tools=[]
   ▼
in-process MCP tools (tools/*)                ← only Krahnie's tools exist
   │   query Salesforce (REST), render Block Kit
   ▼
post to Slack (slackio)  ·  the card IS the response

Writes and buttons bypass the agent: a tool posts a confirm card, and the gateway's button/modal handlers perform the actual Salesforce write on click (confirm-before-commit). Sonnet is called directly (not through the agent loop) for the genuinely reasoning-heavy bits — card-work summaries and story-point enrichment.

3Repo layout

~/krahnie/
  run.py                      entrypoint: load .env, start gateway, run forever
  .env                        SLACK_*, ANTHROPIC_API_KEY, KRAHNIE_SF_* (gitignored)
  requirements.txt            claude-agent-sdk, slack-bolt, slack-sdk, anthropic, PyYAML
  krahnie/
    agent.py                  SDK wiring: Haiku, tools=[], warm pool, run_turn()
    prompts.py                system prompt + intent routing
    context.py                RequestContext (per-conversation holder)
    slack_gateway.py          Socket Mode transport + all interactivity/commands
    tools/
      __init__.py             build_server(get_ctx) → in-process MCP server
      week.py                 post_week_at_a_glance
      cards.py                post_client_cards, post_my_time, find_cards, scope/parse helpers
      research.py             summarize_card_work + shared build_summary (Sonnet)
      logtime.py              post_log_confirm (+ picker), post_correction_confirm
      createcard.py           post_card_create_confirm
    writes.py                 SF write layer + modal builders + result renderers
    salesforce.py             REST query + create_record + patch_record (token cached)
    blockkit.py               pure Block Kit renderers (verbatim from Hermes)
    refresh.py                week payload + render
    enrich.py                 Sonnet: summarize_work, estimate_card (rubric cached)
    pending.py                confirm-before-commit store (nonce → proposed change)
    scheduler.py              durable reusable scheduler (one-time + weekly)
    state.py                  snooze/digest state · slackio.py · envload.py · story_points.md
  scripts/                    smoke_sdk · test_context_flow · bench · test_prewarm · test_wave1 · test_wave3
  var/                        state/ pending/ scheduled/ (gitignored runtime data)

4The orchestrator (agent.py)

A single configured agent, reached via run_turn(text, ctx, session_key). Three deliberate choices define its speed and cost profile:

OptionValueWhy
modelclaude-haiku-4-5-20251001Cheapest capable model; fast tool-routing (the Hermes "minimal reasoning" analog).
thinking{"type":"disabled"}No thinking-token latency or spend on routing turns.
tools[]Strips every built-in tool (Bash/Read/Write/…). Cuts cached context ~54K → ~3K tokens (≈94%). Krahnie literally has no tools but its own.
mcp_servers{"krahnie": <server>}In-process MCP server holding Krahnie's tools (mcp__krahnie__*).
can_use_tooldeny-by-defaultDefense in depth: allow only mcp__krahnie__*, deny everything else.
setting_sources[]Hermetic — loads no ~/.claude settings/skills/MCP from disk.

Warm-pool prewarming

The SDK spins up its runtime subprocess + MCP handshake on the first query() of a client (~13 s cold). To hide that, prewarm() at startup keeps a pool of pre-connected clients (each warmed with a throwaway query so the model path is hot) and primes the Salesforce token. A new conversation adopts a warm client; the pool replenishes in the background. Result: the first real message of a conversation is ~3 s, not ~13 s. Pool size via KRAHNIE_WARM_POOL (default 1).

5Context delivery — the task-isolation fix

The trap we hit: a persistent SDK client runs tool callbacks inside its own connection task, which snapshots contextvars at connect() time. So an ambient contextvar set per-turn in the caller's task is stale inside the callback — turn 2's tool saw turn 1's channel.

The fix: each conversation owns a mutable holder; run(text, ctx) stamps the current turn's RequestContext into it, and that conversation's tools read it via a bound getter (build_server(lambda: holder["current"])). A mutable dict read at call-time is shared by reference and always reflects the current turn — immune to which task runs the callback. Validated across multi-turn.

6Tools (the workflows)

Tools are built per-conversation by factories bound to that conversation's context getter, then bundled into one in-process MCP server. Tool names the agent sees are mcp__krahnie__<name>.

ToolBehavior
post_week_at_a_glancereadThe "what's my week" digest — cards due/overdue, utilization vs capacity, Refresh/Snooze buttons.
post_client_cardsreadCards for a client/project/board. Subject matched across all three (robust to which field the model picks). Filters: overdue, unlogged, mine.
post_my_timereadCards the asking user logged time to this week + utilization.
summarize_card_workreadFetch a card's work logs, write a Sonnet recap, post it. Shares build_summary() with the ⋯ menu.
post_log_confirmwriteLog time. Confident match → confirm card; otherwise a picker of the client's (or your own) open cards.
post_correction_confirmwriteCorrect/move an existing entry — before→after diff card.
post_card_create_confirmwriteCreate a card — Create/Edit/Cancel card; fires async Sonnet AI insights after creation.

Routing lives in prompts.py. Key rules: a named client/project/board always wins over "do I have" phrasing; durations convert to decimal hours (90 min → 1.5); after a posting tool succeeds the agent stays silent (the card is the response).

7Write flows & confirm-before-commit

No tool writes directly. A write tool stashes the proposed change in pending.py (one JSON-on-disk store, nonce-keyed, with a kind discriminator: log / correction / card_create / log_intent) and posts a Block Kit confirm card whose buttons carry the nonce. The gateway's button handler loads (and consumes) the pending change and performs the Salesforce write.

agent → post_log_confirm → pending.save("log", {...}) + confirm card
user taps Confirm → gateway _run_log_confirm → pending.load(nonce) → writes.log_work → chat_update to "✅ Logged …"

The fuzzy picker (a favorite Hermes feature): when the card hint isn't a confident match, find_cards() (difflib + token search over open cards, scoped to a client, relaxing to all of the client's cards) drives a picker. Each button carries {intent_nonce, card_id}; tapping one resolves the card and chat_updates the picker into the normal confirm card.

8Gateway: interactivity & slash commands

slack_gateway.py reuses the Hermes transport patterns on slack_bolt AsyncApp + Socket Mode: the 3-second-ack rule is honored by await ack() then asyncio.create_task(...) for all work; assistant-pane threads are handled so replies land in the conversation; messages are de-duped.

SurfaceHandlers
Digest buttonsuiRefresh (re-query + chat_update), Snooze (durable schedule, §11)
Confirm cardsuiConfirm / Edit / Cancel for log, correction, card-create
Pickeruikrahnie_logpick → resolve chosen card → confirm card
Per-card ⋯ menuuiLog time → modal · Summarize → Sonnet recap in-thread
Modalsuiviews.open + view_submission for the log + new-card forms
Slash commandscmd/my-week · /logtime · /newcard (registered in the shared Slack app)
Shared Slack app: Krahnie connects to the same Slack app as the Hermes gateway. Two Socket Mode clients on one app token both receive events — only one may run at a time. Stop hermes-gateway.service before running Krahnie.

9Salesforce REST layer (salesforce.py)

The biggest single speed win. The Hermes plugin shelled out to the sf CLI per query (~3 s of Node cold-start each). Krahnie fetches an access token + instance URL once via sf org display (cached in-memory) and then hits the REST API directly over HTTPS for every query and write.

FunctionREST
query(soql)GET /services/data/vXX/query · follows pagination · 401 → refresh + retry
create_record(sobject, fields)POST /sobjects/{sobject}
patch_record(sobject, id, patch)PATCH /sobjects/{sobject}/{id} (204 on success)

Per-query latency went from ~6.6 s (two CLI calls) to ~0.3 s. The one-time token fetch (~3.5 s) is also prewarmed at startup, so even the first real query is fast.

10Sonnet enrichment (enrich.py)

Haiku orchestrates; the genuinely reasoning-heavy work calls Sonnet 4.6 directly via the Anthropic SDK:

On card creation the gateway renders the created card with a "✨ Generating AI insights…" note, runs estimate_card in the background, writes the AI_* fields, and updates the message in place with story points + next steps + follow-ups.

11Durable scheduler (scheduler.py)

A general, reusable scheduling primitive — not snooze-specific. Each job is a JSON file under var/scheduled/: {id, run_at (UTC), kind, payload, recurring}. The gateway runs a 30-second poll loop that fires due jobs by dispatching on kind, then re-arms recurring jobs or removes one-time ones. Because jobs live on disk, they survive gateway restarts (the gap the Wave-2 in-process snooze had).

12Cost & performance

Warm turn latency
~2–3 s
was ~10–13 s on Hermes
Warm turn cost
~$0.004
Haiku, ~3K-token context
Cold-start cost
~$0.01
was $0.16 before stripping
Context size
~3K tok
from ~54K (tools=[])
SF query
~0.3 s
from ~6.6 s (REST vs CLI)
Cold-start latency
hidden
warm-pool prewarming

Pricing (per 1M tokens): Haiku 4.5 $1 / $5 · Sonnet 4.6 $3 / $15 · cache reads ≈ 0.1× input. Prompt caching is automatic via the SDK; stripping the built-in tool defs is what made it cheap by shrinking what gets cached. Sonnet is the only notable spend (summaries/enrichment, ~$0.01–0.02 each) and fires infrequently and by design.

13Running it · env · cutover

# Krahnie shares the Slack app with Hermes — stop Hermes first
systemctl --user stop hermes-gateway.service
cd ~/krahnie && ./.venv/bin/python run.py
# Ctrl-C to stop, then: systemctl --user start hermes-gateway.service
Env flagEffect
KRAHNIE_WARM_POOLPre-connected client pool size (default 1).
KRAHNIE_WEEKLY_DIGEST=1Enable the recurring Monday 8:00 CT all-hands digest.
KRAHNIE_STORY_POINT_GUIDEOverride the story-point rubric path.
KRAHNIE_SF_ORG / KRAHNIE_SF_INSTANCESalesforce org alias + Lightning instance for record links.

Cutover: once Krahnie is proven, run it as its own systemd user service (mirroring hermes-gateway.service) and leave Hermes stopped/disabled for this workload. Both can't hold the shared Slack app at once.

14Key learnings

  1. tools=[] is the cost lever. The SDK loads its full built-in tool catalog by default (~51K cached tokens). Strip it when the agent only needs your MCP tools — context drops ~94% and cold cost collapses.
  2. Persistent clients break ambient contextvars. Tool callbacks run in the client's connection task. Deliver per-turn context through a mutable per-conversation holder, not a contextvar.
  3. Go around the CLI for Salesforce. Direct REST (token cached) is ~16× faster per query than sf; the MCP server also uses REST, so it wouldn't have helped — direct is leanest.
  4. Prewarm to hide cold start. connect() is lazy; the runtime spins up on the first query(). A throwaway warmup query + token prefetch at startup makes the first real turn fast.
  5. The architecture is the moat. Transport and tools are model-agnostic and ported verbatim; only ~a few dozen lines (agent.py + prompts.py) are Claude-specific. Cheap to maintain, cheap to re-point.