Context Management
LLM context windows are finite. As the conversation grows, the context fills up. Digitorn's context-management layer keeps the agent's input under the model's limit by automatic compaction and provides emergency recovery when an overflow slips through.
Every behaviour and field on this page maps to real code.
What lives in the context
A typical agent context per turn:
| Section | Tokens (typical) |
|---|---|
| System prompt + identity | 500-2 000 |
| Tool schemas | 0-15 000 |
| Conversation history | grows |
| Memory snapshot | 100-500 |
When the token pressure (used / total) crosses the configured threshold, the runtime fires a compaction pass that rewrites the conversation history in place.
runtime.context - the configuration block
ContextConfig (extra: forbid). Eight fields. The
full reference is in
App Configuration → runtime.context;
this page focuses on the behaviour each field controls.
runtime:
context:
max_tokens: 200000 # 0 = auto-detect from provider
output_reserved: 4096 # tokens reserved for the LLM reply
strategy: summarize # truncate | summarize
keep_recent: 10 # most-recent messages preserved verbatim
compression_trigger: 0.75 # pressure ratio (0.0-1.0) that fires compaction
summary_max_tokens: 1024 # cap on the synthesised summary
auto_compact: true # auto-inject a context_pressure hook
summary_brain: # optional cheap brain dedicated to summaries
provider: deepseek
model: deepseek-chat
backend: openai_compat
config:
api_key: "{{secret.DEEPSEEK_API_KEY}}"
Each agent can override the app-level config via
agents[].brain.context.
Provider auto-detection
When max_tokens: 0, the bootstrap pass calls
_refine_context_config_for_provider
() which reads the provider's known context
window. Override only when the provider doesn't report one (custom
endpoints, niche local models).
Token pressure and the auto-compact hook
When runtime.context.auto_compact: true
(default), the runtime auto-injects a hook equivalent to:
runtime:
hooks:
- id: _auto_compact
"on": turn_start
condition:
type: context_pressure
threshold: 0.75 # = compression_trigger
action:
type: compact_context
strategy: summarize # = strategy
keep_recent: 10 # = keep_recent (alias: keep_last)
summary_max_tokens: 1024
cooldown: 30.0
The injection is skipped when an explicit compact_context hook
already exists in runtime.hooks (so apps can override the
behaviour without surprise duplication).
The context_pressure condition is
(_eval_context_pressure). Default threshold 0.75.
token_pressure is computed per turn from the actual usage reported
by the provider (or estimated via
compaction.estimate_tokens - ~ 4 chars / token,
).
The compact_context action is
(@register_action("compact_context", ...)).
Compaction strategies
truncate
The cheap path:
- Find a "safe split point" near the boundary
(
_find_safe_split_point) - never splits a tool-call/tool-result pair, never strands an orphantool_use_id. - Drop everything before the split.
- Inject a single system message
(
_build_context_reminder,) recapping the goal + recent activity so the agent doesn't lose continuity.
No LLM call. Best when the deeper history is repetitive (a long file-grep loop, polling a watcher) and the agent doesn't need it.
summarize
_do_summarize. The smart path:
- Find the safe split point (same as truncate).
- Send the dropped slice to the summary brain (the agent's
main brain, or
runtime.context.summary_brainif configured) with a structured prompt asking for a recap of decisions, files touched, key facts. - Replace the dropped slice with a single system message containing the summary.
- Keep the recent messages verbatim
(count =
keep_recent).
Costs one LLM call per compaction. Recommended when the deeper history matters (the agent has been making decisions over many turns).
The summary is hard-capped at summary_max_tokens to prevent the
summary itself from growing unbounded.
summary_brain - using a cheap model for compaction
ContextConfig.summary_brain. Setting it routes the
summarisation call to a cheaper / faster model than the agent's
main brain. The block accepts the full AgentBrain shape
recursively (provider, model, backend, config, temperature, ...).
runtime:
context:
strategy: summarize
summary_brain:
provider: deepseek # cheap model for summaries
model: deepseek-chat
backend: openai_compat
config:
api_key: "{{secret.DEEPSEEK_API_KEY}}"
temperature: 0.0
max_tokens: 1024
If summary_brain is not set, the agent's main brain handles
summarisation (_summary_provider). For a Claude
Sonnet agent doing daily compactions, switching to DeepSeek for
summaries can cut the compaction cost by an order of magnitude
without affecting the agent's runtime behaviour.
Emergency compaction (overflow recovery)
emergency_compact. When the LLM responds with a
context-overflow error, the runtime:
- Detects the error via
is_context_overflow(exc)- matches
maximum context length,context_length_exceeded,context window,reduce the length of the messages,too many tokens,token limit.
- matches
- Halves
keep_recent(minimum 4) and runs an aggressive truncate compaction (no LLM call - the LLM is refusing). - Persists a
compactionevent tohistory_logwithreason: "context_overflow"for the audit trail. - Retries the turn.
This is a safety net - auto_compact + compression_trigger should
prevent overflow in practice. The emergency path exists for the
edge case where the threshold misfires (a single huge tool result
blowing past 25% of the budget at once).
For oversized single messages (a 200 KB tool result that
wouldn't fit anywhere), truncate_oversized_messages
() and snip_oversized_messages
() clip the offending content with a marker
indicating the truncation.
Persistence and resume
Every compaction emits a durable event with:
_snapshot_tools(line 43) - current tool inventory + injection mode_snapshot_memory(line 146) - memory state snapshot_extract_tool_examples(line 105) - recent tool call shapes, so a resumed session re-ingests realistic examplesbuild_system_note_from_payload(line 331) - rebuilds the system note injected into the post-compaction conversation
When a session resumes after the daemon restarts, the snapshot is replayed so the agent picks up with the same memory + tool examples it had pre-compaction.
Manual triggers
Two ways to compact outside the auto-hook:
- Custom hook in the YAML - declare a
compact_contexthook with your own condition (e.g.turn_count: every: 10,tool_calls: threshold: 50). The auto-injection skips when an explicitcompact_contexthook is present, so this completely replaces the default. - REST API - the daemon's session-compact endpoint
triggers
emergency_compactfrom outside the runtime. Useful for ops scripts, or when a user clicks a "Free up context" button.
Per-agent override
In multi-agent apps, each agent can declare its own context budget:
agents:
- id: explorer
brain:
provider: deepseek
model: deepseek-chat
context:
max_tokens: 65536 # explorer has a smaller window
keep_recent: 15
strategy: truncate # cheap for read-only browsing
- id: writer
brain:
provider: anthropic
model: claude-sonnet-4-5
context:
max_tokens: 200000 # writer keeps full Sonnet window
keep_recent: 8
strategy: summarize # smart summaries for long edits
The agent-level brain.context is resolved by
_resolve_context_config - it fully
overrides runtime.context for that agent. Unspecified fields fall
back to the app-level defaults.
Tuning checklist
| Symptom | Fix |
|---|---|
| Compaction fires too often (every 2-3 turns) | Lower compression_trigger to 0.85, or raise max_tokens if the model supports more. |
| Agent loses thread after compaction | Switch strategy: truncate → strategy: summarize. Bump summary_max_tokens to ~2048. |
| Summarisation is slow / expensive | Set summary_brain to a cheap model (DeepSeek, Groq, Ollama). |
| Overflow errors despite auto_compact | Lower compression_trigger (e.g. 0.6); inspect tool results - one of them is probably huge. Use truncate_oversized_messages settings. |
| Tool calls span many turns and clutter context | Add a tool_calls condition hook that compacts every N tool calls regardless of pressure. |
| Want to disable auto-compaction entirely | Set auto_compact: false and declare your own hook (or accept overflow + emergency recovery). |
Cross-references
- Block reference (every
ContextConfigfield): App Configuration → runtime.context - Per-agent override field on
AgentBrain: Agents → Per-brain context configuration - The hook engine (
compact_contextaction,context_pressurecondition, custom hooks): Tool Hooks - Memory injected into the prompt: Cognitive Memory