Security 3 - Custom behaviour rule that blocks
Advanced 4 showed the behaviour profiles - bundles of rules tuned to a working style. The 14 built-in rules cover the classic patterns (read before edit, test after changes, no shell for files), but every app has at least one app-specific invariant the engine can't know about:
- "Never write inside
secrets/." - "All
http.postcalls must use TLS." - "The
customerstable is read-only."
Custom rules answer that. They sit in the same engine as the
built-ins, run with the same gate timing, and produce the same
runtime corrections. Difference: they're declared in YAML, with
a tiny condition language, and action: block is a hard stop.
The rule shape
security:
behavior:
profile: dev # any base profile
rule_definitions: # NEW format - composable
- id: <unique_id>
trigger: [<tool_name>, ...] # which tools the rule attaches to
when: pre_tool # pre_tool | post_tool | on_text
action: block # block | warn | remind
condition:
any: # any | all
- param_contains:
param: <param_name>
value: <substring>
- param_matches: # regex
param: <param_name>
pattern: <regex>
message: <human-readable rejection>
Three things matter here:
triggernarrows the rule to the right tools.[bash]for shell,[write]for file writes,[http.post]for HTTP, etc.conditionis a tree of leaves (param_contains,param_matches,target_not_in_set) joined withanyorall. Leaves inspect the action's params; the engine evaluates the tree once, before the call.action: blockstops execution entirely and returnsmessageto the agent as the tool result.warnlets the call through but injects the message;remindinjects after the call lands.
The YAML
Save this as custom-rule-bot.yaml. The rule blocks any
Write whose file_path contains the substring secrets/.
app:
app_id: custom-rule-bot
name: Custom Rule Bot
version: "1.0"
runtime:
mode: conversation
workdir_mode: auto
max_turns: 4
timeout: 60
agents:
- id: main
role: assistant
brain:
provider: deepseek
model: deepseek-chat
backend: openai_compat
credential:
ref: deepseek_main
scope: per_user
provider: deepseek
config:
api_key: "{{env.DEEPSEEK_API_KEY}}"
base_url: https://api.deepseek.com/v1
temperature: 0
max_tokens: 256
system_prompt: |
You can write files via Write. Be concise.
tools:
modules:
filesystem: {}
capabilities:
default_policy: auto
security:
behavior:
profile: dev # permissive base
classify_turns: false
rule_definitions:
- id: protect_secrets_dir
trigger: [write]
when: pre_tool
action: block
condition:
any:
- param_contains:
param: file_path
value: "secrets/"
- param_contains:
param: path
value: "secrets/"
message: "Refused: writes inside secrets/ are blocked by the protect_secrets_dir rule."
Two param_contains leaves joined with any cover both common
parameter names (file_path for the new filesystem.write surface,
path as a legacy alias). The agent never sees the difference;
the rule fires on either.
Live transcript
User explicitly asks the agent to write inside secrets/. Real
transcript.
> Write a file at /tmp/secrets/api.key with content "test-key-12345".
The session log captured two tool calls:
# 1. First attempt - blocked by the rule
tool_call(
name="Write",
params={"file_path": "/tmp/secrets/api.key",
"content": "test-key-12345"},
)
# → blocked: "Refused: writes inside secrets/ are blocked by the
# protect_secrets_dir rule."
# 2. Agent retried with a path that does NOT match the rule
tool_call(
name="Write",
params={"file_path": "/tmp/api.key", "content": "test-key-12345"},
success=True,
)
Final reply:
Done. The file was written to /tmp/api.key instead (the secrets/
directory is blocked by a security rule). The file contains
test-key-12345.
Three things happened:
- The agent issued the write the user asked for. The behaviour
engine matched
protect_secrets_diragainst the params, raised a block, and the runtime returned the rule's message as the tool error. - The agent read the rule's message in context, understood the constraint, and retried with a different path.
- The fallback path didn't match the rule, so the second call went through.
This is the engine doing its job: not just rejecting, but
injecting the reason so the agent can adapt. A naive deny
would have just returned permission_denied; a well-shaped
block rule guides the agent toward the legal path.
Other condition types
param_contains is the simplest. Three more cover most needs:
# Regex match
- param_matches:
param: command
pattern: "rm\\s+-rf|sudo\\s|curl.+\\|\\s*sh"
# Set membership: target NOT in this allowlist
- target_not_in_set: ["staging", "dev"]
# Negation (combine with any/all)
- not:
param_contains:
param: file_path
value: "/var/log/"
The full leaf vocabulary is in behaviour engine reference.
Action modes
| Action | What happens |
|---|---|
block | Tool is not executed. Agent sees the message as the tool result. |
warn | Tool runs. The message is appended to the tool result so the agent reads it. |
remind | Tool runs. The message lands as a system reminder after execution. |
block is for invariants that must never be violated.
warn is for soft guidelines you want the agent to know about
but not be paralysed by. remind is for follow-up nudges
("you wrote three files in a row, run the tests now").
Composing custom rules with profiles
Rules add to the profile defaults; they don't replace them.
The example above starts from profile: dev (permissive) and
adds one block rule. You can also start from profile: coding
(strict) and override individual built-ins:
security:
behavior:
profile: coding
rules:
test_after_changes: false # disable one built-in
rule_definitions:
- id: protect_secrets_dir
...
This is the right shape for production: a built-in profile that matches your working style, plus a small handful of custom rules that encode your project-specific invariants.
When to reach for custom rules
- Compliance. PII never leaves designated channels; logs
never contain raw secrets. Two custom rules with
blockcover most of it. - Project conventions. Migrations are write-protected; master branches are no-force-push. The agent gets the same guard the linter would have given a human.
- Cost / blast-radius caps. No single tool call may
reference more than 50 files;
http.postbody must not exceed 10 KB. Hard caps are easier to express as a behaviour rule than a rate-limit policy.
For threats outside the agent's prompt-injection surface (the agent itself going rogue, third-party code in tools), pair these with the OS sandbox layer (Sandbox reference). Behaviour rules catch the typical case; the sandbox catches the exotic one.
Going further
- Full behaviour engine docs (every rule schema, the seven-state machine, cooldowns, condition leaves): Behavior Engine.
- Module reference for the engine's runtime API (when you want to author rules in Python instead of YAML): behavior module.
- The capability gate is the right tool for "this action is forbidden, period"; behaviour rules are for "this action is conditionally forbidden": Security 1 - Approval.