Skip to main content

Security 3 - Custom behaviour rule that blocks

Advanced 4 showed the behaviour profiles - bundles of rules tuned to a working style. The 14 built-in rules cover the classic patterns (read before edit, test after changes, no shell for files), but every app has at least one app-specific invariant the engine can't know about:

  • "Never write inside secrets/."
  • "All http.post calls must use TLS."
  • "The customers table is read-only."

Custom rules answer that. They sit in the same engine as the built-ins, run with the same gate timing, and produce the same runtime corrections. Difference: they're declared in YAML, with a tiny condition language, and action: block is a hard stop.

The rule shape

security:
behavior:
profile: dev # any base profile
rule_definitions: # NEW format - composable
- id: <unique_id>
trigger: [<tool_name>, ...] # which tools the rule attaches to
when: pre_tool # pre_tool | post_tool | on_text
action: block # block | warn | remind
condition:
any: # any | all
- param_contains:
param: <param_name>
value: <substring>
- param_matches: # regex
param: <param_name>
pattern: <regex>
message: <human-readable rejection>

Three things matter here:

  • trigger narrows the rule to the right tools. [bash] for shell, [write] for file writes, [http.post] for HTTP, etc.
  • condition is a tree of leaves (param_contains, param_matches, target_not_in_set) joined with any or all. Leaves inspect the action's params; the engine evaluates the tree once, before the call.
  • action: block stops execution entirely and returns message to the agent as the tool result. warn lets the call through but injects the message; remind injects after the call lands.

The YAML

Save this as custom-rule-bot.yaml. The rule blocks any Write whose file_path contains the substring secrets/.

app:
app_id: custom-rule-bot
name: Custom Rule Bot
version: "1.0"

runtime:
mode: conversation
workdir_mode: auto
max_turns: 4
timeout: 60

agents:
- id: main
role: assistant
brain:
provider: deepseek
model: deepseek-chat
backend: openai_compat
credential:
ref: deepseek_main
scope: per_user
provider: deepseek
config:
api_key: "{{env.DEEPSEEK_API_KEY}}"
base_url: https://api.deepseek.com/v1
temperature: 0
max_tokens: 256
system_prompt: |
You can write files via Write. Be concise.

tools:
modules:
filesystem: {}
capabilities:
default_policy: auto

security:
behavior:
profile: dev # permissive base
classify_turns: false
rule_definitions:
- id: protect_secrets_dir
trigger: [write]
when: pre_tool
action: block
condition:
any:
- param_contains:
param: file_path
value: "secrets/"
- param_contains:
param: path
value: "secrets/"
message: "Refused: writes inside secrets/ are blocked by the protect_secrets_dir rule."

Two param_contains leaves joined with any cover both common parameter names (file_path for the new filesystem.write surface, path as a legacy alias). The agent never sees the difference; the rule fires on either.

Live transcript

User explicitly asks the agent to write inside secrets/. Real transcript.

> Write a file at /tmp/secrets/api.key with content "test-key-12345".

The session log captured two tool calls:

# 1. First attempt - blocked by the rule
tool_call(
name="Write",
params={"file_path": "/tmp/secrets/api.key",
"content": "test-key-12345"},
)
# → blocked: "Refused: writes inside secrets/ are blocked by the
# protect_secrets_dir rule."

# 2. Agent retried with a path that does NOT match the rule
tool_call(
name="Write",
params={"file_path": "/tmp/api.key", "content": "test-key-12345"},
success=True,
)

Final reply:

Done. The file was written to /tmp/api.key instead (the secrets/
directory is blocked by a security rule). The file contains
test-key-12345.

Three things happened:

  1. The agent issued the write the user asked for. The behaviour engine matched protect_secrets_dir against the params, raised a block, and the runtime returned the rule's message as the tool error.
  2. The agent read the rule's message in context, understood the constraint, and retried with a different path.
  3. The fallback path didn't match the rule, so the second call went through.

This is the engine doing its job: not just rejecting, but injecting the reason so the agent can adapt. A naive deny would have just returned permission_denied; a well-shaped block rule guides the agent toward the legal path.

Other condition types

param_contains is the simplest. Three more cover most needs:

# Regex match
- param_matches:
param: command
pattern: "rm\\s+-rf|sudo\\s|curl.+\\|\\s*sh"

# Set membership: target NOT in this allowlist
- target_not_in_set: ["staging", "dev"]

# Negation (combine with any/all)
- not:
param_contains:
param: file_path
value: "/var/log/"

The full leaf vocabulary is in behaviour engine reference.

Action modes

ActionWhat happens
blockTool is not executed. Agent sees the message as the tool result.
warnTool runs. The message is appended to the tool result so the agent reads it.
remindTool runs. The message lands as a system reminder after execution.

block is for invariants that must never be violated. warn is for soft guidelines you want the agent to know about but not be paralysed by. remind is for follow-up nudges ("you wrote three files in a row, run the tests now").

Composing custom rules with profiles

Rules add to the profile defaults; they don't replace them. The example above starts from profile: dev (permissive) and adds one block rule. You can also start from profile: coding (strict) and override individual built-ins:

security:
behavior:
profile: coding
rules:
test_after_changes: false # disable one built-in
rule_definitions:
- id: protect_secrets_dir
...

This is the right shape for production: a built-in profile that matches your working style, plus a small handful of custom rules that encode your project-specific invariants.

When to reach for custom rules

  • Compliance. PII never leaves designated channels; logs never contain raw secrets. Two custom rules with block cover most of it.
  • Project conventions. Migrations are write-protected; master branches are no-force-push. The agent gets the same guard the linter would have given a human.
  • Cost / blast-radius caps. No single tool call may reference more than 50 files; http.post body must not exceed 10 KB. Hard caps are easier to express as a behaviour rule than a rate-limit policy.

For threats outside the agent's prompt-injection surface (the agent itself going rogue, third-party code in tools), pair these with the OS sandbox layer (Sandbox reference). Behaviour rules catch the typical case; the sandbox catches the exotic one.

Going further

  • Full behaviour engine docs (every rule schema, the seven-state machine, cooldowns, condition leaves): Behavior Engine.
  • Module reference for the engine's runtime API (when you want to author rules in Python instead of YAML): behavior module.
  • The capability gate is the right tool for "this action is forbidden, period"; behaviour rules are for "this action is conditionally forbidden": Security 1 - Approval.