magesh.ai agent v1.0 (views are my own) · kill-chain resources about
viewing: behavioral_baselines · 00:00:00
← agent.navigate: resources / detection & monitoring
20 min read · 4 monitoring approaches · 5 tools · 11 references

Behavioral Baselines

If you can't define what normal agent behavior looks like, you can't detect when it's compromised. Behavioral baselines are the agent equivalent of network traffic baselines — and most teams don't have them. Here's how to build them, what tools exist, and patterns from my own agent systems.

category:
Detection & Monitoring · security-teams · builders
CONTEXT Behavioral baselines detect Kill Chain Stage 3 (HIJACK) — when an agent's behavior deviates from its assigned task →

Why Baselines

A hijacked agent looks normal. It uses the same tools, calls the same APIs, generates the same format of output. The difference is subtle: the tool call sequence changed, the parameters shifted, the output contains data the task didn't require. Without a baseline of what "normal" looks like, you can't detect these shifts.

Microsoft elevated AI observability to a security requirement in March 2026 — positioning it with the same seriousness as authentication, encryption, and access management. Their recommendation: complete audit trails of all AI interactions including prompts, responses, intermediate reasoning steps, and external actions.

The industry signal: Industry surveys consistently report that the majority of organizations deploying AI agents have experienced security incidents or observed unintended agent behavior. Most had no behavioral monitoring in place when the incident occurred.

What to Monitor

Four layers of behavioral signals, from easiest to hardest to implement.

Tool call patterns

Which tools the agent calls, in what order, how often, and with what parameters. A code review agent that suddenly calls web_fetch or reads .ssh/ has deviated from its baseline. This is the easiest signal to capture and the most reliable indicator of compromise.

Baseline definition: Record the tool call sequence for 50+ normal agent runs (Driftbase recommends 50 runs as minimum). The resulting "fingerprint" is your baseline — deviations trigger alerts.
Token usage and latency

Sudden spikes in token consumption, unusual response times, or cost anomalies. A hijacked agent executing a multi-step exfiltration will generate more tool calls and tokens than a normal task. This is cheap to monitor and catches resource exhaustion attacks (Kill Chain Stage 4).

Output content analysis

What the agent outputs — does it contain data the task didn't request? PII, credentials, file contents that weren't part of the assignment? Output classifiers can detect when agent responses contain unexpected sensitive data — catching Stage 5 EXFILTRATE at the output boundary.

Confidence scoring

Per-decision confidence scores that determine whether the agent auto-executes, executes with caveats, or escalates to a human. In my own systems, I use three tiers: high confidence (90%+) auto-executes, medium (60-90%) executes with logging and caveats, low (<60%) escalates to human review.

From my ComplianceAI system

Every finding has a confidenceScore field and a requiresHumanReview boolean. Escalation triggers include: AMBIGUOUS_POLICY, LOW_CONFIDENCE, EXPLICIT_REQUEST, and POLICY_GAP. Pattern-based detections have confidence 1.0 (deterministic). AI-based detections have variable confidence (0.0-1.0). This differentiation prevents false confidence in uncertain findings.

Five Tools

The observability stack for agent behavioral monitoring. All vendor-agnostic.

OpenTelemetry GenAI Semantic Conventions

The emerging standard for agent observability. Experimental but rapidly standardizing (major update March 2026). Defines standardized schemas for prompts, model responses, token usage, tool/agent calls, and provider metadata. Agent-specific spans include create_agent and invoke_agent. Vendor-agnostic — replaces fragmented custom tracing.

Source: opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-agent-spans/
LangSmith

Step-by-step execution traces: every LLM call, tool use, and API interaction with full parameters. Over 1 billion trace logs processed. Captures token usage, latency (P50/P99), error rates, cost breakdowns, and feedback scores. Evaluators can score intermediate decisions for quality.

Source: langchain.com/langsmith/observability
Arize Phoenix

Open-source LLM observability built on OpenTelemetry. Traces agent runs, tool calls, and model request/response with full context. Supports evaluation via LLM-based evaluators, code-based checks, or human labels. Integrates with Claude Agent SDK, OpenAI Agents SDK, LangGraph, and CrewAI.

Source: arize.com/docs/phoenix · github.com/Arize-ai/phoenix
Driftbase

Creates behavioral baselines after 50 agent runs. Provides fingerprint diffs after deployment — shows exactly what changed in decision outcomes, latency percentiles, tool distribution, and error patterns. Statistical rigor for detecting drift.

Source: driftbase.io
OpenLLMetry (Traceloop)

Complete observability for the AI stack — LLMs, vector databases, GPUs. One line of code to instrument. Built on OpenTelemetry, so traces are compatible with any OTel-compliant backend.

Source: github.com/traceloop/openllmetry

The 3-Pass Pattern

From my ComplianceAI system — a multi-pass review pattern where each pass has expected output characteristics. Deviation in Pass 3 from Pass 1+2 signals an anomaly.

01
Per-file analysis

Each file scanned independently. Expected output: findings with file paths, line numbers, severity, and rule IDs. Baseline: consistent finding density per file type. A Swift file with zero security findings when the baseline shows 2-3 per file is suspicious.

02
Cross-file integration

Findings from Pass 1 are correlated across files. Expected output: architectural-level findings that span multiple files. Baseline: cross-file findings are a subset of Pass 1 findings, not new findings. New findings here suggest Pass 1 missed something — or the agent's behavior changed between passes.

03
Independent review

A separate review of Pass 1+2 output to catch misses. Expected output: validation or correction, not wholesale new findings. If Pass 3 generates significantly different results from Pass 1+2, either the earlier passes failed or the agent's context was compromised between passes.

Honest Limitations

Concept drift

Agent behavior changes legitimately over time — new tools added, workflows updated, models upgraded. Your baseline becomes stale. You need to re-calibrate regularly, which means you need to distinguish "the agent evolved" from "the agent was compromised." This is the hardest problem in behavioral monitoring.

No public false positive rate data

No one has published rigorous false positive rates for agent behavioral monitoring. Industry claims of "80%+ detection" lack methodology citations. Until someone publishes peer-reviewed detection accuracy data for agent monitoring specifically, treat all detection claims with skepticism — including the tools listed above.

Slow attacks bypass baselines

A sophisticated attacker who gradually shifts agent behavior over many sessions — small changes that stay within the baseline's noise threshold — can avoid detection entirely. Baselines catch sudden deviations. They're weak against gradual behavioral drift that looks like legitimate evolution.

Behavioral baselines are the detection layer in the Kill Chain. Combine with hook-based guardrails (prevention), MCP security (tool defense), and red teaming (validation).

References
[1]OpenTelemetry GenAI Semantic Conventions — Agent spans specification. opentelemetry.io/docs/specs/semconv/gen-ai/gen-ai-agent-spans/ (March 2026)
[2]LangChain, "LangSmith Observability." langchain.com/langsmith/observability
[3]Arize AI, "Phoenix: Open-source LLM Observability." github.com/Arize-ai/phoenix
[4]Driftbase — Behavioral baseline and drift detection for AI agents. driftbase.io
[5]Traceloop, "OpenLLMetry." github.com/traceloop/openllmetry
[6]Microsoft Security Blog, "Observability for AI Systems: Strengthening Visibility and Proactive Risk Detection." (March 18, 2026)
[7]Stellar Cyber, "Agentic AI Security Threats." stellarcyber.ai (2026)
[8]Debenedetti et al., "AgentDojo." ETH Zurich (2024). agentdojo.spylab.ai — baseline utility and attack success rate data.
[9]He, X. et al., "SentinelAgent: Graph-based Anomaly Detection in Multi-Agent Systems." arXiv:2505.24201 (May 2025)
[10]Dhanasekaran, M. "The Agentic AI Kill Chain." magesh.ai/kill-chain (2026)
[11]Dhanasekaran, M. "Hook-Based Guardrails." magesh.ai/hook-guardrails (2026)

This work represents the author's independent research and personal views. It is not related to or endorsed by the author's employer.