Injection Detection

Detect prompt injection attacks with 64+ patterns across 7 categories. Zero dependencies, sub-millisecond.

Detect prompt injection attacks with 64+ regex patterns across 7 categories. Synchronous, zero dependencies, sub-millisecond. Layer this into your agent pipeline as a first line of defense.

Basic Detection

ts

Note: This is a heuristic pattern matcher, not an LLM classifier. It catches known syntactic patterns but cannot detect novel semantic attacks. For high-security deployments, layer this with an LLM-based classifier.

7 Attack Categories

CategoryPatternsDescription
instruction_override6Attempts to override, disregard, or replace the agent's original instructions
role_manipulation4Attempts to redefine the agent's identity or make it act as a different persona
context_escape3Attempts to leak system prompts or escape the conversation context using delimiters
data_exfiltration2Attempts to send conversation data or system internals to external endpoints
encoding_attack2Uses encoding tricks like base64 payloads or Unicode homoglyphs to bypass detection
social_engineering3Uses urgency, false authority claims, or testing excuses to manipulate the agent
obfuscation8Advanced evasion using zero-width characters, RTL overrides, zalgo text, and Unicode normalization attacks

Score Weighting

The detection score (0 to 1) uses max-weight scoring rather than averaging. This prevents low-weight patterns from diluting high-confidence detections.

  • Base score = weight of the highest-matching pattern (0 to 0.95)
  • Multi-pattern boost = +0.02 per additional pattern match (max +0.10)
  • Multi-category boost = +0.03 per additional category (max +0.10)
  • Final score = min(1.0, base + multi-pattern + multi-category)

Tip: An input matching one high-weight pattern (e.g., override_system at 0.95) scores higher than an input matching many low-weight patterns. Cross-category attacks get the biggest boost.

Configuration

ts

Policy Integration

Use createInjectionGuard() to add injection detection as a policy rule. It scans all string values in the input field recursively, including cross-field concatenation.

ts

API Route Pattern

For HTTP APIs, scan the request body before passing it to your agent.

ts

Need ML-powered detection? The ML Detection module adds an ensemble DeBERTa classifier that catches adversarial inputs the regex patterns miss (requires login, Pro plan).