Injection Detection
Detect prompt injection attacks with 64+ patterns across 7 categories. Zero dependencies, sub-millisecond.
Detect prompt injection attacks with 64+ regex patterns across 7 categories. Synchronous, zero dependencies, sub-millisecond. Layer this into your agent pipeline as a first line of defense.
Basic Detection
Note: This is a heuristic pattern matcher, not an LLM classifier. It catches known syntactic patterns but cannot detect novel semantic attacks. For high-security deployments, layer this with an LLM-based classifier.
7 Attack Categories
| Category | Patterns | Description |
|---|---|---|
instruction_override | 6 | Attempts to override, disregard, or replace the agent's original instructions |
role_manipulation | 4 | Attempts to redefine the agent's identity or make it act as a different persona |
context_escape | 3 | Attempts to leak system prompts or escape the conversation context using delimiters |
data_exfiltration | 2 | Attempts to send conversation data or system internals to external endpoints |
encoding_attack | 2 | Uses encoding tricks like base64 payloads or Unicode homoglyphs to bypass detection |
social_engineering | 3 | Uses urgency, false authority claims, or testing excuses to manipulate the agent |
obfuscation | 8 | Advanced evasion using zero-width characters, RTL overrides, zalgo text, and Unicode normalization attacks |
Score Weighting
The detection score (0 to 1) uses max-weight scoring rather than averaging. This prevents low-weight patterns from diluting high-confidence detections.
- Base score = weight of the highest-matching pattern (0 to 0.95)
- Multi-pattern boost = +0.02 per additional pattern match (max +0.10)
- Multi-category boost = +0.03 per additional category (max +0.10)
- Final score = min(1.0, base + multi-pattern + multi-category)
Tip: An input matching one high-weight pattern (e.g., override_system at 0.95) scores higher than an input matching many low-weight patterns. Cross-category attacks get the biggest boost.
Configuration
Policy Integration
Use createInjectionGuard() to add injection detection as a policy rule. It scans all string values in the input field recursively, including cross-field concatenation.
API Route Pattern
For HTTP APIs, scan the request body before passing it to your agent.
Need ML-powered detection? The ML Detection module adds an ensemble DeBERTa classifier that catches adversarial inputs the regex patterns miss (requires login, Pro plan).