Security Model

Designed to defend, not just observe

governance-sdk enforces before execution — not after. Every design decision prioritizes security: zero dependencies, no network calls, append-only audits, and 64+ injection patterns blocking the attacks that are happening right now in production AI deployments.

Security principles

◈

Zero external network calls

Enforcement is entirely in-process. No calls to external services, no telemetry, no phone-home. Your agent's decisions never leave your runtime.

◆

No eval() or dynamic code

The entire SDK is statically analyzable. No eval(), no new Function(), no dynamic imports that could be hijacked. Works safely in edge runtimes.

◉

Append-only audit semantics

Audit events are written and never modified. The HMAC chain means deletion is detected — you can't silently erase a decision from the audit trail.

◎

Signing key isolation

The HMAC signing key is provided by you at startup. It never leaves your environment. Rotate it without breaking historical chain verification.

◐

Zero dependencies

No supply chain attack surface. The entire governance enforcement path is first-party code. Nothing from npm can compromise your governance layer.

◫

TypeScript-native, not transpiled

Shipped as TypeScript source with full type safety. You can audit exactly what runs. No minified bundles with hidden behavior.

64+-pattern injection detection

Run on every user-provided string before it reaches the agent. Categories allow targeted response — block all vs log only vs require approval.

Instruction Override5 patterns

Attempts to replace or nullify the agent's original system prompt or instructions.

ignore_previous_instructionsdisregard_system_promptoverride_withforget_everythingnew_instructions

Role Switch4 patterns

Forces the agent to adopt a different persona, often one without safety constraints.

you_are_nowpretend_you_areact_as_ifswitch_to_mode

Data Exfiltration4 patterns

Instructs the agent to send internal data, credentials, or prompts to attacker-controlled endpoints.

send_to_externalexport_data_toforward_contentsleak_prompt

Command Injection4 patterns

Embeds OS or interpreter commands inside user input, hoping they execute in the agent's context.

execute_commandrun_shellsystem_callexec_

Goal Hijacking3 patterns

Overrides the agent's stated objectives with attacker-defined goals.

your_real_goal_isprimary_objectivesecret_mission

Prompt Leakage2 patterns

Attempts to extract the agent's system prompt, revealing business logic or credentials.

repeat_your_instructionsshow_your_system_prompt

Usage

import { detectInjection } from "governance-sdk";

const result = detectInjection(userMessage);

if (result.detected) {
  console.log(result.category);  // "instruction_override"
  console.log(result.pattern);   // "ignore_previous_instructions"
  console.log(result.score);     // 0.94 (confidence)
  return { blocked: true };
}

Threat model

Six threat categories with mitigations in governance-sdk v0.5.0.

HIGH

Prompt injection via user input

detectInjection() on all user-sourced strings64+ patterns across 7 categoriesCategory-aware blocking (override vs exfil vs role-switch)

CRITICAL

Agent tool abuse (unauthorized actions)

blockTools() — exact or glob matchrequireLevel() — governance score gaterequireSequence() — must complete prerequisite tools first

HIGH

Runaway agent (infinite loops, resource exhaustion)

kill() / killAll() at priority 999rateLimit() per hour/daytokenBudget() hard cap

MEDIUM

Audit log tampering

HMAC-SHA256 hash chainchain.verify() detects any modificationbrokenAt reports exact tamper location

HIGH

Unauthorized high-risk actions (payments, deletes)

requireApproval() — human-in-the-loop gatetimeWindow() — restrict to business hoursrequireLevel(4+) for sensitive namespaces

LOW

Supply chain compromise

Zero runtime dependenciesFirst-party enforcement code onlyStatically analyzable — no eval()

Responsible Disclosure

Found a security issue in governance-sdk? Please report it privately via GitHub Security Advisories before public disclosure. We target a 72-hour initial response for all reports.