FAQ
Why does Stave use "unsafe state" instead of "vulnerability" or "misconfiguration"?
Stave borrows from systems safety engineering (IEC 61508, DO-178C), not from the security vulnerability lexicon.
| Concept | Security terminology | Safety engineering terminology | Stave uses |
|---|---|---|---|
| A bad condition | Vulnerability, misconfiguration | Unsafe state | Unsafe state |
| A rule to check | Policy, rule, check | Safety invariant, control | Control |
| A detected problem | Alert, violation, issue | Finding, deviation | Finding |
| How long the problem persists | — (rarely tracked) | Unsafe duration, exposure window | Unsafe duration |
| Proof of the problem | Evidence (forensics) | Evidence (safety case) | Evidence |
Why this matters: security engineers are familiar with terms like "insecure configuration," "vulnerability," and "misconfiguration." Stave deliberately does not use these terms. Instead, it expresses the same concepts as "unsafe state," "unsafe duration," and "finding" — because Stave borrows its principles from mature engineering disciplines (aviation, aeronautics, systems safety) that have decades of rigorous methodology for proving system safety, but have no equivalent products in the cybersecurity domain.
No existing security tool applies safety engineering rigor to infrastructure configuration. CSPM tools detect misconfigurations but don't track duration, don't produce deterministic proofs, and don't work offline. IaC scanners check templates but not observed state. Policy engines make runtime decisions but don't evaluate historical evidence. Stave brings the safety engineering approach — state-based reasoning, duration tracking, deterministic proof, offline evaluation — to a domain that has never had it:
- State-based reasoning — Stave evaluates whether observed state satisfies a control, not whether a known CVE applies.
- Duration tracking — safety engineering cares about how long a system remains in an unsafe state, not just that it entered one. A bucket that was public for 5 minutes during a deploy is different from one that has been public for 6 months.
- Deterministic proof — same inputs always produce the same findings. This is a safety case requirement, not a typical security scanner feature.
- Offline evaluation — safety cases are evaluated against recorded evidence, not live systems. Stave works the same way.
The terminology reflects the origin. "Unsafe state" is not a synonym for "insecure configuration" — it carries the safety engineering semantics of state tracking, duration measurement, and provable assertion that the security term does not.
What is "System Invariant as Code"?
A system invariant is a property that must always hold true for your infrastructure. "As Code" means you define these invariants as version-controlled YAML files and evaluate them programmatically.
Example invariant: "PHI buckets are never publicly readable."
This is different from:
- Policy-as-Code (OPA, Sentinel) — evaluates policy decisions at request time. Stave evaluates invariants over historical snapshots.
- Infrastructure-as-Code scanning (tfsec, Checkov) — checks templates before deployment. Stave checks actual observed configurations after deployment.
- CSPM (Wiz, Prisma, AWS Config) — continuously monitors live cloud APIs. Stave evaluates offline, with no credentials.
See System Invariant as Code for the formal model.
How does System Invariant as Code differ from OPA Rego and other policy engines?
The paradigm is different. OPA, Sentinel, and similar tools are policy decision engines — they answer "is this request allowed?" at a point in time, typically at an admission gate or CI step. Stave is a safety evaluation engine — it answers "does observed infrastructure state satisfy declared invariants, and for how long has it been unsafe?"
| OPA / Rego | Stave | |
|---|---|---|
| Input | Structured request or document | Timestamped observation snapshots |
| Evaluation model | Policy decision (allow/deny) | Invariant proof (safe/unsafe + duration) |
| Language | Rego (general-purpose logic) | YAML predicates (ctrl.v1 schema) |
| Time awareness | Single point in time | Multi-snapshot duration tracking |
| Primary use | Admission control, CI gates | Offline audit, safety evidence, preflight |
| Output | Decision (boolean + reason) | Findings with evidence and remediation |
Stave's YAML controls are intentionally narrower than a general-purpose language like Rego. This is a deliberate trade-off: controls are constrained to a closed set of predicate operators (eq, ne, in, missing, any_match, etc.) so they can be statically analyzed, validated by JSON Schema, and evaluated deterministically without an interpreter. You cannot write arbitrary logic — only declare invariants the engine knows how to prove.
The two approaches are complementary. Use OPA for runtime policy decisions and admission control. Use Stave for offline, deterministic safety proofs over historical snapshots.
Why "control" and not "rule" or "policy"?
Externally, Stave is described as System Invariant as Code — invariants are the formal concept. Internally, the codebase uses the term control (as in ctrl.v1, CTL.S3.PUBLIC.001) to align with NIST SP 800-53 and ISO 27001, where a control is a safeguard that reduces risk.
This is a deliberate choice to make the codebase accessible to security researchers and auditors who review it. Someone auditing Stave's control definitions should find familiar terminology — controls, findings, evidence — mapped to established security frameworks, not abstract formal language.
"Rule" is ambiguous (firewall rule? linting rule?). "Policy" implies runtime enforcement. "Control" is precise: a declarative assertion evaluated against evidence, which is exactly what each ctrl.v1 YAML file defines.
Why is there a semantic gap between the domain and the code?
In domain-driven design, you aim for zero semantic gap — the code should use the same language as the domain. Stave's domain is System Invariant as Code, so ideally the codebase would use "invariant" everywhere: invariant.v1 schema, INV.S3.PUBLIC.001 identifiers, --invariants flag.
We deliberately deviate from this ideal. The codebase uses "control" (ctrl.v1, CTL., --controls) instead of "invariant." This is a conscious trade-off between two audiences:
| Audience | Preferred term | Why |
|---|---|---|
| Domain theory / formal methods | Invariant | Precise formal meaning: a property that must always hold |
| Security researchers / auditors | Control | Industry-standard term (NIST, ISO 27001) they already know |
We chose the security audience. Stave is a security tool, and the people who review its control definitions, audit its findings, and evaluate its codebase are security practitioners. If they open controls/s3/CTL.S3.PUBLIC.001.yaml and see a control with a finding and evidence, they know exactly what they are looking at. If they saw invariants/s3/INV.S3.PUBLIC.001.yaml with an "invariant violation," they would need to learn a new vocabulary to do the same review.
The paradigm name — System Invariant as Code — stays as-is in external documentation, talks, and comparisons. It accurately describes what Stave does and positions it in a category distinct from Policy-as-Code or IaC scanning. The codebase implements that paradigm using terminology that security professionals already understand.
This is the one place where we knowingly accept a semantic gap. It is documented here so future contributors understand the choice was intentional, not an oversight.
Why does Stave need two snapshots?
One snapshot tells you the current state. Two snapshots (or more) let Stave calculate how long an asset has been unsafe.
A control with type: unsafe_duration and --max-unsafe 168h means: "this asset must not remain in an unsafe state for more than 7 days." To evaluate that, Stave needs at least two points in time to measure the duration window.
Controls with type: unsafe_state only need one snapshot — they check current state regardless of duration.
Why does Stave work offline with no credentials?
Three reasons:
- Air-gapped environments — security review and audit often happen in isolated networks where cloud API access is unavailable or prohibited.
- Deterministic replay — the same snapshot files produce the same findings on any machine, any time. Live API queries introduce non-determinism (state changes, API throttling, clock differences).
- Separation of concerns — extracting data from cloud APIs is a different problem from evaluating safety invariants. Stave handles evaluation; external extractors handle extraction. See Building an Extractor.
How is "evidence" different from "observation"?
Observations are raw input — point-in-time snapshots of infrastructure state (obs.v0.1 JSON files). They contain everything captured, whether relevant or not.
Evidence is output — the specific subset of observation data that proves a particular finding. When Stave detects a violation, it attaches the relevant property values, timestamps, and duration calculations as evidence.
Observations are what you feed in. Evidence is what Stave produces to support each finding.
What S3 blind spots does Stave detect that AWS Trusted Advisor misses?
AWS Trusted Advisor checks whether S3 buckets are publicly accessible. Stave evaluates 43 controls that go deeper — detecting risks that Trusted Advisor cannot see because of how it collects data and what it checks.
1. Policy-denied scanning (the Fog Security bypass)
In August 2025, Fog Security disclosed that an attacker with AWS access can add a bucket policy denying s3:GetBucketAcl, s3:GetBucketPolicyStatus, and s3:GetPublicAccessBlock to the Trusted Advisor scanning role. The bucket can be fully public, but Trusted Advisor reports green — "no problems detected" — because it cannot read the policy. AWS patched this to show a "Warn" status, but the underlying issue remains: if the scanner is denied access, it cannot prove safety.
Stave handles this via CTL.S3.INCOMPLETE.001 — if required fields are missing from the observation (because the scanning role was denied access), the bucket is flagged as unsafe. Missing data is not safe data.
2. Latent public exposure behind Public Access Block
A bucket with Public Access Block (PAB) enabled may have an underlying policy granting Principal: "*". Trusted Advisor reports it as safe because PAB prevents public access at the API level. But removing PAB — one toggle — immediately makes the bucket public.
Stave detects this via CTL.S3.PUBLIC.005 — latent exposure is a finding even when masked by a compensating control.
3. ACL escalation paths
A bucket ACL may grant WRITE_ACP to public or authenticated users. This allows anyone to call PutBucketAcl and grant themselves FULL_CONTROL, then read or modify every object. Trusted Advisor checks whether a bucket is publicly readable — it does not check whether the public can modify the ACL itself.
Stave detects this via CTL.S3.ACL.ESCALATION.001.
Detection comparison
| Blind spot | Trusted Advisor | Stave |
|---|---|---|
| Policy denies scanning role access | Reports green (or "Warn" post-patch) | CTL.S3.INCOMPLETE.001 — flags missing data as unsafe |
| Latent exposure behind PAB | Reports safe (PAB is on) | CTL.S3.PUBLIC.005 — flags underlying public policy |
| ACL escalation (WRITE_ACP) | Not checked | CTL.S3.ACL.ESCALATION.001 — flags privilege escalation path |
| Unsafe duration tracking | Not tracked | All controls track how long a bucket has been unsafe |
| Cross-account policy grants | Limited checks | CTL.S3.ACCESS.001 — flags unauthorized cross-account access |
| Authenticated-users group grants | Not distinguished from public | CTL.S3.AUTH.READ.001, CTL.S3.AUTH.WRITE.001 — separate controls |
References:
- Fog Security: Mistrusted Advisor — Evading Detection with Public S3 Buckets
- SecurityWeek: AWS Trusted Advisor Tricked Into Showing Unprotected S3 Buckets as Secure
- CheckRed: AWS Bypass — Misconfigurations Still Threaten Cloud Security
How does Stave protect against accidental destruction in production?
Stave uses a two-key safety model for any commands an operator marks as sensitive via the blocked_commands config. Both conditions must be true for the production guard to activate:
- Key 1: Edition — the binary must be built with the dev edition label (
stave-dev) - Key 2: Environment — the runtime must be detected as production (
STAVE_ENV=productionor a context withproduction: true)
This is defense-in-depth. A single misconfiguration cannot cause a disaster:
| Scenario | Binary | Environment | Guard activates? | Result |
|---|---|---|---|---|
| CI pipeline | stave | production | No | All commands run freely (standard deployment) |
| Developer laptop | stave or stave-dev | not production | No | All commands run freely (local sandbox) |
| Shared environment | stave-dev | production | Yes | Destructive commands blocked, read-only commands warn |
| Accidental env var | stave | production | No | Safe — production binary never activates guard |
Why two keys instead of one? If the guard only checked STAVE_ENV, then unsetting the variable (or a typo) would silently disable protection. Requiring the dev edition binary as the first key means the standard stave binary is always safe regardless of environment configuration. You have to intentionally deploy the dev binary to a production-marked environment for the guard to matter.
Layer 1: Environment detection
Set STAVE_ENV=production in production CI/CD runners and deployment environments, or mark contexts as production:
contexts:
prod-us-east:
project_root: /ops/stave
production: true
When detected, the dev binary:
- Hard-blocks any command listed in
blocked_commandswith a clear error - Warns on read-only commands (allows break-glass debugging)
export STAVE_ENV=production
# With `blocked_commands: [enforce]` configured:
stave-dev enforce # BLOCKED: "command 'enforce' is blocked in production"
stave-dev doctor # WARNING printed, then runs (read-only)
Layer 2: IAM boundaries (the gold standard)
The most robust defense ensures developer credentials cannot modify production data at the cloud layer:
| Environment | Binary | Credentials | Can read | Can write/delete |
|---|---|---|---|---|
| CI/CD pipeline | stave | Service account | Yes | Yes (archive only) |
| Developer laptop | stave-dev | Developer IAM role | Yes (break-glass) | No |
| Local sandbox | stave or stave-dev | Sandbox credentials | Yes | Yes |
Why are there two binaries (stave and stave-dev)?
Both binaries contain identical commands. Every command — apply, diagnose, trace, controls, lint, inspect, doctor, snapshot diff — ships in both.
The only difference is the edition label:
stave | stave-dev | |
|---|---|---|
| Edition | production | dev |
--version output | 0.0.3 (production) | 0.0.3 (dev) |
| Production guard | Never activates | Activates when STAVE_ENV=production |
| Panic recovery message | Suggests doctor | Suggests bug-report |
Why not a single binary with a flag? The two-key model requires the safety decision to be made at build time (which binary to deploy), not at runtime (which flag to pass). A --dev flag could be accidentally included in a CI script. A deployment that installs stave instead of stave-dev is safe by construction — there is no flag to discover, no config to override.
When to use which:
stave— standard deployment for CI pipelines, production evaluation, and automated workflows. The production guard never activates, so all commands run without warnings or blocks.stave-dev— for shared environments where you want the production guard active. Deploy alongsideSTAVE_ENV=productionor production-marked contexts to block destructive commands while allowing break-glass debugging.
What is the output contract schema (out.v0.1)?
Every stave apply command produces JSON conforming to the out.v0.1 schema. This is a stable machine-readable contract that downstream tools — CI pipelines, dashboards, SIEM integrations, custom scripts — can rely on.
The schema defines two output kinds:
evaluation(stave apply) — findings from running controls against observationsverification(stave check) — before/after comparison showing resolved, remaining, and introduced findings
Evaluation output structure
| Field | Description |
|---|---|
run | Reproducibility metadata: tool version, --now, --max-unsafe, snapshot count, input file hashes |
summary | Aggregate counts: assets_evaluated, attack_surface (currently unsafe), violations (exceeded threshold) |
findings[] | Each violation with control ID, asset ID, evidence (timestamps, duration, misconfigurations), and remediation guidance |
exempted_assets[] | Assets skipped by exemption rules (with matched pattern and reason) |
excepted_findings[] | Findings suppressed by exception rules — still evaluated, but partitioned out of the violation count |
remediation_groups[] | Findings clustered by shared fix plan per asset |
skipped[] | Controls that could not be evaluated (e.g., missing asset types) |
extensions | Control source metadata, enabled packs, resolved control IDs |
Design decisions
- Exemptions vs exceptions — Exemptions skip entire assets before evaluation. Exceptions suppress specific control+asset findings after evaluation. Excepted findings appear in
excepted_findings, notfindings, so nothing is silently dropped. - Input hashes — SHA-256 hashes of every input file are included in
run.input_hashesfor audit reproducibility. Given the same files, the same output is produced. - Remediation groups — When multiple findings on the same asset share a fix plan, they are grouped together so the operator sees one remediation action, not redundant steps.
Accessing the output
# JSON output for piping to jq or other tools
stave apply --observations observations --format json
# Extract just the finding control IDs
stave apply --observations observations --format json | jq '[.findings[].control_id]'
# Count violations
stave apply --observations observations --format json | jq '.summary.violations'
Full field-by-field reference: Output Schema (out.v0.1)
JSON Schema source: schemas/output/v1/output.schema.json
What does Stave not do?
- No live scanning — it does not query cloud APIs during evaluation.
- No auto-remediation — it produces findings and fix guidance, not infrastructure changes.
- No plugin execution — it does not run arbitrary code, scripts, or third-party plugins.
- No runtime agents — nothing is deployed into your infrastructure.
Stave is a pure function: files in, findings out.
Are all controls YAML or are some implemented in Go?
All 246 controls are declarative YAML. Zero controls are implemented as Go functions.
Every control is a YAML file with an unsafe_predicate evaluated by the CEL engine. Adding a new control means writing a YAML file — no Go code, no compilation, no engine changes.
Compliance profiles (HIPAA, CIS, SOC 2, PCI-DSS, NIST, FedRAMP, GDPR, FFIEC, ISO 27001, NIST CSF) are implemented as compliance tags on existing YAML controls, filtered by --profile. The engine doesn't know which framework is being evaluated — it evaluates the same predicates and attaches the compliance requirement IDs to the findings.
Compound risk detection, duration tracking, severity grouping, and compliance citations are features of the evaluation engine, not separate control implementations. They work across all YAML controls automatically.
Google CEL Library
For the predicate engine that is mostly using google cel library, is it possible to open a PR that moves that functionality to it? Is it appropriate to ask the library maintainer to implement it (is it in their scope and generic enough?)
No, it's not appropriate. Stave's CEL usage is domain-specific — it evaluates unsafe_predicate YAML structures (with all/any/field/op/value nodes) against asset property maps. This is a DSL built on top of CEL, not CEL itself.
What stave does:
- Parses ctrl.v1 YAML predicates into an internal UnsafePredicate tree
- Translates field ops (eq, ne, in, missing, present, not_subset_of_field) into CEL expressions
- Compiles the CEL expression against asset properties
- Evaluates with the CEL runtime
The google/cel-go library provides steps 3 and 4 — the expression compiler and runtime. Steps 1 and 2 are stave-specific domain logic that wouldn't belong in the CEL library because:
- The unsafe_predicate YAML schema is stave's invention, not a CEL concept
- The field operator set (eq, ne, missing, present, not_subset_of_field) is stave's domain language
- The translation from YAML tree → CEL expression string is stave's compiler, not a general-purpose tool
- The AssetEvalContext that maps properties.storage.encryption.enabled paths to CEL variables is stave's binding layer
The CEL library maintainers' scope is the CEL specification — a general-purpose expression language. They wouldn't accept a YAML-to-CEL translator for infrastructure control predicates. That's application logic.
What could be contributed upstream (if it doesn't already exist) would be generic CEL utilities like custom type adapters or extension functions. But stave's current usage of cel-go is standard — it creates programs, registers variables, and evaluates. Nothing is missing from the library.
How does the CI/CD workflow for stave work? Where is the snapshots stored?
Stave's CI/CD integration works as a gatekeeper on locally stored snapshots — no cloud calls during evaluation.
How it works:
- An extractor (external to stave, any language) calls AWS APIs and produces obs.v0.1 JSON files
- Those JSON files are committed to the repo or stored as CI artifacts alongside the infrastructure code
- stave apply evaluates them in the pipeline — same as running locally
Where snapshots are stored — it depends on the workflow:
Option A: Committed to the repo (simplest) my-infra-repo/ observations/ 2026-03-28T000000Z.json 2026-03-29T000000Z.json controls/ # or use built-in packs stave.yaml The pipeline runs stave apply --observations observations/ on every PR. Snapshots are versioned with the code.
Option B: CI artifact from a prior step A scheduled job runs the extractor, produces snapshots, and uploads them as artifacts. A downstream job downloads them and runs stave apply.
Option C: Mounted volume in Docker
The extractor writes to a directory, the stave container mounts it:
docker run --rm -v $(pwd)/snapshots:/work/observations stave-tutorials
stave apply --observations observations --max-unsafe 7d --format json
Stave itself never stores snapshots. It reads from a directory, evaluates, and writes findings to stdout. Where the snapshots live is the user's choice — repo, artifact store, S3, local disk.
The CI workflow in stave-guide/how-to/ci-cd-integration.md documents all the patterns: GitHub Actions (build from source or Docker), GitLab CI, baseline tracking, SARIF upload, and gating.
What is the purpose of now flag in apply command?
--now overrides the current time used to calculate unsafe durations. Stave computes how long an asset has been in an unsafe state by measuring from when the violation was first observed to "now." Without --now, that's the real wall clock — which means the output changes every second, making it impossible to reproduce results or write golden tests.
With --now 2026-01-15T00:00:00Z, the evaluation is frozen in time: the same inputs always produce the same findings, same durations, same safety status. This is essential for:
- Golden tests — commit expected output, diff byte-for-byte
- CI reproducibility — same commit produces same result regardless of when CI runs
- Demo scenarios — the Docker demo pins --now so findings are stable
- Verification — stave apply verify uses --now to confirm deterministic output
Which ones are validated using JSON schema?
output of stave schemas
Data Contracts: control ctrl.v1 observation obs.v0.1 output out.v0.1
Diagnostic Contracts: diagnose diagnose.v1 diff diff.v0.1
Command Output Contracts: baseline baseline.v0.1 ci_diff ci_diff.v0.1 enforce enforce.v0.1 fix_loop fix_loop.v0.1 gate gate.v0.1 snapshot_archive snapshot_archive.v0.1 snapshot_plan snapshot_plan.v0.1 snapshot_prune snapshot_prune.v0.1 snapshot_quality snapshot_quality.v0.1 validate validate.v0.1
Artifact Contracts: bug_report bug-report.v0.1 control_crosswalk_resolution control-crosswalk-resolution.v1 security_audit security-audit.v1 security_audit_artifacts security-audit-artifacts.v1 security_audit_run_manifest security-audit-run-manifest.v1
Validated with JSON Schema (4):
┌─────────────┬─────────────────────────────────────────────────────────────────┐
│ Schema │ Validation Function │
├─────────────┼─────────────────────────────────────────────────────────────────┤
│ ctrl.v1 │ ValidateControlYAML() — control loader │
├─────────────┼─────────────────────────────────────────────────────────────────┤
│ obs.v0.1 │ ValidateObservationJSON() — observation loader │
├─────────────┼─────────────────────────────────────────────────────────────────┤
│ out.v0.1 │ ValidateEvaluation() / ValidateVerification() — safety envelope │
├─────────────┼─────────────────────────────────────────────────────────────────┤
│ diagnose.v1 │ ValidateDiagnose() — safety envelope │
└─────────────┴─────────────────────────────────────────────────────────────────┘
Plus finding.v1 (internal, not in stave schemas output) — validated optionally in the finding writer.
No JSON Schema (20):
All command output contracts (baseline.v0.1, ci_diff.v0.1, enforce.v0.1, gate.v0.1, validate.v0.1, etc.), all artifact contracts (security-audit.v1, bug-report.v0.1, etc.), and diff.v0.1 exist only as version constants in kernel/schema.go. They're stamped into output JSON as schema_version fields but have no .schema.json file and no runtime validation.
The validated schemas are the ones on the input boundary (control YAML, observation JSON) and the output contract (evaluation output, diagnose output) — the core data contracts that external tools and CI pipelines consume.
How are the testscripts structured?
Stave uses testscript (from github.com/rogpeppe/go-internal) for end-to-end CLI tests. The test harness lives in cmd/stave/main_test.go:
func TestMain(m *testing.M) {
testscript.Main(m, map[string]func(){
"stave": staveMain,
})
}
func TestScripts(t *testing.T) {
testscript.Run(t, testscript.Params{
Dir: "testdata/scripts",
RequireExplicitExec: true,
})
}
TestMain registers the stave binary as an in-process command via testscript.Main. This means each .txtar script can call exec stave ... and it runs the real CLI code in-process — no separate binary build needed, and coverage is collected.
TestScripts runs every .txtar file in cmd/stave/testdata/scripts/. Each script is a self-contained test scenario written in the txtar format: a sequence of shell-like commands followed by embedded files.
There are 21 scripts covering the full CLI surface:
| Script | What it tests |
|---|---|
smoke.txtar | Binary starts, --version works, --help produces output |
apply_pipeline.txtar | Full apply workflow: load controls + observations, produce findings |
ci_workflow.txtar | ci baseline and ci diff commands |
config_lifecycle.txtar | config get/set/show commands |
controls_packs.txtar | controls list and pack resolution |
determinism.txtar | Same inputs + --now produce identical output |
diagnose_trace_explain.txtar | diagnose, trace, and explain commands |
doctor_bug_report.txtar | doctor and bug-report commands |
exit_codes.txtar | Exit code 0 (success), 3 (violations), 2 (input error) |
help_discovery.txtar | Subcommand help text and flag documentation |
json_validity.txtar | All JSON output is valid JSON |
lint_fmt_graph.txtar | lint, fmt, and graph commands |
profile_builtin.txtar | apply --profile with built-in controls |
quiet_verbose.txtar | --quiet suppresses output, -v adds diagnostics |
report_prompt.txtar | report and prompt commands |
sanitize.txtar | --sanitize redacts infrastructure identifiers |
sarif_output.txtar | --format sarif produces valid SARIF v2.1.0 |
snapshot_commands.txtar | snapshot diff subcommand surface |
snapshot_operations.txtar | Snapshot lifecycle operations with retention tiers |
streams.txtar | stdout/stderr separation |
validate_lint_fmt.txtar | validate command with lint and format checks |
To run them:
go test ./cmd/stave/ -run TestScripts -v
To run a single script:
go test ./cmd/stave/ -run TestScripts/smoke -v
These tests run as part of make test (which executes go test ./...). They are the primary integration test suite — each script exercises the real CLI binary against real control YAML and observation JSON files embedded in the .txtar archive.
How does security-audit differ from the Logic Trace (--trace)?
They serve different purposes at different layers.
stave security-audit evaluates the Stave binary itself — supply chain
integrity, build hardening, vulnerability assessment, SBOM generation. It
answers: "Is this tool trustworthy?" It produces evidence for auditors about
Stave's own security posture, not about the infrastructure Stave evaluates.
stave apply --trace records the evaluation engine's reasoning chain —
step-by-step decisions for every control × asset pair. It answers: "Why did
the engine reach this verdict?" It produces a trace.v0.1 JSON with exemption
checks, predicate evaluations, threshold checks, and verdict decisions.
stave prompt from-finding --trace-file takes that trace and wraps it in
an LLM-ready prompt for offline explainability. It answers: "How do I fix
this?"
security-audit | apply --trace | prompt --trace-file | |
|---|---|---|---|
| Subject | The Stave binary | Infrastructure findings | Finding explanation |
| Question | "Is this tool secure?" | "Why did this fire?" | "How do I fix this?" |
| Output | SBOM, vuln report, build info | trace.v0.1 JSON | Markdown LLM prompt |
| Audience | Auditors, compliance | Security engineers | Operators, AI assistants |
| Layer | Meta (tool about itself) | Engine internals | User-facing guidance |
They are complementary:
security-auditbuilds trust in the tool.--tracebuilds trust in the verdict.--trace-filebridges the verdict to remediation.
Example workflow:
# 1. Verify the tool itself is trustworthy
stave security-audit --sbom cyclonedx --format json
# 2. Evaluate infrastructure and record reasoning
stave apply --controls controls/s3 --observations obs/ \
--max-unsafe 168h --trace audit_trace.json --format json > eval.json
# 3. Generate explainable remediation prompt from trace
stave prompt from-finding \
--evaluation-file eval.json \
--asset-id my-bucket \
--controls controls/s3 \
--trace-file audit_trace.json
For LLM-driven remediation you can also consume eval.json directly:
controls carry triage terms (defect, infection, failure) so the
findings are already self-explanatory.