Skip to main content

Fact Export Reference

Stave evaluates CEL controls and exports facts. Two roles, one binary. This page documents the export side: the stave export-sir command, the three output formats, the predicate vocabulary, and the closed-world axioms emitted in SMT-LIB.

Command

stave export-sir [flags]

Computes the Stave Intermediate Representation (SIR) — the vendor-neutral fact set built by Stave's evaluation pipeline — and writes it to stdout in the requested format.

Flags

FlagDefaultPurpose
--controls, -icontrolsControl definitions directory
--observations, -oobservationsObservation snapshot directory
--format, -fjsonOne of json, jsonl, smt2
--now(current time)RFC3339 override for deterministic output
--strip-catalogfalseOmit control catalog metadata (has_severity, has_type, has_domain) — reduces SMT2/JSONL to observation-derived facts only
--closed-worldfalseEmit closed-world forall axioms in SMT2 (needed for negative proofs; can cause solver timeout on large fact sets)

Outputs

StreamContent
stdoutSIR document in the requested format
stderrErrors and progress (suppressed when not a TTY)

Exit codes

CodeMeaning
0Success
2Input error (bad flag, malformed --now, missing files)
4Internal error (load failure, builder error)
130SIGINT

Format: json (default)

Full nested SIR document. One top-level object with controls, assets, identities, temporal, and lifecycle sections. Lossless, suitable for tooling that wants the structured shape.

{
"controls": [{ "id": "CTL.S3.PUBLIC.001", "type": "unsafe_state", "classification": "state_assertion" }],
"assets": [{ "id": "res:aws:s3:bucket:example", "type": "storage_bucket", "vendor": "aws" }],
"identities": [],
"temporal": { "windows": [...] },
"lifecycle": { "now": "2026-01-02T00:00:00Z" }
}

Format: jsonl

One (subject, predicate, object) triple per line. Lossy projection optimised for Datalog/Soufflé and ASP/Clingo consumers that prefer flat predicate(s, o) facts.

Each line:

{"subject":"<id>","predicate":"<name>","object":"<value>","source":"<origin>","evidence":"<json-path>"}

Example excerpt from a public-bucket fixture:

{"subject":"CTL.S3.PUBLIC.001","predicate":"has_type","object":"unsafe_state","source":"control","evidence":"controls[0].type"}
{"subject":"res:aws:s3:bucket:example-public-bucket","predicate":"has_type","object":"storage_bucket","source":"asset","evidence":"assets[0].type"}
{"subject":"res:aws:s3:bucket:example-public-bucket","predicate":"has_vendor","object":"aws","source":"asset","evidence":"assets[0].vendor"}
{"subject":"res:aws:s3:bucket:example-public-bucket","predicate":"has_exposure_window","object":"true","source":"exposure","evidence":"temporal.windows[0]"}
{"subject":"res:aws:s3:bucket:example-public-bucket","predicate":"contributed_by","object":"CTL.S3.PUBLIC.001","source":"exposure","evidence":"temporal.windows[0].contributing_controls"}

source and evidence carry provenance — the kernel section the fact came from and a JSON-pointer-style locator into the original SIR — useful when an engine's output flags a triple and a human needs to track it back to its origin.

Loading into Soufflé

.decl has_type(s: symbol, o: symbol)
.input has_type(IO=file, filename="facts.jsonl")

The example under stave/examples/souffle-reachability/ ships a transform.sh that converts the JSONL into Soufflé's positional .facts format if you prefer that over the JSON loader.

Format: smt2

SMT-LIB v2 declarations + assertions. The output contains facts only — no (check-sat), no queries — so any SMT solver (Z3, cvc5, Yices) reads the same file. Reasoning programs append their own query to the file before invoking the solver.

; Stave SIR facts export — SMT-LIB v2
; Predicates declared as Bool functions over String args.
; Closed-world axioms emitted per predicate: the predicate is true
; ONLY for the (subject, object) pairs explicitly asserted; for
; every other input it is false. This is the standard fact-base
; encoding for SAT/SMT consumers.
; This file contains FACTS ONLY. Append your query (check-sat / get-model / etc.) before invoking the solver.
(set-logic ALL)

; --- Predicate declarations ---
(declare-fun allows_unauthenticated (String String) Bool)
(declare-fun can_assume (String String) Bool)
...

; --- Positive assertions ---
(assert (has_type "res:aws:s3:bucket:example" "storage_bucket"))
...

; --- Closed-world axioms ---
(assert (forall ((s String) (o String))
(=> (has_type s o)
(or (and (= s "res:aws:s3:bucket:example") (= o "storage_bucket"))
...))))

Closed-world semantics

For every declared predicate, the exporter emits a universally quantified axiom that constrains the predicate to be true only for the explicitly asserted (subject, object) pairs. This is foundational: without it, an SMT solver can pick arbitrary interpretations of can_assume, allows_unauthenticated, etc., and your reachability queries return spurious satisfying assignments.

This is the standard "completion" / "domain-closure" pattern for encoding a finite fact base into first-order logic. Engines like Soufflé and Clingo enforce closed-world by default; SMT solvers do not, hence the explicit axioms.

Predicate vocabulary

The predicate set is determined by the kernel projection — what Stave's evaluation pipeline produces — not by individual controls. Adding a control does not extend the predicate set; adding a fact type to the kernel does.

PredicateArityDomainMeaning
allows_unauthenticated(resource, "true"|"false")identityIdentity pool admits unauthenticated principals
can_assume(from, to)identityTrust edge admits the from principal into the to role
contributed_by(asset, control_id)exposureThis control fired on this asset within an exposure window
cross_account_assumes(from, to)identityfrom is in a different account than to
first_seen_at(asset, RFC3339)lifecycleEarliest snapshot the asset appears in
has_action(statement, action)policyIAM policy statement grants this action
has_exposure_window(asset, "true")exposureAsset has at least one open unsafe-duration window
has_forbidden_state(asset, value)controlForbidden-state control's metadata
has_intent_rationale(asset, text)controlAuthor-supplied justification when a control would otherwise fire
has_permission_action(identity, action)identityEffective permission edge action
has_permission_resource(identity, arn)identityEffective permission edge resource
has_privilege_level(identity, low|medium|high)identityCoarse privilege classification
has_resource(statement, arn)policyIAM policy statement targets this resource
has_severity(control, low|medium|high|critical)controlControl's severity metadata
has_tag(asset, tag_key:tag_value)assetProvider tag on the asset
has_type(id, type)universalAsset / control / identity type
has_vendor(asset, aws|gcp|...)assetCloud provider of the asset
is_decommissioned(asset, "true")lifecycleAsset has been removed since first seen
is_provisioned(asset, "true")lifecycleAsset is currently provisioned
last_seen_at(asset, RFC3339)lifecycleMost recent snapshot the asset appears in
maps_auth_to(pool, role)identityCognito authenticated-role mapping
maps_unauth_to(pool, role)identityCognito unauthenticated-role mapping
self_registration_unrestricted(pool, "true")identityUser pool admits self-registration without scope
trusts_service(role, service)identityTrust policy admits a service principal

The current count is 24 predicates. Run stave export-sir --format smt2 ... | grep '^(declare-fun' on any fixture to enumerate what got declared (only predicates that have at least one positive assertion appear in the output).

Scope of the exported fact set

The SIR is a curated projection, not a complete dump of every property a control reads. The CEL evaluation pipeline inside stave apply consumes the full property tree of every observation; the fact export ships a subset selected for the domains where external reasoning (Z3, cvc5, Soufflé, Clingo, Prolog) was prioritised.

Domain coverage measurement (against the embedded control catalog at the time of writing — re-derive with stave forge paths for a current snapshot):

Top-level domainsProperty paths
Catalog (all controls combined)1022,954
Reached by at least one SIR projector131,301 (44%)
Zero coverage891,653 (56%)

Domains with SIR projector coverage

ai, api, auth, compute, delegation, encryption, identity, k8s, network, s3_ref, s3_upload, storage, trail

Within these domains the projectors emit both the 59-entry scalar allowlist (cmd/exportsir/facts.go::propertyAllowlist) and relational predicates produced by 19 specialised projectors (IAM policy walk, trust-policy walk, Cognito mapping, role-chain edges, trail logging, delegation principals, etc.). The boundary inside each domain is intentional: e.g. properties.identity.policies.attached_policies is fully exported, but properties.identity.access_advisor.last_used_at is only exported when the unusedServiceFacts projector touches it.

Domains with zero SIR coverage

Top-of-list (highest control-path count first):

database (172), azure (154), search_service (151), governance (117), messaging (87), gcp (85), cdn (85), m365 (78), monitoring (71), cryptography (51), container (51), compliance (51), secret (47), audit (37), cloudflare (31), github (26), plus 73 smaller domains.

Three entire cloud vendors (azure, gcp, m365) and one cloud service domain (database, covering RDS / DynamoDB / Cosmos / Cloud SQL) have no SIR projector. Their controls evaluate normally inside stave apply; their property values do NOT enter any external proof. A Z3 (check-sat) against stave export-sir --format smt2 output reports satisfiability of the chains it can model — it cannot report satisfiability of a chain whose member predicates live in an uncovered domain.

Implication for proof claims

A solver verdict of UNSAT on the exported facts is a proof that the queried chain cannot fire given the predicates the SIR exposes. It is not a proof that the underlying infrastructure is safe along dimensions the SIR does not project.

Practitioners reading a UNSAT result must know which domains the proof covers. The fact export's coverage is the proof's coverage, no more.

Determinism

The exported file is byte-stable for a fixed (controls, observations, --now) triple. The exporter sorts:

  • predicate declarations alphabetically
  • positive assertions by (predicate, subject, object)
  • closed-world axiom branches in the same order

This means commits that change the export — for any reason — show up as targeted diffs in code review, not as gigabyte-of-noise shuffles. Run the export twice on the same input and compare with diff to confirm.

What is NOT exported

  • Control predicate bodies (the CEL expressions). Those stay inside the stave binary and run during apply.
  • Observation raw fields that didn't make it into the kernel projection. The SIR is the projected surface — provider-specific noise (timestamps in non-RFC3339 formats, regional ARNs the kernel didn't know about) is dropped.
  • Findings. Findings are the output of stave apply, not export-sir. The SIR is the input an external engine needs to compute its own verdict; findings are Stave's own verdict.

See also