This is the ultimate way to leverage Contract-First architecture. Since the obs.v0.1 schema is the only thing the engine cares about, you can use a "Meta-Prompt" that turns any LLM to generate code for an Extractor.
Here is a template for developers. You can paste this into Claude, ChatGPT, or GitHub Copilot to generate a working extractor in minutes.
๐ Stave Extractor Jumpstart Template
Instructions for the Developer: Fill in the placeholders [ ] below and paste this entire prompt into an LLM.
The Prompt Template
Task: Build a Stave Observation Extractor
I need to build a data extractor for the Stave policy engine. The extractor's job is to fetch configuration data from a source and output a JSON file that conforms to the obs.v0.1 schema.
1. Environment & Toolsโ
- Target Language: [e.g., Python 3.11, Rust, Go, Node.js]
- Data Source: [e.g., AWS SDK/Boto3, Terraform Plan JSON, GitHub API, Local CSV]
- Output Format: Standardized JSON (STDOUT or file)
2. The Contract (obs.v0.1)โ
The output MUST be a flat JSON object with this exact structure:
{
"schema": "obs.v0.1",
"source_type": "[e.g., aws_s3, gcp_storage, custom_script]",
"captured_at": "YYYY-MM-DDTHH:MM:SSZ",
"assets": [
{
"id": "unique-identifier-of-the-resource",
"type": "resource-category-type",
"properties": {
"key1": "value1",
"key2": 123,
"key3": true
}
}
]
}
3. Implementation Requirementsโ
- Mapping: Iterate through the [Data Source] and map each resource to an entry in the
assetsarray. - Properties: The
propertiesmap should contain the raw configuration values needed for security invariants. - No Extra Fields: The schema has
additionalProperties: false. Do not add top-level fields outside of the ones listed above. - Error Handling: If the [Data Source] is unreachable, exit with a non-zero code and a clear error message to STDERR.
4. Deliverableโ
Please provide:
- The full [Language] source code for the extractor.
- Instructions on how to run it.
- A sample command to pipe the output into Stave:
[extractor_cmd] | stave evaluate --input -Example: How a developer would use this for Python/S3
If a developer wanted to build a quick S3 extractor in Python, they would fill it out like this:
Target Language: Python 3.10 (Boto3)
Data Source: AWS S3 ListBuckets API
Source Type: aws_s3_bucket
The resulting code would look like this (Generated by the LLM):
import boto3
import json
from datetime import datetime, timezone
def extract():
s3 = boto3.client('s3')
response = s3.list_buckets()
observation = {
"schema": "obs.v0.1",
"source_type": "aws_s3_bucket",
"captured_at": datetime.now(timezone.utc).isoformat().replace("+00:00", "Z"),
"assets": []
}
for bucket in response['Buckets']:
# Fetch extra properties like versioning
v_res = s3.get_bucket_versioning(Bucket=bucket['Name'])
observation["assets"].append({
"id": bucket['Name'],
"type": "s3_bucket",
"properties": {
"name": bucket['Name'],
"creation_date": bucket['CreationDate'].isoformat(),
"versioning": v_res.get('Status', 'Disabled')
}
})
print(json.dumps(observation, indent=2))
if __name__ == "__main__":
extract()
## Using Steampipe as an Extractor
Steampipe is the fastest way to build a Stave extractor without writing code. Steampipe exposes cloud APIs as SQL tables โ you write a query, pipe the result through jq, and produce obs.v0.1 JSON directly.
Steampipe has 150+ plugins covering AWS, GCP, Azure, Kubernetes, GitHub, and dozens of other services. Every aws_s3_bucket, aws_iam_role, aws_vpc_security_group, and aws_opensearch_domain row maps directly to a Stave observation asset. This makes it the recommended extraction path for teams that want comprehensive coverage without building custom extractors.
# Example: extract S3 bucket observations via Steampipe
steampipe query --output json \
"SELECT name, region, versioning_enabled, server_side_encryption_configuration,
logging, acl, policy, public_access_block_configuration
FROM aws_s3_bucket" \
| jq '{
schema_version: "obs.v0.1",
captured_at: (now | todate),
assets: [.[] | {
id: .name,
type: "aws_s3_bucket",
vendor: "aws",
properties: {
storage: {
kind: "bucket",
name: .name,
versioning: { enabled: (.versioning_enabled // false) },
encryption: { at_rest_enabled: (.server_side_encryption_configuration != null) },
logging: { enabled: (.logging != null) },
controls: { public_access_fully_blocked: (
.public_access_block_configuration.BlockPublicAcls and
.public_access_block_configuration.BlockPublicPolicy and
.public_access_block_configuration.IgnorePublicAcls and
.public_access_block_configuration.RestrictPublicBuckets
)}
}
}
}]
}' > observations/$(date -u +%Y-%m-%dT%H%M%SZ).json
Steampipe + jq + Stave is a complete air-gapped pipeline: query cloud state, transform to obs.v0.1, evaluate offline. No custom code required.
Using CloudQuery as an Extractorโ
CloudQuery syncs cloud configuration to a local database (Postgres, SQLite, or flat files) on a schedule. If your team already runs CloudQuery for asset inventory, you can query the synced data and transform it to obs.v0.1 without additional API calls.
CloudQuery is the better choice when you need repeatable, scheduled extraction in CI/CD pipelines or when you want to store historical snapshots in a data warehouse for trend analysis.
# Sync AWS resources to a local SQLite database
cloudquery sync aws-config.yml
# Extract S3 observations from the synced database
sqlite3 -json cloudquery.db \
"SELECT name, versioning_status, server_side_encryption_configuration,
logging_target_bucket, block_public_acls, block_public_policy,
ignore_public_acls, restrict_public_buckets
FROM aws_s3_buckets" \
| jq '{
schema_version: "obs.v0.1",
captured_at: (now | todate),
assets: [.[] | {
id: .name,
type: "aws_s3_bucket",
vendor: "aws",
properties: {
storage: {
kind: "bucket",
name: .name,
versioning: { enabled: (.versioning_status == "Enabled") },
encryption: { at_rest_enabled: (.server_side_encryption_configuration != null) },
logging: { enabled: (.logging_target_bucket != null) },
controls: { public_access_fully_blocked: (
.block_public_acls and .block_public_policy and
.ignore_public_acls and .restrict_public_buckets
)}
}
}
}]
}' > observations/$(date -u +%Y-%m-%dT%H%M%SZ).json
CloudQuery supports 100+ source plugins (AWS, GCP, Azure, K8s, GitHub, Terraform) and syncs to Postgres, SQLite, BigQuery, S3, or local files. Your team can query the same synced data for both Stave extraction and operational dashboards.
Using AWS Config as an Extractorโ
AWS Config continuously records resource configuration changes. If your organization already has Config enabled, you can export configuration snapshots directly โ no additional tooling needed.
This is the lowest-friction path for teams that already have AWS Config deployed and don't want to add Steampipe or CloudQuery to their stack.
# Export current configuration for all S3 buckets
aws configservice select-resource-config \
--expression "SELECT resourceId, resourceType, configuration
WHERE resourceType = 'AWS::S3::Bucket'" \
--output json \
| jq '{
schema_version: "obs.v0.1",
captured_at: (now | todate),
assets: [.Results[] | fromjson | {
id: .resourceId,
type: "aws_s3_bucket",
vendor: "aws",
properties: {
storage: {
kind: "bucket",
name: .resourceId
}
}
}]
}' > observations/$(date -u +%Y-%m-%dT%H%M%SZ).json
# Or export a full configuration snapshot to S3
aws configservice deliver-config-snapshot \
--delivery-channel-name default
# Then download and transform the snapshot JSON
AWS Config covers 380+ resource types across all AWS services. The select-resource-config API returns the full configuration as JSON โ the same data Stave controls evaluate. For teams already paying for AWS Config, this is zero additional cost.
Choosing an Extraction Toolโ
Pick the tool your team already knows. Stave doesn't care how you produce obs.v0.1 JSON โ only that it conforms to the schema.
| Tool | Best for | Setup | Multi-cloud |
|---|---|---|---|
| Steampipe | Ad-hoc queries, interactive exploration, quick start | Install + plugin | AWS, GCP, Azure, K8s, 150+ |
| CloudQuery | Scheduled CI/CD pipelines, historical snapshots, data warehouse | Config file + sync job | AWS, GCP, Azure, K8s, 100+ |
| AWS Config | Teams already using Config, zero new tooling | Already deployed | AWS only |
| Custom (Python/Go) | Niche sources, exact field control, LLM-generated | Write code | Anything |
All four paths produce the same obs.v0.1 JSON. Stave evaluates the same controls regardless of how the observations were created.
Why this is powerful for your project:
Zero Onboarding: Developers don't need to learn the Stave Go codebase. They only need to know the 10-line JSON contract.
Language Freedom: One team can use Python, another Go, another SQL + jq, another AWS Config โ Stave core engine doesn't care.
Unix Philosophy: It reinforces the pipe workflow.
Instant Scaffolding: The LLM prompt template is a language-agnostic scaffolder. Steampipe, CloudQuery, and AWS Config are zero-code alternatives.