This is the ultimate way to leverage Contract-First architecture. Since the obs.v0.1 schema is the only thing the engine cares about, you can use a "Meta-Prompt" that turns any LLM to generate code for an Extractor.

Here is a template for developers. You can paste this into Claude, ChatGPT, or GitHub Copilot to generate a working extractor in minutes.

🚀 Stave Extractor Jumpstart Template

Instructions for the Developer: Fill in the placeholders [ ] below and paste this entire prompt into an LLM.

The Prompt Template

Task: Build a Stave Observation Extractor

I need to build a data extractor for the Stave policy engine. The extractor's job is to fetch configuration data from a source and output a JSON file that conforms to the obs.v0.1 schema.

1. Environment & Tools

Target Language: [e.g., Python 3.11, Rust, Go, Node.js]
Data Source: [e.g., AWS SDK/Boto3, Terraform Plan JSON, GitHub API, Local CSV]
Output Format: Standardized JSON (STDOUT or file)

2. The Contract (obs.v0.1)

The output MUST be a flat JSON object with this exact structure:

{
  "schema": "obs.v0.1",
  "source_type": "[e.g., aws_s3, gcp_storage, custom_script]",
  "captured_at": "YYYY-MM-DDTHH:MM:SSZ",
  "assets": [
    {
      "id": "unique-identifier-of-the-resource",
      "type": "resource-category-type",
      "properties": {
        "key1": "value1",
        "key2": 123,
        "key3": true
      }
    }
  ]
}

3. Implementation Requirements

Mapping: Iterate through the [Data Source] and map each resource to an entry in the assets array.
Properties: The properties map should contain the raw configuration values needed for security invariants.
No Extra Fields: The schema has additionalProperties: false. Do not add top-level fields outside of the ones listed above.
Error Handling: If the [Data Source] is unreachable, exit with a non-zero code and a clear error message to STDERR.

4. Deliverable

Please provide:

The full [Language] source code for the extractor.
Instructions on how to run it.
A sample command to pipe the output into Stave: [extractor_cmd] | stave evaluate --input - Example: How a developer would use this for Python/S3

If a developer wanted to build a quick S3 extractor in Python, they would fill it out like this:

Target Language: Python 3.10 (Boto3)

Data Source: AWS S3 ListBuckets API

Source Type: aws_s3_bucket

The resulting code would look like this (Generated by the LLM):

import boto3
import json
from datetime import datetime, timezone

def extract():
    s3 = boto3.client('s3')
    response = s3.list_buckets()
    
    observation = {
        "schema": "obs.v0.1",
        "source_type": "aws_s3_bucket",
        "captured_at": datetime.now(timezone.utc).isoformat().replace("+00:00", "Z"),
        "assets": []
    }

    for bucket in response['Buckets']:
        # Fetch extra properties like versioning
        v_res = s3.get_bucket_versioning(Bucket=bucket['Name'])
        
        observation["assets"].append({
            "id": bucket['Name'],
            "type": "s3_bucket",
            "properties": {
                "name": bucket['Name'],
                "creation_date": bucket['CreationDate'].isoformat(),
                "versioning": v_res.get('Status', 'Disabled')
            }
        })

    print(json.dumps(observation, indent=2))

if __name__ == "__main__":
    extract()
## Using Steampipe as an Extractor

Steampipe is the fastest way to build a Stave extractor without writing code. Steampipe exposes cloud APIs as SQL tables — you write a query, pipe the result through jq, and produce obs.v0.1 JSON directly.

Steampipe has 150+ plugins covering AWS, GCP, Azure, Kubernetes, GitHub, and dozens of other services. Every aws_s3_bucket, aws_iam_role, aws_vpc_security_group, and aws_opensearch_domain row maps directly to a Stave observation asset. This makes it the recommended extraction path for teams that want comprehensive coverage without building custom extractors.

# Example: extract S3 bucket observations via Steampipe
steampipe query --output json \
  "SELECT name, region, versioning_enabled, server_side_encryption_configuration,
          logging, acl, policy, public_access_block_configuration
   FROM aws_s3_bucket" \
  | jq '{
      schema_version: "obs.v0.1",
      captured_at: (now | todate),
      assets: [.[] | {
        id: .name,
        type: "aws_s3_bucket",
        vendor: "aws",
        properties: {
          storage: {
            kind: "bucket",
            name: .name,
            versioning: { enabled: (.versioning_enabled // false) },
            encryption: { at_rest_enabled: (.server_side_encryption_configuration != null) },
            logging: { enabled: (.logging != null) },
            controls: { public_access_fully_blocked: (
              .public_access_block_configuration.BlockPublicAcls and
              .public_access_block_configuration.BlockPublicPolicy and
              .public_access_block_configuration.IgnorePublicAcls and
              .public_access_block_configuration.RestrictPublicBuckets
            )}
          }
        }
      }]
    }' > observations/$(date -u +%Y-%m-%dT%H%M%SZ).json

Steampipe + jq + Stave is a complete air-gapped pipeline: query cloud state, transform to obs.v0.1, evaluate offline. No custom code required.

Using CloudQuery as an Extractor

CloudQuery syncs cloud configuration to a local database (Postgres, SQLite, or flat files) on a schedule. If your team already runs CloudQuery for asset inventory, you can query the synced data and transform it to obs.v0.1 without additional API calls.

CloudQuery is the better choice when you need repeatable, scheduled extraction in CI/CD pipelines or when you want to store historical snapshots in a data warehouse for trend analysis.

# Sync AWS resources to a local SQLite database
cloudquery sync aws-config.yml

# Extract S3 observations from the synced database
sqlite3 -json cloudquery.db \
  "SELECT name, versioning_status, server_side_encryption_configuration,
          logging_target_bucket, block_public_acls, block_public_policy,
          ignore_public_acls, restrict_public_buckets
   FROM aws_s3_buckets" \
  | jq '{
      schema_version: "obs.v0.1",
      captured_at: (now | todate),
      assets: [.[] | {
        id: .name,
        type: "aws_s3_bucket",
        vendor: "aws",
        properties: {
          storage: {
            kind: "bucket",
            name: .name,
            versioning: { enabled: (.versioning_status == "Enabled") },
            encryption: { at_rest_enabled: (.server_side_encryption_configuration != null) },
            logging: { enabled: (.logging_target_bucket != null) },
            controls: { public_access_fully_blocked: (
              .block_public_acls and .block_public_policy and
              .ignore_public_acls and .restrict_public_buckets
            )}
          }
        }
      }]
    }' > observations/$(date -u +%Y-%m-%dT%H%M%SZ).json

CloudQuery supports 100+ source plugins (AWS, GCP, Azure, K8s, GitHub, Terraform) and syncs to Postgres, SQLite, BigQuery, S3, or local files. Your team can query the same synced data for both Stave extraction and operational dashboards.

Using AWS Config as an Extractor

AWS Config continuously records resource configuration changes. If your organization already has Config enabled, you can export configuration snapshots directly — no additional tooling needed.

This is the lowest-friction path for teams that already have AWS Config deployed and don't want to add Steampipe or CloudQuery to their stack.

# Export current configuration for all S3 buckets
aws configservice select-resource-config \
  --expression "SELECT resourceId, resourceType, configuration
                WHERE resourceType = 'AWS::S3::Bucket'" \
  --output json \
  | jq '{
      schema_version: "obs.v0.1",
      captured_at: (now | todate),
      assets: [.Results[] | fromjson | {
        id: .resourceId,
        type: "aws_s3_bucket",
        vendor: "aws",
        properties: {
          storage: {
            kind: "bucket",
            name: .resourceId
          }
        }
      }]
    }' > observations/$(date -u +%Y-%m-%dT%H%M%SZ).json

# Or export a full configuration snapshot to S3
aws configservice deliver-config-snapshot \
  --delivery-channel-name default
# Then download and transform the snapshot JSON

AWS Config covers 380+ resource types across all AWS services. The select-resource-config API returns the full configuration as JSON — the same data Stave controls evaluate. For teams already paying for AWS Config, this is zero additional cost.

Choosing an Extraction Tool

Pick the tool your team already knows. Stave doesn't care how you produce obs.v0.1 JSON — only that it conforms to the schema.

Tool	Best for	Setup	Multi-cloud
Steampipe	Ad-hoc queries, interactive exploration, quick start	Install + plugin	AWS, GCP, Azure, K8s, 150+
CloudQuery	Scheduled CI/CD pipelines, historical snapshots, data warehouse	Config file + sync job	AWS, GCP, Azure, K8s, 100+
AWS Config	Teams already using Config, zero new tooling	Already deployed	AWS only
Custom (Python/Go)	Niche sources, exact field control, LLM-generated	Write code	Anything

All four paths produce the same obs.v0.1 JSON. Stave evaluates the same controls regardless of how the observations were created.

Why this is powerful for your project:

Zero Onboarding: Developers don't need to learn the Stave Go codebase. They only need to know the 10-line JSON contract.

Language Freedom: One team can use Python, another Go, another SQL + jq, another AWS Config — Stave core engine doesn't care.

Unix Philosophy: It reinforces the pipe workflow.

Instant Scaffolding: The LLM prompt template is a language-agnostic scaffolder. Steampipe, CloudQuery, and AWS Config are zero-code alternatives.

1. Environment & Tools​

2. The Contract (obs.v0.1)​

3. Implementation Requirements​

4. Deliverable​

Using CloudQuery as an Extractor​

Using AWS Config as an Extractor​

Choosing an Extraction Tool​