The metadata door that opened for anyone who asked

Metadata

Title: The metadata door that opened for anyone who asked
Source of the case: HackerOne report — DoD #2083771
AWS service(s): EC2 (instance metadata service), Security Groups
Risk archetype: Compound chain — exposed application surface plus a forgeable credential endpoint
One-line hook: Can you prove this Jenkins instance cannot leak its IAM role credentials?

0. The challenge (what the reader does first)

Scenario given to the reader:

A team runs Jenkins on a single EC2 instance for CI builds. The instance was launched a while ago from a stock AMI; nobody changed the metadata defaults. The Jenkins script console is reachable, and the instance has an attached IAM role so builds can pull artifacts from S3. The security group lists one inbound rule.

Evidence they're handed (and nothing else):

{
  "instance_id": "i-0a1b2c3d4e5f",
  "metadata_options": {
    "HttpEndpoint": "enabled",
    "HttpTokens": "optional",
    "HttpPutResponseHopLimit": 2
  },
  "security_group": {"ingress": [{"port": 8080, "cidr": "0.0.0.0/0"}]}
}

The instance metadata configuration and the security group ingress rule above.
No AWS credentials. No live account. No scripts.

The questions they must answer from the evidence alone:

What does HttpTokens: optional actually permit at the metadata endpoint, and which instances are in that state right now?
This instance is not being attacked today — so what latent risk does the combination create the moment any request-forgery bug appears in Jenkins?
Which exposure comes from the network path — the 0.0.0.0/0 ingress on port 8080?
Which exposure comes from the metadata path — HttpTokens: optional and the hop limit of 2?
What single rule, applied at launch, would have made the credential theft impossible regardless of the Jenkins exposure?

1. The manual problem

To answer honestly you have to hold two unrelated-looking facts in your head and connect them. The security group says port 8080 is open to the world. That is a finding on its own. The metadata block says HttpTokens: optional. That is a different finding on its own, in a different part of the config, owned by a different mental model — one is "network", one is "instance hardening".

The danger is not in either fact. It is in the edge between them. An open application port plus a metadata endpoint that answers unauthenticated GET requests means any request-forgery weakness in the application becomes a path to http://169.254.169.254/latest/meta-data/iam/security-credentials/ and the role's temporary keys. The HttpPutResponseHopLimit of 2 means even a containerized process one hop away can reach it.

Doing this by hand means cross-referencing the network posture, the metadata posture, and the fact that an IAM role is attached — three separate places — and reasoning about an attack that none of them describes individually. Most reviewers stop at "two medium findings" and never draw the edge.

2. The reasoning wall (capture, don't invent)

What they hit	What they said / would say
Two findings that each looked minor	"The scanner flagged IMDSv1 as info-level and the open port as medium. Neither felt urgent."
No tool connected the network fact to the metadata fact	"Nothing told us these two settings, together, are a credential-theft primitive."
Couldn't prove the role keys were unreachable	"We could say 'IMDSv1 is on' but not 'therefore these specific credentials can walk out the front door.'"

The insight the reader should reach on their own:

The credential theft lives in the combination of an exposed app and a forgeable metadata endpoint — judging each setting alone hides it.

3. Why scanners miss or flatten it

A per-setting scanner evaluates HttpTokens against a hardening baseline and emits "IMDSv1 enabled — recommend IMDSv2", usually low or informational severity. Separately it evaluates the security group and emits "port 8080 open to 0.0.0.0/0". Both are true. Neither is the actual risk.

What the scanner cannot see is that HttpTokens: optional is only dangerous because there is a reachable request surface, and the open port is only catastrophic because the metadata endpoint will answer an unauthenticated GET and hand back role credentials. The exploitable thing is the edge: live app surface → forgeable metadata GET → IAM role keys. A node-at-a-time tool has no place to record an edge, so it downgrades the most severe finding in the configuration to two unrelated low ones.

Pivot point. Everything above is the gap. Everything below is Stave filling it. The reader has now done the work and hit the wall. Only now does the tool appear.

4. The evidence Stave consumes

The same static facts the reader had — no live cloud, no credentials:

{
  "instance_id": "i-0a1b2c3d4e5f",
  "metadata_options": {
    "HttpEndpoint": "enabled",
    "HttpTokens": "optional",
    "HttpPutResponseHopLimit": 2
  },
  "security_group": {"ingress": [{"port": 8080, "cidr": "0.0.0.0/0"}]}
}

Normalized into an obs.v0.1 snapshot: the instance asset carries its metadata options and its attached security-group ingress as fields on one asset.

5. The reasoning Stave performs

Control / invariant: CTL.EC2.IMDS.HOPLIMIT.001 — an instance metadata endpoint must require session tokens (IMDSv2) so a forged request cannot read role credentials.
What it evaluates: the predicate fails when HttpEndpoint is enabled and HttpTokens is optional (IMDSv1 permitted), with the hop limit allowing reach from an adjacent process — covering both the network path (open ingress makes a forgery surface reachable) and the metadata path (optional tokens make the GET answerable).
Verdict produced: NON_COMPLIANT. The metadata endpoint accepts unauthenticated reads while a forgeable surface is exposed; the instance role's credentials are reachable.

control:  CTL.EC2.IMDS.HOPLIMIT.001
asset:    i-0a1b2c3d4e5f
evidence: HttpTokens=optional (IMDSv1 enabled), HttpPutResponseHopLimit=2, ingress 0.0.0.0/0:8080
verdict:  NON_COMPLIANT — IMDSv1 lets a forged request read iam/security-credentials/

6. The prevention artifact Stave produces

Artifact: an SCP / launch-template guardrail enforcing HttpTokens: required (IMDSv2) on every instance at launch.
What it forecloses: the latent state from question 2 — even if a request-forgery bug appears in Jenkins tomorrow, an unauthenticated GET to the metadata endpoint returns nothing, because IMDSv2 demands a PUT-issued session token an SSRF attack cannot forge.

# SCP: deny instance launches that do not require IMDSv2
{
  "Sid": "RequireIMDSv2",
  "Effect": "Deny",
  "Action": "ec2:RunInstances",
  "Resource": "arn:aws:ec2:*:*:instance/*",
  "Condition": {
    "StringNotEquals": { "ec2:MetadataHttpTokens": "required" }
  }
}

# Launch template hardening (applies to existing fleet via ModifyInstanceMetadataOptions):
MetadataOptions:
  HttpEndpoint: enabled
  HttpTokens: required          # IMDSv2 only
  HttpPutResponseHopLimit: 1     # no adjacent-process reach

# Manual fix for the instance in this case:
#   aws ec2 modify-instance-metadata-options \
#     --instance-id i-0a1b2c3d4e5f --http-tokens required --http-put-response-hop-limit 1

7. What the team no longer does manually

Before	After Stave
Mentally join an "open port" finding to an "IMDSv1" finding to see the credential-theft path	One control asserts the metadata endpoint is unreachable-by-forgery; the edge is evaluated, not eyeballed
Argue over severity of two separate low findings	A single NON_COMPLIANT verdict naming the actual exposure
Hope every new instance gets hardened by hand	An SCP makes IMDSv1 launches impossible across the org

Positioning line for this case

Stave proves that an instance role's credentials are reachable through a forgeable metadata endpoint, names the open-port-plus-IMDSv1 combination as the cause, and emits the IMDSv2 guardrail that forecloses it.

Reuse checklist

A reader could attempt section 0 with zero Stave knowledge
Stave is not named or shown before the pivot point
Section 2 quotes are real (or honestly plausible), not slogans
Section 3 names the specific thing per-setting tools can't see
Section 6 closes the exact latent state raised in section 0, question 2
The title names the failure, not the product

Metadata​

0. The challenge (what the reader does first)​

1. The manual problem​

2. The reasoning wall (capture, don't invent)​

3. Why scanners miss or flatten it​

4. The evidence Stave consumes​

5. The reasoning Stave performs​

6. The prevention artifact Stave produces​

7. What the team no longer does manually​

Positioning line for this case​

Reuse checklist​