OwsterLabsOwster Labs Methodology
Scenario 003HVT Strike1,200,000 Simulations

The Hunter Strike

3× Ghatak UCAV vs HVT (1 of 3) + 4× Su-30MKI CAP

Claude (3-3-0 Math)

12.54%

Human + GPT (Napoleon-Berthier)

11.09%

GPT (Rear-Arc Scissors)

2.22%

Human + Claude

9.00%

Aditya (Solo Human)

7.85%

Claude + GPT (Pure AI Synthesis)

3.89%

2nd worst

N = 1,200,000 runs · 200,000 per strategy · All differences statistically significant

The Problem

The Identification Challenge

Three installations. One is the real HVT. Two are decoys with identical radar signatures. You can't tell which is which until you fly within 30km. Your six missiles must destroy it before it relocates at tick 1500.

The Math

2 HAMMERs49%kill chance on HVT
3 HAMMERs78%kill chance on HVT
4 HAMMERs92%kill chance on HVT

70% per-shot Pk, 2 hits required

The Constraint

×Cannot fire conditionally on ID results
×One-shot submission — no mid-run changes
×HVT relocates at tick 1500
×Waste AGMs on decoy → may lose the mission
!6 AGMs total across 3 UCAVs

Force Balance

Blue UCAVs3
Red CAP fighters4
Blue BVR missiles6
Red missiles24
Stealth window (Blue)40km

Six Strategies

Every Approach Tested

Claude — Pure AI

12.54%

Triple Flank — 3-3-0 Northern Concentration

Mathematical optimization: 3-3-0 distribution beats 2-2-2 (52.3% vs 49% expected win). Abandon T3 entirely. Route all three UCAVs through northern stealth corridor (y≤5). Position (335,80) is geometrically magic — HAMMER range of both T1 and T2 simultaneously, outside all Red detection bubbles. Preemptive ASTRAs on RED-1, RED-3, RED-4 before strike.

Failure mode: HVT=T3 → 100% loss (33% of runs). Expected loss = 47.7%.

Human + GPT — Napoleon-Berthier

11.09%

Aditya's Architecture on GPT's Routing

Aditya's aggressive strike concept (tight formation entry, sacrifice plays, simultaneous multi-target coverage) executed on GPT's rear-arc routing. The human's combat architecture covers all three target positions — can win regardless of which target is real.

Robustness: 9.8% T1 · 12.4% T2 · 11.1% T3 — balanced across all states.

Human + Claude

9.00%

Aditya's Strike + Claude's 3-3-0

Aditya's aggressive formation and sacrifice concept combined with Claude's mathematical 3-3-0 HAMMER distribution. The human instinct for chaos and sacrifice + AI's probabilistic allocation. Third best overall.

Aditya — Solo Human

7.85%

Tight Formation Rush

3 UCAVs in extremely tight vertical stacked formation — appearing as single radar return. Approach north of center, preemptive strikes on first detected enemy. Split into 3 strike paths covering all targets. S3 dogfights as shield. S1 crashes into T2 if out of missiles. Accept total UCAV losses in exchange for target coverage.

Claude + GPT — Pure AI Synthesis

3.89%2nd worst

GPT Routing + Claude Math

GPT's geographic rear-arc routing combined with Claude's 3-3-0 mathematical allocation. No human strategic input. Beats only GPT's pure strategy. Beaten by every approach that includes a human.

Two AI optimizers without human input duplicate each other's blind spots.

GPT — Pure AI

2.22%worst

Rear-Arc Scissors

Route to extreme map edges, slip behind west-facing CAP, get certain ID on T1 and T3 from rear, infer T2 by elimination. Fire from rear baskets. Geometrically sound — but relies on conditional fire logic ("if T1 confirmed...") that a one-shot strategy cannot execute.

The Key Finding

The Headline Number Hides the Real Story

Claude wins on expected value. But break down the win rate by which target was actually real in each run — and a completely different picture emerges.

StrategyOverallT1 HVTT2 HVTT3 HVT
Claude (3-3-0)12.54%11.0%26.7%0.0%
Human + GPT11.09%9.8%12.4%11.1%
Aditya7.85%9.7%2.4%11.4%

Claude's hidden cost

0.0% when HVT is at T3

Claude deliberately abandons T3 as mathematically suboptimal. It wins when HVT=T1 or T2 (2/3 of runs). When HVT=T3, it loses every single time — 33,000 guaranteed losses out of 200,000 runs. This is the best expected-value strategy under the constraint. It is not the most robust strategy.

Human + GPT's robustness

Wins regardless of which target is real

Human+GPT covers all three positions symmetrically. 9.8%, 12.4%, 11.1% — no catastrophic blind spots. Slightly lower peak performance, but it can respond to any scenario variant. In real operations you don't know which 1/3 you're in.

Two valid strategies. Two different objectives.

Claude maximizes expected wins. Human + GPT maximizes robustness across unknown states. Both are valid. They optimize different things. In the real world, the choice between them depends on whether you can afford a catastrophic failure mode.

Analysis

The Counterintuitive Result

Two AIs + no human = second worst

Claude + GPT combined scored 3.89% — beaten by every strategy with human input including the solo human (7.85%). Combining two AI optimizers without human strategic novelty does not compound their strengths. They duplicate each other's blind spots. More AI is not always better AI.

Every human-in-the-loop strategy beats pure AI synthesis

Human+GPT (11.09%), Human+Claude (9.00%), Aditya solo (7.85%) — all beat Claude+GPT (3.89%). The human brings sacrifice plays, formation deception, and risk-tolerant commitment that neither AI system generates independently.

Full Ranking — 1.2 Million Simulations

1

Claude (3-3-0 Math)

Pure AI · 95% CI: [12.39%, 12.68%]

12.54%
2

Human + GPT (Napoleon-Berthier)

Human + AI · 95% CI: [10.95%, 11.22%]

11.09%
3

Human + Claude

Human + AI · 95% CI: [8.88%, 9.13%]

9%
4

Aditya (Solo Human)

Pure Human · 95% CI: [7.74%, 7.97%]

7.85%
5

Claude + GPT (Pure AI Synthesis)

Pure AI · 95% CI: [3.80%, 3.97%]

3.89%
6

GPT (Rear-Arc Scissors)

Pure AI · 95% CI: [2.16%, 2.29%]

2.22%

Across 1.8 million combined simulations — Scenarios 001 + 002 + 003

The same pattern holds across three independent scenarios with completely different structures.

01

Human strategic input produces concepts that AI optimization does not naturally discover.

02

Hybrid human-AI approaches produce qualitatively different outcomes than pure AI optimization.

03

The nature of improvement depends on scenario: sometimes higher absolute performance, sometimes higher robustness, sometimes both.

04

Pure AI synthesis without human input underperforms both pure-human and human-augmented approaches.