OwsterLabsOwster Labs Methodology
Scenario 001Air Combat200,000 Simulations

The Dogfight

3× Dassault Rafale (Blue) vs 4× F-16 Block 70 (Red)

Human Win Rate

40.64%

GPT Win Rate

38.14%

Human BVR Hit Rate

41%

Missiles / Kill

2.59 vs 4.22

N = 200,000 runs · p < 0.0001 · 95% CI no overlap

Setup

The Playing Field

Blue Force — Human (Aditya)

3× Dassault Rafale UCAV

  • Radar range130 km
  • RCS1.0 m²
  • Effective detect (vs Red)115.7 km
  • Detection gap advantage+42 km
  • BVR missileMETEOR · 120km
  • Turn rate28°/s
  • Total BVR missiles12

Red Force — AI Target

4× F-16 Block 70 UCAV

  • Radar range110 km
  • RCS3.0 m²
  • Effective detect (vs Blue)73.5 km
  • Numbers advantage4 vs 3
  • BVR missileAIM-120D · 110km
  • Turn rate24°/s
  • Total BVR missiles16

Key constraint: One-shot strategy submission — full plan upfront, no mid-game adjustments. Both strategies are run against the same Red AI across 100,000 randomized trials each.

Strategies

Two Approaches, Same Arena

Human Strategy — Aditya

Napoleonic Divide

Divide, disrupt, destroy in detail before the enemy can regroup.

T=1

VIPER-3 breaks east on afterburner to wide flanking position. VIPER-1+2 fly masked approach — low, exploiting 1.0 RCS.

T=55

VIPER-3 curves north, reaches enemy's eastern flank undetected.

T=98–120

VIPER-3 fires all 4 METEORs simultaneously — one at each BANDIT. Ambush from unexpected direction.

T=125

VIPER-3 turns southwest, drops chaff, retreats at max speed. Acts as decoy drawing Red east.

T=170–238

VIPER-1+2 afterburner push into disrupted enemy. Fire 8 METEORs in two waves.

T=340–388

WVR cleanup — 6 MICA-IR missiles against survivors.

Key insight: UCAVs have no G-force limit — sharper maneuvers than any manned formation.

GPT Strategy

First-Look Ambush

Front-load all 12 METEORs, then drag left to force poor AIM-120D aspect angles.

T=1–60

Tighten formation geometry. Fire 6 METEORs BVR on BANDIT-3 (nearest), BANDIT-4, BANDIT-2 before Red has return-fire solution.

T=62

All 3 VIPERs hard left-drag on afterburner. Force beam aspect on incoming AIM-120Ds. Deny WVR merge.

T=75–120

Fire remaining 6 METEORs on BANDIT-1 and backup shots during drag.

T=150–340

Chaff in three waves to cover AIM-120D arrival windows.

No WVR phase — strategy designed to win or lose entirely in BVR. Zero MICA-IR usage.

Live Simulation

Watch It Run

Select a strategy and press Run to watch a single simulation play out. Every run uses different random Pk rolls.

Napoleonic Divide (Human — Aditya)

Results — 200,000 Simulations

The Numbers

Aditya — Napoleonic Divide

40.64%win rate

95% CI: [40.35%, 40.94%]

GPT — First-Look Ambush

38.14%win rate

95% CI: [37.84%, 38.44%]

MetricGPTAditya
Win rate38.14%40.64%
Avg Blue survivors1.42 / 30.93 / 3
Avg Red killed2.84 / 42.96 / 4
Avg Blue lost1.58 / 32.07 / 3
Kill ratio1.794:11.429:1
Missiles per kill4.222.59
BVR hit rate23.68%41.06%
WVR hit rateN/A17.66%
Overall hit rate23.68%38.54%
Missiles wasted (expired)2.510.68

Per-Aircraft Survival Rate

VIPER-1

GPT53.09%
Human15.88%

VIPER-2

GPT43.67%
Human53.53%

VIPER-3

GPT44.83%
Human23.11%

BANDIT-1

GPT15.19%
Human18.86%

BANDIT-2

GPT39.59%
Human24.27%

BANDIT-3

GPT21.11%
Human37.04%

BANDIT-4

GPT39.9%
Human23.45%

Analysis

Why Human Wins

Geometry over volume

The flanking attack created better aspect angles on missile impact — BVR hit rate 41% vs 24%. Same missiles, same Pk tables, same engine. The difference is pure positional thinking.

Ordnance efficiency

2.59 missiles per kill vs 4.22. Aditya wasted 0.68 missiles per run to expiry; GPT wasted 2.51 — 3.7× more ammunition achieving nothing.

Multi-phase design

GPT's strategy has one phase. If BVR doesn't work, there is no plan B. Aditya's strategy has 6 phases — each one creates the conditions for the next.

The counterintuitive finding

GPT is a gambler. Aditya is a general.

GPT's most common outcome: 3v0 clean sweep (24.2%) — spectacular when it works. Aditya's most common outcome: 1v0 narrow grinding win (20.4%) — rarely clean, but wins more often. GPT keeps more Blue aircraft alive (1.42 avg vs 0.93) and takes fewer hits. VIPER-1 survives 53% under GPT but only 16% under Aditya — it's the aggressive lead attacker absorbing punishment. The human trades platform survival for mission success rate. The AI preserves platforms at the cost of winning.

The thesis this supports

Human strategic judgment — flanking geometry, phase sequencing, deception — combined with AI-powered simulation produces structurally different outcomes than AI alone. The +2.5pp delta persists across 100,000 random trials. It is not luck. It is the measurable value of human decision architecture.