◆ Mar 2026 · Personal Project

What If AI Designed
My Board Game AI?

I built a Wingspan simulator with a hand-tuned AI. Every decision threshold is a number I guessed. Then I read a paper where Google DeepMind let an AI rewrite game-theory algorithms — and it found better ones than humans designed in 20 years. Here’s what that would look like applied to my project.

446

Birds Implemented

All 4 expansions: Core, Europe, Asia, Oceania

4.5/sec

Games Simulated

Full headless games, no UI

50+

Hardcoded Thresholds

Every one is a guess

Learned Parameters

No ML, no training, pure heuristics

The Project

A Complete Wingspan Simulator

Wingspan is a competitive board game where 1–5 players collect birds, lay eggs, cache food, and chain bird powers together to score points across four rounds. It’s strategic, it has hidden information (your hand of bird cards), and every decision ripples through the rest of the game.

I built a Python simulator that plays the full game programmatically — all 446 birds across four expansions, complete power chains, round goals, bonus cards, and a gym-style API that returns (observation, reward, done, info) after every action. It runs about 4.5 complete games per second with no UI. The goal was always to train an AI on it. The AI I have so far… well.

The Honest Truth

Every Number Is a Guess

The simulator’s AI is a class called SmartAIPlayer. “Smart” is generous. It works — it plays legal moves, finishes games, scores reasonable points. But every decision it makes comes down to a hardcoded number that I picked because it felt about right.

Threshold	What It Controls	Why This Number?
birds < 3	“Early game” — prioritize playing birds	Felt right
birds < 8	“Mid game” — balance all actions	Felt right
food > 10	Stop hoarding food	Seemed like enough
hand > 10	Stop drawing cards	Seemed like enough
points × 2	Base bird attractiveness score	Doubles felt impactful
cost × 0.5	Food cost penalty	Half-penalty felt balanced
+3 / +6	Bonus for birds with powers	Powers seem worth 3–6 points
30%	Chance to prioritize migration birds	Not too often, not too rare

Here’s the actual bird-scoring function. Every line is a judgment call:

def _score_bird_play(self, bird, action, player, game):
    score = 0.0

    # Base score from bird points
    score += bird.points * 2           # Why 2? Felt impactful.

    # Bonus for birds with powers
    if bird.power:
        score += 3                       # Why 3? Seemed fair.
        if bird.power.abstracted_power:
            score += 3                   # Why 3 more? Gut feeling.

    # Migration bird bonus (50% chance)
    if bird.name in MIGRATION_BIRDS and random() < 0.50:
        score += 15                      # Why 15? Big number = gets played.

    # Prefer cheaper birds
    score -= food_cost * 0.5           # Why 0.5? Half-penalty felt right.

    return score

And the egg-laying decision? The most strategically important action in the late game?

def _select_best_egg_laying(self, actions, player, game):
    # For now, just pick randomly from valid egg actions
    # Could be enhanced to prefer birds with more egg capacity
    return random.choice(actions)      # Literally random.

It gets worse

There are actually two heuristic AIs in the codebase — SmartAIPlayer in the AI module and a separate smart_action_selector in the simulator script. They evolved independently and disagree on the thresholds. One says early game ends at 3 birds. The other uses round number instead. One caps food at 10. The other at 4. Two humans hand-tuning the same AI couldn’t even converge on the same numbers.

The Research

AI That Rewrites Its Own Algorithms

I wrote a full explainer of this paper, but here’s the short version. Google DeepMind built AlphaEvolve, a system that takes a working algorithm, uses an LLM (Gemini) to propose code mutations, evaluates each variant by running it on actual games, keeps the winners, and repeats for hundreds of generations.

They pointed it at two families of game-theory algorithms that researchers had been refining for over 20 years. AlphaEvolve discovered two new variants that beat the hand-designed ones on 10 out of 11 benchmark games. The evolved algorithms contained tricks no human proposed — like throwing away the first 500 iterations of data (counterintuitive, but it works) and multiplying certain values by exactly 1.1 (no theoretical justification, but empirically superior).

The parallel

The paper’s algorithms had the same structure as my Wingspan AI: hardcoded thresholds, fixed orderings, manually chosen constants. The only difference is that theirs were designed by PhD researchers and mine were designed by me guessing. AlphaEvolve beat both kinds. If it can improve algorithms refined by experts over two decades, it can definitely improve score += bird.points * 2.

The Application

Mapping AlphaEvolve to Wingspan

The AlphaEvolve approach needs three things: a seed algorithm to start from, a fitness function to measure quality, and code to mutate. The Wingspan simulator already has all three.

Seed Algorithm

SmartAIPlayer — it already plays legal, complete games. It’s not optimal, but it’s a working starting point. AlphaEvolve doesn’t need a good algorithm; it needs a functional one to evolve from.

EXISTS TODAY

Fitness Function

Average final score across N games. The simulator already returns final_scores at game end. Run 100 games, take the mean — higher average score = better algorithm. Simple, fast, and already built.

EXISTS TODAY

Code to Mutate

Three functions: _get_action_priorities() (what to do), _score_bird_play() (which bird to play), _select_best_food_to_gain() (which food to take). These contain all the guessed numbers.

EXISTS TODAY

The LLM

AlphaEvolve used Gemini (not public). But the approach works with any code-capable LLM. Claude, GPT-4, or even a local model — it just needs to read a Python function and propose a modified version.

AVAILABLE VIA API

Here’s what the evolution might produce. The current scoring function is a flat linear formula. An evolved version might discover non-linear interactions, conditional logic, or phase-dependent weights — things that work but that no human would think to try:

Current: Hand-Tuned

score  = bird.points * 2
score += 3 if bird.power else 0
score -= food_cost * 0.5
# 3 constants, all guesses

Hypothetical: Evolved

score  = bird.points * 2.3
score += 4.7 if bird.power else 0
score -= food_cost * 0.8
if round > 2 and eggs < 4:
    score += egg_capacity * 1.4
# non-obvious conditional the AI found

What’s Next

Three Paths Forward

AlphaEvolve isn’t publicly available, but the idea is reproducible. There are three realistic ways to apply it to the Wingspan simulator, each at a different level of complexity.

Genetic Algorithm

Evolve the numbers

Extract all 50+ thresholds into a parameter vector. Use a standard genetic algorithm to mutate values, run tournaments, keep winners. No LLM needed — just numerical optimization. The simplest path and the quickest to validate.

EASIEST

Reinforcement Learning

Learn from self-play

The gym-style API is already built. Hook up PPO or similar via stable-baselines3. The StateEncoder already produces observation vectors. Train against copies of itself.

MEDIUM

DIY AlphaEvolve

LLM evolves the code

Send _score_bird_play() to the Claude API with “propose a better version.” Run 100 games with each variant. Keep the highest-scoring one. Repeat. This is literally what the paper does.

MOST INTERESTING

The practical choice

Path 1 (genetic algorithm) is the fastest to build and would already produce better thresholds than my guesses. Path 3 (DIY AlphaEvolve) is the most exciting because it can discover structural changes to the code — new conditionals, new factors, entirely new decision logic — not just better numbers. The paper’s most impressive discoveries were structural, not numerical.

The Bigger Picture

Why This Matters

This isn’t really about Wingspan. It’s about a pattern that shows up everywhere: a developer writes a heuristic with guessed constants, it works well enough, and nobody goes back to optimize it. Scheduling algorithms, recommendation scores, resource allocation weights, retry backoff timers — they’re all full of numbers someone picked because they felt right.

The AlphaEvolve paper shows that LLMs can do more than just write code to spec. They can explore the space of possible code and find solutions humans wouldn’t think to try. The Wingspan simulator is a perfect sandbox for that experiment — a complete game engine, a measurable fitness function, and an AI full of numbers that are begging to be replaced by better ones.

50+ Guesses → ?

Every threshold in SmartAIPlayer is a guess. The AlphaEvolve paper proved that AI can replace human guesses with empirically optimal values — and discover entirely new decision logic in the process. The simulator is ready. The fitness function is built. The only missing piece is the evolution loop.

What If AI DesignedMy Board Game AI?

A Complete Wingspan Simulator

Every Number Is a Guess

AI That Rewrites Its Own Algorithms

Mapping AlphaEvolve to Wingspan

Seed Algorithm

Fitness Function

Code to Mutate

The LLM

Three Paths Forward

Genetic Algorithm

Reinforcement Learning

DIY AlphaEvolve

Why This Matters

What If AI Designed
My Board Game AI?