A compact research and experimentation repository for micro-scale adversarial game environments built with Streamlit, Matplotlib, and lightweight search + learning agents.
This repository contains three standalone strategy arenas:
- Hex-Line Arena — a 7-hex connection game with dual win conditions.
- Quantum Match Arena — a 2×2 token-control game with placement and flip actions.
- Shift-3 Arena — a 5-square sliding-piece game with repetition-aware tactical play.
Each module is designed as a self-contained environment with:
- explicit game-state logic,
- legal move generation,
- terminal-state detection,
- heuristic evaluation,
- MCTS-based search,
- alpha-beta negamax support,
- tabular reinforcement-style updates,
- self-play training,
- visual analytics,
- agent save/load packaging.
- Project Overview
- Repository Highlights
- Included Arenas
- AI and Search Architecture
- Installation
- Running the Apps
- Training Workflow
- Saving and Loading Agents
- Repository Structure
- Dependencies
- Design Notes
- Research Use Cases
- Future Extensions
- License
Micro-State-Space explores how much strategic depth can be extracted from very small, fully specified game worlds.
The project is intentionally focused on micro state spaces rather than large-scale gameplay. That design choice makes the environments ideal for:
- rapid search experiments,
- self-play comparisons,
- heuristic tuning,
- policy/value approximation,
- explanation-friendly AI behavior,
- reproducible game-theoretic analysis.
Although the boards are compact, the decision spaces are non-trivial. Each arena combines:
- tactical forcing moves,
- board geometry constraints,
- tempo and mobility management,
- threat detection,
- and multiple layers of evaluation.
Each game can be run separately as its own Streamlit application.
The agents combine:
- MCTS + PUCT for guided exploration,
- Negamax + Alpha-Beta pruning for tactical lookahead,
- heuristic evaluation for policy/value estimates,
- Q-learning style tables for tabular reinforcement updates,
- self-play training for iterative improvement.
Each module includes visual tools such as:
- board renderers,
- move-history plots,
- training curves,
- state summaries,
- winner indicators,
- and compact diagnostics for tactical patterns.
Trained agents can be:
- serialized,
- bundled into a ZIP archive,
- downloaded,
- restored later,
- and reused across sessions.
Hex-Line is a minimalist connection game played on a 7-hex cluster:
- 1 center hex
- 6 surrounding outer hexes
The game is deliberately small, but the geometry creates meaningful tactical tension.
A player wins by either:
-
forming a line through the center:
[1, 0, 4][2, 0, 5][3, 0, 6]
-
controlling 4 or more outer hexes
This dual-win structure makes the game fast, decisive, and tie-free in spirit.
- Board size: 7 cells
- Players: 2
- Move type: place one token on an empty hex
- Termination: immediate win or exhaustion of legal moves
- State encoding: board tuple + current player
- History tracking: move list and event log
Hex-Line rewards:
- center control,
- outer-ring domination,
- fork creation,
- blocking line completion,
- and tactical tempo efficiency.
The Hex-Line agent includes:
- MCTS with PUCT selection,
- negamax with alpha-beta pruning,
- heuristic policy/value estimation,
- self-play reinforcement,
- tabular Q-learning updates,
- epsilon-based exploration,
- temperature-based sampling during training.
The module includes:
- board rendering,
- training charts,
- move statistics,
- agent serialization,
- and ZIP-based load/save utilities.
Quantum Match is a 2×2 grid token-control game with a shared pool of 4 tokens.
Each turn, a player may:
- PLACE a token from the shared pool onto an empty square, or
- FLIP an opponent-controlled token to their own color
A player wins when they control all 4 squares after a move.
- Board size: 4 cells
- Token pool: 4 total tokens
- Move types: place / flip
- State encoding: board tuple + current player + remaining pool
- Termination: all squares controlled by one player
- History tracking: structured action log
Quantum Match emphasizes:
- material conversion,
- tempo,
- board saturation,
- flip pressure,
- and short-horizon tactical dominance.
Because every move can either expand presence or reverse control, the game has a compact but highly dynamic decision surface.
The Quantum Match agent includes:
- MCTS + PUCT,
- alpha-beta negamax search,
- heuristic evaluation of ownership and flip pressure,
- self-play learning,
- Q-learning style value tables,
- exploration decay over time,
- policy table updates from game outcomes.
The module includes:
- board rendering,
- action-history plots,
- training analytics,
- agent serialization,
- and ZIP packaging for portable checkpoints.
Shift-3 is a 5-square linear sliding game where pieces do not merely get placed once; they can also be shifted left or right, which makes the board dynamic and potentially cyclical.
Each player can:
- PLACE a piece,
- or SLIDE an existing piece by one square left or right
The primary winning pattern is:
- Surround pattern:
[You, Opponent, You]
This gives the game a positional, pattern-based identity rather than a pure occupancy identity.
- Board size: 5 cells
- Move types: place / slide
- Repetition awareness: position cycling is tracked
- Termination safeguards: max move limit and repeated-state logic
- History tracking: full move record and event log
Shift-3 is particularly useful for studying:
- mobility constraints,
- positional loops,
- repetition handling,
- threat distance,
- sliding piece geometry,
- and constrained search in a small state space.
The Shift-3 agent includes:
- MCTS with PUCT,
- negamax + alpha-beta pruning,
- evaluation focused on surround threats and mobility,
- loop-aware tactical reasoning,
- self-play reinforcement,
- Q-learning style table updates,
- exploration and temperature control.
The module includes:
- board rendering,
- heatmap visualization,
- move-history analysis,
- training charts,
- agent save/load support,
- and ZIP packaging.
The repository uses a layered decision stack that is intentionally lightweight, inspectable, and appropriate for tiny game spaces.
Used to explore candidate actions by simulating outcomes and accumulating visit/value statistics.
The tree policy uses a PUCT-style formulation to balance:
- prior preference,
- visit counts,
- exploitation,
- exploration.
Used as a tactical search baseline to:
- evaluate forcing sequences,
- prune low-value branches,
- and provide deeper local reasoning than shallow heuristics alone.
Each game includes game-specific evaluation terms, such as:
- center control,
- outer-ring pressure,
- flip pressure,
- surround distance,
- mobility advantage,
- blocking potential,
- repetition risk.
Agents maintain compact state-action or state-value tables, making the system:
- easy to inspect,
- easy to serialize,
- and easy to experiment with.
The repository includes self-play training loops that can:
- generate episodes,
- collect outcome statistics,
- update tables,
- decay exploration,
- and visualize learning progress.
- Python 3.10 or later
- Streamlit
- Matplotlib
- Pandas
- NumPy
pip install streamlit matplotlib pandas numpyIf you are using a virtual environment:
python -m venv .venv
# Windows
.venv\Scripts\activate
# macOS / Linux
source .venv/bin/activate
pip install streamlit matplotlib pandas numpyEach file is a standalone Streamlit app.
streamlit run hex_line.pystreamlit run quantum_match.pystreamlit run shift3.pyEach arena includes a self-play workflow in the sidebar.
Typical training flow:
- open the app,
- choose the number of training episodes,
- configure search depth / simulations / learning rate settings,
- start self-play training,
- review the chart outputs,
- save the resulting agents as a ZIP archive.
During training, the agents may:
- choose exploratory actions,
- decay epsilon over time,
- decay temperature over time,
- update Q-style tables,
- collect win/loss statistics,
- and store episode-level metrics.
The repository surfaces trends such as:
- win rates,
- draw rates,
- epsilon decay,
- temperature decay,
- average game length,
- move distributions,
- and training progress over episodes.
The repository supports portable agent bundles.
Each ZIP archive contains:
agent1.jsonagent2.jsonconfig.json
The saved data typically includes:
- learning rate,
- gamma,
- epsilon,
- temperature,
- search depth,
- MCTS simulations,
- Q-tables or state tables,
- and configuration metadata.
This makes it easy to:
- preserve trained agents,
- compare checkpoints,
- share experiment snapshots,
- and reload models for later analysis.
- train agents,
- download the ZIP,
- upload the ZIP later,
- resume evaluation or comparison.
The repository currently centers on three Python modules:
Micro-State-Space/
├── hex_line.py
├── quantum_match.py
└── shift3.py
-
hex_line.py
Implements the Hex-Line board, AI agent, visualizations, training loop, and save/load workflow. -
quantum_match.py
Implements the Quantum Match board, structured actions, AI agent, visualizations, training loop, and save/load workflow. -
shift3.py
Implements the Shift-3 board, sliding mechanics, repetition-aware logic, AI agent, visualizations, training loop, and save/load workflow.
The code base uses the following Python libraries:
- streamlit — web UI and app framework
- matplotlib — board rendering and analytics
- pandas — tabular history analysis
- numpy — numerical helpers, especially in Shift-3
- random, math, time, json, zipfile, io, copy, dataclasses, typing — standard library support for game logic, serialization, and training
The project intentionally uses compact boards to keep the state space small enough for:
- deep inspection,
- fast testing,
- and readable AI behavior.
The rules are encoded directly in Python, making the environments:
- reproducible,
- transparent,
- and easy to debug.
The code is suitable for examining:
- search-vs-heuristic tradeoffs,
- self-play dynamics,
- reinforcement from compact state spaces,
- and game-specific evaluation functions.
Rather than hiding logic inside a large neural network, the repository uses explicit, inspectable scoring terms and search procedures. That makes it especially valuable for educational and experimental use.
This repository can be used for:
- adversarial search demonstrations,
- AI agent benchmarking,
- heuristic evaluation experiments,
- tabular reinforcement learning tests,
- self-play comparisons,
- compact state-space analysis,
- teaching MCTS and alpha-beta pruning,
- studying repetition handling in board games,
- studying board geometry and tactical forcing patterns.
Possible directions for further development include:
- adding a unified launcher page for all three games,
- introducing stronger evaluation functions,
- recording and replaying complete game transcripts,
- exporting training logs to CSV or Parquet,
- adding head-to-head tournament modes,
- implementing deeper search variants,
- integrating visualization dashboards for agent comparisons,
- and building a shared configuration layer for all modules.
MIT License
Copyright (c) 2026 Devanik
Micro-State-Space is a technically focused collection of tiny adversarial environments built for clarity, experimentation, and controlled AI evaluation.
Its core value lies in the combination of:
- small boards,
- complete rule definitions,
- search-based decision making,
- self-play learning,
- rich visual analytics,
- and portable agent checkpoints.
Even though the environments are minimal, the design space is not. Each arena is compact enough to inspect, yet rich enough to study meaningful tactical and algorithmic behavior.
- Minimal state spaces with rich strategy
- Search-driven decision making (MCTS + Alpha-Beta)
- Lightweight reinforcement learning (tabular)
- Self-play and policy refinement
- Interactive AI visualization
