Territory¶
mp.territory provides native PettingZoo ports of Melting Pot's Territory
variants. Players claim resource walls with a paintbrush or short paint beam,
wait for paint to dry, and receive stochastic reward from active territory.
Screenshots¶
| Variant | Screenshot |
|---|---|
| Open | ![]() |
| Rooms | ![]() |
| Inside Out | ![]() |
API¶
See the generated Territory API reference for signatures and public objects.
Use the family dispatcher:
Short names and upstream Melting Pot config names are accepted:
| Short name | Upstream alias | Players | Topology | Global RGB |
|---|---|---|---|---|
open |
territory__open |
9 | bounded | (184, 312, 3) |
rooms |
territory__rooms |
9 | torus | (168, 168, 3) |
inside_out |
territory__inside_out |
5 | bounded | (184, 184, 3) |
Direct imports are also available:
from mp.territory.open import TerritoryOpenConfig, parallel_env
config = TerritoryOpenConfig()
env = parallel_env(config=config, render_mode="rgb_array")
Agents are named player_0, player_1, and so on. Each infos[agent]
includes meltingpot_player_index, preserving Melting Pot's 1-based player ID.
Actions and observations¶
The action space is Discrete(9):
| Action | Meaning |
|---|---|
0 |
no-op |
1 |
forward |
2 |
backward |
3 |
step left |
4 |
step right |
5 |
turn left |
6 |
turn right |
7 |
fire zap |
8 |
fire claim paint |
Default observations are:
| Key | Shape | Type |
|---|---|---|
RGB |
(88, 88, 3) |
uint8 |
READY_TO_SHOOT |
() |
float64 |
Set observation_mode="global" to return the world RGB frame for every agent.
Mechanics¶
Resource walls start unclaimed. A player claims a resource either by facing it
with the paintbrush or by firing the claim beam. Claimed paint dries after 25
steps by default; active resources then deliver +1 reward stochastically at
rate 0.01 per step.
Zap beams damage resources and sanction players. A resource is permanently destroyed after two zap hits and becomes passable floor. A player is frozen by the first zap sanction and permanently removed by the second. Removed players' claimed resources return to the unclaimed state.
Reward strategy and failure modes¶
Agents maximize reward by claiming resources, defending them long enough for paint to dry, and expanding only when the expected active-territory reward beats the cost of travel and conflict. Good policies repair contested borders, avoid destroying valuable resources, and sanction selectively when it protects more future reward than it removes.
Bad equilibria include mutually destructive zap wars that turn resources into rubble, over-expansion that leaves claimed walls undefended, and local truces where everyone paints small safe corners while high-value contested resources remain inactive. Removed players also reset their territory, so repeated sanctions can lower total resource yield for everyone.
Playable notebooks¶
Launch a variant notebook with:
Controls are shared across the family: WASD/arrows move, Q/E turn, SPACE fires the zap beam, F or left Shift fires claim paint, TAB switches the active player, and ESC closes pygame.


