Clean Up¶
clean_up/clean_up is a native PettingZoo port of Melting Pot's seven-player
Clean Up substrate. Players collect apples in an orchard, clean river dirt with
a beam, and can zap each other.
Screenshot¶

API¶
See the generated Clean Up API reference for signatures and public objects.
Use the family dispatcher:
Or import the substrate directly:
from mp.clean_up.clean_up import CleanUpConfig, parallel_env
config = CleanUpConfig()
env = parallel_env(config=config, render_mode="rgb_array")
Agents are named player_0 through player_6. Each infos[agent] includes
meltingpot_player_index, preserving Melting Pot's 1-based player ID.
Actions and observations¶
The action space is Discrete(9):
| Action | Meaning |
|---|---|
0 |
no-op |
1 |
forward |
2 |
backward |
3 |
step left |
4 |
step right |
5 |
turn left |
6 |
turn right |
7 |
fire zap |
8 |
fire clean |
Default per-agent observations are:
| Key | Shape | Type |
|---|---|---|
RGB |
(88, 88, 3) |
uint8 |
READY_TO_SHOOT |
() |
float64 |
NUM_OTHERS_WHO_CLEANED_THIS_STEP |
() |
float64 |
state() and render_mode="rgb_array" return the global world RGB frame with
shape (168, 240, 3).
Mechanics¶
Live apples give the collecting player +1, then return to a wait state. Apple
growth depends on river cleanliness and uses the upstream defaults:
max_apple_growth_rate=0.05, threshold_depletion=0.4, and
threshold_restoration=0.0.
Dirt starts from the upstream map, then one clean river dirt site may become
dirty each step after frame 50 with probability 0.5. The clean beam has
cooldown 2, length 3, and radius 1. The zap beam has cooldown 10, length 3,
radius 1, and zapped avatars respawn after 50 frames.
Reward strategy and failure modes¶
The orchard pays immediately, but sustained optimal reward requires enough
players to clean the river so apple growth remains high. Productive agents
alternate between harvesting and cleaning, using the
NUM_OTHERS_WHO_CLEANED_THIS_STEP observation to avoid all cleaning at once
when apples are already plentiful.
Bad equilibria are the classic tragedy of the commons: every player harvests while nobody cleans, apple growth collapses, and the group settles into low reward. The opposite failure is over-cleaning, where too many players maintain the river but leave ripe apples uncollected. Zap-heavy policing can also reduce the labor pool and make both problems worse.
Playable notebook¶
Launch the playable notebook with:
Controls:
- WASD or arrow keys move the active player.
- Q/E turn the active player.
- SPACE fires the zap beam.
- C fires the clean beam.
- TAB switches the controlled player.
- ESC closes the pygame window.
The notebook exposes higher apple growth and dirt spawn controls for playability. The environment defaults remain parity-oriented.