Skip to content

Clean Up

clean_up/clean_up is a native PettingZoo port of Melting Pot's seven-player Clean Up substrate. Players collect apples in an orchard, clean river dirt with a beam, and can zap each other.

Screenshot

Clean Up global state

API

See the generated Clean Up API reference for signatures and public objects.

Use the family dispatcher:

from mp.clean_up import env, parallel_env

parallel = parallel_env("clean_up")
aec = env("clean_up")

Or import the substrate directly:

from mp.clean_up.clean_up import CleanUpConfig, parallel_env

config = CleanUpConfig()
env = parallel_env(config=config, render_mode="rgb_array")

Agents are named player_0 through player_6. Each infos[agent] includes meltingpot_player_index, preserving Melting Pot's 1-based player ID.

Actions and observations

The action space is Discrete(9):

Action Meaning
0 no-op
1 forward
2 backward
3 step left
4 step right
5 turn left
6 turn right
7 fire zap
8 fire clean

Default per-agent observations are:

Key Shape Type
RGB (88, 88, 3) uint8
READY_TO_SHOOT () float64
NUM_OTHERS_WHO_CLEANED_THIS_STEP () float64

state() and render_mode="rgb_array" return the global world RGB frame with shape (168, 240, 3).

Mechanics

Live apples give the collecting player +1, then return to a wait state. Apple growth depends on river cleanliness and uses the upstream defaults: max_apple_growth_rate=0.05, threshold_depletion=0.4, and threshold_restoration=0.0.

Dirt starts from the upstream map, then one clean river dirt site may become dirty each step after frame 50 with probability 0.5. The clean beam has cooldown 2, length 3, and radius 1. The zap beam has cooldown 10, length 3, radius 1, and zapped avatars respawn after 50 frames.

Reward strategy and failure modes

The orchard pays immediately, but sustained optimal reward requires enough players to clean the river so apple growth remains high. Productive agents alternate between harvesting and cleaning, using the NUM_OTHERS_WHO_CLEANED_THIS_STEP observation to avoid all cleaning at once when apples are already plentiful.

Bad equilibria are the classic tragedy of the commons: every player harvests while nobody cleans, apple growth collapses, and the group settles into low reward. The opposite failure is over-cleaning, where too many players maintain the river but leave ripe apples uncollected. Zap-heavy policing can also reduce the labor pool and make both problems worse.

Playable notebook

Launch the playable notebook with:

uv run marimo run notebooks/clean_up/clean_up_mo.py

Controls:

  • WASD or arrow keys move the active player.
  • Q/E turn the active player.
  • SPACE fires the zap beam.
  • C fires the clean beam.
  • TAB switches the controlled player.
  • ESC closes the pygame window.

The notebook exposes higher apple growth and dirt spawn controls for playability. The environment defaults remain parity-oriented.