Boat Race¶
mp.boat_race provides a native PettingZoo port of Melting Pot's
boat_race__eight_races substrate. Six players repeatedly choose partners,
board two-seat boats, and coordinate row/flail actions to cross the river and
reach the active apple bank.
Screenshot¶

API¶
See the generated Boat Race API reference for signatures and public objects.
Use the family dispatcher:
from mp.boat_race import env, parallel_env
parallel = parallel_env("eight_races")
aec = env("boat_race__eight_races")
Direct imports are also available:
from mp.boat_race.eight_races import BoatRaceEightRacesConfig, parallel_env
config = BoatRaceEightRacesConfig()
env = parallel_env(config=config, render_mode="rgb_array")
Agents are named player_0 through player_5. Each infos[agent] includes
meltingpot_player_index, preserving Melting Pot's 1-based player ID.
Actions and observations¶
The action space is Discrete(9):
| Action | Meaning |
|---|---|
0 |
no-op |
1 |
forward |
2 |
backward |
3 |
step left |
4 |
step right |
5 |
turn left |
6 |
turn right |
7 |
row |
8 |
flail |
Default per-agent observations contain RGB with shape (88, 88, 3).
Set observation_mode="global" to return the world RGB frame (304, 208, 3)
for every agent. state() and render_mode="rgb_array" also return the global
world frame.
Mechanics¶
Each episode contains eight races. A race cycle has a 75-frame partner phase, a short semaphore transition, and a 225-frame boat race. Players board open seats during partner choice. Once both seats are occupied, the boat is full and the rowers' movement is locked until they land or are disqualified.
When both rowers row in the same frame, the boat moves one cell toward the
active bank. A flail stroke gives the boat a 0.1 chance to move; a rower who
rows while their partner flails receives -0.5. Row has a 5-frame cooldown.
Respawning apples are active on one bank at a time and flip banks after each race. Single apples in the river reset between races. Players who fail to land before the race ends are disqualified; the episode ends early if all players are disqualified at a 100-frame check.
Reward strategy and failure modes¶
Optimal reward comes from quickly forming reliable pairs, boarding the same boat, rowing in sync, and reaching the active apple bank before the phase ends. Once a pair has landed, agents should gather nearby apples and be ready to adapt when the next race flips the active side.
Bad equilibria include mistrust cycles where partners flail while the other rows, leaving one player with repeated penalties, congestion around boat seats, and assortative pairing failures where some players are repeatedly stranded or disqualified while the same small coalition captures most of the apple reward.
Playable notebook¶
Launch the playable notebook with:
Controls: WASD/arrows move, Q/E turn, SPACE rows, F or left Shift flails, TAB switches the active player, and ESC closes pygame.