Daycare¶
daycare/daycare is a native PettingZoo port of Melting Pot's Daycare
substrate. A child and a parent coordinate around fruit: the child gets hungry
and needs bananas, while the parent can reliably pick fruit but perceives
bananas as apples.
Screenshot¶

API¶
See the generated Daycare API reference for signatures and public objects.
Use the family dispatcher:
from mp.daycare import env, parallel_env
parallel = parallel_env("daycare", render_mode="rgb_array")
aec = env("daycare")
Or import the variant directly:
from mp.daycare.daycare import DaycareConfig, parallel_env
config = DaycareConfig(observation_mode="global")
env = parallel_env(config=config)
Agents are named player_0 and player_1. By default, player_0 is the child
and player_1 is the parent.
Actions and observations¶
The action space is Discrete(9):
| Action | Meaning |
|---|---|
0 |
no-op |
1 |
forward |
2 |
backward |
3 |
step left |
4 |
step right |
5 |
turn left |
6 |
turn right |
7 |
eat |
8 |
grasp or drop |
Default per-agent observations are:
| Key | Shape | Type |
|---|---|---|
RGB |
(88, 88, 3) |
uint8 |
HUNGER |
() |
float64 |
state() and render_mode="rgb_array" return the global world RGB frame with
shape (104, 160, 3). Pass observation_mode="global" to return a global RGB
observation for each agent.
Mechanics¶
Fruit sites are empty, apple trees, apple shrubs, banana trees, or banana
shrubs. Parent grasping succeeds on trees and shrubs. Child grasping fails on
trees and succeeds on shrubs with probability 0.3.
The child earns reward and resets hunger by eating bananas. The parent earns reward for eating fruit only while the child is active. If the child becomes hungry for 200 live frames, they enter a wait state and respawn near the parent after 100 frames.
Reward strategy and failure modes¶
The best joint strategy keeps the child fed with bananas while letting the parent collect fruit whenever the child is active. The parent is the reliable picker and can move fruit to the child; the child should eat bananas to reset hunger and use shrubs opportunistically when direct pickup succeeds.
Bad equilibria include the parent eating while the child is absent, the child starving because bananas are misidentified or not delivered, and both agents chasing apples that do not reset child hunger. The asymmetric perceptions make naive policies brittle: the parent cannot visually distinguish bananas, so successful coordination often depends on location conventions.
Playable notebook¶
Launch the playable notebook with:
Controls:
- WASD or arrow keys move the active player.
- Q/E turn the active player.
- SPACE grasps or drops fruit.
- C eats held fruit.
- TAB switches the controlled player.
- ESC closes the pygame window.