Skip to content

Daycare

daycare/daycare is a native PettingZoo port of Melting Pot's Daycare substrate. A child and a parent coordinate around fruit: the child gets hungry and needs bananas, while the parent can reliably pick fruit but perceives bananas as apples.

Screenshot

Daycare global state

API

See the generated Daycare API reference for signatures and public objects.

Use the family dispatcher:

from mp.daycare import env, parallel_env

parallel = parallel_env("daycare", render_mode="rgb_array")
aec = env("daycare")

Or import the variant directly:

from mp.daycare.daycare import DaycareConfig, parallel_env

config = DaycareConfig(observation_mode="global")
env = parallel_env(config=config)

Agents are named player_0 and player_1. By default, player_0 is the child and player_1 is the parent.

Actions and observations

The action space is Discrete(9):

Action Meaning
0 no-op
1 forward
2 backward
3 step left
4 step right
5 turn left
6 turn right
7 eat
8 grasp or drop

Default per-agent observations are:

Key Shape Type
RGB (88, 88, 3) uint8
HUNGER () float64

state() and render_mode="rgb_array" return the global world RGB frame with shape (104, 160, 3). Pass observation_mode="global" to return a global RGB observation for each agent.

Mechanics

Fruit sites are empty, apple trees, apple shrubs, banana trees, or banana shrubs. Parent grasping succeeds on trees and shrubs. Child grasping fails on trees and succeeds on shrubs with probability 0.3.

The child earns reward and resets hunger by eating bananas. The parent earns reward for eating fruit only while the child is active. If the child becomes hungry for 200 live frames, they enter a wait state and respawn near the parent after 100 frames.

Reward strategy and failure modes

The best joint strategy keeps the child fed with bananas while letting the parent collect fruit whenever the child is active. The parent is the reliable picker and can move fruit to the child; the child should eat bananas to reset hunger and use shrubs opportunistically when direct pickup succeeds.

Bad equilibria include the parent eating while the child is absent, the child starving because bananas are misidentified or not delivered, and both agents chasing apples that do not reset child hunger. The asymmetric perceptions make naive policies brittle: the parent cannot visually distinguish bananas, so successful coordination often depends on location conventions.

Playable notebook

Launch the playable notebook with:

uv run marimo run notebooks/daycare/daycare_mo.py

Controls:

  • WASD or arrow keys move the active player.
  • Q/E turn the active player.
  • SPACE grasps or drops fruit.
  • C eats held fruit.
  • TAB switches the controlled player.
  • ESC closes the pygame window.