Skip to content

The Matrix: Prisoner's Dilemma Repeated

the_matrix/prisoners_dilemma_repeated is one member of the native Matrix-game family. It ports Melting Pot's two-player repeated Prisoner's Dilemma in the Matrix substrate into a PettingZoo ParallelEnv.

Screenshot

The Matrix Prisoner's Dilemma repeated global state

API

Use the family dispatcher:

from mp.the_matrix import env, parallel_env

parallel = parallel_env("prisoners_dilemma_repeated")
aec = env("prisoners_dilemma_repeated")

The dispatcher also accepts the upstream name: "prisoners_dilemma_in_the_matrix__repeated".

Or import the variant directly:

from mp.the_matrix.prisoners_dilemma_repeated import (
    PrisonersDilemmaRepeatedConfig,
    parallel_env,
)

config = PrisonersDilemmaRepeatedConfig()
env = parallel_env(config=config, render_mode="rgb_array")

Agents are player_0 and player_1. Each infos[agent] includes meltingpot_player_index, preserving Melting Pot's 1-based player ID.

Actions and observations

The action space is Discrete(8):

Action Meaning
0 no-op
1 forward
2 backward
3 step left
4 step right
5 turn left
6 turn right
7 fire interaction beam

Default per-agent observations are:

Key Shape Type
RGB (40, 40, 3) uint8
INVENTORY (2,) float64
READY_TO_SHOOT () float64
INTERACTION_INVENTORIES (2, 2) float64

state() and render_mode="rgb_array" return the global world RGB frame with shape (120, 184, 3). Use PrisonersDilemmaRepeatedConfig(observation_mode="global") to return global RGB observations to every agent.

Mechanics

Players collect two resource classes representing cooperate and defect. They start with one of each resource, but an interaction is blocked until both players have collected at least one resource in the current inventory cycle.

When an interaction beam hits a ready partner, the shooter is the row player and the target is the column player. Rewards are computed from normalized inventories and the Prisoner's Dilemma matrices:

row = [[3, 0], [5, 1]]
column = [[3, 5], [0, 1]]

The lower-reward player loses; ties go to the row player. In this repeated variant both players' inventories reset after an interaction, both players are removed, and both respawn after 5 frames. Interaction effects are displayed during the 16-frame freeze window.

Resources regenerate from their wait state with probability 0.02 after a 10-frame delay. Resources can also be destroyed by three interaction-beam hits. If all players are removed at once, dormant resources are immediately restored.

Episodes truncate at max_episode_steps=5000. Stochastic termination starts after frame 1000, checks every 100 frames, and terminates with probability 0.1.

Reward strategy and failure modes

The efficient outcome is repeated mutual cooperation: both players collect cooperation resources and interact when both inventories support the cooperative cell. A defection resource can win the immediate interaction against a cooperator, but persistent defection drives the pair toward the lower mutual defection payoff.

Bad equilibria include both players stockpiling defect resources, refusing to interact unless they can exploit the other player, or repeatedly destroying resources so neither player becomes ready. Because the lower-reward player is removed after interaction, exploitative cycles can also create long periods of waiting and respawn churn.

Playable notebook

Launch the playable notebook with:

uv run marimo run notebooks/the_matrix/prisoners_dilemma_repeated_mo.py

Controls:

  • WASD or arrow keys move the active player.
  • Q/E turn the active player.
  • SPACE fires the interaction beam.
  • TAB switches the controlled player.
  • ESC closes the pygame window.

The notebook exposes a resource-regrowth control for playability. Environment defaults remain parity-oriented.