Skip to content

Real Matrix

mp.real_matrix provides direct normal-form versions of the payoff games used by Melting Pot's in-the-matrix substrates. These environments are not spatial: each step is one simultaneous matrix-action choice by a row player and a column player.

Screenshot

Real Matrix Prisoner's Dilemma rendered payoff table

API

Use the family dispatcher:

from mp.real_matrix import env, parallel_env

parallel = parallel_env("chicken")
aec = env("chicken")

Spatial Matrix aliases also work. For example, these point at the same normal-form payoff game:

parallel_env("chicken_repeated")
parallel_env("chicken_in_the_matrix__arena")

Direct imports are available:

from mp.real_matrix.prisoners_dilemma import (
    PrisonersDilemmaConfig,
    parallel_env,
)

env = parallel_env(config=PrisonersDilemmaConfig())

Games

Short name Actions Default horizon
bach_or_stravinsky 2 100
chicken 2 100
prisoners_dilemma 2 100
stag_hunt 2 100
pure_coordination 3 100
rationalizable_coordination 3 100
running_with_scissors 3 100

Actions and observations

Both agents use Discrete(num_actions). player_0 selects the row action and player_1 selects the column action.

Observations contain:

Key Meaning
RGB Rendered payoff matrix with the last submitted cell highlighted
PAYOFFS Global (2, n, n) payoff stack, or local (n, n) own payoff matrix
LAST_ACTIONS [row_action, column_action], initialized to [-1, -1]

Set observation_mode="local" on the config to give each player only its own payoff table in own-action/other-action orientation.

Reward strategy and failure modes

Optimal behavior depends on the selected payoff game. Coordination games reward agents for selecting compatible actions; Stag Hunt rewards mutual commitment to the high-payoff action; Prisoner's Dilemma exposes the tension between mutual cooperation and unilateral defection; Chicken rewards avoiding mutual crash outcomes while not yielding too often.

Bad equilibria are the familiar matrix-game traps: mutual defection in Prisoner's Dilemma, safe but low-payoff hare hunting in Stag Hunt, miscoordination in pure coordination games, and alternating exploitation in asymmetric games. The rendered payoff table is useful for debugging whether a learned policy has found the efficient cell or a stable but lower-value equilibrium.

Playable notebooks

Launch a notebook with:

uv run marimo run notebooks/real_matrix/prisoners_dilemma_mo.py

Controls are shared across the family: W/S or Up/Down choose the row action, A/D or Left/Right choose the column action, Space or Enter submits the joint action, and Esc closes pygame.