Real Matrix¶

mp.real_matrix provides direct normal-form versions of the payoff games used by Melting Pot's in-the-matrix substrates. These environments are not spatial: each step is one simultaneous matrix-action choice by a row player and a column player.

Screenshot¶

Real Matrix Prisoner's Dilemma rendered payoff table

API¶

Use the family dispatcher:

from mp.real_matrix import env, parallel_env

parallel = parallel_env("chicken")
aec = env("chicken")

Spatial Matrix aliases also work. For example, these point at the same normal-form payoff game:

parallel_env("chicken_repeated")
parallel_env("chicken_in_the_matrix__arena")

Direct imports are available:

from mp.real_matrix.prisoners_dilemma import (
    PrisonersDilemmaConfig,
    parallel_env,
)

env = parallel_env(config=PrisonersDilemmaConfig())

Games¶

Short name	Actions	Default horizon
`bach_or_stravinsky`	2	100
`chicken`	2	100
`prisoners_dilemma`	2	100
`stag_hunt`	2	100
`pure_coordination`	3	100
`rationalizable_coordination`	3	100
`running_with_scissors`	3	100

Actions and observations¶

Both agents use Discrete(num_actions). player_0 selects the row action and player_1 selects the column action.

Observations contain:

Key	Meaning
`RGB`	Rendered payoff matrix with the last submitted cell highlighted
`PAYOFFS`	Global `(2, n, n)` payoff stack, or local `(n, n)` own payoff matrix
`LAST_ACTIONS`	`[row_action, column_action]`, initialized to `[-1, -1]`

Set observation_mode="local" on the config to give each player only its own payoff table in own-action/other-action orientation.

Reward strategy and failure modes¶

Optimal behavior depends on the selected payoff game. Coordination games reward agents for selecting compatible actions; Stag Hunt rewards mutual commitment to the high-payoff action; Prisoner's Dilemma exposes the tension between mutual cooperation and unilateral defection; Chicken rewards avoiding mutual crash outcomes while not yielding too often.

Bad equilibria are the familiar matrix-game traps: mutual defection in Prisoner's Dilemma, safe but low-payoff hare hunting in Stag Hunt, miscoordination in pure coordination games, and alternating exploitation in asymmetric games. The rendered payoff table is useful for debugging whether a learned policy has found the efficient cell or a stable but lower-value equilibrium.

Playable notebooks¶

Launch a notebook with:

uv run marimo run notebooks/real_matrix/prisoners_dilemma_mo.py

Controls are shared across the family: W/S or Up/Down choose the row action, A/D or Left/Right choose the column action, Space or Enter submits the joint action, and Esc closes pygame.