Real Matrix¶
mp.real_matrix provides direct normal-form versions of the payoff games used
by Melting Pot's in-the-matrix substrates. These environments are not spatial:
each step is one simultaneous matrix-action choice by a row player and a column
player.
Screenshot¶

API¶
Use the family dispatcher:
from mp.real_matrix import env, parallel_env
parallel = parallel_env("chicken")
aec = env("chicken")
Spatial Matrix aliases also work. For example, these point at the same normal-form payoff game:
Direct imports are available:
from mp.real_matrix.prisoners_dilemma import (
PrisonersDilemmaConfig,
parallel_env,
)
env = parallel_env(config=PrisonersDilemmaConfig())
Games¶
| Short name | Actions | Default horizon |
|---|---|---|
bach_or_stravinsky |
2 | 100 |
chicken |
2 | 100 |
prisoners_dilemma |
2 | 100 |
stag_hunt |
2 | 100 |
pure_coordination |
3 | 100 |
rationalizable_coordination |
3 | 100 |
running_with_scissors |
3 | 100 |
Actions and observations¶
Both agents use Discrete(num_actions). player_0 selects the row action and
player_1 selects the column action.
Observations contain:
| Key | Meaning |
|---|---|
RGB |
Rendered payoff matrix with the last submitted cell highlighted |
PAYOFFS |
Global (2, n, n) payoff stack, or local (n, n) own payoff matrix |
LAST_ACTIONS |
[row_action, column_action], initialized to [-1, -1] |
Set observation_mode="local" on the config to give each player only its own
payoff table in own-action/other-action orientation.
Reward strategy and failure modes¶
Optimal behavior depends on the selected payoff game. Coordination games reward agents for selecting compatible actions; Stag Hunt rewards mutual commitment to the high-payoff action; Prisoner's Dilemma exposes the tension between mutual cooperation and unilateral defection; Chicken rewards avoiding mutual crash outcomes while not yielding too often.
Bad equilibria are the familiar matrix-game traps: mutual defection in Prisoner's Dilemma, safe but low-payoff hare hunting in Stag Hunt, miscoordination in pure coordination games, and alternating exploitation in asymmetric games. The rendered payoff table is useful for debugging whether a learned policy has found the efficient cell or a stable but lower-value equilibrium.
Playable notebooks¶
Launch a notebook with:
Controls are shared across the family: W/S or Up/Down choose the row action, A/D or Left/Right choose the column action, Space or Enter submits the joint action, and Esc closes pygame.