Robots, Games, Life, Markets |
![]() |
Reinforcement learning |
Game theory |
Evolutionary dynamics | Market calculation |
---|---|---|---|
agent | player | population | actor |
action | move | subspecies | PPF/CPF bundle |
policy | strategy | subspecies distribution | product lines |
Total reward | payoff | fitness | profit |
multi-agent Markov decision process |
game | game (Competition) | market |
environment | noncompetitive second player |
niche | niche |
environment dynamics | move by nature | move by Nature | exogenous shocks |
MDP | State-based infinite game 2 | ecology | industry |
episode | iteration | generation | timeless? (for complete markets) |
multi-agent multi-armed bandit | Matrix game | Matrix game | exchange |
Bellman optimality | equilibria | stable strategies / Liapunov stable states |
general equilibrium |
optimal substructure | subgame perfect equilibrium |
subgame perfect equilibrium |
partial equilibrium |
known dynamics & rewards | common knowledge | given fitness function | perfect information |
reward design | mechanism design | intelligent design 4 | matching theory? |
approximation ratio? | price of Anarchy | Cost of competition | Theory of the second best? |
coalition formation | coalition games | cultural evolution | coalition formation |
MDP: P-complete | Nash eq: PPAD-complete | ESS: Σ^𝑃_2 complete (NP^SAT) | Arrow-Debreu: PPAD |
Value iteration: O(|A| |S|^2) per iteration |
Approx: at most O(n^{log n/e^2}) |
? | O(n^2 log(1/h) for lateral exchange |
Dynamic Bellman learning | No learning 1 | Replicator dynamics as learning | Lateral exchange pricing |
agent focussed (process; planning; computational learning) |
game focussed (equilibria; perfect rationality) |
dynamics focussed (process; replication; change in mix) |
game focussed (equilibria; perfect rationality) |
Engineering | Normative | Descriptive | Thick |
Physics is the study of physics; economics studies economics. This terminology is confusing, since it’s extremely dubious for even physics to claim that their study is a complete model, structurally identical with the data-generating process. So to be painfully clear: The above is a map from theory to theory, not phenomenon to phenomenon.
(For making the correspondence really nice, you could frame evolution from the perspective of a single actor like the others - a hypothetical organism behind a veil of ignorance, maximising their expected fitness by selecting which subspecies to join. The subspecies distribution is then their chance of switching to a given subspecies.)
What to call the topic in common? ‘Distributed optimisation’? ‘Compositional optimisation’? 3
See also
- Mapping metaphysics, mathematics, and programming
- In Soviet Russia, Optimisation Problem Solves You (2012)
- An Analysis of Stochastic Game Theory for Multiagent Reinforcement Learning (2000)
- Learning Through Reinforcement and Replicator Dynamics (1997)
- Decentralized partially-observable Markov decision process
- Stochastic Recursive Variance Reduction for… Compositional Optimization (2020)
- I just found this superior treatment by Gwern.
- Proofs as programs, propositions as types, relational types as categories
- Physics, Topology, Logic and Computation: A Rosetta Stone (2009)
- Though there are new forms which do learn, including important relaxations like Counterfactual Regret Minimization. Thanks to Misha Yagudin for this point.
- often single-player, stochastic, discrete action, imperfect information
-
Compositional optimization can be used to formulate many important machine learning problems, e.g. reinforcement learning (Sutton and Barto, 1998), risk management (Dentcheva et al., 2017), multi-stage stochastic programming (Shapiro et al., 2009), deep neural nets (Yang et al., 2019), etc.
- Damnit Misha!
Comments
Tags: RL, game-theory, economics, rosetta-stone