Skip to content

Energy-Based Models

Why the Exponential? From Max‑Entropy RL to the Boltzmann Distribution

Modern RL, attention mechanisms, classification, energy-based modeling, and statistical mechanics keep arriving at the same exponential shape:

\[ p(x)\;\propto\;\exp(\text{logits or reward}(x)/T)\quad\text{or}\quad p(x)\;\propto\;\exp(-E(x)/T). \]

Why does the exponential keep showing up, and what does the "temperature" actually do?