Statistical and Algorithmic Foundations of Reinforcement Learning

Paper · arXiv 2507.14444 · Published July 19, 2025
Reinforcement Learning

Abstract As a paradigm for sequential decision making in unknown environments, reinforcement learning (RL) has received a flurry of attention in recent years. However, the explosion of model complexity in emerging applications and the presence of nonconvexity exacerbate the challenge of achieving efficient RL in sample-starved situations, where data collection is expensive, time-consuming, or even high-stakes (e.g., in clinical trials, autonomous systems, and online advertising). How to understand and enhance the sample and computational efficacies of RL algorithms is thus of great interest. In this tutorial, we aim to introduce several important algorithmic and theoretical developments in RL, highlighting the connections between new ideas and classical topics. Employing Markov Decision Processes as the central mathematical model, we cover several distinctive RL scenarios (i.e., RL with a simulator, online RL, offline RL, robust RL, and RL with human feedback), and present several mainstream RL approaches (i.e., model-based approach, value-based approach, and policy optimization). Our discussions gravitate around the issues of sample complexity, computational efficiency, as well as algorithm-dependent and information-theoretic lower bounds from a nonasymptotic viewpoint.

Introduction. Reinforcement learning (RL) [107, 109], which is frequently modeled as learning and decision making in a Markov decision process (MDP), is garnering growing interest in recent years due to its remarkable success in practice. A core objective of RL is to search for a policy— based on a collection of noisy data samples—that approximately maximizes the expected cumulative rewards in an MDP, without direct access to a precise description of the underlying model.1 In contemporary applications, it is increasingly more common to encounter environments with prohibitively large state and action space, thus exacerbating the challenge of collecting abundant samples, when data collection is expensive, time-consuming, or even high-stake (such as clinical trials, online advertisements, autonomous systems, to name just a few).

Discussion / Conclusion. In this tutorial, we have presented a small sample of the state-of-the-art advances in various RL settings and RL approaches, offering a fruitful interplay between statistical methodologies and RL algorithm development. We hope that this tutorial can equip new researchers with the core toolkits of modern RL, while motivating further theoretical and algorithmic investigations. Before concluding, there are several remarks that we would like to make. First, while the algorithms introduced in this tutorial enjoy strong (often optimal) theoretical guarantees, many of them are designed under idealized mathematical assumptions. In practical largescale RL applications—such as robotics, recommendation systems, and game playing—these conditions might not be met. As a result, practitioners often turn to alternative approaches that, while lacking optimal theoretical support, offer superior empirical performance in more complex scenarios.