Reinforcement Learning: An Overview
- URL: http://arxiv.org/abs/2412.05265v3
- Date: Mon, 19 May 2025 15:12:39 GMT
- Title: Reinforcement Learning: An Overview
- Authors: Kevin Murphy,
- Abstract summary: This manuscript gives a big-picture, up-to-date overview of the field of (deep) reinforcement learning and sequential decision making.<n>It covers value-based methods, policy-based methods, model-based methods, multi-agent RL, LLMs and RL, and various other topics.
- Score: 6.146578707999203
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This manuscript gives a big-picture, up-to-date overview of the field of (deep) reinforcement learning and sequential decision making, covering value-based methods, policy-based methods, model-based methods, multi-agent RL, LLMs and RL, and various other topics (e.g., offline RL, hierarchical RL, intrinsic reward).
Related papers
- Statistical and Algorithmic Foundations of Reinforcement Learning [45.707617428078585]
sequential learning (RL) has received a flurry of attention in recent years.<n>We aim to introduce several important developments in RL, highlighting the connections between new ideas classical topics.
arXiv Detail & Related papers (2025-07-19T02:42:41Z) - Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities [62.05713042908654]
This paper provides a review of advances in Large Language Models (LLMs) alignment through the lens of inverse reinforcement learning (IRL)<n>We highlight the necessity of constructing neural reward models from human data and discuss the formal and practical implications of this paradigm shift.
arXiv Detail & Related papers (2025-07-17T14:22:24Z) - A Technical Survey of Reinforcement Learning Techniques for Large Language Models [33.38582292895673]
Reinforcement Learning (RL) has emerged as a transformative approach for aligning and enhancing Large Language Models (LLMs)<n>RLHF remains dominant for alignment, and outcome-based RL such as RLVR significantly improves stepwise reasoning.<n> persistent challenges such as reward hacking, computational costs, and scalable feedback collection underscore the need for continued innovation.
arXiv Detail & Related papers (2025-07-05T19:13:00Z) - Reinforced MLLM: A Survey on RL-Based Reasoning in Multimodal Large Language Models [22.796496516709514]
This paper provides a systematic review of recent advances in reinforcement learning (RL)-based reasoning for Multimodal Large Language Models (MLLMs)<n>We highlight two main RL paradigms, value-model-free and value-model-based methods, and analyze how RL enhances reasoning abilities by optimizing reasoning trajectories and aligning multimodal information.<n>We provide an extensive overview of benchmark datasets, evaluation protocols, and current limitations, and propose future research directions to address challenges such as sparse rewards, inefficient cross-modal reasoning, and real-world deployment constraints.
arXiv Detail & Related papers (2025-04-30T03:14:28Z) - Introduction to Reinforcement Learning [2.52299400625445]
Reinforcement Learning (RL) focuses on training agents to make decisions by interacting with their environment to maximize cumulative rewards.<n>This paper provides an overview of RL, covering its core concepts, methodologies, and resources for further learning.
arXiv Detail & Related papers (2024-08-13T23:08:06Z) - Survey on Large Language Model-Enhanced Reinforcement Learning: Concept, Taxonomy, and Methods [18.771658054884693]
Large language models (LLMs) emerge as a promising avenue to augment reinforcement learning (RL) in aspects such as multi-task learning, sample efficiency, and high-level task planning.
We propose a structured taxonomy to systematically categorize LLMs' functionalities in RL, including four roles: information processor, reward designer, decision-maker, and generator.
arXiv Detail & Related papers (2024-03-30T08:28:08Z) - ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL [80.10358123795946]
We develop a framework for building multi-turn RL algorithms for fine-tuning large language models.
Our framework adopts a hierarchical RL approach and runs two RL algorithms in parallel.
Empirically, we find that ArCHer significantly improves efficiency and performance on agent tasks.
arXiv Detail & Related papers (2024-02-29T18:45:56Z) - Towards an Information Theoretic Framework of Context-Based Offline Meta-Reinforcement Learning [48.79569442193824]
We show that COMRL algorithms are essentially optimizing the same mutual information objective between the task variable $M$ and its latent representation $Z$ by implementing various approximate bounds.
As demonstrations, we propose a supervised and a self-supervised implementation of $I(Z; M)$, and empirically show that the corresponding optimization algorithms exhibit remarkable generalization across a broad spectrum of RL benchmarks.
This work lays the information theoretic foundation for COMRL methods, leading to a better understanding of task representation learning in the context of reinforcement learning.
arXiv Detail & Related papers (2024-02-04T09:58:42Z) - Masked Modeling for Self-supervised Representation Learning on Vision
and Beyond [69.64364187449773]
Masked modeling has emerged as a distinctive approach that involves predicting parts of the original data that are proportionally masked during training.
We elaborate on the details of techniques within masked modeling, including diverse masking strategies, recovering targets, network architectures, and more.
We conclude by discussing the limitations of current techniques and point out several potential avenues for advancing masked modeling research.
arXiv Detail & Related papers (2023-12-31T12:03:21Z) - Unified Off-Policy Learning to Rank: a Reinforcement Learning
Perspective [61.4025671743675]
Off-policy learning to rank methods often make strong assumptions about how users generate the click data.
We show that offline reinforcement learning can adapt to various click models without complex debiasing techniques and prior knowledge of the model.
Results on various large-scale datasets demonstrate that CUOLR consistently outperforms the state-of-the-art off-policy learning to rank algorithms.
arXiv Detail & Related papers (2023-06-13T03:46:22Z) - Reinforcement Learning with Partial Parametric Model Knowledge [3.3598755777055374]
We adapt reinforcement learning methods for continuous control to bridge the gap between complete ignorance and perfect knowledge of the environment.
Our method, Partial Knowledge Least Squares Policy Iteration (PLSPI), takes inspiration from both model-free RL and model-based control.
arXiv Detail & Related papers (2023-04-26T01:04:35Z) - A Tutorial on Meta-Reinforcement Learning [69.76165430793571]
We cast the development of better RL algorithms as a machine learning problem itself in a process called meta-RL.<n>We discuss how, at a high level, meta-RL research can be clustered based on the presence of a task distribution and the learning budget available for each individual task.<n>We conclude by presenting the open problems on the path to making meta-RL part of the standard toolbox for a deep RL practitioner.
arXiv Detail & Related papers (2023-01-19T12:01:41Z) - Inverse Reinforcement Learning for Text Summarization [52.765898203824975]
We introduce inverse reinforcement learning (IRL) as an effective paradigm for training abstractive summarization models.
Experimental results across datasets in different domains demonstrate the superiority of our proposed IRL model for summarization over MLE and RL baselines.
arXiv Detail & Related papers (2022-12-19T23:45:05Z) - A Comprehensive Survey of Data Augmentation in Visual Reinforcement Learning [53.35317176453194]
Data augmentation (DA) has become a widely used technique in visual RL for acquiring sample-efficient and generalizable policies.
We present a principled taxonomy of the existing augmentation techniques used in visual RL and conduct an in-depth discussion on how to better leverage augmented data.
As the first comprehensive survey of DA in visual RL, this work is expected to offer valuable guidance to this emerging field.
arXiv Detail & Related papers (2022-10-10T11:01:57Z) - Unsupervised Representation Learning in Deep Reinforcement Learning: A Review [1.2016264781280588]
This review addresses the problem of learning abstract representations of the measurement data in the context of Deep Reinforcement Learning (DRL)
This review provides a comprehensive and complete overview of unsupervised representation learning in DRL by describing the main Deep Learning tools used for learning representations of the world.
arXiv Detail & Related papers (2022-08-27T09:38:56Z) - Jump-Start Reinforcement Learning [68.82380421479675]
We present a meta algorithm that can use offline data, demonstrations, or a pre-existing policy to initialize an RL policy.
In particular, we propose Jump-Start Reinforcement Learning (JSRL), an algorithm that employs two policies to solve tasks.
We show via experiments that JSRL is able to significantly outperform existing imitation and reinforcement learning algorithms.
arXiv Detail & Related papers (2022-04-05T17:25:22Z) - PoBRL: Optimizing Multi-Document Summarization by Blending Reinforcement
Learning Policies [68.8204255655161]
We propose a reinforcement learning based framework PoBRL for solving multi-document summarization.
Our strategy decouples this multi-objective optimization into different subproblems that can be solved individually by reinforcement learning.
Our empirical analysis shows state-of-the-art performance on several multi-document datasets.
arXiv Detail & Related papers (2021-05-18T02:55:42Z) - Provable Multi-Objective Reinforcement Learning with Generative Models [98.19879408649848]
We study the problem of single policy MORL, which learns an optimal policy given the preference of objectives.
Existing methods require strong assumptions such as exact knowledge of the multi-objective decision process.
We propose a new algorithm called model-based envelop value (EVI) which generalizes the enveloped multi-objective $Q$-learning algorithm.
arXiv Detail & Related papers (2020-11-19T22:35:31Z) - MOReL : Model-Based Offline Reinforcement Learning [49.30091375141527]
In offline reinforcement learning (RL), the goal is to learn a highly rewarding policy based solely on a dataset of historical interactions with the environment.
We present MOReL, an algorithmic framework for model-based offline RL.
We show that MOReL matches or exceeds state-of-the-art results in widely studied offline RL benchmarks.
arXiv Detail & Related papers (2020-05-12T17:52:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.