Good Actions Succeed, Bad Actions Generalize: A Case Study on Why RL Generalizes Better
- URL: http://arxiv.org/abs/2503.15693v1
- Date: Wed, 19 Mar 2025 21:03:27 GMT
- Title: Good Actions Succeed, Bad Actions Generalize: A Case Study on Why RL Generalizes Better
- Authors: Meng Song,
- Abstract summary: Supervised learning (SL) and reinforcement learning (RL) are widely used to train general-purpose agents for complex tasks.<n>This paper provides a direct comparison between SL and RL in terms of zero-shot generalization.
- Score: 0.3021678014343889
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Supervised learning (SL) and reinforcement learning (RL) are both widely used to train general-purpose agents for complex tasks, yet their generalization capabilities and underlying mechanisms are not yet fully understood. In this paper, we provide a direct comparison between SL and RL in terms of zero-shot generalization. Using the Habitat visual navigation task as a testbed, we evaluate Proximal Policy Optimization (PPO) and Behavior Cloning (BC) agents across two levels of generalization: state-goal pair generalization within seen environments and generalization to unseen environments. Our experiments show that PPO consistently outperforms BC across both zero-shot settings and performance metrics-success rate and SPL. Interestingly, even though additional optimal training data enables BC to match PPO's zero-shot performance in SPL, it still falls significantly behind in success rate. We attribute this to a fundamental difference in how models trained by these algorithms generalize: BC-trained models generalize by imitating successful trajectories, whereas TD-based RL-trained models generalize through combinatorial experience stitching-leveraging fragments of past trajectories (mostly failed ones) to construct solutions for new tasks. This allows RL to efficiently find solutions in vast state space and discover novel strategies beyond the scope of human knowledge. Besides providing empirical evidence and understanding, we also propose practical guidelines for improving the generalization capabilities of RL and SL through algorithm design.
Related papers
- Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining [74.83412846804977]
Reinforcement learning (RL)-based fine-tuning has become a crucial step in post-training language models.
We present a systematic end-to-end study of RL fine-tuning for mathematical reasoning by training models entirely from scratch.
arXiv Detail & Related papers (2025-04-10T17:15:53Z) - SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training [127.47044960572659]
Supervised fine-tuning (SFT) and reinforcement learning (RL) are widely used post-training techniques for foundation models.<n>This paper studies the difference between SFT and RL on generalization and memorization.<n>We show that RL, especially when trained with an outcome-based reward, generalizes across both rule-based textual and visual variants.
arXiv Detail & Related papers (2025-01-28T18:59:44Z) - SigmaRL: A Sample-Efficient and Generalizable Multi-Agent Reinforcement Learning Framework for Motion Planning [0.6668116630521236]
This paper introduces an open-source, decentralized framework named SigmaRL, designed to enhance both sample efficiency and generalization of multi-agent Reinforcement Learning (RL)
We propose five strategies to design information-dense observations, focusing on general features that are applicable to most traffic scenarios.
We train our RL agents using these strategies on an intersection and evaluate their generalization through numerical experiments across completely unseen traffic scenarios, including a new intersection, an on-ramp, and a roundabout.
arXiv Detail & Related papers (2024-08-14T16:16:51Z) - IMEX-Reg: Implicit-Explicit Regularization in the Function Space for Continual Learning [17.236861687708096]
Continual learning (CL) remains one of the long-standing challenges for deep neural networks due to catastrophic forgetting of previously acquired knowledge.
Inspired by how humans learn using strong inductive biases, we propose IMEX-Reg to improve the generalization performance of experience rehearsal in CL under low buffer regimes.
arXiv Detail & Related papers (2024-04-28T12:25:09Z) - PEAR: Primitive Enabled Adaptive Relabeling for Boosting Hierarchical Reinforcement Learning [25.84621883831624]
Hierarchical reinforcement learning (HRL) has the potential to solve complex long horizon tasks using temporal abstraction and increased exploration.<n>We present primitive enabled adaptive relabeling (PEAR)<n>We first perform adaptive relabeling on a few expert demonstrations to generate efficient subgoal supervision.<n>We then jointly optimize HRL agents by employing reinforcement learning (RL) and imitation learning (IL)
arXiv Detail & Related papers (2023-06-10T09:41:30Z) - Mastering the Unsupervised Reinforcement Learning Benchmark from Pixels [112.63440666617494]
Reinforcement learning algorithms can succeed but require large amounts of interactions between the agent and the environment.
We propose a new method to solve it, using unsupervised model-based RL, for pre-training the agent.
We show robust performance on the Real-Word RL benchmark, hinting at resiliency to environment perturbations during adaptation.
arXiv Detail & Related papers (2022-09-24T14:22:29Z) - Jump-Start Reinforcement Learning [68.82380421479675]
We present a meta algorithm that can use offline data, demonstrations, or a pre-existing policy to initialize an RL policy.
In particular, we propose Jump-Start Reinforcement Learning (JSRL), an algorithm that employs two policies to solve tasks.
We show via experiments that JSRL is able to significantly outperform existing imitation and reinforcement learning algorithms.
arXiv Detail & Related papers (2022-04-05T17:25:22Z) - Contextualize Me -- The Case for Context in Reinforcement Learning [49.794253971446416]
Contextual Reinforcement Learning (cRL) provides a framework to model such changes in a principled manner.
We show how cRL contributes to improving zero-shot generalization in RL through meaningful benchmarks and structured reasoning about generalization tasks.
arXiv Detail & Related papers (2022-02-09T15:01:59Z) - Improving Zero-shot Generalization in Offline Reinforcement Learning
using Generalized Similarity Functions [34.843526573355746]
Reinforcement learning (RL) agents are widely used for solving complex sequential decision making tasks, but exhibit difficulty in generalizing to scenarios not seen during training.
We show that performance of online algorithms for generalization in RL can be hindered in the offline setting due to poor estimation of similarity between observations.
We propose a new theoretically-motivated framework called Generalized Similarity Functions (GSF), which uses contrastive learning to train an offline RL agent to aggregate observations based on the similarity of their expected future behavior.
arXiv Detail & Related papers (2021-11-29T15:42:54Z) - Cross-Trajectory Representation Learning for Zero-Shot Generalization in
RL [21.550201956884532]
generalize policies learned on a few tasks over a high-dimensional observation space to similar tasks not seen during training.
Many promising approaches to this challenge consider RL as a process of training two functions simultaneously.
We propose Cross-Trajectory Representation Learning (CTRL), a method that runs within an RL agent and conditions its encoder to recognize behavioral similarity in observations.
arXiv Detail & Related papers (2021-06-04T00:43:10Z) - Dynamics Generalization via Information Bottleneck in Deep Reinforcement
Learning [90.93035276307239]
We propose an information theoretic regularization objective and an annealing-based optimization method to achieve better generalization ability in RL agents.
We demonstrate the extreme generalization benefits of our approach in different domains ranging from maze navigation to robotic tasks.
This work provides a principled way to improve generalization in RL by gradually removing information that is redundant for task-solving.
arXiv Detail & Related papers (2020-08-03T02:24:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.