Related papers: Enhancing Analogical Reasoning in the Abstraction and Reasoning Corpus via Model-Based RL

Enhancing Analogical Reasoning in the Abstraction and Reasoning Corpus via Model-Based RL

URL: http://arxiv.org/abs/2408.14855v1
Date: Tue, 27 Aug 2024 08:15:20 GMT
Title: Enhancing Analogical Reasoning in the Abstraction and Reasoning Corpus via Model-Based RL
Authors: Jihwan Lee, Woochang Sim, Sejin Kim, Sundong Kim,
Abstract summary: We show that model-based reinforcement learning is a suitable approach for the task of analogical reasoning. We compare DreamerV3, a model-based RL method, with Proximal Policy Optimization, a model-free RL method. Our results indicate that model-based RL not only outperforms model-free RL in learning and generalizing from single tasks but also shows significant advantages in reasoning across similar tasks.
Score: 6.143939145442195
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper demonstrates that model-based reinforcement learning (model-based RL) is a suitable approach for the task of analogical reasoning. We hypothesize that model-based RL can solve analogical reasoning tasks more efficiently through the creation of internal models. To test this, we compared DreamerV3, a model-based RL method, with Proximal Policy Optimization, a model-free RL method, on the Abstraction and Reasoning Corpus (ARC) tasks. Our results indicate that model-based RL not only outperforms model-free RL in learning and generalizing from single tasks but also shows significant advantages in reasoning across similar tasks.

Related papers

Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model? [67.30809748319486]
Reinforcement Learning with Verifiable Rewards (RLVR) has recently demonstrated notable success in enhancing the reasoning capabilities of LLMs. We re-examine this assumption by measuring the pass@textitk metric with large values of textitk to explore the reasoning capability boundary of the models. We find that the RL does emphnot, in fact, elicit fundamentally new reasoning patterns.
arXiv Detail & Related papers (2025-04-18T17:59:56Z)
Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining [74.83412846804977]
Reinforcement learning (RL)-based fine-tuning has become a crucial step in post-training language models. We present a systematic end-to-end study of RL fine-tuning for mathematical reasoning by training models entirely from scratch.
arXiv Detail & Related papers (2025-04-10T17:15:53Z)
An Empirical Study on Eliciting and Improving R1-like Reasoning Models [90.52239241349504]
scaling RL training has become a central technique for implementing such reasoning models. We demonstrate that our RL training approach consistently improves the Qwen2.5-32B base models. We also explore the use of tool manipulation, finding that it significantly boosts the reasoning performance of large reasoning models.
arXiv Detail & Related papers (2025-03-06T15:34:27Z)
Model-based Offline Reinforcement Learning with Lower Expectile Q-Learning [6.345851712811528]
We introduce a novel model-based offline RL method, Lower Expectile Q-learning (LEQ) LEQ provides a low-bias model-based value estimation via lower expectile regression of $lambda$-returns. Our studies demonstrate that lower expectile regression, $lambda$-returns, and critic training on offline data are all crucial for LEQ.
arXiv Detail & Related papers (2024-06-30T13:44:59Z)
Provable Reward-Agnostic Preference-Based Reinforcement Learning [61.39541986848391]
Preference-based Reinforcement Learning (PbRL) is a paradigm in which an RL agent learns to optimize a task using pair-wise preference-based feedback over trajectories. We propose a theoretical reward-agnostic PbRL framework where exploratory trajectories that enable accurate learning of hidden reward functions are acquired.
arXiv Detail & Related papers (2023-05-29T15:00:09Z)
Learning a model is paramount for sample efficiency in reinforcement learning control of PDEs [5.488334211013093]
We show that learning an actuated model in parallel to training the RL agent significantly reduces the total amount of required data sampled from the real system. We also show that iteratively updating the model is of major importance to avoid biases in the RL training.
arXiv Detail & Related papers (2023-02-14T16:14:39Z)
When to Update Your Model: Constrained Model-based Reinforcement Learning [50.74369835934703]
We propose a novel and general theoretical scheme for a non-decreasing performance guarantee of model-based RL (MBRL) Our follow-up derived bounds reveal the relationship between model shifts and performance improvement. A further example demonstrates that learning models from a dynamically-varying number of explorations benefit the eventual returns.
arXiv Detail & Related papers (2022-10-15T17:57:43Z)
Simplifying Model-based RL: Learning Representations, Latent-space Models, and Policies with One Objective [142.36200080384145]
We propose a single objective which jointly optimize a latent-space model and policy to achieve high returns while remaining self-consistent. We demonstrate that the resulting algorithm matches or improves the sample-efficiency of the best prior model-based and model-free RL methods.
arXiv Detail & Related papers (2022-09-18T03:51:58Z)
Causal Dynamics Learning for Task-Independent State Abstraction [61.707048209272884]
We introduce Causal Dynamics Learning for Task-Independent State Abstraction (CDL) CDL learns a theoretically proved causal dynamics model that removes unnecessary dependencies between state variables and the action. A state abstraction can then be derived from the learned dynamics.
arXiv Detail & Related papers (2022-06-27T17:02:53Z)
Offline Reinforcement Learning with Causal Structured World Models [9.376353239574243]
We show that causal world-models can outperform plain world-models for offline RL. We propose a practical algorithm, oFfline mOdel-based reinforcement learning with CaUsal Structure (FOCUS)
arXiv Detail & Related papers (2022-06-03T09:53:57Z)
INFOrmation Prioritization through EmPOWERment in Visual Model-Based RL [90.06845886194235]
We propose a modified objective for model-based reinforcement learning (RL) We integrate a term inspired by variational empowerment into a state-space model based on mutual information. We evaluate the approach on a suite of vision-based robot control tasks with natural video backgrounds.
arXiv Detail & Related papers (2022-04-18T23:09:23Z)
PC-MLP: Model-based Reinforcement Learning with Policy Cover Guided Exploration [15.173628100049129]
This work studies a model-based algorithm for both Kernelized Regulators (KNR) and linear Markov Decision Processes (MDPs) For both models, our algorithm guarantees sample complexity and only uses access to a planning oracle. Our method can also perform reward-free exploration efficiently.
arXiv Detail & Related papers (2021-07-15T15:49:30Z)
Reinforcement Learning as One Big Sequence Modeling Problem [84.84564880157149]
Reinforcement learning (RL) is typically concerned with estimating single-step policies or single-step models. We view RL as a sequence modeling problem, with the goal being to predict a sequence of actions that leads to a sequence of high rewards.
arXiv Detail & Related papers (2021-06-03T17:58:51Z)
Bellman: A Toolbox for Model-Based Reinforcement Learning in TensorFlow [14.422129911404472]
Bellman aims to fill this gap and introduces the first thoroughly designed and tested model-based RL toolbox. Our modular approach enables to combine a wide range of environment models with generic model-based agent classes that recover state-of-the-art algorithms.
arXiv Detail & Related papers (2021-03-26T11:32:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.