Decision Stacks: Flexible Reinforcement Learning via Modular Generative
Models
- URL: http://arxiv.org/abs/2306.06253v2
- Date: Sun, 29 Oct 2023 21:48:34 GMT
- Title: Decision Stacks: Flexible Reinforcement Learning via Modular Generative
Models
- Authors: Siyan Zhao and Aditya Grover
- Abstract summary: Decision Stacks is a generative framework that decomposes goal-conditioned policy agents into 3 generative modules.
These modules simulate the temporal evolution of observations, rewards, and actions via independent generative models that can be learned in parallel via teacher forcing.
Our framework guarantees both expressivity and flexibility in designing individual modules to account for key factors such as architectural bias, optimization objective and dynamics, transferrability across domains, and inference speed.
- Score: 37.79386205079626
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Reinforcement learning presents an attractive paradigm to reason about
several distinct aspects of sequential decision making, such as specifying
complex goals, planning future observations and actions, and critiquing their
utilities. However, the combined integration of these capabilities poses
competing algorithmic challenges in retaining maximal expressivity while
allowing for flexibility in modeling choices for efficient learning and
inference. We present Decision Stacks, a generative framework that decomposes
goal-conditioned policy agents into 3 generative modules. These modules
simulate the temporal evolution of observations, rewards, and actions via
independent generative models that can be learned in parallel via teacher
forcing. Our framework guarantees both expressivity and flexibility in
designing individual modules to account for key factors such as architectural
bias, optimization objective and dynamics, transferrability across domains, and
inference speed. Our empirical results demonstrate the effectiveness of
Decision Stacks for offline policy optimization for several MDP and POMDP
environments, outperforming existing methods and enabling flexible generative
decision making.
Related papers
- Closed-form merging of parameter-efficient modules for Federated Continual Learning [9.940242741914748]
We introduce LoRM, an alternating optimization strategy that trains one LoRA matrix at a time.
This allows solving for each unknown variable individually, thus finding a unique solution.
Our method demonstrates state-of-the-art performance across a range of FCIL scenarios.
arXiv Detail & Related papers (2024-10-23T15:30:13Z) - On the Modeling Capabilities of Large Language Models for Sequential Decision Making [52.128546842746246]
Large pretrained models are showing increasingly better performance in reasoning and planning tasks.
We evaluate their ability to produce decision-making policies, either directly, by generating actions, or indirectly.
In environments with unfamiliar dynamics, we explore how fine-tuning LLMs with synthetic data can significantly improve their reward modeling capabilities.
arXiv Detail & Related papers (2024-10-08T03:12:57Z) - Efficient Adaptation in Mixed-Motive Environments via Hierarchical Opponent Modeling and Planning [51.52387511006586]
We propose Hierarchical Opponent modeling and Planning (HOP), a novel multi-agent decision-making algorithm.
HOP is hierarchically composed of two modules: an opponent modeling module that infers others' goals and learns corresponding goal-conditioned policies.
HOP exhibits superior few-shot adaptation capabilities when interacting with various unseen agents, and excels in self-play scenarios.
arXiv Detail & Related papers (2024-06-12T08:48:06Z) - Entropy-Regularized Token-Level Policy Optimization for Language Agent Reinforcement [67.1393112206885]
Large Language Models (LLMs) have shown promise as intelligent agents in interactive decision-making tasks.
We introduce Entropy-Regularized Token-level Policy Optimization (ETPO), an entropy-augmented RL method tailored for optimizing LLMs at the token level.
We assess the effectiveness of ETPO within a simulated environment that models data science code generation as a series of multi-step interactive tasks.
arXiv Detail & Related papers (2024-02-09T07:45:26Z) - Attitudes and Latent Class Choice Models using Machine learning [0.0]
We present a method of efficiently incorporating attitudinal indicators in the specification of Latent Class Choice Models (LCCM)
This formulation overcomes structural equations in its capability of exploring the relationship between the attitudinal indicators and the decision choice.
We test our proposed framework for estimating a Car-Sharing (CS) service subscription choice with stated preference data from Copenhagen, Denmark.
arXiv Detail & Related papers (2023-02-20T10:03:01Z) - Latent Variable Representation for Reinforcement Learning [131.03944557979725]
It remains unclear theoretically and empirically how latent variable models may facilitate learning, planning, and exploration to improve the sample efficiency of model-based reinforcement learning.
We provide a representation view of the latent variable models for state-action value functions, which allows both tractable variational learning algorithm and effective implementation of the optimism/pessimism principle.
In particular, we propose a computationally efficient planning algorithm with UCB exploration by incorporating kernel embeddings of latent variable models.
arXiv Detail & Related papers (2022-12-17T00:26:31Z) - Revisiting GANs by Best-Response Constraint: Perspective, Methodology,
and Application [49.66088514485446]
Best-Response Constraint (BRC) is a general learning framework to explicitly formulate the potential dependency of the generator on the discriminator.
We show that even with different motivations and formulations, a variety of existing GANs ALL can be uniformly improved by our flexible BRC methodology.
arXiv Detail & Related papers (2022-05-20T12:42:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.