Related papers: The Quality-Diversity Transformer: Generating Behavior-Conditioned Trajectories with Decision Transformers

The Quality-Diversity Transformer: Generating Behavior-Conditioned Trajectories with Decision Transformers

URL: http://arxiv.org/abs/2303.16207v3
Date: Wed, 13 Sep 2023 17:07:30 GMT
Title: The Quality-Diversity Transformer: Generating Behavior-Conditioned Trajectories with Decision Transformers
Authors: Valentin Mac\'e, Rapha\"el Boige, Felix Chalumeau, Thomas Pierrot, Guillaume Richard, Nicolas Perrin-Gilbert
Abstract summary: Quality-Diversity algorithms have proven effective in generating repertoires of diverse and efficient policies. In uncertain environments, policies can lack robustness and repeatability. We present a new approach to achieve behavior-conditioned trajectory generation based on two mechanisms.
Score: 3.185440619417202
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In the context of neuroevolution, Quality-Diversity algorithms have proven effective in generating repertoires of diverse and efficient policies by relying on the definition of a behavior space. A natural goal induced by the creation of such a repertoire is trying to achieve behaviors on demand, which can be done by running the corresponding policy from the repertoire. However, in uncertain environments, two problems arise. First, policies can lack robustness and repeatability, meaning that multiple episodes under slightly different conditions often result in very different behaviors. Second, due to the discrete nature of the repertoire, solutions vary discontinuously. Here we present a new approach to achieve behavior-conditioned trajectory generation based on two mechanisms: First, MAP-Elites Low-Spread (ME-LS), which constrains the selection of solutions to those that are the most consistent in the behavior space. Second, the Quality-Diversity Transformer (QDT), a Transformer-based model conditioned on continuous behavior descriptors, which trains on a dataset generated by policies from a ME-LS repertoire and learns to autoregressively generate sequences of actions that achieve target behaviors. Results show that ME-LS produces consistent and robust policies, and that its combination with the QDT yields a single policy capable of achieving diverse behaviors on demand with high accuracy.

Related papers

Offline Learning of Controllable Diverse Behaviors [19.0544729496907]
Imitation Learning (IL) techniques aim to replicate human behaviors in specific tasks. We propose a new method based on temporal consistency and controllability. We compare our approach to state-of-the-art methods over a diverse set of tasks and environments.
arXiv Detail & Related papers (2025-04-25T08:16:56Z)
Zero-Shot Whole-Body Humanoid Control via Behavioral Foundation Models [71.34520793462069]
Unsupervised reinforcement learning (RL) aims at pre-training agents that can solve a wide range of downstream tasks in complex environments. We introduce a novel algorithm regularizing unsupervised RL towards imitating trajectories from unlabeled behavior datasets. We demonstrate the effectiveness of this new approach in a challenging humanoid control problem.
arXiv Detail & Related papers (2025-04-15T10:41:11Z)
Q-value Regularized Transformer for Offline Reinforcement Learning [70.13643741130899]
We propose a Q-value regularized Transformer (QT) to enhance the state-of-the-art in offline reinforcement learning (RL) QT learns an action-value function and integrates a term maximizing action-values into the training loss of Conditional Sequence Modeling (CSM) Empirical evaluations on D4RL benchmark datasets demonstrate the superiority of QT over traditional DP and CSM methods.
arXiv Detail & Related papers (2024-05-27T12:12:39Z)
OIL-AD: An Anomaly Detection Framework for Sequential Decision Sequences [16.828732283348817]
We propose an unsupervised method named Offline Learning based Anomaly Detection (OIL-AD) OIL-AD detects anomalies in decision-making sequences using two extracted behaviour features: action optimality and sequential association. Our experiments show that OIL-AD can achieve outstanding online anomaly detection performance with up to 34.8% improvement in F1 score over comparable baselines.
arXiv Detail & Related papers (2024-02-07T04:06:53Z)
Time-series Generation by Contrastive Imitation [87.51882102248395]
We study a generative framework that seeks to combine the strengths of both: Motivated by a moment-matching objective to mitigate compounding error, we optimize a local (but forward-looking) transition policy. At inference, the learned policy serves as the generator for iterative sampling, and the learned energy serves as a trajectory-level measure for evaluating sample quality.
arXiv Detail & Related papers (2023-11-02T16:45:25Z)
Integrating LLMs and Decision Transformers for Language Grounded Generative Quality-Diversity [0.0]
Quality-Diversity is a branch of optimization that is often applied to problems from the Reinforcement Learning and control domains. We propose a Large Language Model to augment the repertoire with natural language descriptions of trajectories. We also propose an LLM-based approach to evaluating the performance of such generative agents.
arXiv Detail & Related papers (2023-08-25T10:00:06Z)
Dichotomy of Control: Separating What You Can Control from What You Cannot [129.62135987416164]
We propose a future-conditioned supervised learning framework that separates mechanisms within a policy's control (actions) from those beyond a policy's control (environmentity) We show that DoC yields policies that are consistent with their conditioning inputs, ensuring that conditioning a learned policy on a desired high-return future outcome will correctly induce high-return behavior.
arXiv Detail & Related papers (2022-10-24T17:49:56Z)
Addressing Optimism Bias in Sequence Modeling for Reinforcement Learning [5.09191791549438]
Recent works have achieved state-of-the-art results in several of the mostly deterministic offline Atari and D4RL benchmarks. We propose a method that addresses this optimism bias by explicitly disentangling the policy and world models. We demonstrate our method's superior performance on a variety of autonomous driving tasks in simulation.
arXiv Detail & Related papers (2022-07-21T04:12:48Z)
Training and Evaluation of Deep Policies using Reinforcement Learning and Generative Models [67.78935378952146]
GenRL is a framework for solving sequential decision-making problems. It exploits the combination of reinforcement learning and latent variable generative models. We experimentally determine the characteristics of generative models that have most influence on the performance of the final policy training.
arXiv Detail & Related papers (2022-04-18T22:02:32Z)
Learning Robust Policy against Disturbance in Transition Dynamics via State-Conservative Policy Optimization [63.75188254377202]
Deep reinforcement learning algorithms can perform poorly in real-world tasks due to discrepancy between source and target environments. We propose a novel model-free actor-critic algorithm to learn robust policies without modeling the disturbance in advance. Experiments in several robot control tasks demonstrate that SCPO learns robust policies against the disturbance in transition dynamics.
arXiv Detail & Related papers (2021-12-20T13:13:05Z)
A New Representation of Successor Features for Transfer across Dissimilar Environments [60.813074750879615]
Many real-world RL problems require transfer among environments with different dynamics. We propose an approach based on successor features in which we model successor feature functions with Gaussian Processes. Our theoretical analysis proves the convergence of this approach as well as the bounded error on modelling successor feature functions.
arXiv Detail & Related papers (2021-07-18T12:37:05Z)
QED: using Quality-Environment-Diversity to evolve resilient robot swarms [12.18340575383456]
In swarm robotics, any of the robots in a swarm may be affected by different faults, resulting in significant performance declines. One model-free approach to fault recovery involves two phases: during simulation, a quality-diversity algorithm evolves a behaviourally diverse archive of controllers. The impact of environmental diversity is often ignored in the choice of a suitable behavioural descriptor.
arXiv Detail & Related papers (2020-03-04T21:36:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.