The Quality-Diversity Transformer: Generating Behavior-Conditioned
Trajectories with Decision Transformers
- URL: http://arxiv.org/abs/2303.16207v3
- Date: Wed, 13 Sep 2023 17:07:30 GMT
- Title: The Quality-Diversity Transformer: Generating Behavior-Conditioned
Trajectories with Decision Transformers
- Authors: Valentin Mac\'e, Rapha\"el Boige, Felix Chalumeau, Thomas Pierrot,
Guillaume Richard, Nicolas Perrin-Gilbert
- Abstract summary: Quality-Diversity algorithms have proven effective in generating repertoires of diverse and efficient policies.
In uncertain environments, policies can lack robustness and repeatability.
We present a new approach to achieve behavior-conditioned trajectory generation based on two mechanisms.
- Score: 3.185440619417202
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the context of neuroevolution, Quality-Diversity algorithms have proven
effective in generating repertoires of diverse and efficient policies by
relying on the definition of a behavior space. A natural goal induced by the
creation of such a repertoire is trying to achieve behaviors on demand, which
can be done by running the corresponding policy from the repertoire. However,
in uncertain environments, two problems arise. First, policies can lack
robustness and repeatability, meaning that multiple episodes under slightly
different conditions often result in very different behaviors. Second, due to
the discrete nature of the repertoire, solutions vary discontinuously. Here we
present a new approach to achieve behavior-conditioned trajectory generation
based on two mechanisms: First, MAP-Elites Low-Spread (ME-LS), which constrains
the selection of solutions to those that are the most consistent in the
behavior space. Second, the Quality-Diversity Transformer (QDT), a
Transformer-based model conditioned on continuous behavior descriptors, which
trains on a dataset generated by policies from a ME-LS repertoire and learns to
autoregressively generate sequences of actions that achieve target behaviors.
Results show that ME-LS produces consistent and robust policies, and that its
combination with the QDT yields a single policy capable of achieving diverse
behaviors on demand with high accuracy.
Related papers
- Q-value Regularized Transformer for Offline Reinforcement Learning [70.13643741130899]
We propose a Q-value regularized Transformer (QT) to enhance the state-of-the-art in offline reinforcement learning (RL)
QT learns an action-value function and integrates a term maximizing action-values into the training loss of Conditional Sequence Modeling (CSM)
Empirical evaluations on D4RL benchmark datasets demonstrate the superiority of QT over traditional DP and CSM methods.
arXiv Detail & Related papers (2024-05-27T12:12:39Z) - OIL-AD: An Anomaly Detection Framework for Sequential Decision Sequences [16.828732283348817]
We propose an unsupervised method named Offline Learning based Anomaly Detection (OIL-AD)
OIL-AD detects anomalies in decision-making sequences using two extracted behaviour features: action optimality and sequential association.
Our experiments show that OIL-AD can achieve outstanding online anomaly detection performance with up to 34.8% improvement in F1 score over comparable baselines.
arXiv Detail & Related papers (2024-02-07T04:06:53Z) - Time-series Generation by Contrastive Imitation [87.51882102248395]
We study a generative framework that seeks to combine the strengths of both: Motivated by a moment-matching objective to mitigate compounding error, we optimize a local (but forward-looking) transition policy.
At inference, the learned policy serves as the generator for iterative sampling, and the learned energy serves as a trajectory-level measure for evaluating sample quality.
arXiv Detail & Related papers (2023-11-02T16:45:25Z) - Integrating LLMs and Decision Transformers for Language Grounded
Generative Quality-Diversity [0.0]
Quality-Diversity is a branch of optimization that is often applied to problems from the Reinforcement Learning and control domains.
We propose a Large Language Model to augment the repertoire with natural language descriptions of trajectories.
We also propose an LLM-based approach to evaluating the performance of such generative agents.
arXiv Detail & Related papers (2023-08-25T10:00:06Z) - Dichotomy of Control: Separating What You Can Control from What You
Cannot [129.62135987416164]
We propose a future-conditioned supervised learning framework that separates mechanisms within a policy's control (actions) from those beyond a policy's control (environmentity)
We show that DoC yields policies that are consistent with their conditioning inputs, ensuring that conditioning a learned policy on a desired high-return future outcome will correctly induce high-return behavior.
arXiv Detail & Related papers (2022-10-24T17:49:56Z) - Addressing Optimism Bias in Sequence Modeling for Reinforcement Learning [5.09191791549438]
Recent works have achieved state-of-the-art results in several of the mostly deterministic offline Atari and D4RL benchmarks.
We propose a method that addresses this optimism bias by explicitly disentangling the policy and world models.
We demonstrate our method's superior performance on a variety of autonomous driving tasks in simulation.
arXiv Detail & Related papers (2022-07-21T04:12:48Z) - Training and Evaluation of Deep Policies using Reinforcement Learning
and Generative Models [67.78935378952146]
GenRL is a framework for solving sequential decision-making problems.
It exploits the combination of reinforcement learning and latent variable generative models.
We experimentally determine the characteristics of generative models that have most influence on the performance of the final policy training.
arXiv Detail & Related papers (2022-04-18T22:02:32Z) - Learning Robust Policy against Disturbance in Transition Dynamics via
State-Conservative Policy Optimization [63.75188254377202]
Deep reinforcement learning algorithms can perform poorly in real-world tasks due to discrepancy between source and target environments.
We propose a novel model-free actor-critic algorithm to learn robust policies without modeling the disturbance in advance.
Experiments in several robot control tasks demonstrate that SCPO learns robust policies against the disturbance in transition dynamics.
arXiv Detail & Related papers (2021-12-20T13:13:05Z) - A New Representation of Successor Features for Transfer across
Dissimilar Environments [60.813074750879615]
Many real-world RL problems require transfer among environments with different dynamics.
We propose an approach based on successor features in which we model successor feature functions with Gaussian Processes.
Our theoretical analysis proves the convergence of this approach as well as the bounded error on modelling successor feature functions.
arXiv Detail & Related papers (2021-07-18T12:37:05Z) - QED: using Quality-Environment-Diversity to evolve resilient robot
swarms [12.18340575383456]
In swarm robotics, any of the robots in a swarm may be affected by different faults, resulting in significant performance declines.
One model-free approach to fault recovery involves two phases: during simulation, a quality-diversity algorithm evolves a behaviourally diverse archive of controllers.
The impact of environmental diversity is often ignored in the choice of a suitable behavioural descriptor.
arXiv Detail & Related papers (2020-03-04T21:36:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.