Integrating LLMs and Decision Transformers for Language Grounded
Generative Quality-Diversity
- URL: http://arxiv.org/abs/2308.13278v1
- Date: Fri, 25 Aug 2023 10:00:06 GMT
- Title: Integrating LLMs and Decision Transformers for Language Grounded
Generative Quality-Diversity
- Authors: Achkan Salehi and Stephane Doncieux
- Abstract summary: Quality-Diversity is a branch of optimization that is often applied to problems from the Reinforcement Learning and control domains.
We propose a Large Language Model to augment the repertoire with natural language descriptions of trajectories.
We also propose an LLM-based approach to evaluating the performance of such generative agents.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Quality-Diversity is a branch of stochastic optimization that is often
applied to problems from the Reinforcement Learning and control domains in
order to construct repertoires of well-performing policies/skills that exhibit
diversity with respect to a behavior space. Such archives are usually composed
of a finite number of reactive agents which are each associated to a unique
behavior descriptor, and instantiating behavior descriptors outside of that
coarsely discretized space is not straight-forward. While a few recent works
suggest solutions to that issue, the trajectory that is generated is not easily
customizable beyond the specification of a target behavior descriptor. We
propose to jointly solve those problems in environments where semantic
information about static scene elements is available by leveraging a Large
Language Model to augment the repertoire with natural language descriptions of
trajectories, and training a policy conditioned on those descriptions. Thus,
our method allows a user to not only specify an arbitrary target behavior
descriptor, but also provide the model with a high-level textual prompt to
shape the generated trajectory. We also propose an LLM-based approach to
evaluating the performance of such generative agents. Furthermore, we develop a
benchmark based on simulated robot navigation in a 2d maze that we use for
experimental validation.
Related papers
- Generative Context Distillation [48.91617280112579]
Generative Context Distillation (GCD) is a lightweight prompt internalization method that employs a joint training approach.
We demonstrate that our approach effectively internalizes complex prompts across various agent-based application scenarios.
arXiv Detail & Related papers (2024-11-24T17:32:20Z) - State- and context-dependent robotic manipulation and grasping via uncertainty-aware imitation learning [9.369039142989875]
We introduce an LfD approach to acquire context-dependent grasping and manipulation strategies.
We propose a state-dependent approach that automatically returns to demonstrations, avoiding unpredictable behavior.
The approach is evaluated against the LASA handwriting dataset and on a real 7-DoF robot.
arXiv Detail & Related papers (2024-10-31T15:32:32Z) - Enhancing adversarial robustness in Natural Language Inference using explanations [41.46494686136601]
We cast the spotlight on the underexplored task of Natural Language Inference (NLI)
We validate the usage of natural language explanation as a model-agnostic defence strategy through extensive experimentation.
We research the correlation of widely used language generation metrics with human perception, in order for them to serve as a proxy towards robust NLI models.
arXiv Detail & Related papers (2024-09-11T17:09:49Z) - Entropy-Regularized Token-Level Policy Optimization for Language Agent Reinforcement [67.1393112206885]
Large Language Models (LLMs) have shown promise as intelligent agents in interactive decision-making tasks.
We introduce Entropy-Regularized Token-level Policy Optimization (ETPO), an entropy-augmented RL method tailored for optimizing LLMs at the token level.
We assess the effectiveness of ETPO within a simulated environment that models data science code generation as a series of multi-step interactive tasks.
arXiv Detail & Related papers (2024-02-09T07:45:26Z) - HAZARD Challenge: Embodied Decision Making in Dynamically Changing
Environments [93.94020724735199]
HAZARD consists of three unexpected disaster scenarios, including fire, flood, and wind.
This benchmark enables us to evaluate autonomous agents' decision-making capabilities across various pipelines.
arXiv Detail & Related papers (2024-01-23T18:59:43Z) - Intuitive or Dependent? Investigating LLMs' Behavior Style to
Conflicting Prompts [9.399159332152013]
This study investigates the behaviors of Large Language Models (LLMs) when faced with conflicting prompts versus their internal memory.
This will help to understand LLMs' decision mechanism and also benefit real-world applications, such as retrieval-augmented generation (RAG)
arXiv Detail & Related papers (2023-09-29T17:26:03Z) - DisCLIP: Open-Vocabulary Referring Expression Generation [37.789850573203694]
We build on CLIP, a large-scale visual-semantic model, to guide an LLM to generate a contextual description of a target concept in an image.
We measure the quality of the generated text by evaluating the capability of a receiver model to accurately identify the described object within the scene.
Our results highlight the potential of using pre-trained visual-semantic models for generating high-quality contextual descriptions.
arXiv Detail & Related papers (2023-05-30T15:13:17Z) - Guiding the PLMs with Semantic Anchors as Intermediate Supervision:
Towards Interpretable Semantic Parsing [57.11806632758607]
We propose to incorporate the current pretrained language models with a hierarchical decoder network.
By taking the first-principle structures as the semantic anchors, we propose two novel intermediate supervision tasks.
We conduct intensive experiments on several semantic parsing benchmarks and demonstrate that our approach can consistently outperform the baselines.
arXiv Detail & Related papers (2022-10-04T07:27:29Z) - Object-Aware Regularization for Addressing Causal Confusion in Imitation
Learning [131.1852444489217]
This paper presents Object-aware REgularizatiOn (OREO), a technique that regularizes an imitation policy in an object-aware manner.
Our main idea is to encourage a policy to uniformly attend to all semantic objects, in order to prevent the policy from exploiting nuisance variables strongly correlated with expert actions.
arXiv Detail & Related papers (2021-10-27T01:56:23Z) - Composable Learning with Sparse Kernel Representations [110.19179439773578]
We present a reinforcement learning algorithm for learning sparse non-parametric controllers in a Reproducing Kernel Hilbert Space.
We improve the sample complexity of this approach by imposing a structure of the state-action function through a normalized advantage function.
We demonstrate the performance of this algorithm on learning obstacle-avoidance policies in multiple simulations of a robot equipped with a laser scanner while navigating in a 2D environment.
arXiv Detail & Related papers (2021-03-26T13:58:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.