Efficiently Generating Expressive Quadruped Behaviors via Language-Guided Preference Learning
- URL: http://arxiv.org/abs/2502.03717v1
- Date: Thu, 06 Feb 2025 02:07:18 GMT
- Title: Efficiently Generating Expressive Quadruped Behaviors via Language-Guided Preference Learning
- Authors: Jaden Clark, Joey Hejna, Dorsa Sadigh,
- Abstract summary: Expressive robotic behavior is essential for the widespread acceptance of robots in social environments.
Current methods either rely on natural language input, which is efficient but low-resolution, or learn from human preferences, which, although high-resolution, is sample inefficient.
This paper introduces a novel approach that leverages priors generated by pre-trained Language-Guided Preference Learning (LGPL)
Our method, termed Language-Guided Preference Learning (LGPL), uses LLMs to generate initial behavior samples, which are then refined through preference-based feedback to learn behaviors that closely align with human expectations.
- Score: 25.841585208296998
- License:
- Abstract: Expressive robotic behavior is essential for the widespread acceptance of robots in social environments. Recent advancements in learned legged locomotion controllers have enabled more dynamic and versatile robot behaviors. However, determining the optimal behavior for interactions with different users across varied scenarios remains a challenge. Current methods either rely on natural language input, which is efficient but low-resolution, or learn from human preferences, which, although high-resolution, is sample inefficient. This paper introduces a novel approach that leverages priors generated by pre-trained LLMs alongside the precision of preference learning. Our method, termed Language-Guided Preference Learning (LGPL), uses LLMs to generate initial behavior samples, which are then refined through preference-based feedback to learn behaviors that closely align with human expectations. Our core insight is that LLMs can guide the sampling process for preference learning, leading to a substantial improvement in sample efficiency. We demonstrate that LGPL can quickly learn accurate and expressive behaviors with as few as four queries, outperforming both purely language-parameterized models and traditional preference learning approaches. Website with videos: https://lgpl-gaits.github.io/
Related papers
- MAPLE: A Framework for Active Preference Learning Guided by Large Language Models [9.37268652939886]
We introduce MAPLE, a framework for large language model-guided Bayesian active preference learning.
Our results demonstrate that MAPLE accelerates the learning process and effectively improves humans' ability to answer queries.
arXiv Detail & Related papers (2024-12-10T05:55:14Z) - Large Language Models are In-context Preference Learners [15.84585737510038]
We show that Large Language Models (LLMs) have native preference-learning capabilities that allow them to achieve sample-efficient preference learning.
We propose In-Context Preference Learning (ICPL), which uses in-context learning capabilities of LLMs to reduce human query inefficiency.
arXiv Detail & Related papers (2024-10-22T17:53:34Z) - MetaAlign: Align Large Language Models with Diverse Preferences during Inference Time [50.41806216615488]
Large Language Models (LLMs) acquire extensive knowledge and remarkable abilities from extensive text corpora.
To make LLMs more usable, aligning them with human preferences is essential.
We propose an effective method, textbf MetaAlign, which aims to help LLMs dynamically align with various explicit or implicit preferences specified at inference time.
arXiv Detail & Related papers (2024-10-18T05:31:13Z) - Verbalized Machine Learning: Revisiting Machine Learning with Language Models [63.10391314749408]
We introduce the framework of verbalized machine learning (VML)
VML constrains the parameter space to be human-interpretable natural language.
We empirically verify the effectiveness of VML, and hope that VML can serve as a stepping stone to stronger interpretability.
arXiv Detail & Related papers (2024-06-06T17:59:56Z) - Batch Active Learning of Reward Functions from Human Preferences [33.39413552270375]
Preference-based learning enables reliable labeling by querying users with preference questions.
Active querying methods are commonly employed in preference-based learning to generate more informative data.
We develop a set of novel algorithms that enable efficient learning of reward functions using as few data samples as possible.
arXiv Detail & Related papers (2024-02-24T08:07:48Z) - Active Preference Learning for Large Language Models [12.093302163058436]
We develop an active learning strategy for DPO to make better use of preference labels.
We propose a practical acquisition function for prompt/completion pairs based on the predictive entropy of the language model.
We demonstrate how our approach improves both the rate of learning and final performance of fine-tuning on pairwise preference data.
arXiv Detail & Related papers (2024-02-12T23:09:00Z) - Aligning Language Models with Offline Learning from Human Feedback [5.539080592071948]
We propose an offline learning from human feedback framework to align language models without interacting with environments.
Specifically, we explore filtering alignment (FA), reward-weighted regression (RWR), and conditional alignment (CA) to align language models to human preferences.
arXiv Detail & Related papers (2023-08-23T10:41:07Z) - Pre-Trained Language Models for Interactive Decision-Making [72.77825666035203]
We describe a framework for imitation learning in which goals and observations are represented as a sequence of embeddings.
We demonstrate that this framework enables effective generalization across different environments.
For test tasks involving novel goals or novel scenes, initializing policies with language models improves task completion rates by 43.6%.
arXiv Detail & Related papers (2022-02-03T18:55:52Z) - Recall and Learn: Fine-tuning Deep Pretrained Language Models with Less
Forgetting [66.45372974713189]
We propose a recall and learn mechanism, which adopts the idea of multi-task learning and jointly learns pretraining tasks and downstream tasks.
Experiments show that our method achieves state-of-the-art performance on the GLUE benchmark.
We provide open-source RecAdam, which integrates the proposed mechanisms into Adam to facility the NLP community.
arXiv Detail & Related papers (2020-04-27T08:59:57Z) - Pre-training Text Representations as Meta Learning [113.3361289756749]
We introduce a learning algorithm which directly optimize model's ability to learn text representations for effective learning of downstream tasks.
We show that there is an intrinsic connection between multi-task pre-training and model-agnostic meta-learning with a sequence of meta-train steps.
arXiv Detail & Related papers (2020-04-12T09:05:47Z) - On the interaction between supervision and self-play in emergent
communication [82.290338507106]
We investigate the relationship between two categories of learning signals with the ultimate goal of improving sample efficiency.
We find that first training agents via supervised learning on human data followed by self-play outperforms the converse.
arXiv Detail & Related papers (2020-02-04T02:35:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.