Quality Diversity through Human Feedback: Towards Open-Ended Diversity-Driven Optimization
- URL: http://arxiv.org/abs/2310.12103v3
- Date: Tue, 4 Jun 2024 08:39:33 GMT
- Title: Quality Diversity through Human Feedback: Towards Open-Ended Diversity-Driven Optimization
- Authors: Li Ding, Jenny Zhang, Jeff Clune, Lee Spector, Joel Lehman,
- Abstract summary: This paper introduces Quality Diversity through Human Feedback (QDHF), a novel approach that progressively infers diversity metrics from human judgments of similarity among solutions.
Empirical studies show that QDHF significantly outperforms state-of-the-art methods in automatic diversity discovery.
In open-ended generative tasks, QDHF substantially enhances the diversity of text-to-image generation from a diffusion model.
- Score: 13.436983663467938
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reinforcement Learning from Human Feedback (RLHF) has shown potential in qualitative tasks where easily defined performance measures are lacking. However, there are drawbacks when RLHF is commonly used to optimize for average human preferences, especially in generative tasks that demand diverse model responses. Meanwhile, Quality Diversity (QD) algorithms excel at identifying diverse and high-quality solutions but often rely on manually crafted diversity metrics. This paper introduces Quality Diversity through Human Feedback (QDHF), a novel approach that progressively infers diversity metrics from human judgments of similarity among solutions, thereby enhancing the applicability and effectiveness of QD algorithms in complex and open-ended domains. Empirical studies show that QDHF significantly outperforms state-of-the-art methods in automatic diversity discovery and matches the efficacy of QD with manually crafted diversity metrics on standard benchmarks in robotics and reinforcement learning. Notably, in open-ended generative tasks, QDHF substantially enhances the diversity of text-to-image generation from a diffusion model and is more favorably received in user studies. We conclude by analyzing QDHF's scalability, robustness, and quality of derived diversity metrics, emphasizing its strength in open-ended optimization tasks. Code and tutorials are available at https://liding.info/qdhf.
Related papers
- Explicit Diversity Conditions for Effective Question Answer Generation with Large Language Models [37.641872859809844]
Question Answer Generation (QAG) is an effective data augmentation technique to improve the accuracy of question answering systems.
We present explicit diversity conditions for QAG, focusing on spatial aspects, question types, and entities.
In particular, generated QA pairs from explicit diversity conditions when used to train the downstream QA model results in an average 4.1% exact match and 4.5% F1 improvement over QAG.
arXiv Detail & Related papers (2024-06-26T00:12:08Z) - Large Language Models as In-context AI Generators for Quality-Diversity [8.585387103144825]
In-context QD aims to generate interesting solutions using few-shot and many-shot prompting with quality-diverse examples from the QD archive as context.
In-context QD displays promising results compared to both QD baselines and similar strategies developed for single-objective optimization.
arXiv Detail & Related papers (2024-04-24T10:35:36Z) - Multi-Modal Prompt Learning on Blind Image Quality Assessment [65.0676908930946]
Image Quality Assessment (IQA) models benefit significantly from semantic information, which allows them to treat different types of objects distinctly.
Traditional methods, hindered by a lack of sufficiently annotated data, have employed the CLIP image-text pretraining model as their backbone to gain semantic awareness.
Recent approaches have attempted to address this mismatch using prompt technology, but these solutions have shortcomings.
This paper introduces an innovative multi-modal prompt-based methodology for IQA.
arXiv Detail & Related papers (2024-04-23T11:45:32Z) - MaxMin-RLHF: Towards Equitable Alignment of Large Language Models with
Diverse Human Preferences [101.57443597426374]
Reinforcement Learning from Human Feedback (RLHF) aligns language models to human preferences by employing a singular reward model derived from preference data.
We learn a mixture of preference distributions via an expectation-maximization algorithm to better represent diverse human preferences.
Our algorithm achieves an average improvement of more than 16% in win-rates over conventional RLHF algorithms.
arXiv Detail & Related papers (2024-02-14T03:56:27Z) - Diversify Question Generation with Retrieval-Augmented Style Transfer [68.00794669873196]
We propose RAST, a framework for Retrieval-Augmented Style Transfer.
The objective is to utilize the style of diverse templates for question generation.
We develop a novel Reinforcement Learning (RL) based approach that maximizes a weighted combination of diversity reward and consistency reward.
arXiv Detail & Related papers (2023-10-23T02:27:31Z) - Quality-Diversity through AI Feedback [10.423093353553217]
Quality-diversity (QD) search algorithms aim at continually improving and diversifying a population of candidates.
Recent developments in language models (LMs) have enabled guiding search through AI feedback.
QDAIF is a step towards AI systems that can independently search, diversify, evaluate, and improve.
arXiv Detail & Related papers (2023-10-19T12:13:58Z) - Efficient Quality-Diversity Optimization through Diverse Quality Species [3.428706362109921]
We show that a diverse population of solutions can be found without the limitation of needing an archive or defining the range of behaviors in advance.
We propose Diverse Quality Species (DQS) as an alternative to archive-based Quality-Diversity (QD) algorithms.
arXiv Detail & Related papers (2023-04-14T23:15:51Z) - Source-free Domain Adaptation Requires Penalized Diversity [60.04618512479438]
Source-free domain adaptation (SFDA) was introduced to address knowledge transfer between different domains in the absence of source data.
In unsupervised SFDA, the diversity is limited to learning a single hypothesis on the source or learning multiple hypotheses with a shared feature extractor.
We propose a novel unsupervised SFDA algorithm that promotes representational diversity through the use of separate feature extractors.
arXiv Detail & Related papers (2023-04-06T00:20:19Z) - Uncertainty-Driven Action Quality Assessment [67.20617610820857]
We propose a novel probabilistic model, named Uncertainty-Driven AQA (UD-AQA), to capture the diversity among multiple judge scores.
We generate the estimation of uncertainty for each prediction, which is employed to re-weight AQA regression loss.
Our proposed method achieves competitive results on three benchmarks including the Olympic events MTL-AQA and FineDiving, and the surgical skill JIGSAWS datasets.
arXiv Detail & Related papers (2022-07-29T07:21:15Z) - Learning to Walk Autonomously via Reset-Free Quality-Diversity [73.08073762433376]
Quality-Diversity algorithms can discover large and complex behavioural repertoires consisting of both diverse and high-performing skills.
Existing QD algorithms need large numbers of evaluations as well as episodic resets, which require manual human supervision and interventions.
This paper proposes Reset-Free Quality-Diversity optimization (RF-QD) as a step towards autonomous learning for robotics in open-ended environments.
arXiv Detail & Related papers (2022-04-07T14:07:51Z) - Diversity Policy Gradient for Sample Efficient Quality-Diversity
Optimization [7.8499505363825755]
Aiming for diversity in addition to performance is a convenient way to deal with the exploration-exploitation trade-off.
This paper proposes a novel algorithm, QDPG, which combines the strength of Policy Gradient algorithms and Quality Diversity approaches.
arXiv Detail & Related papers (2020-06-15T16:04:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.