Related papers: Quality-Diversity through AI Feedback

Quality-Diversity through AI Feedback

URL: http://arxiv.org/abs/2310.13032v4
Date: Thu, 7 Dec 2023 19:56:21 GMT
Title: Quality-Diversity through AI Feedback
Authors: Herbie Bradley, Andrew Dai, Hannah Teufel, Jenny Zhang, Koen Oostermeijer, Marco Bellagente, Jeff Clune, Kenneth Stanley, Gr\'egory Schott, Joel Lehman
Abstract summary: Quality-diversity (QD) search algorithms aim at continually improving and diversifying a population of candidates. Recent developments in language models (LMs) have enabled guiding search through AI feedback. QDAIF is a step towards AI systems that can independently search, diversify, evaluate, and improve.
Score: 10.423093353553217
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In many text-generation problems, users may prefer not only a single response, but a diverse range of high-quality outputs from which to choose. Quality-diversity (QD) search algorithms aim at such outcomes, by continually improving and diversifying a population of candidates. However, the applicability of QD to qualitative domains, like creative writing, has been limited by the difficulty of algorithmically specifying measures of quality and diversity. Interestingly, recent developments in language models (LMs) have enabled guiding search through AI feedback, wherein LMs are prompted in natural language to evaluate qualitative aspects of text. Leveraging this development, we introduce Quality-Diversity through AI Feedback (QDAIF), wherein an evolutionary algorithm applies LMs to both generate variation and evaluate the quality and diversity of candidate text. When assessed on creative writing domains, QDAIF covers more of a specified search space with high-quality samples than do non-QD controls. Further, human evaluation of QDAIF-generated creative texts validates reasonable agreement between AI and human evaluation. Our results thus highlight the potential of AI feedback to guide open-ended search for creative and original solutions, providing a recipe that seemingly generalizes to many domains and modalities. In this way, QDAIF is a step towards AI systems that can independently search, diversify, evaluate, and improve, which are among the core skills underlying human society's capacity for innovation.

Related papers

The AI Imperative: Scaling High-Quality Peer Review in Machine Learning [49.87236114682497]
We argue that AI-assisted peer review must become an urgent research and infrastructure priority.<n>We propose specific roles for AI in enhancing factual verification, guiding reviewer performance, assisting authors in quality improvement, and supporting ACs in decision-making.
arXiv Detail & Related papers (2025-06-09T18:37:14Z)
What Makes a Good Natural Language Prompt? [72.3282960118995]
We conduct a meta-analysis surveying more than 150 prompting-related papers from leading NLP and AI conferences from 2022 to 2025.<n>We propose a property- and human-centric framework for evaluating prompt quality, encompassing 21 properties categorized into six dimensions.<n>We then empirically explore multi-property prompt enhancements in reasoning tasks, observing that single-property enhancements often have the greatest impact.
arXiv Detail & Related papers (2025-06-07T23:19:27Z)
On Benchmarking Human-Like Intelligence in Machines [77.55118048492021]
We argue that current AI evaluation paradigms are insufficient for assessing human-like cognitive capabilities. We identify a set of key shortcomings: a lack of human-validated labels, inadequate representation of human response variability and uncertainty, and reliance on simplified and ecologically-invalid tasks.
arXiv Detail & Related papers (2025-02-27T20:21:36Z)
Validity Arguments For Constructed Response Scoring Using Generative Artificial Intelligence Applications [0.0]
generative AI is particularly appealing because it reduces the effort required for handcrafting features in traditional AI scoring. We compare the validity evidence needed in scoring systems using human ratings, feature-based natural language processing AI scoring engines, and generative AI.
arXiv Detail & Related papers (2025-01-04T16:59:29Z)
AI-generated Image Quality Assessment in Visual Communication [72.11144790293086]
AIGI-VC is a quality assessment database for AI-generated images in visual communication. The dataset consists of 2,500 images spanning 14 advertisement topics and 8 emotion types. It provides coarse-grained human preference annotations and fine-grained preference descriptions, benchmarking the abilities of IQA methods in preference prediction, interpretation, and reasoning.
arXiv Detail & Related papers (2024-12-20T08:47:07Z)
AI-Generated Image Quality Assessment Based on Task-Specific Prompt and Multi-Granularity Similarity [62.00987205438436]
We propose a novel quality assessment method for AIGIs named TSP-MGS. It designs task-specific prompts and measures multi-granularity similarity between AIGIs and the prompts. Experiments on the commonly used AGIQA-1K and AGIQA-3K benchmarks demonstrate the superiority of the proposed TSP-MGS.
arXiv Detail & Related papers (2024-11-25T04:47:53Z)
The Future of Learning in the Age of Generative AI: Automated Question Generation and Assessment with Large Language Models [0.0]
Large language models (LLMs) and generative AI have revolutionized natural language processing (NLP) This chapter explores the transformative potential of LLMs in automated question generation and answer assessment.
arXiv Detail & Related papers (2024-10-12T15:54:53Z)
Do great minds think alike? Investigating Human-AI Complementarity in Question Answering with CAIMIRA [43.116608441891096]
Humans outperform AI systems in knowledge-grounded abductive and conceptual reasoning. State-of-the-art LLMs like GPT-4 and LLaMA show superior performance on targeted information retrieval.
arXiv Detail & Related papers (2024-10-09T03:53:26Z)
Decoding AI and Human Authorship: Nuances Revealed Through NLP and Statistical Analysis [0.0]
This research explores the nuanced differences in texts produced by AI and those written by humans. The study investigates various linguistic traits, patterns of creativity, and potential biases inherent in human-written and AI- generated texts.
arXiv Detail & Related papers (2024-07-15T18:09:03Z)
Understanding and Evaluating Human Preferences for AI Generated Images with Instruction Tuning [58.41087653543607]
We first establish a novel Image Quality Assessment (IQA) database for AIGIs, termed AIGCIQA2023+. This paper presents a MINT-IQA model to evaluate and explain human preferences for AIGIs from Multi-perspectives with INstruction Tuning.
arXiv Detail & Related papers (2024-05-12T17:45:11Z)
Large Language Models as In-context AI Generators for Quality-Diversity [8.585387103144825]
In-context QD aims to generate interesting solutions using few-shot and many-shot prompting with quality-diverse examples from the QD archive as context. In-context QD displays promising results compared to both QD baselines and similar strategies developed for single-objective optimization.
arXiv Detail & Related papers (2024-04-24T10:35:36Z)
Quality Diversity through Human Feedback: Towards Open-Ended Diversity-Driven Optimization [13.436983663467938]
This paper introduces Quality Diversity through Human Feedback (QDHF), a novel approach that progressively infers diversity metrics from human judgments of similarity among solutions. Empirical studies show that QDHF significantly outperforms state-of-the-art methods in automatic diversity discovery. In open-ended generative tasks, QDHF substantially enhances the diversity of text-to-image generation from a diffusion model.
arXiv Detail & Related papers (2023-10-18T16:46:16Z)
ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate [57.71597869337909]
We build a multi-agent referee team called ChatEval to autonomously discuss and evaluate the quality of generated responses from different models. Our analysis shows that ChatEval transcends mere textual scoring, offering a human-mimicking evaluation process for reliable assessments.
arXiv Detail & Related papers (2023-08-14T15:13:04Z)
Towards Robust Text-Prompted Semantic Criterion for In-the-Wild Video Quality Assessment [54.31355080688127]
We introduce a text-prompted Semantic Affinity Quality Index (SAQI) and its localized version (SAQI-Local) using Contrastive Language-Image Pre-training (CLIP) BVQI-Local demonstrates unprecedented performance, surpassing existing zero-shot indices by at least 24% on all datasets. We conduct comprehensive analyses to investigate different quality concerns of distinct indices, demonstrating the effectiveness and rationality of our design.
arXiv Detail & Related papers (2023-04-28T08:06:05Z)
The Role of AI in Drug Discovery: Challenges, Opportunities, and Strategies [97.5153823429076]
The benefits, challenges and drawbacks of AI in this field are reviewed. The use of data augmentation, explainable AI, and the integration of AI with traditional experimental methods are also discussed.
arXiv Detail & Related papers (2022-12-08T23:23:39Z)
Improving the Question Answering Quality using Answer Candidate Filtering based on Natural-Language Features [117.44028458220427]
We address the problem of how the Question Answering (QA) quality of a given system can be improved. Our main contribution is an approach capable of identifying wrong answers provided by a QA system. In particular, our approach has shown its potential while removing in many cases the majority of incorrect answers.
arXiv Detail & Related papers (2021-12-10T11:09:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.