Related papers: Improving Diversity of Demographic Representation in Large Language Models via Collective-Critiques and Self-Voting

Improving Diversity of Demographic Representation in Large Language Models via Collective-Critiques and Self-Voting

URL: http://arxiv.org/abs/2310.16523v1
Date: Wed, 25 Oct 2023 10:17:17 GMT
Title: Improving Diversity of Demographic Representation in Large Language Models via Collective-Critiques and Self-Voting
Authors: Preethi Lahoti, Nicholas Blumm, Xiao Ma, Raghavendra Kotikalapudi, Sahitya Potluri, Qijun Tan, Hansa Srinivasan, Ben Packer, Ahmad Beirami, Alex Beutel, Jilin Chen
Abstract summary: This paper formalizes diversity of representation in generative large language models. We present evaluation datasets and propose metrics to measure diversity in generated responses along people and culture axes. We find that LLMs understand the notion of diversity, and that they can reason and critique their own responses for that goal.
Score: 19.79214899011072
License: http://creativecommons.org/licenses/by/4.0/
Abstract: A crucial challenge for generative large language models (LLMs) is diversity: when a user's prompt is under-specified, models may follow implicit assumptions while generating a response, which may result in homogenization of the responses, as well as certain demographic groups being under-represented or even erased from the generated responses. In this paper, we formalize diversity of representation in generative LLMs. We present evaluation datasets and propose metrics to measure diversity in generated responses along people and culture axes. We find that LLMs understand the notion of diversity, and that they can reason and critique their own responses for that goal. This finding motivated a new prompting technique called collective-critique and self-voting (CCSV) to self-improve people diversity of LLMs by tapping into its diversity reasoning capabilities, without relying on handcrafted examples or prompt tuning. Extensive empirical experiments with both human and automated evaluations show that our proposed approach is effective at improving people and culture diversity, and outperforms all baseline methods by a large margin.

Related papers

Evaluating the Diversity and Quality of LLM Generated Content [72.84945252821908]
We introduce a framework for measuring effective semantic diversity--diversity among outputs that meet quality thresholds. Although preference-tuned models exhibit reduced lexical and syntactic diversity, they produce greater effective semantic diversity than SFT or base models. These findings have important implications for applications that require diverse yet high-quality outputs.
arXiv Detail & Related papers (2025-04-16T23:02:23Z)
NoveltyBench: Evaluating Language Models for Humanlike Diversity [21.6078675947446]
NoveltyBench is a benchmark designed to evaluate the ability of language models to produce multiple distinct and high-quality outputs. We evaluate 20 leading language models and find that current state-of-the-art systems generate significantly less diversity than human writers.
arXiv Detail & Related papers (2025-04-07T16:14:23Z)
Exploring Robustness of LLMs to Paraphrasing Based on Sociodemographic Factors [7.312170216336085]
We extend the SocialIQA dataset to create diverse paraphrased sets conditioned on sociodemographic factors.<n>We find that demographic-based paraphrasing significantly impacts the performance of language models.
arXiv Detail & Related papers (2025-01-14T17:50:06Z)
One fish, two fish, but not the whole sea: Alignment reduces language models' conceptual diversity [2.5975241792179378]
Researchers have proposed using large language models (LLMs) as replacements for humans in behavioral research. It is debated whether post-training alignment (RLHF or RLAIF) affects models' internal diversity. We use a new way of measuring the conceptual diversity of synthetically-generated LLM "populations" by relating the internal variability of simulated individuals to the population-level variability.
arXiv Detail & Related papers (2024-11-07T04:38:58Z)
Unified Generative and Discriminative Training for Multi-modal Large Language Models [88.84491005030316]
Generative training has enabled Vision-Language Models (VLMs) to tackle various complex tasks. Discriminative training, exemplified by models like CLIP, excels in zero-shot image-text classification and retrieval. This paper proposes a unified approach that integrates the strengths of both paradigms.
arXiv Detail & Related papers (2024-11-01T01:51:31Z)
Capturing Bias Diversity in LLMs [1.9685736810241874]
This paper presents research on enhancements to Large Language Models (LLMs) through the addition of diversity in its generated outputs. By developing multiple customised instances of a GPT model, each reflecting biases in specific demographic characteristics including gender, age, and race, we propose, develop and evaluate a framework for a more nuanced and representative AI dialogue which we call BiasGPT. In this paper, through experiments, we demonstrate the capabilities of a GPT model to embed different biases which, when combined, can open the possibilities of more inclusive AI technologies.
arXiv Detail & Related papers (2024-10-09T17:07:50Z)
PersLLM: A Personified Training Approach for Large Language Models [66.16513246245401]
We propose PersLLM, integrating psychology-grounded principles of personality: social practice, consistency, and dynamic development. We incorporate personality traits directly into the model parameters, enhancing the model's resistance to induction, promoting consistency, and supporting the dynamic evolution of personality.
arXiv Detail & Related papers (2024-07-17T08:13:22Z)
The Factuality Tax of Diversity-Intervened Text-to-Image Generation: Benchmark and Fact-Augmented Intervention [61.80236015147771]
We quantify the trade-off between using diversity interventions and preserving demographic factuality in T2I models. Experiments on DoFaiR reveal that diversity-oriented instructions increase the number of different gender and racial groups. We propose Fact-Augmented Intervention (FAI) to reflect on verbalized or retrieved factual information about gender and racial compositions of generation subjects in history.
arXiv Detail & Related papers (2024-06-29T09:09:42Z)
Improving Diversity of Commonsense Generation by Large Language Models via In-Context Learning [28.654890118684957]
Generative Commonsense Reasoning (GCR) requires a model to reason about a situation using commonsense knowledge. The diversity of the generation is equally important because it reflects the model's ability to use a range of commonsense knowledge facts. We propose a simple method that diversifies the LLM generations, while preserving their quality.
arXiv Detail & Related papers (2024-04-25T17:52:39Z)
Scaling Data Diversity for Fine-Tuning Language Models in Human Alignment [84.32768080422349]
Alignment with human preference prevents large language models from generating misleading or toxic content. We propose a new formulation of prompt diversity, implying a linear correlation with the final performance of LLMs after fine-tuning.
arXiv Detail & Related papers (2024-03-17T07:08:55Z)
On the steerability of large language models toward data-driven personas [98.9138902560793]
Large language models (LLMs) are known to generate biased responses where the opinions of certain groups and populations are underrepresented. Here, we present a novel approach to achieve controllable generation of specific viewpoints using LLMs.
arXiv Detail & Related papers (2023-11-08T19:01:13Z)
Improving Factuality and Reasoning in Language Models through Multiagent Debate [95.10641301155232]
We present a complementary approach to improve language responses where multiple language model instances propose and debate their individual responses and reasoning processes over multiple rounds to arrive at a common final answer. Our findings indicate that this approach significantly enhances mathematical and strategic reasoning across a number of tasks. Our approach may be directly applied to existing black-box models and uses identical procedure and prompts for all tasks we investigate.
arXiv Detail & Related papers (2023-05-23T17:55:11Z)
Measuring and Improving Semantic Diversity of Dialogue Generation [21.59385143783728]
We introduce a new automatic evaluation metric to measure the semantic diversity of generated responses. We show that our proposed metric captures human judgments on response diversity better than existing lexical-level diversity metrics. We also propose a simple yet effective learning method that improves the semantic diversity of generated responses.
arXiv Detail & Related papers (2022-10-11T18:36:54Z)
Evaluating the Evaluation of Diversity in Natural Language Generation [43.05127848086264]
We propose a framework for evaluating diversity metrics in natural language generation systems. Our framework can advance the understanding of different diversity metrics, an essential step on the road towards better NLG systems.
arXiv Detail & Related papers (2020-04-06T20:44:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.