Improving Diversity of Demographic Representation in Large Language
Models via Collective-Critiques and Self-Voting
- URL: http://arxiv.org/abs/2310.16523v1
- Date: Wed, 25 Oct 2023 10:17:17 GMT
- Title: Improving Diversity of Demographic Representation in Large Language
Models via Collective-Critiques and Self-Voting
- Authors: Preethi Lahoti, Nicholas Blumm, Xiao Ma, Raghavendra Kotikalapudi,
Sahitya Potluri, Qijun Tan, Hansa Srinivasan, Ben Packer, Ahmad Beirami, Alex
Beutel, Jilin Chen
- Abstract summary: This paper formalizes diversity of representation in generative large language models.
We present evaluation datasets and propose metrics to measure diversity in generated responses along people and culture axes.
We find that LLMs understand the notion of diversity, and that they can reason and critique their own responses for that goal.
- Score: 19.79214899011072
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A crucial challenge for generative large language models (LLMs) is diversity:
when a user's prompt is under-specified, models may follow implicit assumptions
while generating a response, which may result in homogenization of the
responses, as well as certain demographic groups being under-represented or
even erased from the generated responses. In this paper, we formalize diversity
of representation in generative LLMs. We present evaluation datasets and
propose metrics to measure diversity in generated responses along people and
culture axes. We find that LLMs understand the notion of diversity, and that
they can reason and critique their own responses for that goal. This finding
motivated a new prompting technique called collective-critique and self-voting
(CCSV) to self-improve people diversity of LLMs by tapping into its diversity
reasoning capabilities, without relying on handcrafted examples or prompt
tuning. Extensive empirical experiments with both human and automated
evaluations show that our proposed approach is effective at improving people
and culture diversity, and outperforms all baseline methods by a large margin.
Related papers
- PersLLM: A Personified Training Approach for Large Language Models [63.75008885222351]
We propose PersLLM, integrating psychology-grounded principles of personality: social practice, consistency, and dynamic development.
We incorporate personality traits directly into the model parameters, enhancing the model's resistance to induction, promoting consistency, and supporting the dynamic evolution of personality.
arXiv Detail & Related papers (2024-07-17T08:13:22Z) - The Factuality Tax of Diversity-Intervened Text-to-Image Generation: Benchmark and Fact-Augmented Intervention [61.80236015147771]
We quantify the trade-off between using diversity interventions and preserving demographic factuality in T2I models.
Experiments on DoFaiR reveal that diversity-oriented instructions increase the number of different gender and racial groups.
We propose Fact-Augmented Intervention (FAI), which instructs a Large Language Model (LLM) to reflect on verbalized or retrieved factual information about gender and racial compositions of generation subjects in history.
arXiv Detail & Related papers (2024-06-29T09:09:42Z) - Improving Diversity of Commonsense Generation by Large Language Models via In-Context Learning [28.654890118684957]
Generative Commonsense Reasoning (GCR) requires a model to reason about a situation using commonsense knowledge.
The diversity of the generation is equally important because it reflects the model's ability to use a range of commonsense knowledge facts.
We propose a simple method that diversifies the LLM generations, while preserving their quality.
arXiv Detail & Related papers (2024-04-25T17:52:39Z) - CulturalTeaming: AI-Assisted Interactive Red-Teaming for Challenging LLMs' (Lack of) Multicultural Knowledge [69.82940934994333]
We introduce CulturalTeaming, an interactive red-teaming system that leverages human-AI collaboration to build challenging evaluation dataset.
Our study reveals that CulturalTeaming's various modes of AI assistance support annotators in creating cultural questions.
CULTURALBENCH-V0.1 is a compact yet high-quality evaluation dataset with users' red-teaming attempts.
arXiv Detail & Related papers (2024-04-10T00:25:09Z) - Scaling Data Diversity for Fine-Tuning Language Models in Human Alignment [84.32768080422349]
Alignment with human preference prevents large language models from generating misleading or toxic content.
We propose a new formulation of prompt diversity, implying a linear correlation with the final performance of LLMs after fine-tuning.
arXiv Detail & Related papers (2024-03-17T07:08:55Z) - How Far Can We Extract Diverse Perspectives from Large Language Models? [17.66104821305835]
We investigate Large Language Models' capacity for generating diverse perspectives on subjective topics.
Motivated by how humans develop their opinions through their values, we propose a criteria-based prompting technique.
We find that LLMs can generate diverse opinions according to the degree of task subjectivity.
arXiv Detail & Related papers (2023-11-16T11:23:38Z) - On the steerability of large language models toward data-driven personas [98.9138902560793]
Large language models (LLMs) are known to generate biased responses where the opinions of certain groups and populations are underrepresented.
Here, we present a novel approach to achieve controllable generation of specific viewpoints using LLMs.
arXiv Detail & Related papers (2023-11-08T19:01:13Z) - Improving Factuality and Reasoning in Language Models through Multiagent
Debate [95.10641301155232]
We present a complementary approach to improve language responses where multiple language model instances propose and debate their individual responses and reasoning processes over multiple rounds to arrive at a common final answer.
Our findings indicate that this approach significantly enhances mathematical and strategic reasoning across a number of tasks.
Our approach may be directly applied to existing black-box models and uses identical procedure and prompts for all tasks we investigate.
arXiv Detail & Related papers (2023-05-23T17:55:11Z) - Measuring and Improving Semantic Diversity of Dialogue Generation [21.59385143783728]
We introduce a new automatic evaluation metric to measure the semantic diversity of generated responses.
We show that our proposed metric captures human judgments on response diversity better than existing lexical-level diversity metrics.
We also propose a simple yet effective learning method that improves the semantic diversity of generated responses.
arXiv Detail & Related papers (2022-10-11T18:36:54Z) - Evaluating the Evaluation of Diversity in Natural Language Generation [43.05127848086264]
We propose a framework for evaluating diversity metrics in natural language generation systems.
Our framework can advance the understanding of different diversity metrics, an essential step on the road towards better NLG systems.
arXiv Detail & Related papers (2020-04-06T20:44:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.