A survey of diversity quantification in natural language processing: The why, what, where and how
- URL: http://arxiv.org/abs/2507.20858v1
- Date: Mon, 28 Jul 2025 14:12:34 GMT
- Title: A survey of diversity quantification in natural language processing: The why, what, where and how
- Authors: Louis Estève, Marie-Catherine de Marneffe, Nurit Melnik, Agata Savary, Olha Kanishcheva,
- Abstract summary: We survey articles in the ACL Anthology from the past 6 years, with "diversity" or "diverse" in their title.<n>We put forward a unified taxonomy of why, what on, where, and how diversity is measured in NLP.<n>We believe that this study paves the way towards a better formalization of diversity in NLP.
- Score: 2.5833049611832273
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: The concept of diversity has received increased consideration in Natural Language Processing (NLP) in recent years. This is due to various motivations like promoting and inclusion, approximating human linguistic behavior, and increasing systems' performance. Diversity has however often been addressed in an ad hoc manner in NLP, and with few explicit links to other domains where this notion is better theorized. We survey articles in the ACL Anthology from the past 6 years, with "diversity" or "diverse" in their title. We find a wide range of settings in which diversity is quantified, often highly specialized and using inconsistent terminology. We put forward a unified taxonomy of why, what on, where, and how diversity is measured in NLP. Diversity measures are cast upon a unified framework from ecology and economy (Stirling, 2007) with 3 dimensions of diversity: variety, balance and disparity. We discuss the trends which emerge due to this systematized approach. We believe that this study paves the way towards a better formalization of diversity in NLP, which should bring a better understanding of this notion and a better comparability between various approaches.
Related papers
- Evaluating the Diversity and Quality of LLM Generated Content [72.84945252821908]
We introduce a framework for measuring effective semantic diversity--diversity among outputs that meet quality thresholds.<n>Although preference-tuned models exhibit reduced lexical and syntactic diversity, they produce greater effective semantic diversity than SFT or base models.<n>These findings have important implications for applications that require diverse yet high-quality outputs.
arXiv Detail & Related papers (2025-04-16T23:02:23Z) - What is "Typological Diversity" in NLP? [7.58293347591642]
We introduce metrics to approximate the diversity of language selection along several axes.
We show that skewed language selection can lead to overestimated multilingual performance.
arXiv Detail & Related papers (2024-02-06T18:29:39Z) - Diversify Question Generation with Retrieval-Augmented Style Transfer [68.00794669873196]
We propose RAST, a framework for Retrieval-Augmented Style Transfer.
The objective is to utilize the style of diverse templates for question generation.
We develop a novel Reinforcement Learning (RL) based approach that maximizes a weighted combination of diversity reward and consistency reward.
arXiv Detail & Related papers (2023-10-23T02:27:31Z) - Exploring Diversity in Back Translation for Low-Resource Machine
Translation [85.03257601325183]
Back translation is one of the most widely used methods for improving the performance of neural machine translation systems.
Recent research has sought to enhance the effectiveness of this method by increasing the 'diversity' of the generated translations.
This work puts forward a more nuanced framework for understanding diversity in training data, splitting it into lexical diversity and syntactic diversity.
arXiv Detail & Related papers (2022-06-01T15:21:16Z) - Semantic Diversity in Dialogue with Natural Language Inference [19.74618235525502]
This paper makes two substantial contributions to improving diversity in dialogue generation.
First, we propose a novel metric which uses Natural Language Inference (NLI) to measure the semantic diversity of a set of model responses for a conversation.
Second, we demonstrate how to iteratively improve the semantic diversity of a sampled set of responses via a new generation procedure called Diversity Threshold Generation.
arXiv Detail & Related papers (2022-05-03T13:56:32Z) - What can phylogenetic metrics tell us about useful diversity in
evolutionary algorithms? [62.997667081978825]
Phylogenetic diversity metrics are a class of metrics popularly used in biology.
We find that, in most cases, phylogenetic metrics behave meaningfully differently from other diversity metrics.
arXiv Detail & Related papers (2021-08-28T06:49:14Z) - Diversity in Sociotechnical Machine Learning Systems [2.9973947110286163]
There has been a surge of recent interest in sociocultural diversity in machine learning (ML) research.
We present a taxonomy of different diversity concepts from philosophy of science, and explicate the distinct rationales underlying these concepts.
We provide an overview of mechanisms by which diversity can benefit group performance.
arXiv Detail & Related papers (2021-07-19T21:26:38Z) - Unifying Behavioral and Response Diversity for Open-ended Learning in
Zero-sum Games [44.30509625560908]
In open-ended learning algorithms, there are no widely accepted definitions for diversity, making it hard to construct and evaluate the diverse policies.
We propose a unified measure of diversity in multi-agent open-ended learning based on both Behavioral Diversity (BD) and Response Diversity (RD)
We show that many current diversity measures fall in one of the categories of BD or RD but not both.
With this unified diversity measure, we design the corresponding diversity-promoting objective and population effectivity when seeking the best responses in open-ended learning.
arXiv Detail & Related papers (2021-06-09T10:11:06Z) - Gender Bias in Multilingual Embeddings and Cross-Lingual Transfer [101.58431011820755]
We study gender bias in multilingual embeddings and how it affects transfer learning for NLP applications.
We create a multilingual dataset for bias analysis and propose several ways for quantifying bias in multilingual representations.
arXiv Detail & Related papers (2020-05-02T04:34:37Z) - Evaluating the Evaluation of Diversity in Natural Language Generation [43.05127848086264]
We propose a framework for evaluating diversity metrics in natural language generation systems.
Our framework can advance the understanding of different diversity metrics, an essential step on the road towards better NLG systems.
arXiv Detail & Related papers (2020-04-06T20:44:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.