Related papers: Evaluating the Diversity and Quality of LLM Generated Content

Evaluating the Diversity and Quality of LLM Generated Content

URL: http://arxiv.org/abs/2504.12522v1
Date: Wed, 16 Apr 2025 23:02:23 GMT
Title: Evaluating the Diversity and Quality of LLM Generated Content
Authors: Alexander Shypula, Shuo Li, Botong Zhang, Vishakh Padmakumar, Kayo Yin, Osbert Bastani,
Abstract summary: We introduce a framework for measuring effective semantic diversity--diversity among outputs that meet quality thresholds.<n>Although preference-tuned models exhibit reduced lexical and syntactic diversity, they produce greater effective semantic diversity than SFT or base models.<n>These findings have important implications for applications that require diverse yet high-quality outputs.
Score: 72.84945252821908
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent work suggests that preference-tuning techniques--including Reinforcement Learning from Human Preferences (RLHF) methods like PPO and GRPO, as well as alternatives like DPO--reduce diversity, creating a dilemma given that such models are widely deployed in applications requiring diverse outputs. To address this, we introduce a framework for measuring effective semantic diversity--diversity among outputs that meet quality thresholds--which better reflects the practical utility of large language models (LLMs). Using open-ended tasks that require no human intervention, we find counterintuitive results: although preference-tuned models--especially those trained via RL--exhibit reduced lexical and syntactic diversity, they produce greater effective semantic diversity than SFT or base models, not from increasing diversity among high-quality outputs, but from generating more high-quality outputs overall. We discover that preference tuning reduces syntactic diversity while preserving semantic diversity--revealing a distinction between diversity in form and diversity in content that traditional metrics often overlook. Our analysis further shows that smaller models are consistently more parameter-efficient at generating unique content within a fixed sampling budget, offering insights into the relationship between model scaling and diversity. These findings have important implications for applications that require diverse yet high-quality outputs, from creative assistance to synthetic data generation.

Related papers

Beyond the Dirac Delta: Mitigating Diversity Collapse in Reinforcement Fine-Tuning for Versatile Image Generation [51.305316234962554]
We propose textbfDRIFT (textbfDivetextbfRsity-textbfIncentivized Reinforcement textbfFine-textbfTuning for Versatile Image Generation), an innovative framework that systematically incentivizes output throughout the on-policy fine-tuning process.<n>DRIFT achieves superior dominance regarding task alignment and generation diversity, yielding a $ 9.08%!sim! 43.46%$ increase in diversity equivalent alignment levels and a $ 59.65
arXiv Detail & Related papers (2026-01-18T13:25:43Z)
DiverseGRPO: Mitigating Mode Collapse in Image Generation via Diversity-Aware GRPO [50.89703227426486]
Reinforcement learning (RL) improves image generation quality significantly by comparing the relative performance of images generated within the same group.<n>In the later stages of training, the model tends to produce homogenized outputs, lacking creativity and visual diversity.<n>This issue can be analyzed from both reward modeling and generation dynamics perspectives.
arXiv Detail & Related papers (2025-12-25T05:37:37Z)
Post-training Large Language Models for Diverse High-Quality Responses [32.92680825196664]
Reinforcement learning (RL) has emerged as a popular method for post-training large language models (LLMs)<n>We propose a novel training method named DQO (Diversity Quality Optimization) based on determinantal point processes (DPPs)<n>Our approach samples and embeds a group of responses for each prompt, then uses the determinant of a kernel-based similarity matrix to measure diversity as the volume spanned by the embeddings of these responses.
arXiv Detail & Related papers (2025-09-05T03:47:06Z)
Jointly Reinforcing Diversity and Quality in Language Model Generations [64.72289248044514]
Post-training of Large Language Models (LMs) often prioritizes accuracy and helpfulness at the expense of diversity.<n>We address this challenge with Diversity-Aware Reinforcement Learning (DARLING), a framework that jointly optimize for response quality and semantic diversity.
arXiv Detail & Related papers (2025-09-02T17:38:47Z)
Diverse, not Short: A Length-Controlled Data Selection Strategy for Improving Response Diversity of Language Models [7.024425397469389]
We show that common diversity metrics, and even reward models used for preference optimization, systematically bias models toward shorter outputs.<n>We introduce Diverse, not Short (Diverse-NS), a length-controlled data selection strategy that improves response diversity while maintaining length parity.
arXiv Detail & Related papers (2025-05-22T05:29:47Z)
Latent Preference Coding: Aligning Large Language Models via Discrete Latent Codes [54.93980123979578]
We introduce Latent Preference Coding (LPC), a novel framework that models the implicit factors as well as their combinations behind holistic preferences.<n>LPC seamlessly integrates with various offline alignment algorithms, automatically inferring the underlying factors and their importance from data.
arXiv Detail & Related papers (2025-05-08T06:59:06Z)
Modifying Large Language Model Post-Training for Diverse Creative Writing [12.872333448726595]
In creative writing generation, we investigate post-training approaches to promote both output diversity and quality.<n>Our core idea is to include deviation in the training objective to facilitate learning from rare high-quality instances.<n>Our best model with 8B parameters could achieve on-par diversity as a human-created dataset while having output quality similar to the best instruction-tuned models.
arXiv Detail & Related papers (2025-03-21T13:21:45Z)
Measuring Data Diversity for Instruction Tuning: A Systematic Analysis and A Reliable Metric [48.81957145701228]
We propose a new diversity metric based on sample-level "novelty" We show that NovelSum accurately captures diversity variations and achieves a 0.97 correlation with instruction-tuned model performance.
arXiv Detail & Related papers (2025-02-24T14:20:22Z)
Prompt as Free Lunch: Enhancing Diversity in Source-Free Cross-domain Few-shot Learning through Semantic-Guided Prompting [9.116108409344177]
The source-free cross-domain few-shot learning task aims to transfer pretrained models to target domains utilizing minimal samples.<n>We propose the SeGD-VPT framework, which is divided into two phases.<n>The first step aims to increase feature diversity by adding diversity prompts to each support sample, thereby generating varying input and enhancing sample diversity.
arXiv Detail & Related papers (2024-12-01T11:00:38Z)
Improving Diversity of Commonsense Generation by Large Language Models via In-Context Learning [28.654890118684957]
Generative Commonsense Reasoning (GCR) requires a model to reason about a situation using commonsense knowledge. The diversity of the generation is equally important because it reflects the model's ability to use a range of commonsense knowledge facts. We propose a simple method that diversifies the LLM generations, while preserving their quality.
arXiv Detail & Related papers (2024-04-25T17:52:39Z)
Quality Diversity through Human Feedback: Towards Open-Ended Diversity-Driven Optimization [13.436983663467938]
This paper introduces Quality Diversity through Human Feedback (QDHF), a novel approach that progressively infers diversity metrics from human judgments of similarity among solutions. Empirical studies show that QDHF significantly outperforms state-of-the-art methods in automatic diversity discovery. In open-ended generative tasks, QDHF substantially enhances the diversity of text-to-image generation from a diffusion model.
arXiv Detail & Related papers (2023-10-18T16:46:16Z)
Leveraging Diffusion Disentangled Representations to Mitigate Shortcuts in Underspecified Visual Tasks [92.32670915472099]
We propose an ensemble diversification framework exploiting the generation of synthetic counterfactuals using Diffusion Probabilistic Models (DPMs) We show that diffusion-guided diversification can lead models to avert attention from shortcut cues, achieving ensemble diversity performance comparable to previous methods requiring additional data collection.
arXiv Detail & Related papers (2023-10-03T17:37:52Z)
Source-free Domain Adaptation Requires Penalized Diversity [60.04618512479438]
Source-free domain adaptation (SFDA) was introduced to address knowledge transfer between different domains in the absence of source data. In unsupervised SFDA, the diversity is limited to learning a single hypothesis on the source or learning multiple hypotheses with a shared feature extractor. We propose a novel unsupervised SFDA algorithm that promotes representational diversity through the use of separate feature extractors.
arXiv Detail & Related papers (2023-04-06T00:20:19Z)
Exploring Diversity in Back Translation for Low-Resource Machine Translation [85.03257601325183]
Back translation is one of the most widely used methods for improving the performance of neural machine translation systems. Recent research has sought to enhance the effectiveness of this method by increasing the 'diversity' of the generated translations. This work puts forward a more nuanced framework for understanding diversity in training data, splitting it into lexical diversity and syntactic diversity.
arXiv Detail & Related papers (2022-06-01T15:21:16Z)
Informed Sampling for Diversity in Concept-to-Text NLG [8.883733362171034]
We propose an Imitation Learning approach to explore the level of diversity that a language generation model can reliably produce. Specifically, we augment the decoding process with a meta-classifier trained to distinguish which words at any given timestep will lead to high-quality output.
arXiv Detail & Related papers (2020-04-29T17:43:24Z)
Towards Multimodal Response Generation with Exemplar Augmentation and Curriculum Optimization [73.45742420178196]
We propose a novel multimodal response generation framework with exemplar augmentation and curriculum optimization. Our model achieves significant improvements compared to strong baselines in terms of diversity and relevance.
arXiv Detail & Related papers (2020-04-26T16:29:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.