Related papers: Mind the Gap: Conformative Decoding to Improve Output Diversity of Instruction-Tuned Large Language Models

Mind the Gap: Conformative Decoding to Improve Output Diversity of Instruction-Tuned Large Language Models

URL: http://arxiv.org/abs/2507.20956v1
Date: Mon, 28 Jul 2025 16:04:25 GMT
Title: Mind the Gap: Conformative Decoding to Improve Output Diversity of Instruction-Tuned Large Language Models
Authors: Max Peeperkorn, Tom Kouwenhoven, Dan Brown, Anna Jordanous,
Abstract summary: This paper investigates the diversity gap'' for a writing prompt narrative generation task.<n>Results show significant decreases in diversity due to instruction-tuning.<n>We present a new decoding strategy, conformative decoding, which guides an instruct model using its more diverse base model to reintroduce output diversity.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Instruction-tuning large language models (LLMs) reduces the diversity of their outputs, which has implications for many tasks, particularly for creative tasks. This paper investigates the ``diversity gap'' for a writing prompt narrative generation task. This gap emerges as measured by current diversity metrics for various open-weight and open-source LLMs. The results show significant decreases in diversity due to instruction-tuning. We explore the diversity loss at each fine-tuning stage for the OLMo and OLMo 2 models to further understand how output diversity is affected. The results indicate that DPO has the most substantial impact on diversity. Motivated by these findings, we present a new decoding strategy, conformative decoding, which guides an instruct model using its more diverse base model to reintroduce output diversity. We show that conformative decoding typically increases diversity and even maintains or improves quality.

Related papers

The Price of Format: Diversity Collapse in LLMs [32.616432249190716]
Large language models (LLMs) employ structured templates, such as role markers and special tokens, to enforce format consistency during inference.<n>We systematically evaluate this effect across tasks like story completion and free-form generation, finding that diversity collapse persists even under high-temperature sampling.<n>To contextualize these findings, we fine-tune the same model using a range of structured prompts and then evaluate them across three axes: downstream task performance, alignment behavior, and output diversity.
arXiv Detail & Related papers (2025-05-25T02:52:35Z)
Diverse, not Short: A Length-Controlled Self-Learning Framework for Improving Response Diversity of Language Models [8.023589594229914]
We show that common diversity metrics, and even reward models used for preference optimization, systematically bias models toward shorter outputs.<n>We introduce Diverse, not Short (Diverse-NS), a length-controlled self-learning framework that improves response diversity while maintaining length parity.
arXiv Detail & Related papers (2025-05-22T05:29:47Z)
Evaluating the Diversity and Quality of LLM Generated Content [72.84945252821908]
We introduce a framework for measuring effective semantic diversity--diversity among outputs that meet quality thresholds.<n>Although preference-tuned models exhibit reduced lexical and syntactic diversity, they produce greater effective semantic diversity than SFT or base models.<n>These findings have important implications for applications that require diverse yet high-quality outputs.
arXiv Detail & Related papers (2025-04-16T23:02:23Z)
Exploring and Controlling Diversity in LLM-Agent Conversation [17.38671584773247]
We propose Adaptive Prompt Pruning (APP), a novel method that allows users to control diversity through a single parameter.<n>APP effectively controls output diversity through extensive experiments, and propose a method to balance the control trade-offs.
arXiv Detail & Related papers (2024-12-30T17:25:58Z)
Improving Structural Diversity of Blackbox LLMs via Chain-of-Specification Prompting [28.971248570622603]
We propose a diversity metric called structural diversity, where the user provides a mapping from generated text to features capturing the kinds of diversity that they care about. In our experiments, we show that for structural diversity in the poetry and code domains, CoS significantly improves diversity compared to several baselines.
arXiv Detail & Related papers (2024-08-12T14:34:06Z)
Scaling Data Diversity for Fine-Tuning Language Models in Human Alignment [84.32768080422349]
Alignment with human preference prevents large language models from generating misleading or toxic content. We propose a new formulation of prompt diversity, implying a linear correlation with the final performance of LLMs after fine-tuning.
arXiv Detail & Related papers (2024-03-17T07:08:55Z)
Mitigating Shortcut Learning with Diffusion Counterfactuals and Diverse Ensembles [104.60508550106618]
We propose DiffDiv, an ensemble diversification framework exploiting Diffusion Probabilistic Models (DPMs)<n>We show that DPMs can generate images with novel feature combinations, even when trained on samples displaying correlated input features.<n>We show that DPM-guided diversification is sufficient to remove dependence on shortcut cues, without a need for additional supervised signals.
arXiv Detail & Related papers (2023-11-23T15:47:33Z)
Diversify Question Generation with Retrieval-Augmented Style Transfer [68.00794669873196]
We propose RAST, a framework for Retrieval-Augmented Style Transfer. The objective is to utilize the style of diverse templates for question generation. We develop a novel Reinforcement Learning (RL) based approach that maximizes a weighted combination of diversity reward and consistency reward.
arXiv Detail & Related papers (2023-10-23T02:27:31Z)
Exploring Diversity in Back Translation for Low-Resource Machine Translation [85.03257601325183]
Back translation is one of the most widely used methods for improving the performance of neural machine translation systems. Recent research has sought to enhance the effectiveness of this method by increasing the 'diversity' of the generated translations. This work puts forward a more nuanced framework for understanding diversity in training data, splitting it into lexical diversity and syntactic diversity.
arXiv Detail & Related papers (2022-06-01T15:21:16Z)
The Effect of Diversity in Meta-Learning [79.56118674435844]
Few-shot learning aims to learn representations that can tackle novel tasks given a small number of examples. Recent studies show that task distribution plays a vital role in the model's performance. We study different task distributions on a myriad of models and datasets to evaluate the effect of task diversity on meta-learning algorithms.
arXiv Detail & Related papers (2022-01-27T19:39:07Z)
Informed Sampling for Diversity in Concept-to-Text NLG [8.883733362171034]
We propose an Imitation Learning approach to explore the level of diversity that a language generation model can reliably produce. Specifically, we augment the decoding process with a meta-classifier trained to distinguish which words at any given timestep will lead to high-quality output.
arXiv Detail & Related papers (2020-04-29T17:43:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.