Related papers: Improving Diversity in Language Models: When Temperature Fails, Change the Loss

Improving Diversity in Language Models: When Temperature Fails, Change the Loss

URL: http://arxiv.org/abs/2508.09654v1
Date: Wed, 13 Aug 2025 09:37:53 GMT
Title: Improving Diversity in Language Models: When Temperature Fails, Change the Loss
Authors: Alexandre Verine, Florian Le Bronnec, Kunhao Zheng, Alexandre Allauzen, Yann Chevaleyre, Benjamin Negrevergne,
Abstract summary: We propose rethinking loss functions in language models by leveraging the Precision-Recall framework.<n>Our results demonstrate that this approach achieves a substantially better trade-off between Precision and Recall than merely combining negative log-likelihood training with temperature scaling.
Score: 81.73385878967899
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Increasing diversity in language models is a challenging yet essential objective. A common approach is to raise the decoding temperature. In this work, we investigate this approach through a simplistic yet common case to provide insights into why decreasing temperature can improve quality (Precision), while increasing it often fails to boost coverage (Recall). Our analysis reveals that for a model to be effectively tunable through temperature adjustments, it must be trained toward coverage. To address this, we propose rethinking loss functions in language models by leveraging the Precision-Recall framework. Our results demonstrate that this approach achieves a substantially better trade-off between Precision and Recall than merely combining negative log-likelihood training with temperature scaling. These findings offer a pathway toward more versatile and robust language modeling techniques.

Related papers

Simulated Annealing Enhances Theory-of-Mind Reasoning in Autoregressive Language Models [1.4323566945483497]
Theory of Mind (ToM) tasks crucially depend on reasoning about latent mental states of oneself and others.<n>We show that strong ToM capability can be recovered directly from the base model without any additional weight updates or verifications.
arXiv Detail & Related papers (2026-01-18T05:51:30Z)
ReLearn: Unlearning via Learning for Large Language Models [64.2802606302194]
We propose ReLearn, a data augmentation and fine-tuning pipeline for effective unlearning.<n>This framework introduces Knowledge Forgetting Rate (KFR) and Knowledge Retention Rate (KRR) to measure knowledge-level preservation.<n>Our experiments show that ReLearn successfully achieves targeted forgetting while preserving high-quality output.
arXiv Detail & Related papers (2025-02-16T16:31:00Z)
Optimizing Temperature for Language Models with Multi-Sample Inference [47.14991144052361]
This paper addresses the challenge of automatically identifying the (near)-optimal temperature for different large language models.<n>We provide a comprehensive analysis of temperature's role in performance optimization, considering variations in model architectures, datasets, task types, model sizes, and predictive accuracy.<n>We propose a novel entropy-based metric for automated temperature optimization, which consistently outperforms fixed-temperature baselines.
arXiv Detail & Related papers (2025-02-07T19:35:25Z)
Masked Thought: Simply Masking Partial Reasoning Steps Can Improve Mathematical Reasoning Learning of Language Models [102.72940700598055]
In reasoning tasks, even a minor error can cascade into inaccurate results. We develop a method that avoids introducing external resources, relying instead on perturbations to the input. Our training approach randomly masks certain tokens within the chain of thought, a technique we found to be particularly effective for reasoning tasks.
arXiv Detail & Related papers (2024-03-04T16:21:54Z)
A Study on the Calibration of In-context Learning [27.533223818505682]
We study in-context learning (ICL), a prevalent method for adapting static language models through tailored prompts. We observe that, with an increasing number of ICL examples, models initially exhibit increased miscalibration before achieving better calibration. We explore recalibration techniques and find that a scaling-binning calibrator can reduce calibration errors consistently.
arXiv Detail & Related papers (2023-12-07T03:37:39Z)
On the Analysis of Cross-Lingual Prompt Tuning for Decoder-based Multilingual Model [49.81429697921861]
We study the interaction between parameter-efficient fine-tuning (PEFT) and cross-lingual tasks in multilingual autoregressive models. We show that prompt tuning is more effective in enhancing the performance of low-resource languages than fine-tuning.
arXiv Detail & Related papers (2023-11-14T00:43:33Z)
Shattering the Agent-Environment Interface for Fine-Tuning Inclusive Language Models [24.107358120517336]
In this work, we adopt a novel perspective wherein a pre-trained language model is itself simultaneously a policy, reward function, and transition function. An immediate consequence of this is that reward learning and language model fine-tuning can be performed jointly and directly, without requiring any further downstream policy optimization.
arXiv Detail & Related papers (2023-05-19T06:21:15Z)
FUN with Fisher: Improving Generalization of Adapter-Based Cross-lingual Transfer with Scheduled Unfreezing [60.629222280633606]
We investigate scheduled unfreezing algorithms for fine-tuning task adapters. Experiments show scheduled unfreezing methods close the gap to full fine-tuning and achieve stronger cross-lingual transfer performance. We propose a general scheduled unfreezing algorithm that achieves an average of 2 points improvement over four datasets.
arXiv Detail & Related papers (2023-01-13T11:26:53Z)
Adaptive Temperature Scaling for Robust Calibration of Deep Neural Networks [0.7219077740523682]
We focus on the task of confidence scaling, specifically on post-hoc methods that generalize Temperature Scaling. We show that when there is plenty of data complex models like neural networks yield better performance, but are prone to fail when the amount of data is limited. We propose Entropy-based Temperature Scaling, a simple method that scales the confidence of a prediction according to its entropy.
arXiv Detail & Related papers (2022-07-31T16:20:06Z)
Contextual Temperature for Language Modeling [14.485125883455975]
We propose contextual temperature, which learns an optimal temperature trajectory for each vocabulary over the context. Experimental results confirm that the proposed method significantly improves state-of-the-art language models. In-depth analyses show that the behaviour of the learned temperature schedules varies dramatically by vocabulary.
arXiv Detail & Related papers (2020-12-25T13:50:03Z)
Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose. We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.