Related papers: Multiple Choice Learning of Low Rank Adapters for Language Modeling

Multiple Choice Learning of Low Rank Adapters for Language Modeling

URL: http://arxiv.org/abs/2507.10419v1
Date: Mon, 14 Jul 2025 16:00:51 GMT
Title: Multiple Choice Learning of Low Rank Adapters for Language Modeling
Authors: Victor Letzelter, Hugo Malard, Mathieu Fontaine, Gaël Richard, Slim Essid, Andrei Bursuc, Patrick Pérez,
Abstract summary: We propose LoRA-MCL, a training scheme that extends next-token prediction in language models with a method designed to decode diverse, plausible sentence continuations at inference time.<n>We demonstrate with extensive experiments on real-world visual and audio captioning tasks that our method achieves high diversity and relevance in generated outputs.
Score: 40.380297530862656
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: We propose LoRA-MCL, a training scheme that extends next-token prediction in language models with a method designed to decode diverse, plausible sentence continuations at inference time. Traditional language modeling is an intrinsically ill-posed problem: given a context, multiple futures may be equally plausible. Our approach leverages Multiple Choice Learning (MCL) and the Winner-Takes-All (WTA) loss to efficiently handle ambiguity through Low-Rank Adaptation (LoRA). We provide a theoretical interpretation of applying Multiple Choice Learning to Language Modeling, assuming the data is generated from a mixture of distributions. To illustrate the proposed approach, we use data sampled from mixtures of Markov chains. We then demonstrate with extensive experiments on real-world visual and audio captioning tasks that our method achieves high diversity and relevance in generated outputs.

Related papers

Multi-Hypothesis Distillation of Multilingual Neural Translation Models for Low-Resource Languages [2.2061683015812026]
We argue that the teacher model's output distribution holds valuable insights for the student.<n>We present Multi-Hypothesis Distillation (MHD), a sequence-level KD method that generates multiple translations for each source sentence.
arXiv Detail & Related papers (2025-07-29T07:59:20Z)
The Unreasonable Effectiveness of Model Merging for Cross-Lingual Transfer in LLMs [54.59207567677249]
Large language models (LLMs) still struggle across tasks outside of high-resource languages.<n>In this work, we investigate cross-lingual transfer to lower-resource languages where task-specific post-training data is scarce.
arXiv Detail & Related papers (2025-05-23T20:28:31Z)
Unlocking the Potential of Model Merging for Low-Resource Languages [66.7716891808697]
Adapting large language models to new languages typically involves continual pre-training (CT) followed by supervised fine-tuning (SFT) We propose model merging as an alternative for low-resource languages, combining models with distinct capabilities into a single model without additional training. Experiments based on Llama-2-7B demonstrate that model merging effectively endows LLMs for low-resource languages with task-solving abilities, outperforming CT-then-SFT in scenarios with extremely scarce data.
arXiv Detail & Related papers (2024-07-04T15:14:17Z)
Towards Fast Multilingual LLM Inference: Speculative Decoding and Specialized Drafters [21.19251212483406]
Large language models (LLMs) have revolutionized natural language processing and broadened their applicability across diverse commercial applications. This paper explores a training recipe of an assistant model in speculative decoding, which is leveraged to draft and-then its future tokens are verified by the target LLM. We show that language-specific draft models, optimized through a targeted pretrain-and-finetune strategy, substantially brings a speedup in inference time compared to the previous methods.
arXiv Detail & Related papers (2024-06-24T16:06:50Z)
RAVEL: Evaluating Interpretability Methods on Disentangling Language Model Representations [38.79058788596755]
We introduce RAVEL, a dataset that enables tightly controlled, quantitative comparisons between interpretability methods. We use the resulting conceptual framework to define the new method of Multi-task Distributed Alignment Search. With Llama2-7B as the target language model, MDAS achieves state-of-the-art results on RAVEL.
arXiv Detail & Related papers (2024-02-27T17:25:37Z)
Exploiting Multilingualism in Low-resource Neural Machine Translation via Adversarial Learning [3.2258463207097017]
Generative Adversarial Networks (GAN) offer a promising approach for Neural Machine Translation (NMT) In GAN, similar to bilingual models, multilingual NMT only considers one reference translation for each sentence during model training. This article proposes Denoising Adversarial Auto-encoder-based Sentence Interpolation (DAASI) approach to perform sentence computation.
arXiv Detail & Related papers (2023-03-31T12:34:14Z)
Modeling Sequential Sentence Relation to Improve Cross-lingual Dense Retrieval [87.11836738011007]
We propose a multilingual multilingual language model called masked sentence model (MSM) MSM consists of a sentence encoder to generate the sentence representations, and a document encoder applied to a sequence of sentence vectors from a document. To train the model, we propose a masked sentence prediction task, which masks and predicts the sentence vector via a hierarchical contrastive loss with sampled negatives.
arXiv Detail & Related papers (2023-02-03T09:54:27Z)
Language Model Pre-Training with Sparse Latent Typing [66.75786739499604]
We propose a new pre-training objective, Sparse Latent Typing, which enables the model to sparsely extract sentence-level keywords with diverse latent types. Experimental results show that our model is able to learn interpretable latent type categories in a self-supervised manner without using any external knowledge.
arXiv Detail & Related papers (2022-10-23T00:37:08Z)
Data Augmentation for Spoken Language Understanding via Pretrained Language Models [113.56329266325902]
Training of spoken language understanding (SLU) models often faces the problem of data scarcity. We put forward a data augmentation method using pretrained language models to boost the variability and accuracy of generated utterances.
arXiv Detail & Related papers (2020-04-29T04:07:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.