Related papers: Stop Jostling: Adaptive Negative Sampling Reduces the Marginalization of Low-Resource Language Tokens by Cross-Entropy Loss

Stop Jostling: Adaptive Negative Sampling Reduces the Marginalization of Low-Resource Language Tokens by Cross-Entropy Loss

URL: http://arxiv.org/abs/2601.22439v1
Date: Fri, 30 Jan 2026 01:17:53 GMT
Title: Stop Jostling: Adaptive Negative Sampling Reduces the Marginalization of Low-Resource Language Tokens by Cross-Entropy Loss
Authors: Galim Turumtaev,
Abstract summary: We show how negative sampling can be applied to improve the representation of rare tokens by limiting the harmful influence of excessive marginalization.<n>This work is the first to show how negative sampling can be applied to improve the representation of rare tokens by limiting the harmful influence of excessive marginalization.
Score: 1.0152838128195467
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Neural language models often struggle with low-resource languages due to the limited availability of training data, making tokens from these languages rare in the training set. This paper addresses a specific challenge during training: rare tokens are disproportionately affected by marginalization, which prevents them from learning effectively. We propose a thresholding technique that reduces the impact of this marginalization, allowing rare tokens to benefit from more meaningful alignment. Through experiments with a character-level language model, we demonstrate that this method significantly improves performance on low-resource language validation data. This work is the first to show how negative sampling can be applied to improve the representation of rare tokens by limiting the harmful influence of excessive marginalization, offering a new approach to enhancing language model performance for underrepresented languages.

Related papers

Low-Resource English-Tigrinya MT: Leveraging Multilingual Models, Custom Tokenizers, and Clean Evaluation Benchmarks [6.177998679139308]
Despite advances in Neural Machine Translation (NMT), low-resource languages like Tigrinya remain underserved.<n>This paper investigates transfer learning techniques using multilingual pretrained models to enhance translation quality for morphologically rich, low-resource languages.
arXiv Detail & Related papers (2025-09-24T15:02:57Z)
Controlling Language Confusion in Multilingual LLMs [0.0]
Large language models often suffer from language confusion, a phenomenon in which responses are partially or entirely generated in unintended languages.<n>In this work, we apply ORPO, which adds penalties for unwanted output styles to standard SFT, effectively suppressing language-confused generations.
arXiv Detail & Related papers (2025-05-25T12:15:31Z)
SMILE: Speech Meta In-Context Learning for Low-Resource Language Automatic Speech Recognition [55.2480439325792]
Speech Meta In-Context LEarning (SMILE) is an innovative framework that combines meta-learning with speech in-context learning (SICL)<n>We show that SMILE consistently outperforms baseline methods in training-free few-shot multilingual ASR tasks.
arXiv Detail & Related papers (2024-09-16T16:04:16Z)
Enhancing Idiomatic Representation in Multiple Languages via an Adaptive Contrastive Triplet Loss [9.807885676930308]
We propose an approach to model idiomaticity using a triplet loss that incorporates the asymmetric contribution of components words to an idiomatic meaning for training language models. Our proposed method is evaluated on a SemEval challenge and outperforms previous alternatives significantly in many metrics.
arXiv Detail & Related papers (2024-06-21T14:21:41Z)
Problematic Tokens: Tokenizer Bias in Large Language Models [4.7245503050933335]
This paper traces the roots of disparities to the tokenization process inherent to large language models. Specifically, it explores how the tokenizers vocabulary, often used to speed up the tokenization process, inadequately represents non-English languages. We aim to dissect the tokenization mechanics of GPT-4o, illustrating how its simplified token-handling methods amplify associated security and ethical issues.
arXiv Detail & Related papers (2024-06-17T05:13:25Z)
No Train but Gain: Language Arithmetic for training-free Language Adapters enhancement [59.37775534633868]
We introduce a novel method called language arithmetic, which enables training-free post-processing. The effectiveness of the proposed solution is demonstrated on three downstream tasks in a MAD-X-based set of cross-lingual schemes.
arXiv Detail & Related papers (2024-04-24T08:52:40Z)
A Simple Contrastive Learning Objective for Alleviating Neural Text Degeneration [56.64703901898937]
We propose a new contrastive token learning objective that inherits the advantages of cross-entropy and unlikelihood training. Comprehensive experiments on language modeling and open-domain dialogue generation tasks show that the proposed contrastive token objective yields less repetitive texts.
arXiv Detail & Related papers (2022-05-05T08:50:50Z)
Token-wise Curriculum Learning for Neural Machine Translation [94.93133801641707]
Existing curriculum learning approaches to Neural Machine Translation (NMT) require sufficient sampling amounts of "easy" samples from training data at the early training stage. We propose a novel token-wise curriculum learning approach that creates sufficient amounts of easy samples. Our approach can consistently outperform baselines on 5 language pairs, especially for low-resource languages.
arXiv Detail & Related papers (2021-03-20T03:57:59Z)
On Negative Interference in Multilingual Models: Findings and A Meta-Learning Treatment [59.995385574274785]
We show that, contrary to previous belief, negative interference also impacts low-resource languages. We present a meta-learning algorithm that obtains better cross-lingual transferability and alleviates negative interference.
arXiv Detail & Related papers (2020-10-06T20:48:58Z)
Leveraging Adversarial Training in Self-Learning for Cross-Lingual Text Classification [52.69730591919885]
We present a semi-supervised adversarial training process that minimizes the maximal loss for label-preserving input perturbations. We observe significant gains in effectiveness on document and intent classification for a diverse set of languages.
arXiv Detail & Related papers (2020-07-29T19:38:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.