Related papers: Fluent Alignment with Disfluent Judges: Post-training for Lower-resource Languages

Fluent Alignment with Disfluent Judges: Post-training for Lower-resource Languages

URL: http://arxiv.org/abs/2512.08777v1
Date: Tue, 09 Dec 2025 16:31:48 GMT
Title: Fluent Alignment with Disfluent Judges: Post-training for Lower-resource Languages
Authors: David Samuel, Lilja Øvrelid, Erik Velldal, Andrey Kutuzov,
Abstract summary: We develop a fluent preference-aligned language model without instruction-tuning data in the target language.<n>Our approach uses an on-policy training method, which we compare with two common approaches.<n>We conduct a case study on Norwegian Bokml and evaluate fluency through native-speaker assessments.
Score: 16.671158083515373
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We propose a post-training method for lower-resource languages that preserves fluency of language models even when aligned by disfluent reward models. Preference-optimization is now a well-researched topic, but previous work has mostly addressed models for English and Chinese. Lower-resource languages lack both datasets written by native speakers and language models capable of generating fluent synthetic data. Thus, in this work, we focus on developing a fluent preference-aligned language model without any instruction-tuning data in the target language. Our approach uses an on-policy training method, which we compare with two common approaches: supervised finetuning on machine-translated data and multilingual finetuning. We conduct a case study on Norwegian Bokmål and evaluate fluency through native-speaker assessments. The results show that the on-policy aspect is crucial and outperforms the alternatives without relying on any hard-to-obtain data.

Related papers

BhashaKritika: Building Synthetic Pretraining Data at Scale for Indic Languages [4.279942349440352]
We present a systematic study on the generation and evaluation of synthetic multilingual pretraining data for Indic languages.<n>We construct a large-scale synthetic dataset BhashaKritika, comprising 540B tokens using 5 different techniques for 10 languages.<n>We analyze how language choice, both in the prompt instructions and document grounding, affects data quality.
arXiv Detail & Related papers (2025-11-13T14:12:44Z)
Adapting Language Models to Indonesian Local Languages: An Empirical Study of Language Transferability on Zero-Shot Settings [1.1556013985948772]
We evaluate transferability of pre-trained language models to low-resource Indonesian local languages.<n>We group the target languages into three categories: seen, partially seen, and unseen.<n> Multilingual models perform best on seen languages, moderately on partially seen ones, and poorly on unseen languages.<n>We find that MAD-X significantly improves performance, especially for seen and partially seen languages, without requiring labeled data in the target language.
arXiv Detail & Related papers (2025-07-02T12:17:55Z)
A Comparison of Language Modeling and Translation as Multilingual Pretraining Objectives [13.581385765600265]
Pretrained language models (PLMs) display impressive performances and have captured the attention of the NLP community. This paper proposes a comparison of multilingual pretraining objectives in a controlled methodological environment.
arXiv Detail & Related papers (2024-07-22T09:16:30Z)
Learning Phonotactics from Linguistic Informants [54.086544221761486]
Our model iteratively selects or synthesizes a data-point according to one of a range of information-theoretic policies. We find that the information-theoretic policies that our model uses to select items to query the informant achieve sample efficiency comparable to, or greater than, fully supervised approaches.
arXiv Detail & Related papers (2024-05-08T00:18:56Z)
Reuse Your Rewards: Reward Model Transfer for Zero-Shot Cross-Lingual Alignment [39.94156255629528]
We evaluate a simple approach for zero-shot cross-lingual alignment. Cross-lingually aligned models are preferred by humans over unaligned models. A different-language reward model sometimes yields better aligned models than a same-language reward model.
arXiv Detail & Related papers (2024-04-18T16:52:36Z)
Optimal Transport Posterior Alignment for Cross-lingual Semantic Parsing [68.47787275021567]
Cross-lingual semantic parsing transfers parsing capability from a high-resource language (e.g., English) to low-resource languages with scarce training data. We propose a new approach to cross-lingual semantic parsing by explicitly minimizing cross-lingual divergence between latent variables using Optimal Transport.
arXiv Detail & Related papers (2023-07-09T04:52:31Z)
Improving Cross-lingual Information Retrieval on Low-Resource Languages via Optimal Transport Distillation [21.057178077747754]
In this work, we propose OPTICAL: Optimal Transport distillation for low-resource Cross-lingual information retrieval. By separating the cross-lingual knowledge from knowledge of query document matching, OPTICAL only needs bitext data for distillation training. Experimental results show that, with minimal training data, OPTICAL significantly outperforms strong baselines on low-resource languages.
arXiv Detail & Related papers (2023-01-29T22:30:36Z)
Language Models are Few-shot Multilingual Learners [66.11011385895195]
We evaluate the multilingual skills of the GPT and T5 models in conducting multi-class classification on non-English languages. We show that, given a few English examples as context, pre-trained language models can predict not only English test samples but also non-English ones.
arXiv Detail & Related papers (2021-09-16T03:08:22Z)
Unsupervised Domain Adaptation of a Pretrained Cross-Lingual Language Model [58.27176041092891]
Recent research indicates that pretraining cross-lingual language models on large-scale unlabeled texts yields significant performance improvements. We propose a novel unsupervised feature decomposition method that can automatically extract domain-specific features from the entangled pretrained cross-lingual representations. Our proposed model leverages mutual information estimation to decompose the representations computed by a cross-lingual model into domain-invariant and domain-specific parts.
arXiv Detail & Related papers (2020-11-23T16:00:42Z)
Comparison of Interactive Knowledge Base Spelling Correction Models for Low-Resource Languages [81.90356787324481]
Spelling normalization for low resource languages is a challenging task because the patterns are hard to predict. This work shows a comparison of a neural model and character language models with varying amounts on target language data. Our usage scenario is interactive correction with nearly zero amounts of training examples, improving models as more data is collected.
arXiv Detail & Related papers (2020-10-20T17:31:07Z)
Cross-lingual Spoken Language Understanding with Regularized Representation Alignment [71.53159402053392]
We propose a regularization approach to align word-level and sentence-level representations across languages without any external resource. Experiments on the cross-lingual spoken language understanding task show that our model outperforms current state-of-the-art methods in both few-shot and zero-shot scenarios.
arXiv Detail & Related papers (2020-09-30T08:56:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.