Related papers: Steering Large Language Models for Machine Translation Personalization

Steering Large Language Models for Machine Translation Personalization

URL: http://arxiv.org/abs/2505.16612v2
Date: Tue, 14 Oct 2025 08:28:41 GMT
Title: Steering Large Language Models for Machine Translation Personalization
Authors: Daniel Scalena, Gabriele Sarti, Arianna Bisazza, Elisabetta Fersini, Malvina Nissim,
Abstract summary: We explore various strategies for personalizing automatically generated translations when few examples are available.<n>We focus on contrastive steering with sparse autoencoder (SAE) latents to identify salient personalization properties.<n>We demonstrate that contrastive SAE steering yields robust style conditioning and translation quality, resulting in higher inference-time computational efficiency.
Score: 20.181629685548454
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models have simplified the production of personalized translations reflecting predefined stylistic constraints. However, these systems still struggle when stylistic requirements are implicitly represented by a set of examples, such as texts produced by a specific human translator. In this work, we explore various strategies for personalizing automatically generated translations when few examples are available, with a focus on the challenging domain of literary translation. We begin by determining the feasibility of the task and how style information is encoded within model representations. Then, we evaluate various prompting strategies and inference-time interventions for steering model generations towards a personalized style, with a particular focus on contrastive steering with sparse autoencoder (SAE) latents to identify salient personalization properties. We demonstrate that contrastive SAE steering yields robust style conditioning and translation quality, resulting in higher inference-time computational efficiency than prompting approaches. We further examine the impact of steering on model activations, finding that layers encoding personalization properties are impacted similarly by prompting and SAE steering, suggesting a similar mechanism at play.

Related papers

Improving Multilingual Language Models by Aligning Representations through Steering [14.358813505154316]
Large language models (LLMS) process non-English tokens within their layer representations.<n>We demonstrate that steering a single model layer can notably enhance performance.<n>We highlight how advanced techniques like supervised fine tuning (textscsft) and reinforcement learning from human feedback improve multilingual capabilities.
arXiv Detail & Related papers (2025-05-19T00:14:43Z)
Interpretable Face Anti-Spoofing: Enhancing Generalization with Multimodal Large Language Models [58.936893810674896]
Face Anti-Spoofing (FAS) is essential for ensuring the security and reliability of facial recognition systems.<n>We introduce a multimodal large language model framework for FAS, termed Interpretable Face Anti-Spoofing (I-FAS)<n>We propose a Spoof-aware Captioning and Filtering (SCF) strategy to generate high-quality captions for FAS images.
arXiv Detail & Related papers (2025-01-03T09:25:04Z)
Paraphrase-Aligned Machine Translation [7.258916315600866]
Large Language Models (LLMs) have demonstrated significant capabilities in machine translation.<n>We propose ParaAlign Translator, a method that fine-tunes LLMs to paraphrase sentences, aligning their structures with those of the target language systems.<n> Experimental results demonstrate that the proposed method enhances the LLaMA-3-8B model's performance in both resource-rich and low-resource scenarios.
arXiv Detail & Related papers (2024-12-08T12:17:26Z)
Towards Visual Text Design Transfer Across Languages [49.78504488452978]
We introduce a novel task of Multimodal Style Translation (MuST-Bench) MuST-Bench is a benchmark designed to evaluate the ability of visual text generation models to perform translation across different writing systems. In response, we introduce SIGIL, a framework for multimodal style translation that eliminates the need for style descriptions.
arXiv Detail & Related papers (2024-10-24T15:15:01Z)
SignAttention: On the Interpretability of Transformer Models for Sign Language Translation [2.079808290618441]
This paper presents the first comprehensive interpretability analysis of a Transformer-based Sign Language Translation model. We examine the attention mechanisms within the model to understand how it processes and aligns visual input with sequential glosses. This work contributes to a deeper understanding of SLT models, paving the way for the development of more transparent and reliable translation systems.
arXiv Detail & Related papers (2024-10-18T14:38:37Z)
Multi-modal Instruction Tuned LLMs with Fine-grained Visual Perception [63.03288425612792]
We propose bfAnyRef, a general MLLM model that can generate pixel-wise object perceptions and natural language descriptions from multi-modality references. Our model achieves state-of-the-art results across multiple benchmarks, including diverse modality referring segmentation and region-level referring expression generation.
arXiv Detail & Related papers (2024-03-05T13:45:46Z)
Towards Effective Disambiguation for Machine Translation with Large Language Models [65.80775710657672]
We study the capabilities of large language models to translate "ambiguous sentences" Experiments show that our methods can match or outperform state-of-the-art systems such as DeepL and NLLB in four out of five language directions.
arXiv Detail & Related papers (2023-09-20T22:22:52Z)
Reference-less Analysis of Context Specificity in Translation with Personalised Language Models [3.527589066359829]
This work investigates what extent rich character and film annotations can be leveraged to personalise language models (LMs) We build LMs which leverage rich contextual information to reduce perplexity by up to 6.5% compared to a non-contextual model. Our results suggest that the degree to which professional translations in our domain are context-specific can be preserved to a better extent by a contextual machine translation model.
arXiv Detail & Related papers (2023-03-29T12:19:23Z)
Building Multilingual Machine Translation Systems That Serve Arbitrary X-Y Translations [75.73028056136778]
We show how to practically build MNMT systems that serve arbitrary X-Y translation directions. We also examine our proposed approach in an extremely large-scale data setting to accommodate practical deployment scenarios.
arXiv Detail & Related papers (2022-06-30T02:18:15Z)
Examining Scaling and Transfer of Language Model Architectures for Machine Translation [51.69212730675345]
Language models (LMs) process sequences in a single stack of layers, and encoder-decoder models (EncDec) utilize separate layer stacks for input and output processing. In machine translation, EncDec has long been the favoured approach, but with few studies investigating the performance of LMs.
arXiv Detail & Related papers (2022-02-01T16:20:15Z)
GTAE: Graph-Transformer based Auto-Encoders for Linguistic-Constrained Text Style Transfer [119.70961704127157]
Non-parallel text style transfer has attracted increasing research interests in recent years. Current approaches still lack the ability to preserve the content and even logic of original sentences. We propose a method called Graph Transformer based Auto-GTAE, which models a sentence as a linguistic graph and performs feature extraction and style transfer at the graph level.
arXiv Detail & Related papers (2021-02-01T11:08:45Z)
On Learning Text Style Transfer with Direct Rewards [101.97136885111037]
Lack of parallel corpora makes it impossible to directly train supervised models for the text style transfer task. We leverage semantic similarity metrics originally used for fine-tuning neural machine translation models. Our model provides significant gains in both automatic and human evaluation over strong baselines.
arXiv Detail & Related papers (2020-10-24T04:30:02Z)
Incorporating Stylistic Lexical Preferences in Generative Language Models [10.62343151429147]
We present an approach to induce certain target-author attributes by incorporating continuous multi-dimensional lexical preferences of an author into generative language models. Our experiments demonstrate that the proposed approach can generate text that distinctively aligns with a given target author's lexical style.
arXiv Detail & Related papers (2020-10-22T09:24:05Z)
Linguistically-Informed Transformations (LIT): A Method for Automatically Generating Contrast Sets [13.706520309917634]
We propose a Linguistically-Informed Transformation (LIT) method to automatically generate contrast sets. Experiments show that current pretrained language models struggle on our automatically generated contrast sets. We improve models' performance on the contrast sets by apply-ing LIT to augment the training data, without affecting performance on the original data.
arXiv Detail & Related papers (2020-10-16T18:23:05Z)
Exemplar-Controllable Paraphrasing and Translation using Bitext [57.92051459102902]
We adapt models from prior work to be able to learn solely from bilingual text (bitext) Our single proposed model can perform four tasks: controlled paraphrase generation in both languages and controlled machine translation in both language directions.
arXiv Detail & Related papers (2020-10-12T17:02:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.