Related papers: Languages are Modalities: Cross-Lingual Alignment via Encoder Injection

Languages are Modalities: Cross-Lingual Alignment via Encoder Injection

URL: http://arxiv.org/abs/2510.27254v1
Date: Fri, 31 Oct 2025 07:43:21 GMT
Title: Languages are Modalities: Cross-Lingual Alignment via Encoder Injection
Authors: Rajan Agarwal, Aarush Gupta,
Abstract summary: We present a compute efficient language-as-modality method that conditions an instruction-tuned decoder without changing the tokenizer or retraining the decoder.<n>LLINK substantially improves bilingual retrieval and achieves 81.3% preference over the base model.<n>We find that improvements can be attributed to reduced tokenization inflation and a stronger cross lingual alignment.
Score: 0.8461674097042394
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Instruction-tuned Large Language Models (LLMs) underperform on low resource, non-Latin scripts due to tokenizer fragmentation and weak cross-lingual coupling. We present LLINK (Latent Language Injection for Non-English Knowledge), a compute efficient language-as-modality method that conditions an instruction-tuned decoder without changing the tokenizer or retraining the decoder. First, we align sentence embeddings from a frozen multilingual encoder to the decoder's latent embedding space at a reserved position via a lightweight contrastive projector. Second, the vector is expanded into K soft slots and trained with minimal adapters so the frozen decoder consumes the signal. LLINK substantially improves bilingual retrieval and achieves 81.3% preference over the base model and 63.6% over direct fine-tuning in LLM-judged Q&A evaluations. We further find that improvements can be attributed to reduced tokenization inflation and a stronger cross lingual alignment, despite the model having residual weaknesses in numeric fidelity. Treating low resource languages as a modality offers a practical path to stronger cross-lingual alignment in lightweight LLMs.

Related papers

MaDiS: Taming Masked Diffusion Language Models for Sign Language Generation [78.75809158246723]
We present MaDiS, a masked-diffusion-based language model for SLG that captures bidirectional and supports efficient parallel multi-token generation.<n>We also introduce a tri-level cross-modal pretraining scheme that jointly learns from token-, latent-Hearing, and 3D-space objectives.<n>MaDiS achieves superior performance across multiple metrics, including DTW error and two newly introduced metrics, SiBLEU and SiCLIP, while reducing inference latency by nearly 30%.
arXiv Detail & Related papers (2026-01-27T13:06:47Z)
Language steering in latent space to mitigate unintended code-switching [1.1330938617817454]
Large Language Models (LLMs) often exhibit unintended code-switching, reducing reliability in downstream tasks.<n>We propose latent-space language steering, a lightweight inference-time method that identifies language directions via PCA on parallel translations.<n>Our approach mitigates code-switching while preserving semantics with negligible computational overhead.
arXiv Detail & Related papers (2025-10-11T19:49:38Z)
Transformer-Encoder Trees for Efficient Multilingual Machine Translation and Speech Translation [2.7023796303812193]
We propose a novel hierarchical Transformer Tree (TET) combined with non-autoregressive encoder-only models trained with Connectionist Temporal Classification for multilingual translation.<n>For speech translation, combining TET with a non-autoregressive speech recognition backbone (wav2vec2) shows promising results in terms of translation quality compared to autoregressive systems while being 7-14 times faster.
arXiv Detail & Related papers (2025-09-22T15:52:18Z)
LLM Encoder vs. Decoder: Robust Detection of Chinese AI-Generated Text with LoRA [4.104443734934105]
We compare encoder-based Transformers (Chinese BERT-large and RoBERTa-wwm-ext-large), a decoder-only LLM (Alibaba's Qwen2.5-7B/Deep-R1-Distill-Qwen-7B fine-tuned via Low-Rank Adaptation, LoRA), and a FastText baseline.<n>Experiments reveal that although encoder models nearly memorize training data, they suffer significant performance degradation under distribution shifts.
arXiv Detail & Related papers (2025-08-31T07:51:22Z)
Adapting Whisper for Code-Switching through Encoding Refining and Language-Aware Decoding [27.499426765845705]
Code-switching automatic speech recognition (ASR) faces challenges due to the language confusion resulting from accents, auditory similarity, and seamless language switches.<n>We adapt Whisper, which is a large-scale multilingual pre-trained speech recognition model, to CS from both encoder and decoder parts.
arXiv Detail & Related papers (2024-12-21T07:06:44Z)
Bridging the Language Gap: Enhancing Multilingual Prompt-Based Code Generation in LLMs via Zero-Shot Cross-Lingual Transfer [5.355430735475281]
This paper investigates the complexities of multilingual prompt-based code generation.<n>Our evaluations reveal significant disparities in code quality for non-English prompts.<n>We propose a zero-shot cross-lingual approach using a neural projection technique.
arXiv Detail & Related papers (2024-08-19T05:11:46Z)
Speculative Contrastive Decoding [55.378200871224074]
Large language models(LLMs) exhibit exceptional performance in language tasks, yet their auto-regressive inference is limited due to high computational requirements and is sub-optimal due to the exposure bias. Inspired by speculative decoding and contrastive decoding, we introduce Speculative Contrastive Decoding(SCD), a straightforward yet powerful decoding approach.
arXiv Detail & Related papers (2023-11-15T14:15:30Z)
Learning Language-Specific Layers for Multilingual Machine Translation [1.997704019887898]
We introduce Language-Specific Transformer Layers (LSLs) LSLs allow us to increase model capacity, while keeping the amount of computation and the number of parameters used in the forward pass constant. We study the best way to place these layers using a neural architecture search inspired approach, and achieve an improvement of 1.3 chrF (1.5 spBLEU) points over not using LSLs on a separate decoder architecture, and 1.9 chrF (2.2 spBLEU) on a shared decoder one.
arXiv Detail & Related papers (2023-05-04T09:18:05Z)
LAMASSU: Streaming Language-Agnostic Multilingual Speech Recognition and Translation Using Neural Transducers [71.76680102779765]
Automatic speech recognition (ASR) and speech translation (ST) can both use neural transducers as the model structure. We propose LAMASSU, a streaming language-agnostic multilingual speech recognition and translation model using neural transducers.
arXiv Detail & Related papers (2022-11-05T04:03:55Z)
Zero-Shot Cross-lingual Semantic Parsing [56.95036511882921]
We study cross-lingual semantic parsing as a zero-shot problem without parallel data for 7 test languages. We propose a multi-task encoder-decoder model to transfer parsing knowledge to additional languages using only English-Logical form paired data. Our system frames zero-shot parsing as a latent-space alignment problem and finds that pre-trained models can be improved to generate logical forms with minimal cross-lingual transfer penalty.
arXiv Detail & Related papers (2021-04-15T16:08:43Z)
Improving Target-side Lexical Transfer in Multilingual Neural Machine Translation [104.10726545151043]
multilingual data has been found more beneficial for NMT models that translate from the LRL to a target language than the ones that translate into the LRLs. Our experiments show that DecSDE leads to consistent gains of up to 1.8 BLEU on translation from English to four different languages.
arXiv Detail & Related papers (2020-10-04T19:42:40Z)
FILTER: An Enhanced Fusion Method for Cross-lingual Language Understanding [85.29270319872597]
We propose an enhanced fusion method that takes cross-lingual data as input for XLM finetuning. During inference, the model makes predictions based on the text input in the target language and its translation in the source language. To tackle this issue, we propose an additional KL-divergence self-teaching loss for model training, based on auto-generated soft pseudo-labels for translated text in the target language.
arXiv Detail & Related papers (2020-09-10T22:42:15Z)
Inducing Language-Agnostic Multilingual Representations [61.97381112847459]
Cross-lingual representations have the potential to make NLP techniques available to the vast majority of languages in the world. We examine three approaches for this: (i) re-aligning the vector spaces of target languages to a pivot source language; (ii) removing language-specific means and variances, which yields better discriminativeness of embeddings as a by-product; and (iii) increasing input similarity across languages by removing morphological contractions and sentence reordering.
arXiv Detail & Related papers (2020-08-20T17:58:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.