Related papers: Graph-Assisted Culturally Adaptable Idiomatic Translation for Indic Languages

Graph-Assisted Culturally Adaptable Idiomatic Translation for Indic Languages

URL: http://arxiv.org/abs/2505.21937v1
Date: Wed, 28 May 2025 03:42:16 GMT
Title: Graph-Assisted Culturally Adaptable Idiomatic Translation for Indic Languages
Authors: Pratik Rakesh Singh, Kritarth Prasad, Mohammadi Zaki, Pankaj Wasnik,
Abstract summary: Translating multi-word expressions (MWEs) and idioms requires a deep understanding of both the source and target languages.<n>Traditional static knowledge graphs (KGs) and prompt-based approaches struggle to capture these complex relationships.<n>We propose an adaptive graph neural network (GNN) based methodology that learns intricate mappings between idiomatic expressions.
Score: 3.2498796510544636
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Translating multi-word expressions (MWEs) and idioms requires a deep understanding of the cultural nuances of both the source and target languages. This challenge is further amplified by the one-to-many nature of idiomatic translations, where a single source idiom can have multiple target-language equivalents depending on cultural references and contextual variations. Traditional static knowledge graphs (KGs) and prompt-based approaches struggle to capture these complex relationships, often leading to suboptimal translations. To address this, we propose IdiomCE, an adaptive graph neural network (GNN) based methodology that learns intricate mappings between idiomatic expressions, effectively generalizing to both seen and unseen nodes during training. Our proposed method enhances translation quality even in resource-constrained settings, facilitating improved idiomatic translation in smaller models. We evaluate our approach on multiple idiomatic translation datasets using reference-less metrics, demonstrating significant improvements in translating idioms from English to various Indian languages.

Related papers

SONAR-SLT: Multilingual Sign Language Translation via Language-Agnostic Sentence Embedding Supervision [14.416218321809824]
Sign language translation (SLT) is typically trained with text in a single spoken language.<n>We employ language-agnostic, multimodal embeddings trained on text and speech from multiple languages to supervise SLT.<n>Our results demonstrate that language-agnostic embedding supervision, combined with coupled augmentation, provides a scalable and semantically robust alternative to traditional SLT training.
arXiv Detail & Related papers (2025-10-22T09:17:31Z)
Enhancing Entertainment Translation for Indian Languages using Adaptive Context, Style and LLMs [3.55026004901472]
We introduce an algorithm to estimate the context and style of the current session and use these estimations to generate a prompt that guides a Large Language Model (LLM) to generate high-quality translations.<n>Our method is both language and LLM-agnostic, making it a general-purpose tool.
arXiv Detail & Related papers (2024-12-29T11:33:51Z)
MoCE: Adaptive Mixture of Contextualization Experts for Byte-based Neural Machine Translation [13.70446799743065]
Byte-based machine translation systems have shown significant potential in massively multilingual settings.<n> Unicode encoding maps each character to specific byte(s) eliminating the emergence of unknown words, even in new languages.<n>Local contextualization has proven effective in assigning initial semantics to tokens, improving sentence comprehension.<n>We propose Mixture of Contextualization Experts (MoCE), adaptively selecting and mixing attention heads, which are treated as contextualization experts.
arXiv Detail & Related papers (2024-11-03T08:15:43Z)
Improving Multilingual Neural Machine Translation by Utilizing Semantic and Linguistic Features [18.76505158652759]
We propose to exploit both semantic and linguistic features between multiple languages to enhance multilingual translation. On the encoder side, we introduce a disentangling learning task that aligns encoder representations by disentangling semantic and linguistic features. On the decoder side, we leverage a linguistic encoder to integrate low-level linguistic features to assist in the target language generation.
arXiv Detail & Related papers (2024-08-02T17:10:12Z)
Crossing the Threshold: Idiomatic Machine Translation through Retrieval Augmentation and Loss Weighting [66.02718577386426]
We provide a simple characterization of idiomatic translation and related issues. We conduct a synthetic experiment revealing a tipping point at which transformer-based machine translation models correctly default to idiomatic translations. To improve translation of natural idioms, we introduce two straightforward yet effective techniques.
arXiv Detail & Related papers (2023-10-10T23:47:25Z)
Optimal Transport Posterior Alignment for Cross-lingual Semantic Parsing [68.47787275021567]
Cross-lingual semantic parsing transfers parsing capability from a high-resource language (e.g., English) to low-resource languages with scarce training data. We propose a new approach to cross-lingual semantic parsing by explicitly minimizing cross-lingual divergence between latent variables using Optimal Transport.
arXiv Detail & Related papers (2023-07-09T04:52:31Z)
Iterative Translation Refinement with Large Language Models [25.90607157524168]
We propose iteratively prompting a large language model to self-correct a translation. We also discuss the challenges in evaluation and relation to human performance and translationese.
arXiv Detail & Related papers (2023-06-06T16:51:03Z)
Decomposed Prompting for Machine Translation Between Related Languages using Large Language Models [55.35106713257871]
We introduce DecoMT, a novel approach of few-shot prompting that decomposes the translation process into a sequence of word chunk translations. We show that DecoMT outperforms the strong few-shot prompting BLOOM model with an average improvement of 8 chrF++ scores across the examined languages.
arXiv Detail & Related papers (2023-05-22T14:52:47Z)
Beyond Contrastive Learning: A Variational Generative Model for Multilingual Retrieval [109.62363167257664]
We propose a generative model for learning multilingual text embeddings. Our model operates on parallel data in $N$ languages. We evaluate this method on a suite of tasks including semantic similarity, bitext mining, and cross-lingual question retrieval.
arXiv Detail & Related papers (2022-12-21T02:41:40Z)
CLSE: Corpus of Linguistically Significant Entities [58.29901964387952]
We release a Corpus of Linguistically Significant Entities (CLSE) annotated by experts. CLSE covers 74 different semantic types to support various applications from airline ticketing to video games. We create a linguistically representative NLG evaluation benchmark in three languages: French, Marathi, and Russian.
arXiv Detail & Related papers (2022-11-04T12:56:12Z)
DEEP: DEnoising Entity Pre-training for Neural Machine Translation [123.6686940355937]
It has been shown that machine translation models usually generate poor translations for named entities that are infrequent in the training corpus. We propose DEEP, a DEnoising Entity Pre-training method that leverages large amounts of monolingual data and a knowledge base to improve named entity translation accuracy within sentences.
arXiv Detail & Related papers (2021-11-14T17:28:09Z)
Cross-lingual Text Classification with Heterogeneous Graph Neural Network [2.6936806968297913]
Cross-lingual text classification aims at training a classifier on the source language and transferring the knowledge to target languages. Recent multilingual pretrained language models (mPLM) achieve impressive results in cross-lingual classification tasks. We propose a simple yet effective method to incorporate heterogeneous information within and across languages for cross-lingual text classification.
arXiv Detail & Related papers (2021-05-24T12:45:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.