Pun Intended: Multi-Agent Translation of Wordplay with Contrastive Learning and Phonetic-Semantic Embeddings
- URL: http://arxiv.org/abs/2507.06506v1
- Date: Wed, 09 Jul 2025 03:09:14 GMT
- Title: Pun Intended: Multi-Agent Translation of Wordplay with Contrastive Learning and Phonetic-Semantic Embeddings
- Authors: Russell Taylor, Benjamin Herbert, Michael Sana,
- Abstract summary: This research proposes a novel approach for translating puns from English to French by combining state-of-the-art large language models with specialized techniques for wordplay generation.<n>Our methodology's primary objective is to capture the linguistic creativity and humor of the source text wordplay, rather than simply duplicating its vocabulary.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Translating wordplay across languages presents unique challenges that have long confounded both professional human translators and machine translation systems. This research proposes a novel approach for translating puns from English to French by combining state-of-the-art large language models with specialized techniques for wordplay generation. Our methodology employs a three-stage approach. First, we establish a baseline using multiple frontier large language models with feedback based on a new contrastive learning dataset. Second, we implement a guided chain-of-thought pipeline with combined phonetic-semantic embeddings. Third, we implement a multi-agent generator-discriminator framework for evaluating and regenerating puns with feedback. Moving beyond the limitations of literal translation, our methodology's primary objective is to capture the linguistic creativity and humor of the source text wordplay, rather than simply duplicating its vocabulary. Our best runs earned first and second place in the CLEF JOKER 2025 Task 2 competition where they were evaluated manually by expert native French speakers. This research addresses a gap between translation studies and computational linguistics by implementing linguistically-informed techniques for wordplay translation, advancing our understanding of how language models can be leveraged to handle the complex interplay between semantic ambiguity, phonetic similarity, and the implicit cultural and linguistic awareness needed for successful humor.
Related papers
- (Perhaps) Beyond Human Translation: Harnessing Multi-Agent Collaboration for Translating Ultra-Long Literary Texts [56.7988577327046]
We introduce TransAgents, a novel multi-agent framework that simulates the roles and collaborative practices of a human translation company.<n>Our findings highlight the potential of multi-agent collaboration in enhancing translation quality, particularly for longer texts.
arXiv Detail & Related papers (2024-05-20T05:55:08Z) - Enhancing Cross-lingual Transfer via Phonemic Transcription Integration [57.109031654219294]
PhoneXL is a framework incorporating phonemic transcriptions as an additional linguistic modality for cross-lingual transfer.
Our pilot study reveals phonemic transcription provides essential information beyond the orthography to enhance cross-lingual transfer.
arXiv Detail & Related papers (2023-07-10T06:17:33Z) - Beyond Contrastive Learning: A Variational Generative Model for
Multilingual Retrieval [109.62363167257664]
We propose a generative model for learning multilingual text embeddings.
Our model operates on parallel data in $N$ languages.
We evaluate this method on a suite of tasks including semantic similarity, bitext mining, and cross-lingual question retrieval.
arXiv Detail & Related papers (2022-12-21T02:41:40Z) - Revamping Multilingual Agreement Bidirectionally via Switched
Back-translation for Multilingual Neural Machine Translation [107.83158521848372]
multilingual agreement (MA) has shown its importance for multilingual neural machine translation (MNMT)
We present textbfBidirectional textbfMultilingual textbfAgreement via textbfSwitched textbfBack-textbftranslation (textbfBMA-SBT)
It is a novel and universal multilingual agreement framework for fine-tuning pre-trained MNMT models.
arXiv Detail & Related papers (2022-09-28T09:14:58Z) - VECO: Variable and Flexible Cross-lingual Pre-training for Language
Understanding and Generation [77.82373082024934]
We plug a cross-attention module into the Transformer encoder to explicitly build the interdependence between languages.
It can effectively avoid the degeneration of predicting masked words only conditioned on the context in its own language.
The proposed cross-lingual model delivers new state-of-the-art results on various cross-lingual understanding tasks of the XTREME benchmark.
arXiv Detail & Related papers (2020-10-30T03:41:38Z) - Learning Contextualised Cross-lingual Word Embeddings and Alignments for
Extremely Low-Resource Languages Using Parallel Corpora [63.5286019659504]
We propose a new approach for learning contextualised cross-lingual word embeddings based on a small parallel corpus.
Our method obtains word embeddings via an LSTM encoder-decoder model that simultaneously translates and reconstructs an input sentence.
arXiv Detail & Related papers (2020-10-27T22:24:01Z) - A Deep Reinforced Model for Zero-Shot Cross-Lingual Summarization with
Bilingual Semantic Similarity Rewards [40.17497211507507]
Cross-lingual text summarization is a practically important but under-explored task.
We propose an end-to-end cross-lingual text summarization model.
arXiv Detail & Related papers (2020-06-27T21:51:38Z) - On the Language Neutrality of Pre-trained Multilingual Representations [70.93503607755055]
We investigate the language-neutrality of multilingual contextual embeddings directly and with respect to lexical semantics.
Our results show that contextual embeddings are more language-neutral and, in general, more informative than aligned static word-type embeddings.
We show how to reach state-of-the-art accuracy on language identification and match the performance of statistical methods for word alignment of parallel sentences.
arXiv Detail & Related papers (2020-04-09T19:50:32Z) - Investigating Language Impact in Bilingual Approaches for Computational
Language Documentation [28.838960956506018]
This paper investigates how the choice of translation language affects the posterior documentation work.
We create 56 bilingual pairs that we apply to the task of low-resource unsupervised word segmentation and alignment.
Our results suggest that incorporating clues into the neural models' input representation increases their translation and alignment quality.
arXiv Detail & Related papers (2020-03-30T10:30:34Z) - A Common Semantic Space for Monolingual and Cross-Lingual
Meta-Embeddings [10.871587311621974]
This paper presents a new technique for creating monolingual and cross-lingual meta-embeddings.
Existing word vectors are projected to a common semantic space using linear transformations and averaging.
The resulting cross-lingual meta-embeddings also exhibit excellent cross-lingual transfer learning capabilities.
arXiv Detail & Related papers (2020-01-17T15:42:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.