PHRASED: Phrase Dictionary Biasing for Speech Translation
- URL: http://arxiv.org/abs/2506.09175v1
- Date: Tue, 10 Jun 2025 18:42:38 GMT
- Title: PHRASED: Phrase Dictionary Biasing for Speech Translation
- Authors: Peidong Wang, Jian Xue, Rui Zhao, Junkun Chen, Aswin Shanmugam Subramanian, Jinyu Li,
- Abstract summary: We propose a phrase dictionary biasing method to leverage pairs of phrases mapping from the source language to the target language.<n>We apply the phrase dictionary biasing method to two types of widely adopted models, a transducer-based streaming speech translation model and a multimodal large language model.
- Score: 41.03459069364749
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Phrases are essential to understand the core concepts in conversations. However, due to their rare occurrence in training data, correct translation of phrases is challenging in speech translation tasks. In this paper, we propose a phrase dictionary biasing method to leverage pairs of phrases mapping from the source language to the target language. We apply the phrase dictionary biasing method to two types of widely adopted models, a transducer-based streaming speech translation model and a multimodal large language model. Experimental results show that the phrase dictionary biasing method outperforms phrase list biasing by 21% relatively for the streaming speech translation model. In addition, phrase dictionary biasing enables multimodal large language models to use external phrase information, achieving 85% relative improvement in phrase recall.
Related papers
- Dictionaries to the Rescue: Cross-Lingual Vocabulary Transfer for Low-Resource Languages Using Bilingual Dictionaries [22.562544826766917]
Cross-lingual vocabulary transfer plays a promising role in adapting pre-trained language models to new languages.<n>Existing approaches that utilize monolingual or parallel corpora face challenges when applied to languages with limited resources.
arXiv Detail & Related papers (2025-06-02T10:52:52Z) - Languages in Multilingual Speech Foundation Models Align Both Phonetically and Semantically [58.019484208091534]
Cross-lingual alignment in pretrained language models (LMs) has enabled efficient transfer in text-based LMs.<n>It remains an open question whether findings and methods from text-based cross-lingual alignment apply to speech.
arXiv Detail & Related papers (2025-05-26T07:21:20Z) - Optimizing Rare Word Accuracy in Direct Speech Translation with a Retrieval-and-Demonstration Approach [25.166206924366527]
We propose a retrieval-and-demonstration approach to enhance rare word translation accuracy in direct speech translation models.
First, we adapt existing ST models to incorporate retrieved examples for rare word translation.
We then develop a cross-modal (speech-to-speech, speech-to-text, text-to-text) retriever to locate suitable examples.
arXiv Detail & Related papers (2024-09-13T17:38:03Z) - Contextualized Automatic Speech Recognition with Dynamic Vocabulary [41.892863381787684]
This paper proposes a dynamic vocabulary where bias tokens can be added during inference.
Each entry in a bias list is represented as a single token, unlike a sequence of existing subword tokens.
Experimental results demonstrate that the proposed method improves the bias phrase WER on English and Japanese datasets by 3.1 -- 4.9 points.
arXiv Detail & Related papers (2024-05-22T05:03:39Z) - CB-Conformer: Contextual biasing Conformer for biased word recognition [33.28780163232423]
We introduce the Contextual Biasing Module and the Self-Adaptive Language Model to vanilla Conformer.
Our proposed method brings a 15.34% character error rate reduction, a 14.13% biased word recall increase, and a 6.80% biased word F1-score increase compared with the base Conformer.
arXiv Detail & Related papers (2023-04-19T12:26:04Z) - Beyond Contrastive Learning: A Variational Generative Model for
Multilingual Retrieval [109.62363167257664]
We propose a generative model for learning multilingual text embeddings.
Our model operates on parallel data in $N$ languages.
We evaluate this method on a suite of tasks including semantic similarity, bitext mining, and cross-lingual question retrieval.
arXiv Detail & Related papers (2022-12-21T02:41:40Z) - Dict-BERT: Enhancing Language Model Pre-training with Dictionary [42.0998323292348]
Pre-trained language models (PLMs) aim to learn universal language representations by conducting self-supervised training tasks on large-scale corpora.
In this work, we focus on enhancing language model pre-training by leveraging definitions of rare words in dictionaries.
We propose two novel self-supervised pre-training tasks on word and sentence-level alignment between input text sequence and rare word definitions.
arXiv Detail & Related papers (2021-10-13T04:29:14Z) - On the Language Coverage Bias for Neural Machine Translation [81.81456880770762]
Language coverage bias is important for neural machine translation (NMT) because the target-original training data is not well exploited in current practice.
By carefully designing experiments, we provide comprehensive analyses of the language coverage bias in the training data.
We propose two simple and effective approaches to alleviate the language coverage bias problem.
arXiv Detail & Related papers (2021-06-07T01:55:34Z) - Unsupervised Cross-lingual Representation Learning for Speech
Recognition [63.85924123692923]
XLSR learns cross-lingual speech representations by pretraining a single model from the raw waveform of speech in multiple languages.
We build on wav2vec 2.0 which is trained by solving a contrastive task over masked latent speech representations.
Experiments show that cross-lingual pretraining significantly outperforms monolingual pretraining.
arXiv Detail & Related papers (2020-06-24T18:25:05Z) - Fast and Robust Unsupervised Contextual Biasing for Speech Recognition [16.557586847398778]
We propose an alternative approach that does not entail explicit contextual language model.
We derive the bias score for every word in the system vocabulary from the training corpus.
We show significant improvement in recognition accuracy when the relevant context is available.
arXiv Detail & Related papers (2020-05-04T17:29:59Z) - On the Importance of Word Order Information in Cross-lingual Sequence
Labeling [80.65425412067464]
Cross-lingual models that fit into the word order of the source language might fail to handle target languages.
We investigate whether making models insensitive to the word order of the source language can improve the adaptation performance in target languages.
arXiv Detail & Related papers (2020-01-30T03:35:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.