English-to-Chinese Transliteration with Phonetic Back-transliteration
- URL: http://arxiv.org/abs/2112.10321v1
- Date: Mon, 20 Dec 2021 03:29:28 GMT
- Title: English-to-Chinese Transliteration with Phonetic Back-transliteration
- Authors: Shi Cheng, Zhuofei Ding and Songpeng Yan
- Abstract summary: Transliteration is a task of translating named entities from a language to another, based on phonetic similarity.
In this work, we incorporate phonetic information into neural networks in two ways: we synthesize extra data using forward and back-translation but in a phonetic manner.
Our experiments include three language pairs and six directions, namely English to and from Chinese, Hebrew and Thai.
- Score: 0.9281671380673306
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Transliteration is a task of translating named entities from a language to
another, based on phonetic similarity. The task has embraced deep learning
approaches in recent years, yet, most ignore the phonetic features of the
involved languages. In this work, we incorporate phonetic information into
neural networks in two ways: we synthesize extra data using forward and
back-translation but in a phonetic manner; and we pre-train models on a
phonetic task before learning transliteration. Our experiments include three
language pairs and six directions, namely English to and from Chinese, Hebrew
and Thai. Results indicate that our proposed approach brings benefits to the
model and achieves better or similar performance when compared to state of the
art.
Related papers
- Cross-Lingual Transfer from Related Languages: Treating Low-Resource
Maltese as Multilingual Code-Switching [9.435669487585917]
We focus on Maltese, a Semitic language, with substantial influences from Arabic, Italian, and English, and notably written in Latin script.
We present a novel dataset annotated with word-level etymology.
We show that conditional transliteration based on word etymology yields the best results, surpassing fine-tuning with raw Maltese or Maltese processed with non-selective pipelines.
arXiv Detail & Related papers (2024-01-30T11:04:36Z) - Enhancing Cross-lingual Transfer via Phonemic Transcription Integration [57.109031654219294]
PhoneXL is a framework incorporating phonemic transcriptions as an additional linguistic modality for cross-lingual transfer.
Our pilot study reveals phonemic transcription provides essential information beyond the orthography to enhance cross-lingual transfer.
arXiv Detail & Related papers (2023-07-10T06:17:33Z) - Few-Shot Cross-Lingual TTS Using Transferable Phoneme Embedding [55.989376102986654]
This paper studies a transferable phoneme embedding framework that aims to deal with the cross-lingual text-to-speech problem under the few-shot setting.
We propose a framework that consists of a phoneme-based TTS model and a codebook module to project phonemes from different languages into a learned latent space.
arXiv Detail & Related papers (2022-06-27T11:24:40Z) - Discovering Phonetic Inventories with Crosslingual Automatic Speech
Recognition [71.49308685090324]
This paper investigates the influence of different factors (i.e., model architecture, phonotactic model, type of speech representation) on phone recognition in an unknown language.
We find that unique sounds, similar sounds, and tone languages remain a major challenge for phonetic inventory discovery.
arXiv Detail & Related papers (2022-01-26T22:12:55Z) - Cross-lingual Low Resource Speaker Adaptation Using Phonological
Features [2.8080708404213373]
We train a language-agnostic multispeaker model conditioned on a set of phonologically derived features common across different languages.
With as few as 32 and 8 utterances of target speaker data, we obtain high speaker similarity scores and naturalness comparable to the corresponding literature.
arXiv Detail & Related papers (2021-11-17T12:33:42Z) - DEEP: DEnoising Entity Pre-training for Neural Machine Translation [123.6686940355937]
It has been shown that machine translation models usually generate poor translations for named entities that are infrequent in the training corpus.
We propose DEEP, a DEnoising Entity Pre-training method that leverages large amounts of monolingual data and a knowledge base to improve named entity translation accuracy within sentences.
arXiv Detail & Related papers (2021-11-14T17:28:09Z) - Consecutive Decoding for Speech-to-text Translation [51.155661276936044]
COnSecutive Transcription and Translation (COSTT) is an integral approach for speech-to-text translation.
The key idea is to generate source transcript and target translation text with a single decoder.
Our method is verified on three mainstream datasets.
arXiv Detail & Related papers (2020-09-21T10:10:45Z) - Detect Language of Transliterated Texts [0.0]
Informal transliteration from other languages to English is prevalent in social media threads, instant messaging, and discussion forums.
We propose a Language Identification (LID) system, with an approach for feature extraction.
We tokenize the words into phonetic syllables and use a simple Long Short-term Memory (LSTM) network architecture to detect the language of transliterated texts.
arXiv Detail & Related papers (2020-04-26T10:28:02Z) - Towards Zero-shot Learning for Automatic Phonemic Transcription [82.9910512414173]
A more challenging problem is to build phonemic transcribers for languages with zero training data.
Our model is able to recognize unseen phonemes in the target language without any training data.
It achieves 7.7% better phoneme error rate on average over a standard multilingual model.
arXiv Detail & Related papers (2020-02-26T20:38:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.