The First Parallel Corpora for Kurdish Sign Language
- URL: http://arxiv.org/abs/2305.06747v1
- Date: Thu, 11 May 2023 12:10:20 GMT
- Title: The First Parallel Corpora for Kurdish Sign Language
- Authors: Zina Kamal and Hossein Hassani
- Abstract summary: Kurdish Sign Language (KuSL) is the natural language of the Kurdish Deaf people.
We propose an avatar-based automatic translation of Kurdish texts in the Sorani (Central Kurdish) dialect into the Kurdish Sign language.
We tested the outcome understandability and evaluated it using the Bilingual Evaluation Understudy (BLEU)
- Score: 0.76146285961466
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Kurdish Sign Language (KuSL) is the natural language of the Kurdish Deaf
people. We work on automatic translation between spoken Kurdish and KuSL. Sign
languages evolve rapidly and follow grammatical rules that differ from spoken
languages. Consequently,those differences should be considered during any
translation. We proposed an avatar-based automatic translation of Kurdish texts
in the Sorani (Central Kurdish) dialect into the Kurdish Sign language. We
developed the first parallel corpora for that pair that we use to train a
Statistical Machine Translation (SMT) engine. We tested the outcome
understandability and evaluated it using the Bilingual Evaluation Understudy
(BLEU). Results showed 53.8% accuracy. Compared to the previous experiments in
the field, the result is considerably high. We suspect the reason to be the
similarity between the structure of the two pairs. We plan to make the
resources publicly available under CC BY-NC-SA 4.0 license on the Kurdish-BLARK
(https://kurdishblark.github.io/).
Related papers
- How Robust is Neural Machine Translation to Language Imbalance in
Multilingual Tokenizer Training? [86.48323488619629]
We analyze how translation performance changes as the data ratios among languages vary in the tokenizer training corpus.
We find that while relatively better performance is often observed when languages are more equally sampled, the downstream performance is more robust to language imbalance than we usually expected.
arXiv Detail & Related papers (2022-04-29T17:50:36Z) - Linking Emergent and Natural Languages via Corpus Transfer [98.98724497178247]
We propose a novel way to establish a link by corpus transfer between emergent languages and natural languages.
Our approach showcases non-trivial transfer benefits for two different tasks -- language modeling and image captioning.
We also introduce a novel metric to predict the transferability of an emergent language by translating emergent messages to natural language captions grounded on the same images.
arXiv Detail & Related papers (2022-03-24T21:24:54Z) - Part of Speech Tagging (POST) of a Low-resource Language using another
Language (Developing a POS-Tagged Lexicon for Kurdish (Sorani) using a Tagged
Persian (Farsi) Corpus) [0.76146285961466]
Part of Speech Tagging (POST) is essential in developing tagged corpora.
The Kurdish language currently lacks publicly available tagged corpora of proper sizes.
We use a tagged corpus (Bijankhan corpus) in Persian (Farsi) as a close language to Kurdish to develop a POS-tagged lexicon.
arXiv Detail & Related papers (2022-01-30T11:49:43Z) - From Good to Best: Two-Stage Training for Cross-lingual Machine Reading
Comprehension [51.953428342923885]
We develop a two-stage approach to enhance the model performance.
The first stage targets at recall: we design a hard-learning (HL) algorithm to maximize the likelihood that the top-k predictions contain the accurate answer.
The second stage focuses on precision: an answer-aware contrastive learning mechanism is developed to learn the fine difference between the accurate answer and other candidates.
arXiv Detail & Related papers (2021-12-09T07:31:15Z) - ChrEnTranslate: Cherokee-English Machine Translation Demo with Quality
Estimation and Corrective Feedback [70.5469946314539]
ChrEnTranslate is an online machine translation demonstration system for translation between English and an endangered language Cherokee.
It supports both statistical and neural translation models as well as provides quality estimation to inform users of reliability.
arXiv Detail & Related papers (2021-07-30T17:58:54Z) - Central Kurdish machine translation: First large scale parallel corpus
and experiments [2.099922236065961]
We present the first large scale parallel corpus of Central Kurdish-English, Awta, containing 229,222 pairs of manually aligned translations.
Our best performing systems achieve 22.72 and 16.81 in BLEU score for Ku$rightarrow$EN and En$rightarrow$Ku, respectively.
arXiv Detail & Related papers (2021-06-17T08:41:53Z) - Towards Machine Translation for the Kurdish Language [0.0]
Machine translation is the task of translating texts from one language to another using computers.
Kurdish, an Indo-European language, has received little attention in this realm due to the language being less-resourced.
We describe the available scarce parallel data suitable for training a neural machine translation model for Sorani Kurdish-English translation.
arXiv Detail & Related papers (2020-10-12T21:28:57Z) - Leveraging Multilingual News Websites for Building a Kurdish Parallel
Corpus [0.6445605125467573]
We present a corpus containing 12,327 translation pairs in the two major dialects of Kurdish, Sorani and Kurmanji.
We also provide 1,797 and 650 translation pairs in English-Kurmanji and English-Sorani.
arXiv Detail & Related papers (2020-10-04T11:52:50Z) - Unsupervised Cross-lingual Representation Learning for Speech
Recognition [63.85924123692923]
XLSR learns cross-lingual speech representations by pretraining a single model from the raw waveform of speech in multiple languages.
We build on wav2vec 2.0 which is trained by solving a contrastive task over masked latent speech representations.
Experiments show that cross-lingual pretraining significantly outperforms monolingual pretraining.
arXiv Detail & Related papers (2020-06-24T18:25:05Z) - Knowledge Distillation for Multilingual Unsupervised Neural Machine
Translation [61.88012735215636]
Unsupervised neural machine translation (UNMT) has recently achieved remarkable results for several language pairs.
UNMT can only translate between a single language pair and cannot produce translation results for multiple language pairs at the same time.
In this paper, we empirically introduce a simple method to translate between thirteen languages using a single encoder and a single decoder.
arXiv Detail & Related papers (2020-04-21T17:26:16Z) - Using Punkt for Sentence Segmentation in non-Latin Scripts: Experiments
on Kurdish (Sorani) Texts [0.76146285961466]
Punkt is an unsupervised machine learning method.
We used Punkt to segment a Kurdish corpus of Sorani dialect, written in Persian-Arabic script.
In our experiment, we achieved an F1 score of 91.10% and had an Error Rate of 16.32%.
arXiv Detail & Related papers (2020-04-09T06:44:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.