Related papers: Persian Ezafe Recognition Using Transformers and Its Role in Part-Of-Speech Tagging

Persian Ezafe Recognition Using Transformers and Its Role in Part-Of-Speech Tagging

URL: http://arxiv.org/abs/2009.09474v2
Date: Sun, 4 Oct 2020 19:52:11 GMT
Title: Persian Ezafe Recognition Using Transformers and Its Role in Part-Of-Speech Tagging
Authors: Ehsan Doostmohammadi, Minoo Nassajian, Adel Rahimi
Abstract summary: Ezafe is a grammatical particle in some Iranian languages that links two words together. We use different machine learning methods to achieve state-of-the-art results in the task of ezafe recognition.
Score: 1.5469452301122177
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Ezafe is a grammatical particle in some Iranian languages that links two words together. Regardless of the important information it conveys, it is almost always not indicated in Persian script, resulting in mistakes in reading complex sentences and errors in natural language processing tasks. In this paper, we experiment with different machine learning methods to achieve state-of-the-art results in the task of ezafe recognition. Transformer-based methods, BERT and XLMRoBERTa, achieve the best results, the latter achieving 2.68% F1-score more than the previous state-of-the-art. We, moreover, use ezafe information to improve Persian part-of-speech tagging results and show that such information will not be useful to transformer-based methods and explain why that might be the case.

Related papers

Bridging the Gap: An Intermediate Language for Enhanced and Cost-Effective Grapheme-to-Phoneme Conversion with Homographs with Multiple Pronunciations Disambiguation [0.0]
This paper introduces an intermediate language specifically designed for Persian language processing.<n>Our methodology combines two key components: Large Language Model (LLM) prompting techniques and a specialized sequence-to-sequence machine transliteration architecture.
arXiv Detail & Related papers (2025-05-10T11:10:48Z)
PERCORE: A Deep Learning-Based Framework for Persian Spelling Correction with Phonetic Analysis [0.0]
This research introduces a state-of-the-art Persian spelling correction system that seamlessly integrates deep learning techniques with phonetic analysis. Our methodology effectively combines deep contextual analysis with phonetic insights, adeptly correcting both non-word and real-word spelling errors. A thorough evaluation on a wide-ranging dataset confirms our system's superior performance compared to existing methods.
arXiv Detail & Related papers (2024-07-20T07:41:04Z)
Persian Speech Emotion Recognition by Fine-Tuning Transformers [1.0152838128195467]
We present two models, one based on spectrograms and the other on the audio itself, fine-tuned using the shEMO dataset. These models significantly enhance the accuracy of previous systems, increasing it from approximately 65% to 80%. To investigate the effect of multilinguality on the fine-tuning process, these same models are fine-tuned twice.
arXiv Detail & Related papers (2024-02-11T23:23:31Z)
Adversarial Training For Low-Resource Disfluency Correction [50.51901599433536]
We propose an adversarially-trained sequence-tagging model for Disfluency Correction (DC) We show the benefit of our proposed technique, which crucially depends on synthetically generated disfluent data, by evaluating it for DC in three Indian languages. Our technique also performs well in removing stuttering disfluencies in ASR transcripts introduced by speech impairments.
arXiv Detail & Related papers (2023-06-10T08:58:53Z)
DisfluencyFixer: A tool to enhance Language Learning through Speech To Speech Disfluency Correction [50.51901599433536]
DisfluencyFixer is a tool that performs speech-to-speech disfluency correction in English and Hindi. Our proposed system removes disfluencies from input speech and returns fluent speech as output.
arXiv Detail & Related papers (2023-05-26T14:13:38Z)
Evaluating Persian Tokenizers [6.10917825357379]
This article introduces a novel work by the most widely used tokenizers for Persian. It compares and evaluating their performance on Persian texts using a simple algorithm with a pre-tagged Persian dependency dataset. After evaluating tokenizers with the F1-Score, the hybrid version of the Farsi Verb and Hazm with bounded morphemes fixing showed the best performance with an F1 score of 98.97%.
arXiv Detail & Related papers (2022-02-22T13:27:24Z)
ViraPart: A Text Refinement Framework for ASR and NLP Tasks in Persian [0.0]
We propose a ViraPart framework that uses embedded ParsBERT in its core for text clarifications. In the end, the proposed model for ZWNJ recognition, punctuation restoration, and Persian Ezafe construction performs the averaged F1 macro scores of 96.90%, 92.13%, and 98.50%, respectively.
arXiv Detail & Related papers (2021-10-18T08:20:40Z)
On the Language Coverage Bias for Neural Machine Translation [81.81456880770762]
Language coverage bias is important for neural machine translation (NMT) because the target-original training data is not well exploited in current practice. By carefully designing experiments, we provide comprehensive analyses of the language coverage bias in the training data. We propose two simple and effective approaches to alleviate the language coverage bias problem.
arXiv Detail & Related papers (2021-06-07T01:55:34Z)
The Challenges of Persian User-generated Textual Content: A Machine Learning-Based Approach [0.0]
This research applies machine learning-based approaches to tackle the hurdles that come with Persian user-generated textual content. The presented approach uses a machine-translated datasets to conduct sentiment analysis for the Persian language. The results of the experiments have shown promising state-of-the-art performance in contrast to the previous efforts.
arXiv Detail & Related papers (2021-01-20T11:57:59Z)
Worse WER, but Better BLEU? Leveraging Word Embedding as Intermediate in Multitask End-to-End Speech Translation [127.54315184545796]
Speech translation (ST) aims to learn transformations from speech in the source language to the text in the target language. We propose to improve the multitask ST model by utilizing word embedding as the intermediate.
arXiv Detail & Related papers (2020-05-21T14:22:35Z)
Curriculum Pre-training for End-to-End Speech Translation [51.53031035374276]
We propose a curriculum pre-training method that includes an elementary course for transcription learning and two advanced courses for understanding the utterance and mapping words in two languages. Experiments show that our curriculum pre-training method leads to significant improvements on En-De and En-Fr speech translation benchmarks.
arXiv Detail & Related papers (2020-04-21T15:12:07Z)
Sign Language Transformers: Joint End-to-end Sign Language Recognition and Translation [59.38247587308604]
We introduce a novel transformer based architecture that jointly learns Continuous Sign Language Recognition and Translation. We evaluate the recognition and translation performances of our approaches on the challenging RWTH-PHOENIX-Weather-2014T dataset. Our translation networks outperform both sign video to spoken language and gloss to spoken language translation models.
arXiv Detail & Related papers (2020-03-30T21:35:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.