Machine Learning Approaches for Amharic Parts-of-speech Tagging
- URL: http://arxiv.org/abs/2001.03324v1
- Date: Fri, 10 Jan 2020 06:40:49 GMT
- Title: Machine Learning Approaches for Amharic Parts-of-speech Tagging
- Authors: Ibrahim Gashaw and H L. Shashirekha
- Abstract summary: Performance of the current POS taggers in Amharic is not as good as that of the contemporary POS taggers available for English and other European languages.
The aim of this work is to improve POS tagging performance for the Amharic language, which was never above 91%.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Part-of-speech (POS) tagging is considered as one of the basic but necessary
tools which are required for many Natural Language Processing (NLP)
applications such as word sense disambiguation, information retrieval,
information processing, parsing, question answering, and machine translation.
Performance of the current POS taggers in Amharic is not as good as that of the
contemporary POS taggers available for English and other European languages.
The aim of this work is to improve POS tagging performance for the Amharic
language, which was never above 91%. Usage of morphological knowledge, an
extension of the existing annotated data, feature extraction, parameter tuning
by applying grid search and the tagging algorithms have been examined and
obtained significant performance difference from the previous works. We have
used three different datasets for POS experiments.
Related papers
- LexMatcher: Dictionary-centric Data Collection for LLM-based Machine Translation [67.24113079928668]
We present LexMatcher, a method for data curation driven by the coverage of senses found in bilingual dictionaries.
Our approach outperforms the established baselines on the WMT2022 test sets.
arXiv Detail & Related papers (2024-06-03T15:30:36Z) - Cross-lingual Contextualized Phrase Retrieval [63.80154430930898]
We propose a new task formulation of dense retrieval, cross-lingual contextualized phrase retrieval.
We train our Cross-lingual Contextualized Phrase Retriever (CCPR) using contrastive learning.
On the phrase retrieval task, CCPR surpasses baselines by a significant margin, achieving a top-1 accuracy that is at least 13 points higher.
arXiv Detail & Related papers (2024-03-25T14:46:51Z) - DIALECTBENCH: A NLP Benchmark for Dialects, Varieties, and Closely-Related Languages [49.38663048447942]
We propose DIALECTBENCH, the first-ever large-scale benchmark for NLP on varieties.
This allows for a comprehensive evaluation of NLP system performance on different language varieties.
We provide substantial evidence of performance disparities between standard and non-standard language varieties.
arXiv Detail & Related papers (2024-03-16T20:18:36Z) - Prefer to Classify: Improving Text Classifiers via Auxiliary Preference
Learning [76.43827771613127]
In this paper, we investigate task-specific preferences between pairs of input texts as a new alternative way for such auxiliary data annotation.
We propose a novel multi-task learning framework, called prefer-to-classify (P2C), which can enjoy the cooperative effect of learning both the given classification task and the auxiliary preferences.
arXiv Detail & Related papers (2023-06-08T04:04:47Z) - MasakhaPOS: Part-of-Speech Tagging for Typologically Diverse African
Languages [7.86385861664505]
We present MasakhaPOS, the largest part-of-speech (POS) dataset for 20 typologically diverse African languages.
We discuss the challenges in annotating POS for these languages using the UD (universal dependencies) guidelines.
arXiv Detail & Related papers (2023-05-23T12:15:33Z) - Incorporating External POS Tagger for Punctuation Restoration [11.573672075002007]
Punctuation restoration is an important post-processing step in automatic speech recognition.
Part-of-speech (POS) taggers provide informative tags, suggesting each input token's syntactic role.
We incorporate an external POS tagger and fuse its predicted labels into the existing language model to provide syntactic information.
arXiv Detail & Related papers (2021-06-12T09:58:06Z) - FILTER: An Enhanced Fusion Method for Cross-lingual Language
Understanding [85.29270319872597]
We propose an enhanced fusion method that takes cross-lingual data as input for XLM finetuning.
During inference, the model makes predictions based on the text input in the target language and its translation in the source language.
To tackle this issue, we propose an additional KL-divergence self-teaching loss for model training, based on auto-generated soft pseudo-labels for translated text in the target language.
arXiv Detail & Related papers (2020-09-10T22:42:15Z) - Reliable Part-of-Speech Tagging of Historical Corpora through Set-Valued Prediction [21.67895423776014]
We consider POS tagging within the framework of set-valued prediction.
We find that extending state-of-the-art POS taggers to set-valued prediction yields more precise and robust taggings.
arXiv Detail & Related papers (2020-08-04T07:21:36Z) - Adversarial Transfer Learning for Punctuation Restoration [58.2201356693101]
Adversarial multi-task learning is introduced to learn task invariant knowledge for punctuation prediction.
Experiments are conducted on IWSLT2011 datasets.
arXiv Detail & Related papers (2020-04-01T06:19:56Z) - Is POS Tagging Necessary or Even Helpful for Neural Dependency Parsing? [22.93722845643562]
We show that POS tagging can still significantly improve parsing performance when using the Stack joint framework.
Considering that it is much cheaper to annotate POS tags than parse trees, we also investigate the utilization of large-scale heterogeneous POS tag data.
arXiv Detail & Related papers (2020-03-06T13:47:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.