AsPOS: Assamese Part of Speech Tagger using Deep Learning Approach
- URL: http://arxiv.org/abs/2212.07043v1
- Date: Wed, 14 Dec 2022 05:36:18 GMT
- Title: AsPOS: Assamese Part of Speech Tagger using Deep Learning Approach
- Authors: Dhrubajyoti Pathak, Sukumar Nandi, Priyankoo Sarmah
- Abstract summary: Part of Speech (POS) tagging is crucial to Natural Language Processing (NLP)
We present a Deep Learning (DL)-based POS tagger for Assamese.
We attain a tagging accuracy of 86.52% in F1 score.
- Score: 7.252817150901275
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Part of Speech (POS) tagging is crucial to Natural Language Processing (NLP).
It is a well-studied topic in several resource-rich languages. However, the
development of computational linguistic resources is still in its infancy
despite the existence of numerous languages that are historically and literary
rich. Assamese, an Indian scheduled language, spoken by more than 25 million
people, falls under this category. In this paper, we present a Deep Learning
(DL)-based POS tagger for Assamese. The development process is divided into two
stages. In the first phase, several pre-trained word embeddings are employed to
train several tagging models. This allows us to evaluate the performance of the
word embeddings in the POS tagging task. The top-performing model from the
first phase is employed to annotate another set of new sentences. In the second
phase, the model is trained further using the fresh dataset. Finally, we attain
a tagging accuracy of 86.52% in F1 score. The model may serve as a baseline for
further study on DL-based Assamese POS tagging.
Related papers
- Exploring transfer learning for Deep NLP systems on rarely annotated languages [0.0]
This thesis investigates the application of transfer learning for Part-of-Speech (POS) tagging between Hindi and Nepali.
We assess whether multitask learning in Hindi, with auxiliary tasks such as gender and singular/plural tagging, can contribute to improved POS tagging accuracy.
arXiv Detail & Related papers (2024-10-15T13:33:54Z) - A New Method for Cross-Lingual-based Semantic Role Labeling [5.992526851963307]
A deep learning algorithm is proposed to train semantic role labeling in English and Persian.
The results show significant improvements compared to Niksirt et al.'s model.
The development of cross-lingual methods for semantic role labeling holds promise.
arXiv Detail & Related papers (2024-08-28T16:06:12Z) - Breaking the Script Barrier in Multilingual Pre-Trained Language Models with Transliteration-Based Post-Training Alignment [50.27950279695363]
The transfer performance is often hindered when a low-resource target language is written in a different script than the high-resource source language.
Inspired by recent work that uses transliteration to address this problem, our paper proposes a transliteration-based post-pretraining alignment (PPA) method.
arXiv Detail & Related papers (2024-06-28T08:59:24Z) - Language Model Pre-Training with Sparse Latent Typing [66.75786739499604]
We propose a new pre-training objective, Sparse Latent Typing, which enables the model to sparsely extract sentence-level keywords with diverse latent types.
Experimental results show that our model is able to learn interpretable latent type categories in a self-supervised manner without using any external knowledge.
arXiv Detail & Related papers (2022-10-23T00:37:08Z) - Part-of-Speech Tagging of Odia Language Using statistical and Deep
Learning-Based Approaches [0.0]
This research work is to present a conditional random field (CRF) and deep learning-based approaches (CNN and Bi-LSTM) to develop Odia part-of-speech tagger.
It has been observed that Bi-LSTM model with character sequence feature and pre-trained word vector achieved a significant state-of-the-art result.
arXiv Detail & Related papers (2022-07-07T12:15:23Z) - Por Qu\'e N\~ao Utiliser Alla Spr{\aa}k? Mixed Training with Gradient
Optimization in Few-Shot Cross-Lingual Transfer [2.7213511121305465]
We propose a one-step mixed training method that trains on both source and target data.
We use one model to handle all target languages simultaneously to avoid excessively language-specific models.
Our proposed method achieves state-of-the-art performance on all tasks and outperforms target-adapting by a large margin.
arXiv Detail & Related papers (2022-04-29T04:05:02Z) - IGLUE: A Benchmark for Transfer Learning across Modalities, Tasks, and
Languages [87.5457337866383]
We introduce the Image-Grounded Language Understanding Evaluation benchmark.
IGLUE brings together visual question answering, cross-modal retrieval, grounded reasoning, and grounded entailment tasks across 20 diverse languages.
We find that translate-test transfer is superior to zero-shot transfer and that few-shot learning is hard to harness for many tasks.
arXiv Detail & Related papers (2022-01-27T18:53:22Z) - Continual Learning in Multilingual NMT via Language-Specific Embeddings [92.91823064720232]
It consists in replacing the shared vocabulary with a small language-specific vocabulary and fine-tuning the new embeddings on the new language's parallel data.
Because the parameters of the original model are not modified, its performance on the initial languages does not degrade.
arXiv Detail & Related papers (2021-10-20T10:38:57Z) - UNKs Everywhere: Adapting Multilingual Language Models to New Scripts [103.79021395138423]
Massively multilingual language models such as multilingual BERT (mBERT) and XLM-R offer state-of-the-art cross-lingual transfer performance on a range of NLP tasks.
Due to their limited capacity and large differences in pretraining data, there is a profound performance gap between resource-rich and resource-poor target languages.
We propose novel data-efficient methods that enable quick and effective adaptation of pretrained multilingual models to such low-resource languages and unseen scripts.
arXiv Detail & Related papers (2020-12-31T11:37:28Z) - Grounded Compositional Outputs for Adaptive Language Modeling [59.02706635250856]
A language model's vocabulary$-$typically selected before training and permanently fixed later$-$affects its size.
We propose a fully compositional output embedding layer for language models.
To our knowledge, the result is the first word-level language model with a size that does not depend on the training vocabulary.
arXiv Detail & Related papers (2020-09-24T07:21:14Z) - Multilingual Jointly Trained Acoustic and Written Word Embeddings [22.63696520064212]
We extend this idea to multiple low-resource languages.
We jointly train an AWE model and an AGWE model, using phonetically transcribed data from multiple languages.
The pre-trained models can then be used for unseen zero-resource languages, or fine-tuned on data from low-resource languages.
arXiv Detail & Related papers (2020-06-24T19:16:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.