AlbMoRe: A Corpus of Movie Reviews for Sentiment Analysis in Albanian
- URL: http://arxiv.org/abs/2306.08526v1
- Date: Wed, 14 Jun 2023 14:21:55 GMT
- Title: AlbMoRe: A Corpus of Movie Reviews for Sentiment Analysis in Albanian
- Authors: Erion \c{C}ano
- Abstract summary: AlbMoRe is a corpus of 800 movie reviews in Albanian.
Each text is labeled as positive or negative and can be used for sentiment analysis research.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Lack of available resources such as text corpora for low-resource languages
seriously hinders research on natural language processing and computational
linguistics. This paper presents AlbMoRe, a corpus of 800 sentiment annotated
movie reviews in Albanian. Each text is labeled as positive or negative and can
be used for sentiment analysis research. Preliminary results based on
traditional machine learning classifiers trained with the AlbMoRe samples are
also reported. They can serve as comparison baselines for future research
experiments.
Related papers
- Strategies for Arabic Readability Modeling [9.976720880041688]
Automatic readability assessment is relevant to building NLP applications for education, content analysis, and accessibility.
We present a set of experimental results on Arabic readability assessment using a diverse range of approaches.
arXiv Detail & Related papers (2024-07-03T11:54:11Z) - AlbNews: A Corpus of Headlines for Topic Modeling in Albanian [0.0]
AlbNews is a collection of 600 topically labeled news headlines and 2600 unlabeled ones in Albanian.
The data can be freely used for conducting topic modeling research.
arXiv Detail & Related papers (2024-02-06T14:24:28Z) - AlbNER: A Corpus for Named Entity Recognition in Albanian [0.0]
This paper presents AlbNER, a corpus of 900 sentences with labeled named entities, collected from Albanian Wikipedia articles.
Preliminary results with BERT and RoBERTa variants fine-tuned and tested with AlbNER data indicate that model size has slight impact on NER performance, whereas language transfer has a significant one.
arXiv Detail & Related papers (2023-09-15T20:03:19Z) - No Language Left Behind: Scaling Human-Centered Machine Translation [69.28110770760506]
We create datasets and models aimed at narrowing the performance gap between low and high-resource languages.
We propose multiple architectural and training improvements to counteract overfitting while training on thousands of tasks.
Our model achieves an improvement of 44% BLEU relative to the previous state-of-the-art.
arXiv Detail & Related papers (2022-07-11T07:33:36Z) - Offensive Language Detection in Under-resourced Algerian Dialectal
Arabic Language [0.0]
We focus on the Algerian dialectal Arabic which is one of under-resourced languages.
Due to the scarcity of works on the same language, we have built a new corpus regrouping more than 8.7k texts manually annotated as normal, abusive and offensive.
arXiv Detail & Related papers (2022-03-18T15:42:21Z) - Intent Classification Using Pre-Trained Embeddings For Low Resource
Languages [67.40810139354028]
Building Spoken Language Understanding systems that do not rely on language specific Automatic Speech Recognition is an important yet less explored problem in language processing.
We present a comparative study aimed at employing a pre-trained acoustic model to perform Spoken Language Understanding in low resource scenarios.
We perform experiments across three different languages: English, Sinhala, and Tamil each with different data sizes to simulate high, medium, and low resource scenarios.
arXiv Detail & Related papers (2021-10-18T13:06:59Z) - Survey of Low-Resource Machine Translation [65.52755521004794]
There are currently around 7000 languages spoken in the world and almost all language pairs lack significant resources for training machine translation models.
There has been increasing interest in research addressing the challenge of producing useful translation models when very little translated training data is available.
arXiv Detail & Related papers (2021-09-01T16:57:58Z) - Bambara Language Dataset for Sentiment Analysis [0.0]
In Africa, various languages and dialects exist. However, they are still underrepresented and not fully exploited for analytical studies and research purposes.
In this paper, we present the first common-crawl-based Bambara dialectal dataset dedicated for Sentiment Analysis.
arXiv Detail & Related papers (2021-08-05T11:07:18Z) - Leveraging Pre-trained Language Model for Speech Sentiment Analysis [58.78839114092951]
We explore the use of pre-trained language models to learn sentiment information of written texts for speech sentiment analysis.
We propose a pseudo label-based semi-supervised training strategy using a language model on an end-to-end speech sentiment approach.
arXiv Detail & Related papers (2021-06-11T20:15:21Z) - Improving Cross-Lingual Reading Comprehension with Self-Training [62.73937175625953]
Current state-of-the-art models even surpass human performance on several benchmarks.
Previous works have revealed the abilities of pre-trained multilingual models for zero-shot cross-lingual reading comprehension.
This paper further utilized unlabeled data to improve the performance.
arXiv Detail & Related papers (2021-05-08T08:04:30Z) - ORB: An Open Reading Benchmark for Comprehensive Evaluation of Machine
Reading Comprehension [53.037401638264235]
We present an evaluation server, ORB, that reports performance on seven diverse reading comprehension datasets.
The evaluation server places no restrictions on how models are trained, so it is a suitable test bed for exploring training paradigms and representation learning.
arXiv Detail & Related papers (2019-12-29T07:27:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.