An Enhanced Corpus for Arabic Newspapers Comments
- URL: http://arxiv.org/abs/2102.09965v1
- Date: Mon, 8 Feb 2021 10:15:44 GMT
- Title: An Enhanced Corpus for Arabic Newspapers Comments
- Authors: Hichem Rahab, Abdelhafid Zitouni, Mahieddine Djoudi (TECHN\'E - EA
6316)
- Abstract summary: We propose an enhanced approach to create a dedicated corpus for Algerian Arabic newspapers comments.
A corpus is created by collecting comments from web sites of three well know Algerian newspapers.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we propose our enhanced approach to create a dedicated corpus
for Algerian Arabic newspapers comments. The developed approach has to enhance
an existing approach by the enrichment of the available corpus and the
inclusion of the annotation step by following the Model Annotate Train Test
Evaluate Revise (MATTER) approach. A corpus is created by collecting comments
from web sites of three well know Algerian newspapers. Three classifiers,
support vector machines, na{\"i}ve Bayes, and k-nearest neighbors, were used
for classification of comments into positive and negative classes. To identify
the influence of the stemming in the obtained results, the classification was
tested with and without stemming. Obtained results show that stemming does not
enhance considerably the classification due to the nature of Algerian comments
tied to Algerian Arabic Dialect. The promising results constitute a motivation
for us to improve our approach especially in dealing with non Arabic sentences,
especially Dialectal and French ones.
Related papers
- FASSILA: A Corpus for Algerian Dialect Fake News Detection and Sentiment Analysis [0.0]
The Algerian dialect (AD) faces challenges due to the absence of annotated corpora.
This study outlines the development process of a specialized corpus for Fake News (FN) detection and sentiment analysis (SA) in AD called FASSILA.
arXiv Detail & Related papers (2024-11-07T10:39:10Z) - Strategies for Arabic Readability Modeling [9.976720880041688]
Automatic readability assessment is relevant to building NLP applications for education, content analysis, and accessibility.
We present a set of experimental results on Arabic readability assessment using a diverse range of approaches.
arXiv Detail & Related papers (2024-07-03T11:54:11Z) - Arabic Sentiment Analysis with Noisy Deep Explainable Model [48.22321420680046]
This paper proposes an explainable sentiment classification framework for the Arabic language.
The proposed framework can explain specific predictions by training a local surrogate explainable model.
We carried out experiments on public benchmark Arabic SA datasets.
arXiv Detail & Related papers (2023-09-24T19:26:53Z) - Prefer to Classify: Improving Text Classifiers via Auxiliary Preference
Learning [76.43827771613127]
In this paper, we investigate task-specific preferences between pairs of input texts as a new alternative way for such auxiliary data annotation.
We propose a novel multi-task learning framework, called prefer-to-classify (P2C), which can enjoy the cooperative effect of learning both the given classification task and the auxiliary preferences.
arXiv Detail & Related papers (2023-06-08T04:04:47Z) - Offensive Language Detection in Under-resourced Algerian Dialectal
Arabic Language [0.0]
We focus on the Algerian dialectal Arabic which is one of under-resourced languages.
Due to the scarcity of works on the same language, we have built a new corpus regrouping more than 8.7k texts manually annotated as normal, abusive and offensive.
arXiv Detail & Related papers (2022-03-18T15:42:21Z) - Speaker Embedding-aware Neural Diarization for Flexible Number of
Speakers with Textual Information [55.75018546938499]
We propose the speaker embedding-aware neural diarization (SEND) method, which predicts the power set encoded labels.
Our method achieves lower diarization error rate than the target-speaker voice activity detection.
arXiv Detail & Related papers (2021-11-28T12:51:04Z) - Effect of Word Embedding Variable Parameters on Arabic Sentiment
Analysis Performance [0.0]
Social media such as Twitter, Facebook, etc. has led to a generated growing number of comments that contains users opinions.
This study will discuss three parameters (Window size, Dimension of vector and Negative Sample) for Arabic sentiment analysis.
Four binary classifiers (Logistic Regression, Decision Tree, Support Vector Machine and Naive Bayes) are used to detect sentiment.
arXiv Detail & Related papers (2021-01-08T08:31:00Z) - Hierarchical Bi-Directional Self-Attention Networks for Paper Review
Rating Recommendation [81.55533657694016]
We propose a Hierarchical bi-directional self-attention Network framework (HabNet) for paper review rating prediction and recommendation.
Specifically, we leverage the hierarchical structure of the paper reviews with three levels of encoders: sentence encoder (level one), intra-review encoder (level two) and inter-review encoder (level three)
We are able to identify useful predictors to make the final acceptance decision, as well as to help discover the inconsistency between numerical review ratings and text sentiment conveyed by reviewers.
arXiv Detail & Related papers (2020-11-02T08:07:50Z) - Predicting the Humorousness of Tweets Using Gaussian Process Preference
Learning [56.18809963342249]
We present a probabilistic approach that learns to rank and rate the humorousness of short texts by exploiting human preference judgments and automatically sourced linguistic annotations.
We report system performance for the campaign's two subtasks, humour detection and funniness score prediction, and discuss some issues arising from the conversion between the numeric scores used in the HAHA@IberLEF 2019 data and the pairwise judgment annotations required for our method.
arXiv Detail & Related papers (2020-08-03T13:05:42Z) - SANA : Sentiment Analysis on Newspapers comments in Algeria [0.0]
We are interested in our work by comments in Algerian newspaper websites.
Two corpora were used: SANA and OCA.
For the classification we adopt Supports vector machines, naive Bayes and knearest neighbors.
arXiv Detail & Related papers (2020-05-31T08:02:23Z) - Automatic Discourse Segmentation: an evaluation in French [65.00134288222509]
We describe some discursive segmentation methods as well as a preliminary evaluation of the segmentation quality.
We have developed three models solely based on resources simultaneously available in several languages: marker lists and a statistic POS labeling.
arXiv Detail & Related papers (2020-02-10T21:35:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.