ARAACOM: ARAbic Algerian Corpus for Opinion Mining
- URL: http://arxiv.org/abs/2001.08010v1
- Date: Wed, 22 Jan 2020 13:45:34 GMT
- Title: ARAACOM: ARAbic Algerian Corpus for Opinion Mining
- Authors: Zitouni Abdelhafid (LIRE), Hichem Rahab (ICOSI, LIRE), Abdelhafid
Zitouni (LIRE), Mahieddine Djoudi (TECHN\'E - EA 6316)
- Abstract summary: Opinion mining in the web becomes more and more an attracting task.
In this paper, we propose our approach, for opinion mining in Arabic Algerian news paper.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Nowadays, it is no more needed to do an enormous effort to distribute a lot
of forms to thousands of people and collect them, then convert this from into
electronic format to track people opinion about some subjects. A lot of web
sites can today reach a large spectrum with less effort. The majority of web
sites suggest to their visitors to leave backups about their feeling of the
site or events. So, this makes for us a lot of data which need powerful mean to
exploit. Opinion mining in the web becomes more and more an attracting task,
due the increasing need for individuals and societies to track the mood of
people against several subjects of daily life (sports, politics,
television,...). A lot of works in opinion mining was developed in western
languages especially English, such works in Arabic language still very scarce.
In this paper, we propose our approach, for opinion mining in Arabic Algerian
news paper. CCS CONCEPTS $\bullet$Information systems~Sentiment analysis
$\bullet$ Computing methodologies~Natural language processing
Related papers
- ArMeme: Propagandistic Content in Arabic Memes [9.48177009736915]
We develop an Arabic memes dataset with manual annotations of propagandistic content.
We provide a comprehensive analysis aiming to develop computational tools for their detection.
arXiv Detail & Related papers (2024-06-06T09:56:49Z) - Utilizing deep learning models for the identification of enhancers and
super-enhancers based on genomic and epigenomic features [0.0]
This paper provides an extensive examination of a sizable dataset of English tweets focusing on nine widely recognized cryptocurrencies.
Our primary objective was to conduct a psycholinguistic and emotion analysis of social media content associated with these cryptocurrencies.
The study involved comparing linguistic characteristics across the diverse digital coins, shedding light on the distinctive linguistic patterns that emerge within each coin's community.
arXiv Detail & Related papers (2024-01-15T04:58:50Z) - Hate speech detection in algerian dialect using deep learning [0.0]
We propose a complete approach for detecting hate speech on online Algerian messages.
This corpus contains more than 13.5K documents in Algerian dialect written in Arabic, labeled as hateful or non-hateful.
arXiv Detail & Related papers (2023-09-20T19:54:48Z) - AfriSenti: A Twitter Sentiment Analysis Benchmark for African Languages [45.88640066767242]
Africa is home to over 2,000 languages from more than six language families and has the highest linguistic diversity among all continents.
Yet, there is little NLP research conducted on African languages. Crucial to enabling such research is the availability of high-quality annotated datasets.
In this paper, we introduce AfriSenti, a sentiment analysis benchmark that contains a total of >110,000 tweets in 14 African languages.
arXiv Detail & Related papers (2023-02-17T15:40:12Z) - BERTuit: Understanding Spanish language in Twitter through a native
transformer [70.77033762320572]
We present bfBERTuit, the larger transformer proposed so far for Spanish language, pre-trained on a massive dataset of 230M Spanish tweets.
Our motivation is to provide a powerful resource to better understand Spanish Twitter and to be used on applications focused on this social network.
arXiv Detail & Related papers (2022-04-07T14:28:51Z) - A New Generation of Perspective API: Efficient Multilingual
Character-level Transformers [66.9176610388952]
We present the fundamentals behind the next version of the Perspective API from Google Jigsaw.
At the heart of the approach is a single multilingual token-free Charformer model.
We demonstrate that by forgoing static vocabularies, we gain flexibility across a variety of settings.
arXiv Detail & Related papers (2022-02-22T20:55:31Z) - Sentiment Classification in Swahili Language Using Multilingual BERT [0.04297070083645048]
This study uses the current state-of-the-art model, multilingual BERT, to perform sentiment classification on Swahili datasets.
The data was created by extracting and annotating 8.2k reviews and comments on different social media platforms and the ISEAR emotion dataset.
The model was fine-tuned and achieve the best accuracy of 87.59%.
arXiv Detail & Related papers (2021-04-19T01:47:00Z) - hBert + BiasCorp -- Fighting Racism on the Web [58.768804813646334]
We are releasing BiasCorp, a dataset containing 139,090 comments and news segment from three specific sources - Fox News, BreitbartNews and YouTube.
In this work, we present hBERT, where we modify certain layers of the pretrained BERT model with the new Hopfield Layer.
We are also releasing a JavaScript library and a Chrome Extension Application, to help developers make use of our trained model in web applications.
arXiv Detail & Related papers (2021-04-06T02:17:20Z) - Factorization of Fact-Checks for Low Resource Indian Languages [44.94080515860928]
We introduce FactDRIL: the first large scale multilingual Fact-checking dataset for Regional Indian languages.
Our dataset consists of 9,058 samples belonging to English, 5,155 samples to Hindi and remaining 8,222 samples are distributed across various regional languages.
We expect this dataset will be a valuable resource and serve as a starting point to fight proliferation of fake news in low resource languages.
arXiv Detail & Related papers (2021-02-23T16:47:41Z) - Mere account mein kitna balance hai? -- On building voice enabled
Banking Services for Multilingual Communities [47.955173277834795]
We present our initial exploratory work towards building voice enabled banking services for multilingual societies.
Code Mixing is a phenomenon where lexical items from one language are embedded in the utterance of another.
We investigate various training strategies for building speech based intent recognition systems.
arXiv Detail & Related papers (2020-10-09T01:20:09Z) - SANA : Sentiment Analysis on Newspapers comments in Algeria [0.0]
We are interested in our work by comments in Algerian newspaper websites.
Two corpora were used: SANA and OCA.
For the classification we adopt Supports vector machines, naive Bayes and knearest neighbors.
arXiv Detail & Related papers (2020-05-31T08:02:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.