Towards A Sentiment Analyzer for Low-Resource Languages
- URL: http://arxiv.org/abs/2011.06382v1
- Date: Thu, 12 Nov 2020 13:50:00 GMT
- Title: Towards A Sentiment Analyzer for Low-Resource Languages
- Authors: Dian Indriani, Arbi Haza Nasution, Winda Monika and Salhazan Nasution
- Abstract summary: This research aims to analyse a sentiment of the users towards a particular trending topic that has been actively and massively discussed at that time.
We use the hashtag textit#kpujangancurang that was the trending topic during the Indonesia presidential election in 2019.
This research utilizes rapid miner tool to generate the twitter data and comparing Naive Bayes, K-Nearest Neighbor, Decision Tree, and Multi-Layer Perceptron classification methods to classify the sentiment of the twitter data.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Twitter is one of the top influenced social media which has a million number
of active users. It is commonly used for microblogging that allows users to
share messages, ideas, thoughts and many more. Thus, millions interaction such
as short messages or tweets are flowing around among the twitter users
discussing various topics that has been happening world-wide. This research
aims to analyse a sentiment of the users towards a particular trending topic
that has been actively and massively discussed at that time. We chose a hashtag
\textit{\#kpujangancurang} that was the trending topic during the Indonesia
presidential election in 2019. We use the hashtag to obtain a set of data from
Twitter to analyse and investigate further the positive or the negative
sentiment of the users from their tweets. This research utilizes rapid miner
tool to generate the twitter data and comparing Naive Bayes, K-Nearest
Neighbor, Decision Tree, and Multi-Layer Perceptron classification methods to
classify the sentiment of the twitter data. There are overall 200 labeled data
in this experiment. Overall, Naive Bayes and Multi-Layer Perceptron
classification outperformed the other two methods on 11 experiments with
different size of training-testing data split. The two classifiers are
potential to be used in creating sentiment analyzer for low-resource languages
with small corpus.
Related papers
- ViralBERT: A User Focused BERT-Based Approach to Virality Prediction [11.992815669875924]
We propose ViralBERT, which can be used to predict the virality of tweets using content- and user-based features.
We employ a method of concatenating numerical features such as hashtags and follower numbers to tweet text, and utilise two BERT modules.
We collect a dataset of 330k tweets to train ViralBERT and validate the efficacy of our model using baselines from current studies in this field.
arXiv Detail & Related papers (2022-05-17T21:40:24Z) - Manipulating Twitter Through Deletions [64.33261764633504]
Research into influence campaigns on Twitter has mostly relied on identifying malicious activities from tweets obtained via public APIs.
Here, we provide the first exhaustive, large-scale analysis of anomalous deletion patterns involving more than a billion deletions by over 11 million accounts.
We find that a small fraction of accounts delete a large number of tweets daily.
First, limits on tweet volume are circumvented, allowing certain accounts to flood the network with over 26 thousand daily tweets.
Second, coordinated networks of accounts engage in repetitive likes and unlikes of content that is eventually deleted, which can manipulate ranking algorithms.
arXiv Detail & Related papers (2022-03-25T20:07:08Z) - Sentiment Analysis and Sarcasm Detection of Indian General Election
Tweets [0.0]
Social media usage has increased to an all-time high level in today's digital world.
Analysing the sentiments and opinions of the common public is very important for both the government and the business people.
In this paper, we have worked towards analysing the sentiments of the people of India during the Lok Sabha election 2019 using Twitter data.
arXiv Detail & Related papers (2022-01-03T17:30:00Z) - Identification of Twitter Bots based on an Explainable ML Framework: the
US 2020 Elections Case Study [72.61531092316092]
This paper focuses on the design of a novel system for identifying Twitter bots based on labeled Twitter data.
Supervised machine learning (ML) framework is adopted using an Extreme Gradient Boosting (XGBoost) algorithm.
Our study also deploys Shapley Additive Explanations (SHAP) for explaining the ML model predictions.
arXiv Detail & Related papers (2021-12-08T14:12:24Z) - News consumption and social media regulations policy [70.31753171707005]
We analyze two social media that enforced opposite moderation methods, Twitter and Gab, to assess the interplay between news consumption and content regulation.
Our results show that the presence of moderation pursued by Twitter produces a significant reduction of questionable content.
The lack of clear regulation on Gab results in the tendency of the user to engage with both types of content, showing a slight preference for the questionable ones which may account for a dissing/endorsement behavior.
arXiv Detail & Related papers (2021-06-07T19:26:32Z) - Sentiment analysis in tweets: an assessment study from classical to
modern text representation models [59.107260266206445]
Short texts published on Twitter have earned significant attention as a rich source of information.
Their inherent characteristics, such as the informal, and noisy linguistic style, remain challenging to many natural language processing (NLP) tasks.
This study fulfils an assessment of existing language models in distinguishing the sentiment expressed in tweets by using a rich collection of 22 datasets.
arXiv Detail & Related papers (2021-05-29T21:05:28Z) - Understanding Information Spreading Mechanisms During COVID-19 Pandemic
by Analyzing the Impact of Tweet Text and User Features for Retweet
Prediction [6.658785818853953]
COVID-19 has affected the world economy and the daily life routine of almost everyone.
Social media platforms enable users to share information with other users who can reshare this information.
We propose two CNN and RNN based models and evaluate the performance of these models on a publicly available TweetsCOV19 dataset.
arXiv Detail & Related papers (2021-05-26T15:55:58Z) - Sentiment Analysis on Social Media Content [0.0]
The aim of this paper is to present a model that can perform sentiment analysis of real data collected from Twitter.
Data in Twitter is highly unstructured which makes it difficult to analyze.
Our proposed model is different from prior work in this field because it combined the use of supervised and unsupervised machine learning algorithms.
arXiv Detail & Related papers (2020-07-04T17:03:30Z) - Echo Chambers on Social Media: A comparative analysis [64.2256216637683]
We introduce an operational definition of echo chambers and perform a massive comparative analysis on 1B pieces of contents produced by 1M users on four social media platforms.
We infer the leaning of users about controversial topics and reconstruct their interaction networks by analyzing different features.
We find support for the hypothesis that platforms implementing news feed algorithms like Facebook may elicit the emergence of echo-chambers.
arXiv Detail & Related papers (2020-04-20T20:00:27Z) - Forensic Authorship Analysis of Microblogging Texts Using N-Grams and
Stylometric Features [63.48764893706088]
This work aims at identifying authors of tweet messages, which are limited to 280 characters.
We use for our experiments a self-captured database of 40 users, with 120 to 200 tweets per user.
Results using this small set are promising, with the different features providing a classification accuracy between 92% and 98.5%.
arXiv Detail & Related papers (2020-03-24T19:32:11Z) - Investigating Classification Techniques with Feature Selection For
Intention Mining From Twitter Feed [0.0]
Micro-blogging service Twitter has more than 200 million registered users who exchange more than 65 million posts per day.
Most of the tweets are written informally and often in slang language.
This paper investigates the problem of selecting features that affect extracting user's intention from Twitter feeds.
arXiv Detail & Related papers (2020-01-22T11:55:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.