Sentiment Analysis for Roman Urdu Text over Social Media, a Comparative
Study
- URL: http://arxiv.org/abs/2010.16408v1
- Date: Mon, 5 Oct 2020 16:19:00 GMT
- Title: Sentiment Analysis for Roman Urdu Text over Social Media, a Comparative
Study
- Authors: Irfan Qutab, Khawar Iqbal Malik, Hira Arooj
- Abstract summary: Roman Urdu is one of most dominant language on social networks in Pakistan and India.
In this article we addressed the prior concepts and strategies used to examine the sentiment of the roman Urdu text.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In present century, data volume is increasing enormously. The data could be
in form for image, text, voice, and video. One factor in this huge growth of
data is usage of social media where everyone is posting data on daily basis
during chatting, exchanging information, and uploading their personal and
official credential. Research of sentiments seeks to uncover abstract knowledge
in Published texts in which users communicate their emotions and thoughts about
shared content, including blogs, news and social networks. Roman Urdu is the
one of most dominant language on social networks in Pakistan and India. Roman
Urdu is among the varieties of the world's third largest Urdu language but yet
not sufficient work has been done in this language. In this article we
addressed the prior concepts and strategies used to examine the sentiment of
the roman Urdu text and reported their results as well.
Related papers
- From Statistical Methods to Pre-Trained Models; A Survey on Automatic Speech Recognition for Resource Scarce Urdu Language [41.272055304311905]
This paper focuses on the resource-constrained Urdu language, which is widely spoken across South Asian nations.
It outlines current research trends, technological advancements, and potential directions for future studies in Urdu ASR.
arXiv Detail & Related papers (2024-11-20T17:39:56Z) - Reddit is all you need: Authorship profiling for Romanian [49.1574468325115]
Authorship profiling is the process of identifying an author's characteristics based on their writings.
In this paper, we introduce a corpus of short texts in the Romanian language, annotated with certain author characteristic keywords.
arXiv Detail & Related papers (2024-10-13T16:27:31Z) - Dataset and Benchmark for Urdu Natural Scenes Text Detection, Recognition and Visual Question Answering [50.52792174648067]
This initiative seeks to bridge the gap between textual and visual comprehension.
We propose a new multi-task Urdu scene text dataset comprising over 1000 natural scene images.
We provide fine-grained annotations for text instances, addressing the limitations of previous datasets.
arXiv Detail & Related papers (2024-05-21T06:48:26Z) - Urdu Speech and Text Based Sentiment Analyzer [1.4630964945453113]
This work presented a new multi-class Urdu dataset based on user evaluations.
Our proposed dataset includes 10,000 reviews that have been carefully classified into two categories by human experts: positive, negative.
Five different lexicon- and rule-based algorithms including Naivebayes, Stanza, Textblob, Vader, and Flair are employed and the experimental results show that Flair with an accuracy of 70% outperforms other tested algorithms.
arXiv Detail & Related papers (2022-07-19T10:11:22Z) - A Survey on sentiment analysis in Persian: A Comprehensive System
Perspective Covering Challenges and Advances in Resources, and Methods [0.0]
The main target of this paper is to provide a comprehensive literature survey for state-of-the-art advances in Persian sentiment analysis.
A detailed survey of the sentiment analysis methods used for Persian texts is presented, and previous relevant works on Persian Language are discussed.
According to the state-of-the-art development of English sentiment analysis, some issues and challenges not being addressed in Persian texts are listed.
arXiv Detail & Related papers (2021-04-30T04:31:21Z) - Factorization of Fact-Checks for Low Resource Indian Languages [44.94080515860928]
We introduce FactDRIL: the first large scale multilingual Fact-checking dataset for Regional Indian languages.
Our dataset consists of 9,058 samples belonging to English, 5,155 samples to Hindi and remaining 8,222 samples are distributed across various regional languages.
We expect this dataset will be a valuable resource and serve as a starting point to fight proliferation of fake news in low resource languages.
arXiv Detail & Related papers (2021-02-23T16:47:41Z) - BOLD: Dataset and Metrics for Measuring Biases in Open-Ended Language
Generation [42.34923623457615]
Bias in Open-Ended Language Generation dataset consists of 23,679 English text generation prompts.
An examination of text generated from three popular language models reveals that the majority of these models exhibit a larger social bias than human-written Wikipedia text.
arXiv Detail & Related papers (2021-01-27T22:07:03Z) - Named Entity Recognition for Social Media Texts with Semantic
Augmentation [70.44281443975554]
Existing approaches for named entity recognition suffer from data sparsity problems when conducted on short and informal texts.
We propose a neural-based approach to NER for social media texts where both local (from running text) and augmented semantics are taken into account.
arXiv Detail & Related papers (2020-10-29T10:06:46Z) - PoliWAM: An Exploration of a Large Scale Corpus of Political Discussions
on WhatsApp Messenger [1.2301855531996841]
WhatsApp Messenger is one of the most popular channels for spreading information with a current reach of more than 180 countries and 2 billion people.
In the recent past, several countries have witnessed its effectiveness and influence in political and social campaigns.
We observe a high surge in information and propaganda flow during election campaigning.
arXiv Detail & Related papers (2020-10-26T00:35:57Z) - Characterising User Content on a Multi-lingual Social Network [9.13241181020543]
We present our characterisation of a multilingual social network in India called ShareChat.
We collect an exhaustive dataset across 72 weeks before and during the Indian general elections of 2019 across 14 languages.
We find that Telugu, Malayalam, Tamil and Kannada languages tend to be dominant in soliciting political images.
arXiv Detail & Related papers (2020-04-23T22:25:48Z) - ParsEL 1.0: Unsupervised Entity Linking in Persian Social Media Texts [6.866104126509981]
A large portion of social media data is natural language text.
Recently, FarsBase, a Persian knowledge graph, has been introduced containing almost half a million entities.
In this paper, we propose an unsupervised Persian Entity Linking system.
arXiv Detail & Related papers (2020-04-22T19:34:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.