Urdu Speech and Text Based Sentiment Analyzer
- URL: http://arxiv.org/abs/2207.09163v1
- Date: Tue, 19 Jul 2022 10:11:22 GMT
- Title: Urdu Speech and Text Based Sentiment Analyzer
- Authors: Waqar Ahmad, Maryam Edalati
- Abstract summary: This work presented a new multi-class Urdu dataset based on user evaluations.
Our proposed dataset includes 10,000 reviews that have been carefully classified into two categories by human experts: positive, negative.
Five different lexicon- and rule-based algorithms including Naivebayes, Stanza, Textblob, Vader, and Flair are employed and the experimental results show that Flair with an accuracy of 70% outperforms other tested algorithms.
- Score: 1.4630964945453113
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Discovering what other people think has always been a key aspect of our
information-gathering strategy. People can now actively utilize information
technology to seek out and comprehend the ideas of others, thanks to the
increased availability and popularity of opinion-rich resources such as online
review sites and personal blogs. Because of its crucial function in
understanding people's opinions, sentiment analysis (SA) is a crucial task.
Existing research, on the other hand, is primarily focused on the English
language, with just a small amount of study devoted to low-resource languages.
For sentiment analysis, this work presented a new multi-class Urdu dataset
based on user evaluations. The tweeter website was used to get Urdu dataset.
Our proposed dataset includes 10,000 reviews that have been carefully
classified into two categories by human experts: positive, negative. The
primary purpose of this research is to construct a manually annotated dataset
for Urdu sentiment analysis and to establish the baseline result. Five
different lexicon- and rule-based algorithms including Naivebayes, Stanza,
Textblob, Vader, and Flair are employed and the experimental results show that
Flair with an accuracy of 70% outperforms other tested algorithms.
Related papers
- Lexicon-Based Sentiment Analysis on Text Polarities with Evaluation of Classification Models [1.342834401139078]
This work uses a lexicon-based method to perform sentiment analysis and shows an evaluation of classification models trained over textual data.
The lexicon-based methods identify the intensity of emotion and subjectivity at word levels.
This work is based on a multi-class problem of text being labeled as positive, negative, or neutral.
arXiv Detail & Related papers (2024-09-19T15:31:12Z) - UltraFeedback: Boosting Language Models with Scaled AI Feedback [99.4633351133207]
We present textscUltraFeedback, a large-scale, high-quality, and diversified AI feedback dataset.
Our work validates the effectiveness of scaled AI feedback data in constructing strong open-source chat language models.
arXiv Detail & Related papers (2023-10-02T17:40:01Z) - Constructing Colloquial Dataset for Persian Sentiment Analysis of Social
Microblogs [0.0]
This paper first constructs a user opinion dataset called ITRC-Opinion in a collaborative environment and insource way.
Our dataset contains 60,000 informal and colloquial Persian texts from social microblogs such as Twitter and Instagram.
Second, this study proposes a new architecture based on the convolutional neural network (CNN) model for more effective sentiment analysis of colloquial text in social microblog posts.
arXiv Detail & Related papers (2023-06-22T05:51:22Z) - Leveraging ChatGPT As Text Annotation Tool For Sentiment Analysis [6.596002578395151]
ChatGPT is a new product of OpenAI and has emerged as the most popular AI product.
This study explores the use of ChatGPT as a tool for data labeling for different sentiment analysis tasks.
arXiv Detail & Related papers (2023-06-18T12:20:42Z) - Sentiment analysis in tweets: an assessment study from classical to
modern text representation models [59.107260266206445]
Short texts published on Twitter have earned significant attention as a rich source of information.
Their inherent characteristics, such as the informal, and noisy linguistic style, remain challenging to many natural language processing (NLP) tasks.
This study fulfils an assessment of existing language models in distinguishing the sentiment expressed in tweets by using a rich collection of 22 datasets.
arXiv Detail & Related papers (2021-05-29T21:05:28Z) - Subsentence Extraction from Text Using Coverage-Based Deep Learning
Language Models [3.3461339691835277]
We propose a coverage-based sentiment and subsentence extraction system.
The predicted subsentence consists of auxiliary information expressing a sentiment.
Our approach outperforms the state-of-the-art approaches by a large margin in subsentence prediction.
arXiv Detail & Related papers (2021-04-20T06:24:49Z) - Sentiment Analysis for YouTube Comments in Roman Urdu [0.0]
In Pakistan, a huge amount of data is in roman Urdu language, it is scattered all over the social sites including Twitter, YouTube, Facebook and similar applications.
In this study the focus domain of dataset gathering is YouTube comments.
The dataset contains the comments of people over different Pakistani dramas and TV shows.
arXiv Detail & Related papers (2021-02-19T18:15:52Z) - Vyaktitv: A Multimodal Peer-to-Peer Hindi Conversations based Dataset
for Personality Assessment [50.15466026089435]
We present a novel peer-to-peer Hindi conversation dataset- Vyaktitv.
It consists of high-quality audio and video recordings of the participants, with Hinglish textual transcriptions for each conversation.
The dataset also contains a rich set of socio-demographic features, like income, cultural orientation, amongst several others, for all the participants.
arXiv Detail & Related papers (2020-08-31T17:44:28Z) - Abstractive Summarization of Spoken and Written Instructions with BERT [66.14755043607776]
We present the first application of the BERTSum model to conversational language.
We generate abstractive summaries of narrated instructional videos across a wide variety of topics.
We envision this integrated as a feature in intelligent virtual assistants, enabling them to summarize both written and spoken instructional content upon request.
arXiv Detail & Related papers (2020-08-21T20:59:34Z) - Information-Theoretic Probing for Linguistic Structure [74.04862204427944]
We propose an information-theoretic operationalization of probing as estimating mutual information.
We evaluate on a set of ten typologically diverse languages often underrepresented in NLP research.
arXiv Detail & Related papers (2020-04-07T01:06:36Z) - ORB: An Open Reading Benchmark for Comprehensive Evaluation of Machine
Reading Comprehension [53.037401638264235]
We present an evaluation server, ORB, that reports performance on seven diverse reading comprehension datasets.
The evaluation server places no restrictions on how models are trained, so it is a suitable test bed for exploring training paradigms and representation learning.
arXiv Detail & Related papers (2019-12-29T07:27:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.