Sentiment Analysis for YouTube Comments in Roman Urdu
- URL: http://arxiv.org/abs/2102.10075v1
- Date: Fri, 19 Feb 2021 18:15:52 GMT
- Title: Sentiment Analysis for YouTube Comments in Roman Urdu
- Authors: Tooba Tehreem (Hira Tahir National University of Computer and Emerging
Sciences Islamabad, Pakistan)
- Abstract summary: In Pakistan, a huge amount of data is in roman Urdu language, it is scattered all over the social sites including Twitter, YouTube, Facebook and similar applications.
In this study the focus domain of dataset gathering is YouTube comments.
The dataset contains the comments of people over different Pakistani dramas and TV shows.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Sentiment analysis is a vast area in the Machine learning domain. A lot of
work is done on datasets and their analysis of the English Language. In
Pakistan, a huge amount of data is in roman Urdu language, it is scattered all
over the social sites including Twitter, YouTube, Facebook and similar
applications. In this study the focus domain of dataset gathering is YouTube
comments. The Dataset contains the comments of people over different Pakistani
dramas and TV shows. The Dataset contains multi-class classification that is
grouped The comments into positive, negative and neutral sentiment. In this
Study comparative analysis is done for five supervised learning Algorithms
including linear regression, SVM, KNN, Multi layer Perceptron and Na\"ive Bayes
classifier. Accuracy, recall, precision and F-measure are used for measuring
performance. Results show that accuracy of SVM is 64 percent, which is better
than the rest of the list.
Related papers
- IndiBias: A Benchmark Dataset to Measure Social Biases in Language Models for Indian Context [32.48196952339581]
We introduce IndiBias, a benchmark dataset for evaluating social biases in the Indian context.
The included bias dimensions encompass gender, religion, caste, age, region, physical appearance, and occupation.
Our dataset contains 800 sentence pairs and 300s for bias measurement across different demographics.
arXiv Detail & Related papers (2024-03-29T12:32:06Z) - Into the LAIONs Den: Investigating Hate in Multimodal Datasets [67.21783778038645]
This paper investigates the effect of scaling datasets on hateful content through a comparative audit of two datasets: LAION-400M and LAION-2B.
We found that hate content increased by nearly 12% with dataset scale, measured both qualitatively and quantitatively.
We also found that filtering dataset contents based on Not Safe For Work (NSFW) values calculated based on images alone does not exclude all the harmful content in alt-text.
arXiv Detail & Related papers (2023-11-06T19:00:05Z) - Unsupervised Sentiment Analysis of Plastic Surgery Social Media Posts [91.3755431537592]
The massive collection of user posts across social media platforms is primarily untapped for artificial intelligence (AI) use cases.
Natural language processing (NLP) is a subfield of AI that leverages bodies of documents, known as corpora, to train computers in human-like language understanding.
This study demonstrates that the applied results of unsupervised analysis allow a computer to predict either negative, positive, or neutral user sentiment towards plastic surgery.
arXiv Detail & Related papers (2023-07-05T20:16:20Z) - Constructing Colloquial Dataset for Persian Sentiment Analysis of Social
Microblogs [0.0]
This paper first constructs a user opinion dataset called ITRC-Opinion in a collaborative environment and insource way.
Our dataset contains 60,000 informal and colloquial Persian texts from social microblogs such as Twitter and Instagram.
Second, this study proposes a new architecture based on the convolutional neural network (CNN) model for more effective sentiment analysis of colloquial text in social microblog posts.
arXiv Detail & Related papers (2023-06-22T05:51:22Z) - SEAHORSE: A Multilingual, Multifaceted Dataset for Summarization
Evaluation [52.186343500576214]
We introduce SEAHORSE, a dataset for multilingual, multifaceted summarization evaluation.
SEAHORSE consists of 96K summaries with human ratings along 6 dimensions of text quality.
We show that metrics trained with SEAHORSE achieve strong performance on the out-of-domain meta-evaluation benchmarks TRUE and mFACE.
arXiv Detail & Related papers (2023-05-22T16:25:07Z) - Urdu Speech and Text Based Sentiment Analyzer [1.4630964945453113]
This work presented a new multi-class Urdu dataset based on user evaluations.
Our proposed dataset includes 10,000 reviews that have been carefully classified into two categories by human experts: positive, negative.
Five different lexicon- and rule-based algorithms including Naivebayes, Stanza, Textblob, Vader, and Flair are employed and the experimental results show that Flair with an accuracy of 70% outperforms other tested algorithms.
arXiv Detail & Related papers (2022-07-19T10:11:22Z) - L3CubeMahaSent: A Marathi Tweet-based Sentiment Analysis Dataset [0.0]
This paper presents the first major publicly available Marathi Sentiment Analysis dataset - L3MahaSent.
It is curated using tweets extracted from various Maharashtrian personalities' Twitter accounts.
Our dataset consists of 16,000 distinct tweets classified in three broad classes viz. positive, negative, and neutral.
arXiv Detail & Related papers (2021-03-21T14:22:13Z) - Hate Speech detection in the Bengali language: A dataset and its
baseline evaluation [0.8793721044482612]
This paper presents a new dataset of 30,000 user comments tagged by crowd sourcing and varified by experts.
All the comments are collected from YouTube and Facebook comment section and classified into seven categories.
A total of 50 annotators annotated each comment three times and the majority vote was taken as the final annotation.
arXiv Detail & Related papers (2020-12-17T15:53:54Z) - Abstractive Summarization of Spoken and Written Instructions with BERT [66.14755043607776]
We present the first application of the BERTSum model to conversational language.
We generate abstractive summaries of narrated instructional videos across a wide variety of topics.
We envision this integrated as a feature in intelligent virtual assistants, enabling them to summarize both written and spoken instructional content upon request.
arXiv Detail & Related papers (2020-08-21T20:59:34Z) - Trawling for Trolling: A Dataset [56.1778095945542]
We present a dataset that models trolling as a subcategory of offensive content.
The dataset has 12,490 samples, split across 5 classes; Normal, Profanity, Trolling, Derogatory and Hate Speech.
arXiv Detail & Related papers (2020-08-02T17:23:55Z) - ORB: An Open Reading Benchmark for Comprehensive Evaluation of Machine
Reading Comprehension [53.037401638264235]
We present an evaluation server, ORB, that reports performance on seven diverse reading comprehension datasets.
The evaluation server places no restrictions on how models are trained, so it is a suitable test bed for exploring training paradigms and representation learning.
arXiv Detail & Related papers (2019-12-29T07:27:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.