Sentiment Analysis of Code-Mixed Social Media Text (Hinglish)
- URL: http://arxiv.org/abs/2102.12149v1
- Date: Wed, 24 Feb 2021 09:15:34 GMT
- Title: Sentiment Analysis of Code-Mixed Social Media Text (Hinglish)
- Authors: Gaurav Singh
- Abstract summary: Various stages involved in performing the sentiment analysis were data consolidation, data cleaning, data transformation and modelling.
The models were created using various machine learning algorithms such as SVM, KNN, Decision Trees, Random Forests, Naive Bayes, Logistic Regression, and ensemble voting classifiers.
- Score: 4.081440927534578
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper discusses the results obtained for different techniques applied
for performing the sentiment analysis of social media (Twitter) code-mixed text
written in Hinglish. The various stages involved in performing the sentiment
analysis were data consolidation, data cleaning, data transformation and
modelling. Various data cleaning techniques were applied, data was cleaned in
five iterations and the results of experiments conducted were noted after each
iteration. Data was transformed using count vectorizer, one hot vectorizer,
tf-idf vectorizer, doc2vec, word2vec and fasttext embeddings. The models were
created using various machine learning algorithms such as SVM, KNN, Decision
Trees, Random Forests, Naive Bayes, Logistic Regression, and ensemble voting
classifiers. The data was obtained from a task on Codalab competition website
which was listed as Task:9 on the Semeval-2020 competition website. The models
created were evaluated using the F1-score (macro). The best F1-score of 69.07
was achieved using ensemble voting classifier.
Related papers
- Unsupervised Sentiment Analysis of Plastic Surgery Social Media Posts [91.3755431537592]
The massive collection of user posts across social media platforms is primarily untapped for artificial intelligence (AI) use cases.
Natural language processing (NLP) is a subfield of AI that leverages bodies of documents, known as corpora, to train computers in human-like language understanding.
This study demonstrates that the applied results of unsupervised analysis allow a computer to predict either negative, positive, or neutral user sentiment towards plastic surgery.
arXiv Detail & Related papers (2023-07-05T20:16:20Z) - MSeg: A Composite Dataset for Multi-domain Semantic Segmentation [100.17755160696939]
We present MSeg, a composite dataset that unifies semantic segmentation datasets from different domains.
We reconcile the generalization and bring the pixel-level annotations into alignment by relabeling more than 220,000 object masks in more than 80,000 images.
A model trained on MSeg ranks first on the WildDash-v1 leaderboard for robust semantic segmentation, with no exposure to WildDash data during training.
arXiv Detail & Related papers (2021-12-27T16:16:35Z) - Benchmarking Multimodal AutoML for Tabular Data with Text Fields [83.43249184357053]
We assemble 18 multimodal data tables that each contain some text fields.
Our benchmark enables researchers to evaluate their own methods for supervised learning with numeric, categorical, and text features.
arXiv Detail & Related papers (2021-11-04T09:29:16Z) - Detecting Handwritten Mathematical Terms with Sensor Based Data [71.84852429039881]
We propose a solution to the UbiComp 2021 Challenge by Stabilo in which handwritten mathematical terms are supposed to be automatically classified.
The input data set contains data of different writers, with label strings constructed from a total of 15 different possible characters.
arXiv Detail & Related papers (2021-09-12T19:33:34Z) - KBCNMUJAL@HASOC-Dravidian-CodeMix-FIRE2020: Using Machine Learning for
Detection of Hate Speech and Offensive Code-Mixed Social Media text [1.0499611180329804]
This paper describes the system submitted by our team, KBCNMUJAL, for Task 2 of the shared task Hate Speech and Offensive Content Identification in Indo-European languages.
The datasets of two Dravidian languages Viz. Malayalam and Tamil of size 4000 observations, each were shared by the HASOC organizers.
The best performing classification models developed for both languages are applied on test datasets.
arXiv Detail & Related papers (2021-02-19T11:08:02Z) - Constructing interval variables via faceted Rasch measurement and
multitask deep learning: a hate speech application [63.10266319378212]
We propose a method for measuring complex variables on a continuous, interval spectrum by combining supervised deep learning with the Constructing Measures approach to faceted Rasch item response theory (IRT)
We demonstrate this new method on a dataset of 50,000 social media comments sourced from YouTube, Twitter, and Reddit and labeled by 11,000 U.S.-based Amazon Mechanical Turk workers.
arXiv Detail & Related papers (2020-09-22T02:15:05Z) - Decision Tree J48 at SemEval-2020 Task 9: Sentiment Analysis for
Code-Mixed Social Media Text (Hinglish) [3.007778295477907]
This system uses Weka as a tool for providing the classifier for the classification of tweets.
python is used for loading the data from the files provided and cleaning it.
The system performance was assessed using the official competition evaluation metric F1-score.
arXiv Detail & Related papers (2020-08-26T06:30:43Z) - Abstractive Summarization of Spoken and Written Instructions with BERT [66.14755043607776]
We present the first application of the BERTSum model to conversational language.
We generate abstractive summaries of narrated instructional videos across a wide variety of topics.
We envision this integrated as a feature in intelligent virtual assistants, enabling them to summarize both written and spoken instructional content upon request.
arXiv Detail & Related papers (2020-08-21T20:59:34Z) - Deep Learning Brasil -- NLP at SemEval-2020 Task 9: Overview of
Sentiment Analysis of Code-Mixed Tweets [0.2294014185517203]
In this paper, we describe a methodology to predict sentiment in code-mixed tweets (hindi-english)
Our team called verissimo.manoel in CodaLab developed an approach based on an ensemble of four models.
The final classification algorithm was an ensemble of some predictions of all softmax values from these four models.
arXiv Detail & Related papers (2020-07-28T16:42:41Z) - Voice@SRIB at SemEval-2020 Task 9 and 12: Stacked Ensembling method for
Sentiment and Offensiveness detection in Social Media [2.9008108937701333]
We train embeddings, ensembling methods for Sentimix, and OffensEval tasks.
We evaluate our models on macro F1-score, precision, accuracy, and recall on the datasets.
arXiv Detail & Related papers (2020-07-20T11:54:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.