Decision Tree J48 at SemEval-2020 Task 9: Sentiment Analysis for
Code-Mixed Social Media Text (Hinglish)
- URL: http://arxiv.org/abs/2008.11398v1
- Date: Wed, 26 Aug 2020 06:30:43 GMT
- Title: Decision Tree J48 at SemEval-2020 Task 9: Sentiment Analysis for
Code-Mixed Social Media Text (Hinglish)
- Authors: Gaurav Singh
- Abstract summary: This system uses Weka as a tool for providing the classifier for the classification of tweets.
python is used for loading the data from the files provided and cleaning it.
The system performance was assessed using the official competition evaluation metric F1-score.
- Score: 3.007778295477907
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper discusses the design of the system used for providing a solution
for the problem given at SemEval-2020 Task 9 where sentiment analysis of
code-mixed language Hindi and English needed to be performed. This system uses
Weka as a tool for providing the classifier for the classification of tweets
and python is used for loading the data from the files provided and cleaning
it. Only part of the training data was provided to the system for classifying
the tweets in the test data set on which evaluation of the system was done. The
system performance was assessed using the official competition evaluation
metric F1-score. Classifier was trained on two sets of training data which
resulted in F1 scores of 0.4972 and 0.5316.
Related papers
- SEAHORSE: A Multilingual, Multifaceted Dataset for Summarization
Evaluation [52.186343500576214]
We introduce SEAHORSE, a dataset for multilingual, multifaceted summarization evaluation.
SEAHORSE consists of 96K summaries with human ratings along 6 dimensions of text quality.
We show that metrics trained with SEAHORSE achieve strong performance on the out-of-domain meta-evaluation benchmarks TRUE and mFACE.
arXiv Detail & Related papers (2023-05-22T16:25:07Z) - Using Rater and System Metadata to Explain Variance in the VoiceMOS
Challenge 2022 Dataset [71.93633698146002]
The VoiceMOS 2022 challenge provided a dataset of synthetic voice conversion and text-to-speech samples with subjective labels.
This study looks at the amount of variance that can be explained in subjective ratings of speech quality from metadata and the distribution imbalances of the dataset.
arXiv Detail & Related papers (2022-09-14T00:45:49Z) - UrduFake@FIRE2020: Shared Track on Fake News Identification in Urdu [62.6928395368204]
This paper gives the overview of the first shared task at FIRE 2020 on fake news detection in the Urdu language.
The goal is to identify fake news using a dataset composed of 900 annotated news articles for training and 400 news articles for testing.
The dataset contains news in five domains: (i) Health, (ii) Sports, (iii) Showbiz, (iv) Technology, and (v) Business.
arXiv Detail & Related papers (2022-07-25T03:46:51Z) - Segment-level Metric Learning for Few-shot Bioacoustic Event Detection [56.59107110017436]
We propose a segment-level few-shot learning framework that utilizes both the positive and negative events during model optimization.
Our system achieves an F-measure of 62.73 on the DCASE 2022 challenge task 5 (DCASE2022-T5) validation set, outperforming the performance of the baseline prototypical network 34.02 by a large margin.
arXiv Detail & Related papers (2022-07-15T22:41:30Z) - Detecting Handwritten Mathematical Terms with Sensor Based Data [71.84852429039881]
We propose a solution to the UbiComp 2021 Challenge by Stabilo in which handwritten mathematical terms are supposed to be automatically classified.
The input data set contains data of different writers, with label strings constructed from a total of 15 different possible characters.
arXiv Detail & Related papers (2021-09-12T19:33:34Z) - Sentiment Analysis of Code-Mixed Social Media Text (Hinglish) [4.081440927534578]
Various stages involved in performing the sentiment analysis were data consolidation, data cleaning, data transformation and modelling.
The models were created using various machine learning algorithms such as SVM, KNN, Decision Trees, Random Forests, Naive Bayes, Logistic Regression, and ensemble voting classifiers.
arXiv Detail & Related papers (2021-02-24T09:15:34Z) - KBCNMUJAL@HASOC-Dravidian-CodeMix-FIRE2020: Using Machine Learning for
Detection of Hate Speech and Offensive Code-Mixed Social Media text [1.0499611180329804]
This paper describes the system submitted by our team, KBCNMUJAL, for Task 2 of the shared task Hate Speech and Offensive Content Identification in Indo-European languages.
The datasets of two Dravidian languages Viz. Malayalam and Tamil of size 4000 observations, each were shared by the HASOC organizers.
The best performing classification models developed for both languages are applied on test datasets.
arXiv Detail & Related papers (2021-02-19T11:08:02Z) - WESSA at SemEval-2020 Task 9: Code-Mixed Sentiment Analysis using
Transformers [0.0]
We describe our system submitted for SemEval 2020 Task 9, Sentiment Analysis for Code-Mixed Social Media Text.
Our best performing system is a Transfer Learning-based model that fine-tunes "XLM-RoBERTa"
For later submissions, our system manages to achieve a 75.9% average F1-Score on the test set using CodaLab username "ahmed0sultan"
arXiv Detail & Related papers (2020-09-21T13:59:24Z) - NAYEL at SemEval-2020 Task 12: TF/IDF-Based Approach for Automatic
Offensive Language Detection in Arabic Tweets [0.0]
The proposed system aims to automatically identify the Offensive Language in Arabic Tweets.
A machine learning based approach has been used to design our system.
The best performed system and the system in the last rank reported 90.17% and 44.51% f1-score on test set respectively.
arXiv Detail & Related papers (2020-07-27T07:44:00Z) - Voice@SRIB at SemEval-2020 Task 9 and 12: Stacked Ensembling method for
Sentiment and Offensiveness detection in Social Media [2.9008108937701333]
We train embeddings, ensembling methods for Sentimix, and OffensEval tasks.
We evaluate our models on macro F1-score, precision, accuracy, and recall on the datasets.
arXiv Detail & Related papers (2020-07-20T11:54:43Z) - Overview of the TREC 2019 Fair Ranking Track [65.15263872493799]
The goal of the TREC Fair Ranking track was to develop a benchmark for evaluating retrieval systems in terms of fairness to different content providers.
This paper presents an overview of the track, including the task definition, descriptions of the data and the annotation process.
arXiv Detail & Related papers (2020-03-25T21:34:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.