SemEval-2020 Task 9: Overview of Sentiment Analysis of Code-Mixed Tweets
- URL: http://arxiv.org/abs/2008.04277v1
- Date: Mon, 10 Aug 2020 17:17:52 GMT
- Title: SemEval-2020 Task 9: Overview of Sentiment Analysis of Code-Mixed Tweets
- Authors: Parth Patwa and Gustavo Aguilar and Sudipta Kar and Suraj Pandey and
Srinivas PYKL and Bj\"orn Gamb\"ack and Tanmoy Chakraborty and Thamar Solorio
and Amitava Das
- Abstract summary: We present the results of the SemEval-2020 Task 9 on Sentiment Analysis of Code-Mixed Tweets (SentiMix 2020)
We release and describe our Hinglish (Hindi-English) and Spanglish (Spanish-English) corpora annotated with word-level language identification and sentence-level sentiment labels.
- Score: 29.74702868712367
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we present the results of the SemEval-2020 Task 9 on Sentiment
Analysis of Code-Mixed Tweets (SentiMix 2020). We also release and describe our
Hinglish (Hindi-English) and Spanglish (Spanish-English) corpora annotated with
word-level language identification and sentence-level sentiment labels. These
corpora are comprised of 20K and 19K examples, respectively. The sentiment
labels are - Positive, Negative, and Neutral. SentiMix attracted 89 submissions
in total including 61 teams that participated in the Hinglish contest and 28
submitted systems to the Spanglish competition. The best performance achieved
was 75.0% F1 score for Hinglish and 80.6% F1 for Spanglish. We observe that
BERT-like models and ensemble methods are the most common and successful
approaches among the participants.
Related papers
- SemEval 2024 -- Task 10: Emotion Discovery and Reasoning its Flip in
Conversation (EDiReF) [61.49972925493912]
SemEval-2024 Task 10 is a shared task centred on identifying emotions in code-mixed dialogues.
This task comprises three distinct subtasks - emotion recognition in conversation for code-mixed dialogues, emotion flip reasoning for code-mixed dialogues, and emotion flip reasoning for English dialogues.
A total of 84 participants engaged in this task, with the most adept systems attaining F1-scores of 0.70, 0.79, and 0.76 for the respective subtasks.
arXiv Detail & Related papers (2024-02-29T08:20:06Z) - Transformer-based Model for Word Level Language Identification in
Code-mixed Kannada-English Texts [55.41644538483948]
We propose the use of a Transformer based model for word-level language identification in code-mixed Kannada English texts.
The proposed model on the CoLI-Kenglish dataset achieves a weighted F1-score of 0.84 and a macro F1-score of 0.61.
arXiv Detail & Related papers (2022-11-26T02:39:19Z) - Overview of Abusive and Threatening Language Detection in Urdu at FIRE
2021 [50.591267188664666]
We present two shared tasks of abusive and threatening language detection for the Urdu language.
We present two manually annotated datasets containing tweets labelled as (i) Abusive and Non-Abusive, and (ii) Threatening and Non-Threatening.
For both subtasks, m-Bert based transformer model showed the best performance.
arXiv Detail & Related papers (2022-07-14T07:38:13Z) - WLV-RIT at HASOC-Dravidian-CodeMix-FIRE2020: Offensive Language
Identification in Code-switched YouTube Comments [16.938836887702923]
This paper describes the WLV-RIT entry to the Hate Speech and Offensive Content Identification in Indo-European languages task 2020.
The HASOC 2020 organizers provided participants with datasets containing social media posts of code-mixed in Dravidian languages (Malayalam-English and Tamil-English)
Our system achieved 0.89 weighted average F1 score for the test set and it ranked 5th place out of 12 participants.
arXiv Detail & Related papers (2020-11-01T16:52:08Z) - LT3 at SemEval-2020 Task 9: Cross-lingual Embeddings for Sentiment
Analysis of Hinglish Social Media Text [1.0152838128195465]
We investigate two approaches to solve the task of Hinglish sentiment analysis.
The first approach uses cross-lingual embeddings resulting from projecting Hinglish and pre-trained English FastText word embeddings.
The second approach incorporates pre-trained English embeddings that are incrementally retrained with a set of Hinglish tweets.
arXiv Detail & Related papers (2020-10-21T14:03:16Z) - NLP-CIC at SemEval-2020 Task 9: Analysing sentiment in code-switching
language using a simple deep-learning classifier [63.137661897716555]
Code-switching is a phenomenon in which two or more languages are used in the same message.
We use a standard convolutional neural network model to predict the sentiment of tweets in a blend of Spanish and English languages.
arXiv Detail & Related papers (2020-09-07T19:57:09Z) - SemEval-2020 Task 10: Emphasis Selection for Written Text in Visual
Media [50.29389719723529]
We present the main findings and compare the results of SemEval-2020 Task 10, Emphasis Selection for Written Text in Visual Media.
The goal of this shared task is to design automatic methods for emphasis selection.
The analysis of systems submitted to the task indicates that BERT and RoBERTa were the most common choice of pre-trained models used.
arXiv Detail & Related papers (2020-08-07T17:24:53Z) - Reed at SemEval-2020 Task 9: Fine-Tuning and Bag-of-Words Approaches to
Code-Mixed Sentiment Analysis [1.2147145617662432]
We explore the task of sentiment analysis on Hinglish (code-mixed Hindi-English) tweets as participants of Task 9 of the SemEval-2020 competition, known as the SentiMix task.
We had two main approaches: 1) applying transfer learning by fine-tuning pre-trained BERT models and 2) training feedforward neural networks on bag-of-words representations.
During the evaluation phase of the competition, we obtained an F-score of 71.3% with our best model, which placed $4th$ out of 62 entries in the official system rankings.
arXiv Detail & Related papers (2020-07-26T05:48:46Z) - NITS-Hinglish-SentiMix at SemEval-2020 Task 9: Sentiment Analysis For
Code-Mixed Social Media Text Using an Ensemble Model [1.1265248232450553]
This work proposes a system named NITS-Hinglish-SentiMix to viably complete the sentiment analysis of code-mixed Hinglish text.
The proposed framework has recorded an F-Score of 0.617 on the test data.
arXiv Detail & Related papers (2020-07-23T15:45:12Z) - CS-Embed at SemEval-2020 Task 9: The effectiveness of code-switched word
embeddings for sentiment analysis [0.5908471365011942]
We present word-embedding trained on code-switched tweets, specifically those that make use of Spanish and English, known as Spanglish.
We utilise them to train a sentiment classifier that achieves an F-1 score of 0.722.
This is higher than the baseline for the competition of 0.656, with our team ranking 14 out of 29 participating teams, beating the baseline.
arXiv Detail & Related papers (2020-06-08T13:48:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.