CS-Embed at SemEval-2020 Task 9: The effectiveness of code-switched word
embeddings for sentiment analysis
- URL: http://arxiv.org/abs/2006.04597v2
- Date: Mon, 7 Sep 2020 10:39:45 GMT
- Title: CS-Embed at SemEval-2020 Task 9: The effectiveness of code-switched word
embeddings for sentiment analysis
- Authors: Frances Adriana Laureano De Leon and Florimond Gu\'eniat and Harish
Tayyar Madabushi
- Abstract summary: We present word-embedding trained on code-switched tweets, specifically those that make use of Spanish and English, known as Spanglish.
We utilise them to train a sentiment classifier that achieves an F-1 score of 0.722.
This is higher than the baseline for the competition of 0.656, with our team ranking 14 out of 29 participating teams, beating the baseline.
- Score: 0.5908471365011942
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The growing popularity and applications of sentiment analysis of social media
posts has naturally led to sentiment analysis of posts written in multiple
languages, a practice known as code-switching. While recent research into
code-switched posts has focused on the use of multilingual word embeddings,
these embeddings were not trained on code-switched data. In this work, we
present word-embeddings trained on code-switched tweets, specifically those
that make use of Spanish and English, known as Spanglish. We explore the
embedding space to discover how they capture the meanings of words in both
languages. We test the effectiveness of these embeddings by participating in
SemEval 2020 Task 9: ~\emph{Sentiment Analysis on Code-Mixed Social Media
Text}. We utilised them to train a sentiment classifier that achieves an F-1
score of 0.722. This is higher than the baseline for the competition of 0.656,
with our team (codalab username \emph{francesita}) ranking 14 out of 29
participating teams, beating the baseline.
Related papers
- Cross-lingual Contextualized Phrase Retrieval [63.80154430930898]
We propose a new task formulation of dense retrieval, cross-lingual contextualized phrase retrieval.
We train our Cross-lingual Contextualized Phrase Retriever (CCPR) using contrastive learning.
On the phrase retrieval task, CCPR surpasses baselines by a significant margin, achieving a top-1 accuracy that is at least 13 points higher.
arXiv Detail & Related papers (2024-03-25T14:46:51Z) - SemEval 2024 -- Task 10: Emotion Discovery and Reasoning its Flip in
Conversation (EDiReF) [61.49972925493912]
SemEval-2024 Task 10 is a shared task centred on identifying emotions in code-mixed dialogues.
This task comprises three distinct subtasks - emotion recognition in conversation for code-mixed dialogues, emotion flip reasoning for code-mixed dialogues, and emotion flip reasoning for English dialogues.
A total of 84 participants engaged in this task, with the most adept systems attaining F1-scores of 0.70, 0.79, and 0.76 for the respective subtasks.
arXiv Detail & Related papers (2024-02-29T08:20:06Z) - Transformer-based Model for Word Level Language Identification in
Code-mixed Kannada-English Texts [55.41644538483948]
We propose the use of a Transformer based model for word-level language identification in code-mixed Kannada English texts.
The proposed model on the CoLI-Kenglish dataset achieves a weighted F1-score of 0.84 and a macro F1-score of 0.61.
arXiv Detail & Related papers (2022-11-26T02:39:19Z) - Sentiment-Aware Word and Sentence Level Pre-training for Sentiment
Analysis [64.70116276295609]
SentiWSP is a Sentiment-aware pre-trained language model with combined Word-level and Sentence-level Pre-training tasks.
SentiWSP achieves new state-of-the-art performance on various sentence-level and aspect-level sentiment classification benchmarks.
arXiv Detail & Related papers (2022-10-18T12:25:29Z) - NLP-CIC at SemEval-2020 Task 9: Analysing sentiment in code-switching
language using a simple deep-learning classifier [63.137661897716555]
Code-switching is a phenomenon in which two or more languages are used in the same message.
We use a standard convolutional neural network model to predict the sentiment of tweets in a blend of Spanish and English languages.
arXiv Detail & Related papers (2020-09-07T19:57:09Z) - C1 at SemEval-2020 Task 9: SentiMix: Sentiment Analysis for Code-Mixed
Social Media Text using Feature Engineering [0.9646922337783134]
This paper describes our feature engineering approach to sentiment analysis in code-mixed social media text for SemEval-2020 Task 9: SentiMix.
We are able to obtain a weighted F1 score of 0.65 for the "Hinglish" task and 0.63 for the "Spanglish" tasks.
arXiv Detail & Related papers (2020-08-09T00:46:26Z) - Writer Identification Using Microblogging Texts for Social Media
Forensics [53.180678723280145]
We evaluate popular stylometric features, widely used in literary analysis, and specific Twitter features like URLs, hashtags, replies or quotes.
We test varying sized author sets and varying amounts of training/test texts per author.
arXiv Detail & Related papers (2020-07-31T00:23:18Z) - JUNLP@SemEval-2020 Task 9:Sentiment Analysis of Hindi-English code mixed
data using Grid Search Cross Validation [3.5169472410785367]
We focus on working out a plausible solution to the domain of Code-Mixed Sentiment Analysis.
This work was done as participation in the SemEval-2020 Sentimix Task.
arXiv Detail & Related papers (2020-07-24T15:06:48Z) - BAKSA at SemEval-2020 Task 9: Bolstering CNN with Self-Attention for
Sentiment Analysis of Code Mixed Text [4.456122555367167]
We present an ensemble architecture of convolutional neural net (CNN) and self-attention based LSTM for sentiment analysis of code-mixed tweets.
We achieved F1 scores of 0.707 and 0.725 on Hindi-English (Hinglish) and Spanish-English (Spanglish) datasets, respectively.
arXiv Detail & Related papers (2020-07-21T14:05:51Z) - Voice@SRIB at SemEval-2020 Task 9 and 12: Stacked Ensembling method for
Sentiment and Offensiveness detection in Social Media [2.9008108937701333]
We train embeddings, ensembling methods for Sentimix, and OffensEval tasks.
We evaluate our models on macro F1-score, precision, accuracy, and recall on the datasets.
arXiv Detail & Related papers (2020-07-20T11:54:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.