Related papers: NLP-CIC at SemEval-2020 Task 9: Analysing sentiment in code-switching language using a simple deep-learning classifier

NLP-CIC at SemEval-2020 Task 9: Analysing sentiment in code-switching language using a simple deep-learning classifier

URL: http://arxiv.org/abs/2009.03397v1
Date: Mon, 7 Sep 2020 19:57:09 GMT
Title: NLP-CIC at SemEval-2020 Task 9: Analysing sentiment in code-switching language using a simple deep-learning classifier
Authors: Jason Angel, Segun Taofeek Aroyehun, Antonio Tamayo and Alexander Gelbukh
Abstract summary: Code-switching is a phenomenon in which two or more languages are used in the same message. We use a standard convolutional neural network model to predict the sentiment of tweets in a blend of Spanish and English languages.
Score: 63.137661897716555
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Code-switching is a phenomenon in which two or more languages are used in the same message. Nowadays, it is quite common to find messages with languages mixed in social media. This phenomenon presents a challenge for sentiment analysis. In this paper, we use a standard convolutional neural network model to predict the sentiment of tweets in a blend of Spanish and English languages. Our simple approach achieved a F1-score of 0.71 on test set on the competition. We analyze our best model capabilities and perform error analysis to expose important difficulties for classifying sentiment in a code-switching setting.

Related papers

Adapting Language Balance in Code-Switching Speech [60.296574524609575]
Large foundational models still struggle against code-switching test cases.<n>We use differentiable surrogates to mitigate context bias during generation.<n>Experiments with Arabic and Chinese-English showed that the models are able to predict the switching places more correctly.
arXiv Detail & Related papers (2025-10-21T15:23:55Z)
Mechanistic Understanding and Mitigation of Language Confusion in English-Centric Large Language Models [56.61984030508691]
We present the first mechanistic interpretability study of language confusion.<n>We show that confusion points (CPs) are central to this phenomenon.<n>We show that editing a small set of critical neurons, identified via comparative analysis with a multilingual-tuned counterpart, substantially mitigates confusion.
arXiv Detail & Related papers (2025-05-22T11:29:17Z)
Investigating and Scaling up Code-Switching for Multilingual Language Model Pre-Training [58.696660064190475]
We find that the existence of code-switching, alternating between different languages within a context, is key to multilingual capabilities. To better explore the power of code-switching for language alignment during pre-training, we investigate the strategy of synthetic code-switching.
arXiv Detail & Related papers (2025-04-02T15:09:58Z)
Training Neural Networks as Recognizers of Formal Languages [87.06906286950438]
Formal language theory pertains specifically to recognizers. It is common to instead use proxy tasks that are similar in only an informal sense. We correct this mismatch by training and evaluating neural networks directly as binary classifiers of strings.
arXiv Detail & Related papers (2024-11-11T16:33:25Z)
Transformer-based Model for Word Level Language Identification in Code-mixed Kannada-English Texts [55.41644538483948]
We propose the use of a Transformer based model for word-level language identification in code-mixed Kannada English texts. The proposed model on the CoLI-Kenglish dataset achieves a weighted F1-score of 0.84 and a macro F1-score of 0.61.
arXiv Detail & Related papers (2022-11-26T02:39:19Z)
Reducing language context confusion for end-to-end code-switching automatic speech recognition [50.89821865949395]
We propose a language-related attention mechanism to reduce multilingual context confusion for the E2E code-switching ASR model. By calculating the respective attention of multiple languages, our method can efficiently transfer language knowledge from rich monolingual data.
arXiv Detail & Related papers (2022-01-28T14:39:29Z)
Zero-Shot Cross-lingual Semantic Parsing [56.95036511882921]
We study cross-lingual semantic parsing as a zero-shot problem without parallel data for 7 test languages. We propose a multi-task encoder-decoder model to transfer parsing knowledge to additional languages using only English-Logical form paired data. Our system frames zero-shot parsing as a latent-space alignment problem and finds that pre-trained models can be improved to generate logical forms with minimal cross-lingual transfer penalty.
arXiv Detail & Related papers (2021-04-15T16:08:43Z)
kk2018 at SemEval-2020 Task 9: Adversarial Training for Code-Mixing Sentiment Classification [18.41476971318978]
Code switching is a linguistic phenomenon that may occur within a multilingual setting where speakers share more than one language. In this work, the domain transfer learning from state-of-the-art uni-language model ERNIE is tested on the code-mixing dataset. adversarial training with a multi-lingual model is used to achieve 1st place of SemEval-2020 Task 9 Hindi-English sentiment classification competition.
arXiv Detail & Related papers (2020-09-08T12:20:04Z)
ULD@NUIG at SemEval-2020 Task 9: Generative Morphemes with an Attention Model for Sentiment Analysis in Code-Mixed Text [1.4926515182392508]
We present the Generative Morphemes with Attention (GenMA) Model sentiment analysis system contributed to SemEval 2020 Task 9 SentiMix. The system aims to predict the sentiments of the given English-Hindi code-mixed tweets without using word-level language tags.
arXiv Detail & Related papers (2020-07-27T23:58:54Z)
IUST at SemEval-2020 Task 9: Sentiment Analysis for Code-Mixed Social Media Text using Deep Neural Networks and Linear Baselines [6.866104126509981]
We develop a system to predict the sentiment of a given code-mixed tweet. Our best performing method obtains an F1 score of 0.751 for the Spanish-English sub-task and 0.706 over the Hindi-English sub-task.
arXiv Detail & Related papers (2020-07-24T18:48:37Z)
JUNLP@SemEval-2020 Task 9:Sentiment Analysis of Hindi-English code mixed data using Grid Search Cross Validation [3.5169472410785367]
We focus on working out a plausible solution to the domain of Code-Mixed Sentiment Analysis. This work was done as participation in the SemEval-2020 Sentimix Task.
arXiv Detail & Related papers (2020-07-24T15:06:48Z)
IIT Gandhinagar at SemEval-2020 Task 9: Code-Mixed Sentiment Classification Using Candidate Sentence Generation and Selection [1.2301855531996841]
Code-mixing adds to the challenge of analyzing the sentiment of the text due to the non-standard writing style. We present a candidate sentence generation and selection based approach on top of the Bi-LSTM based neural classifier. The proposed approach shows an improvement in the system performance as compared to the Bi-LSTM based neural classifier.
arXiv Detail & Related papers (2020-06-25T14:59:47Z)
Rnn-transducer with language bias for end-to-end Mandarin-English code-switching speech recognition [58.105818353866354]
We propose an improved recurrent neural network transducer (RNN-T) model with language bias to alleviate the problem. We use the language identities to bias the model to predict the CS points. This promotes the model to learn the language identity information directly from transcription, and no additional LID model is needed.
arXiv Detail & Related papers (2020-02-19T12:01:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.