Unsupervised Sentiment Analysis for Code-mixed Data
- URL: http://arxiv.org/abs/2001.11384v1
- Date: Mon, 20 Jan 2020 06:12:12 GMT
- Title: Unsupervised Sentiment Analysis for Code-mixed Data
- Authors: Siddharth Yadav, Tanmoy Chakraborty
- Abstract summary: We introduce methods that use different kinds of multilingual and cross-lingual embeddings to efficiently transfer knowledge from monolingual text to code-mixed text.
Our methods beat state-of-the-art on English-Spanish code-mixed sentiment analysis by absolute 3% F1-score.
- Score: 33.939487457110566
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Code-mixing is the practice of alternating between two or more languages.
Mostly observed in multilingual societies, its occurrence is increasing and
therefore its importance. A major part of sentiment analysis research has been
monolingual, and most of them perform poorly on code-mixed text. In this work,
we introduce methods that use different kinds of multilingual and cross-lingual
embeddings to efficiently transfer knowledge from monolingual text to
code-mixed text for sentiment analysis of code-mixed text. Our methods can
handle code-mixed text through a zero-shot learning. Our methods beat
state-of-the-art on English-Spanish code-mixed sentiment analysis by absolute
3\% F1-score. We are able to achieve 0.58 F1-score (without parallel corpus)
and 0.62 F1-score (with parallel corpus) on the same benchmark in a zero-shot
way as compared to 0.68 F1-score in supervised settings. Our code is publicly
available.
Related papers
- Transformer-based Model for Word Level Language Identification in
Code-mixed Kannada-English Texts [55.41644538483948]
We propose the use of a Transformer based model for word-level language identification in code-mixed Kannada English texts.
The proposed model on the CoLI-Kenglish dataset achieves a weighted F1-score of 0.84 and a macro F1-score of 0.61.
arXiv Detail & Related papers (2022-11-26T02:39:19Z) - Sentiment Classification of Code-Switched Text using Pre-trained
Multilingual Embeddings and Segmentation [1.290382979353427]
We propose a multi-step natural language processing algorithm for code-switched sentiment analysis.
The proposed algorithm can be expanded for sentiment analysis of multiple languages with limited human expertise.
arXiv Detail & Related papers (2022-10-29T01:52:25Z) - CMSAOne@Dravidian-CodeMix-FIRE2020: A Meta Embedding and Transformer
model for Code-Mixed Sentiment Analysis on Social Media Text [9.23545668304066]
Code-mixing (CM) is a frequently observed phenomenon that uses multiple languages in an utterance or sentence.
Sentiment analysis (SA) is a fundamental step in NLP and is well studied in the monolingual text.
This paper proposes a meta embedding with a transformer method for sentiment analysis on the Dravidian code-mixed dataset.
arXiv Detail & Related papers (2021-01-22T08:48:27Z) - FILTER: An Enhanced Fusion Method for Cross-lingual Language
Understanding [85.29270319872597]
We propose an enhanced fusion method that takes cross-lingual data as input for XLM finetuning.
During inference, the model makes predictions based on the text input in the target language and its translation in the source language.
To tackle this issue, we propose an additional KL-divergence self-teaching loss for model training, based on auto-generated soft pseudo-labels for translated text in the target language.
arXiv Detail & Related papers (2020-09-10T22:42:15Z) - NLP-CIC at SemEval-2020 Task 9: Analysing sentiment in code-switching
language using a simple deep-learning classifier [63.137661897716555]
Code-switching is a phenomenon in which two or more languages are used in the same message.
We use a standard convolutional neural network model to predict the sentiment of tweets in a blend of Spanish and English languages.
arXiv Detail & Related papers (2020-09-07T19:57:09Z) - C1 at SemEval-2020 Task 9: SentiMix: Sentiment Analysis for Code-Mixed
Social Media Text using Feature Engineering [0.9646922337783134]
This paper describes our feature engineering approach to sentiment analysis in code-mixed social media text for SemEval-2020 Task 9: SentiMix.
We are able to obtain a weighted F1 score of 0.65 for the "Hinglish" task and 0.63 for the "Spanglish" tasks.
arXiv Detail & Related papers (2020-08-09T00:46:26Z) - Voice@SRIB at SemEval-2020 Task 9 and 12: Stacked Ensembling method for
Sentiment and Offensiveness detection in Social Media [2.9008108937701333]
We train embeddings, ensembling methods for Sentimix, and OffensEval tasks.
We evaluate our models on macro F1-score, precision, accuracy, and recall on the datasets.
arXiv Detail & Related papers (2020-07-20T11:54:43Z) - A Sentiment Analysis Dataset for Code-Mixed Malayalam-English [0.8454131372606295]
This paper presents a new gold standard corpus for sentiment analysis of code-mixed text in Malayalam-English annotated by voluntary annotators.
We use this new corpus to provide the benchmark for sentiment analysis in Malayalam-English code-mixed texts.
arXiv Detail & Related papers (2020-05-30T07:32:37Z) - A Multi-Perspective Architecture for Semantic Code Search [58.73778219645548]
We propose a novel multi-perspective cross-lingual neural framework for code--text matching.
Our experiments on the CoNaLa dataset show that our proposed model yields better performance than previous approaches.
arXiv Detail & Related papers (2020-05-06T04:46:11Z) - Knowledge Distillation for Multilingual Unsupervised Neural Machine
Translation [61.88012735215636]
Unsupervised neural machine translation (UNMT) has recently achieved remarkable results for several language pairs.
UNMT can only translate between a single language pair and cannot produce translation results for multiple language pairs at the same time.
In this paper, we empirically introduce a simple method to translate between thirteen languages using a single encoder and a single decoder.
arXiv Detail & Related papers (2020-04-21T17:26:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.