CMSAOne@Dravidian-CodeMix-FIRE2020: A Meta Embedding and Transformer
model for Code-Mixed Sentiment Analysis on Social Media Text
- URL: http://arxiv.org/abs/2101.09004v1
- Date: Fri, 22 Jan 2021 08:48:27 GMT
- Title: CMSAOne@Dravidian-CodeMix-FIRE2020: A Meta Embedding and Transformer
model for Code-Mixed Sentiment Analysis on Social Media Text
- Authors: Suman Dowlagar, Radhika Mamidi
- Abstract summary: Code-mixing (CM) is a frequently observed phenomenon that uses multiple languages in an utterance or sentence.
Sentiment analysis (SA) is a fundamental step in NLP and is well studied in the monolingual text.
This paper proposes a meta embedding with a transformer method for sentiment analysis on the Dravidian code-mixed dataset.
- Score: 9.23545668304066
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Code-mixing(CM) is a frequently observed phenomenon that uses multiple
languages in an utterance or sentence. CM is mostly practiced on various social
media platforms and in informal conversations. Sentiment analysis (SA) is a
fundamental step in NLP and is well studied in the monolingual text.
Code-mixing adds a challenge to sentiment analysis due to its non-standard
representations. This paper proposes a meta embedding with a transformer method
for sentiment analysis on the Dravidian code-mixed dataset. In our method, we
used meta embeddings to capture rich text representations. We used the proposed
method for the Task: "Sentiment Analysis for Dravidian Languages in Code-Mixed
Text", and it achieved an F1 score of $0.58$ and $0.66$ for the given Dravidian
code mixed data sets. The code is provided in the Github
https://github.com/suman101112/fire-2020-Dravidian-CodeMix.
Related papers
- Uncovering LLM-Generated Code: A Zero-Shot Synthetic Code Detector via Code Rewriting [78.48355455324688]
We propose a novel zero-shot synthetic code detector based on the similarity between the code and its rewritten variants.
Our results demonstrate a notable enhancement over existing synthetic content detectors designed for general texts.
arXiv Detail & Related papers (2024-05-25T08:57:28Z) - Transformer-based Model for Word Level Language Identification in
Code-mixed Kannada-English Texts [55.41644538483948]
We propose the use of a Transformer based model for word-level language identification in code-mixed Kannada English texts.
The proposed model on the CoLI-Kenglish dataset achieves a weighted F1-score of 0.84 and a macro F1-score of 0.61.
arXiv Detail & Related papers (2022-11-26T02:39:19Z) - Interactive Code Generation via Test-Driven User-Intent Formalization [60.90035204567797]
Large language models (LLMs) produce code from informal natural language (NL) intent.
It is hard to define a notion of correctness since natural language can be ambiguous and lacks a formal semantics.
We describe a language-agnostic abstract algorithm and a concrete implementation TiCoder.
arXiv Detail & Related papers (2022-08-11T17:41:08Z) - M-Adapter: Modality Adaptation for End-to-End Speech-to-Text Translation [66.92823764664206]
We propose M-Adapter, a novel Transformer-based module, to adapt speech representations to text.
While shrinking the speech sequence, M-Adapter produces features desired for speech-to-text translation.
Our experimental results show that our model outperforms a strong baseline by up to 1 BLEU.
arXiv Detail & Related papers (2022-07-03T04:26:53Z) - IIITT@Dravidian-CodeMix-FIRE2021: Transliterate or translate? Sentiment
analysis of code-mixed text in Dravidian languages [0.0]
This research paper bestows a tiny contribution to this research in the form of sentiment analysis of code-mixed social media comments in the popular Dravidian languages Kannada, Tamil and Malayalam.
It describes the work for the shared task conducted by Dravidian-CodeMix at FIRE 2021 by employing pre-trained models like ULMFiT and multilingual BERT fine-tuned on the code-mixed dataset.
The results are recorded in this research paper where the best models stood 4th, 5th and 10th ranks in the Tamil, Kannada and Malayalam tasks respectively.
arXiv Detail & Related papers (2021-11-15T16:57:59Z) - Contextual Hate Speech Detection in Code Mixed Text using Transformer
Based Approaches [0.0]
We propose automated techniques for hate speech detection in code mixed text from Twitter.
While regular approaches analyze the text independently, we also make use of content text in the form of parent tweets.
We show that the dual-encoder approach using independent representations yields better performance.
arXiv Detail & Related papers (2021-10-18T14:05:36Z) - Exploiting BERT For Multimodal Target SentimentClassification Through
Input Space Translation [75.82110684355979]
We introduce a two-stream model that translates images in input space using an object-aware transformer.
We then leverage the translation to construct an auxiliary sentence that provides multimodal information to a language model.
We achieve state-of-the-art performance on two multimodal Twitter datasets.
arXiv Detail & Related papers (2021-08-03T18:02:38Z) - JUNLP@Dravidian-CodeMix-FIRE2020: Sentiment Classification of Code-Mixed
Tweets using Bi-Directional RNN and Language Tags [14.588109573710431]
This paper uses bi-directional LSTMs along with language tagging to facilitate sentiment tagging of code-mixed Tamil texts extracted from social media.
The presented algorithm garnered precision, recall, and F1 scores of 0.59, 0.66, and 0.58 respectively.
arXiv Detail & Related papers (2020-10-20T08:10:29Z) - NLP-CIC at SemEval-2020 Task 9: Analysing sentiment in code-switching
language using a simple deep-learning classifier [63.137661897716555]
Code-switching is a phenomenon in which two or more languages are used in the same message.
We use a standard convolutional neural network model to predict the sentiment of tweets in a blend of Spanish and English languages.
arXiv Detail & Related papers (2020-09-07T19:57:09Z) - C1 at SemEval-2020 Task 9: SentiMix: Sentiment Analysis for Code-Mixed
Social Media Text using Feature Engineering [0.9646922337783134]
This paper describes our feature engineering approach to sentiment analysis in code-mixed social media text for SemEval-2020 Task 9: SentiMix.
We are able to obtain a weighted F1 score of 0.65 for the "Hinglish" task and 0.63 for the "Spanglish" tasks.
arXiv Detail & Related papers (2020-08-09T00:46:26Z) - Unsupervised Sentiment Analysis for Code-mixed Data [33.939487457110566]
We introduce methods that use different kinds of multilingual and cross-lingual embeddings to efficiently transfer knowledge from monolingual text to code-mixed text.
Our methods beat state-of-the-art on English-Spanish code-mixed sentiment analysis by absolute 3% F1-score.
arXiv Detail & Related papers (2020-01-20T06:12:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.