Related papers: CMSAOne@Dravidian-CodeMix-FIRE2020: A Meta Embedding and Transformer model for Code-Mixed Sentiment Analysis on Social Media Text

CMSAOne@Dravidian-CodeMix-FIRE2020: A Meta Embedding and Transformer model for Code-Mixed Sentiment Analysis on Social Media Text

URL: http://arxiv.org/abs/2101.09004v1
Date: Fri, 22 Jan 2021 08:48:27 GMT
Title: CMSAOne@Dravidian-CodeMix-FIRE2020: A Meta Embedding and Transformer model for Code-Mixed Sentiment Analysis on Social Media Text
Authors: Suman Dowlagar, Radhika Mamidi
Abstract summary: Code-mixing (CM) is a frequently observed phenomenon that uses multiple languages in an utterance or sentence. Sentiment analysis (SA) is a fundamental step in NLP and is well studied in the monolingual text. This paper proposes a meta embedding with a transformer method for sentiment analysis on the Dravidian code-mixed dataset.
Score: 9.23545668304066
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Code-mixing(CM) is a frequently observed phenomenon that uses multiple languages in an utterance or sentence. CM is mostly practiced on various social media platforms and in informal conversations. Sentiment analysis (SA) is a fundamental step in NLP and is well studied in the monolingual text. Code-mixing adds a challenge to sentiment analysis due to its non-standard representations. This paper proposes a meta embedding with a transformer method for sentiment analysis on the Dravidian code-mixed dataset. In our method, we used meta embeddings to capture rich text representations. We used the proposed method for the Task: "Sentiment Analysis for Dravidian Languages in Code-Mixed Text", and it achieved an F1 score of $0.58$ and $0.66$ for the given Dravidian code mixed data sets. The code is provided in the Github https://github.com/suman101112/fire-2020-Dravidian-CodeMix.

Related papers

Uncovering LLM-Generated Code: A Zero-Shot Synthetic Code Detector via Code Rewriting [78.48355455324688]
We propose a novel zero-shot synthetic code detector based on the similarity between the code and its rewritten variants. Our results demonstrate a notable enhancement over existing synthetic content detectors designed for general texts.
arXiv Detail & Related papers (2024-05-25T08:57:28Z)
Transformer-based Model for Word Level Language Identification in Code-mixed Kannada-English Texts [55.41644538483948]
We propose the use of a Transformer based model for word-level language identification in code-mixed Kannada English texts. The proposed model on the CoLI-Kenglish dataset achieves a weighted F1-score of 0.84 and a macro F1-score of 0.61.
arXiv Detail & Related papers (2022-11-26T02:39:19Z)
Interactive Code Generation via Test-Driven User-Intent Formalization [60.90035204567797]
Large language models (LLMs) produce code from informal natural language (NL) intent. It is hard to define a notion of correctness since natural language can be ambiguous and lacks a formal semantics. We describe a language-agnostic abstract algorithm and a concrete implementation TiCoder.
arXiv Detail & Related papers (2022-08-11T17:41:08Z)
M-Adapter: Modality Adaptation for End-to-End Speech-to-Text Translation [66.92823764664206]
We propose M-Adapter, a novel Transformer-based module, to adapt speech representations to text. While shrinking the speech sequence, M-Adapter produces features desired for speech-to-text translation. Our experimental results show that our model outperforms a strong baseline by up to 1 BLEU.
arXiv Detail & Related papers (2022-07-03T04:26:53Z)
IIITT@Dravidian-CodeMix-FIRE2021: Transliterate or translate? Sentiment analysis of code-mixed text in Dravidian languages [0.0]
This research paper bestows a tiny contribution to this research in the form of sentiment analysis of code-mixed social media comments in the popular Dravidian languages Kannada, Tamil and Malayalam. It describes the work for the shared task conducted by Dravidian-CodeMix at FIRE 2021 by employing pre-trained models like ULMFiT and multilingual BERT fine-tuned on the code-mixed dataset. The results are recorded in this research paper where the best models stood 4th, 5th and 10th ranks in the Tamil, Kannada and Malayalam tasks respectively.
arXiv Detail & Related papers (2021-11-15T16:57:59Z)
Contextual Hate Speech Detection in Code Mixed Text using Transformer Based Approaches [0.0]
We propose automated techniques for hate speech detection in code mixed text from Twitter. While regular approaches analyze the text independently, we also make use of content text in the form of parent tweets. We show that the dual-encoder approach using independent representations yields better performance.
arXiv Detail & Related papers (2021-10-18T14:05:36Z)
Exploiting BERT For Multimodal Target SentimentClassification Through Input Space Translation [75.82110684355979]
We introduce a two-stream model that translates images in input space using an object-aware transformer. We then leverage the translation to construct an auxiliary sentence that provides multimodal information to a language model. We achieve state-of-the-art performance on two multimodal Twitter datasets.
arXiv Detail & Related papers (2021-08-03T18:02:38Z)
Deep Graph Matching and Searching for Semantic Code Retrieval [76.51445515611469]
We propose an end-to-end deep graph matching and searching model based on graph neural networks. We first represent both natural language query texts and programming language code snippets with the unified graph-structured data. In particular, DGMS not only captures more structural information for individual query texts or code snippets but also learns the fine-grained similarity between them.
arXiv Detail & Related papers (2020-10-24T14:16:50Z)
JUNLP@Dravidian-CodeMix-FIRE2020: Sentiment Classification of Code-Mixed Tweets using Bi-Directional RNN and Language Tags [14.588109573710431]
This paper uses bi-directional LSTMs along with language tagging to facilitate sentiment tagging of code-mixed Tamil texts extracted from social media. The presented algorithm garnered precision, recall, and F1 scores of 0.59, 0.66, and 0.58 respectively.
arXiv Detail & Related papers (2020-10-20T08:10:29Z)
Consecutive Decoding for Speech-to-text Translation [51.155661276936044]
COnSecutive Transcription and Translation (COSTT) is an integral approach for speech-to-text translation. The key idea is to generate source transcript and target translation text with a single decoder. Our method is verified on three mainstream datasets.
arXiv Detail & Related papers (2020-09-21T10:10:45Z)
NLP-CIC at SemEval-2020 Task 9: Analysing sentiment in code-switching language using a simple deep-learning classifier [63.137661897716555]
Code-switching is a phenomenon in which two or more languages are used in the same message. We use a standard convolutional neural network model to predict the sentiment of tweets in a blend of Spanish and English languages.
arXiv Detail & Related papers (2020-09-07T19:57:09Z)
C1 at SemEval-2020 Task 9: SentiMix: Sentiment Analysis for Code-Mixed Social Media Text using Feature Engineering [0.9646922337783134]
This paper describes our feature engineering approach to sentiment analysis in code-mixed social media text for SemEval-2020 Task 9: SentiMix. We are able to obtain a weighted F1 score of 0.65 for the "Hinglish" task and 0.63 for the "Spanglish" tasks.
arXiv Detail & Related papers (2020-08-09T00:46:26Z)
Unsupervised Sentiment Analysis for Code-mixed Data [33.939487457110566]
We introduce methods that use different kinds of multilingual and cross-lingual embeddings to efficiently transfer knowledge from monolingual text to code-mixed text. Our methods beat state-of-the-art on English-Spanish code-mixed sentiment analysis by absolute 3% F1-score.
arXiv Detail & Related papers (2020-01-20T06:12:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.