IIT Gandhinagar at SemEval-2020 Task 9: Code-Mixed Sentiment
Classification Using Candidate Sentence Generation and Selection
- URL: http://arxiv.org/abs/2006.14465v3
- Date: Thu, 23 Jul 2020 05:03:02 GMT
- Title: IIT Gandhinagar at SemEval-2020 Task 9: Code-Mixed Sentiment
Classification Using Candidate Sentence Generation and Selection
- Authors: Vivek Srivastava, Mayank Singh
- Abstract summary: Code-mixing adds to the challenge of analyzing the sentiment of the text due to the non-standard writing style.
We present a candidate sentence generation and selection based approach on top of the Bi-LSTM based neural classifier.
The proposed approach shows an improvement in the system performance as compared to the Bi-LSTM based neural classifier.
- Score: 1.2301855531996841
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Code-mixing is the phenomenon of using multiple languages in the same
utterance of a text or speech. It is a frequently used pattern of communication
on various platforms such as social media sites, online gaming, product
reviews, etc. Sentiment analysis of the monolingual text is a well-studied
task. Code-mixing adds to the challenge of analyzing the sentiment of the text
due to the non-standard writing style. We present a candidate sentence
generation and selection based approach on top of the Bi-LSTM based neural
classifier to classify the Hinglish code-mixed text into one of the three
sentiment classes positive, negative, or neutral. The proposed approach shows
an improvement in the system performance as compared to the Bi-LSTM based
neural classifier. The results present an opportunity to understand various
other nuances of code-mixing in the textual data, such as humor-detection,
intent classification, etc.
Related papers
- Code-Mixed Text to Speech Synthesis under Low-Resource Constraints [6.544954579068865]
We describe our approaches for production quality code-mixed Hindi-English TTS systems built for e-commerce applications.
We propose a data-oriented approach by utilizing monolingual data sets in individual languages.
We show that such single script bi-lingual training without any code-mixing works well for pure code-mixed test sets.
arXiv Detail & Related papers (2023-12-02T10:40:38Z) - Like a Good Nearest Neighbor: Practical Content Moderation and Text
Classification [66.02091763340094]
Like a Good Nearest Neighbor (LaGoNN) is a modification to SetFit that introduces no learnable parameters but alters input text with information from its nearest neighbor.
LaGoNN is effective at flagging undesirable content and text classification, and improves the performance of SetFit.
arXiv Detail & Related papers (2023-02-17T15:43:29Z) - A Vector Quantized Approach for Text to Speech Synthesis on Real-World
Spontaneous Speech [94.64927912924087]
We train TTS systems using real-world speech from YouTube and podcasts.
Recent Text-to-Speech architecture is designed for multiple code generation and monotonic alignment.
We show thatRecent Text-to-Speech architecture outperforms existing TTS systems in several objective and subjective measures.
arXiv Detail & Related papers (2023-02-08T17:34:32Z) - SpeechLM: Enhanced Speech Pre-Training with Unpaired Textual Data [100.46303484627045]
We propose a cross-modal Speech and Language Model (SpeechLM) to align speech and text pre-training with a pre-defined unified representation.
Specifically, we introduce two alternative discrete tokenizers to bridge the speech and text modalities.
We evaluate SpeechLM on various spoken language processing tasks including speech recognition, speech translation, and universal representation evaluation framework SUPERB.
arXiv Detail & Related papers (2022-09-30T09:12:10Z) - Spoken Style Learning with Multi-modal Hierarchical Context Encoding for
Conversational Text-to-Speech Synthesis [59.27994987902646]
The study about learning spoken styles from historical conversations is still in its infancy.
Only the transcripts of the historical conversations are considered, which neglects the spoken styles in historical speeches.
We propose a spoken style learning approach with multi-modal hierarchical context encoding.
arXiv Detail & Related papers (2021-06-11T08:33:52Z) - JUNLP@Dravidian-CodeMix-FIRE2020: Sentiment Classification of Code-Mixed
Tweets using Bi-Directional RNN and Language Tags [14.588109573710431]
This paper uses bi-directional LSTMs along with language tagging to facilitate sentiment tagging of code-mixed Tamil texts extracted from social media.
The presented algorithm garnered precision, recall, and F1 scores of 0.59, 0.66, and 0.58 respectively.
arXiv Detail & Related papers (2020-10-20T08:10:29Z) - HPCC-YNU at SemEval-2020 Task 9: A Bilingual Vector Gating Mechanism for
Sentiment Analysis of Code-Mixed Text [10.057804086733576]
This paper presents a system that uses a bilingual vector gating mechanism for bilingual resources to complete the task.
We achieved fifth place in Spanglish and 19th place in Hinglish.
arXiv Detail & Related papers (2020-10-10T08:02:15Z) - gundapusunil at SemEval-2020 Task 9: Syntactic Semantic LSTM
Architecture for SENTIment Analysis of Code-MIXed Data [7.538482310185133]
We have developed a system for SemEval 2020: Task 9 on Sentiment Analysis for Code-Mixed Social Media Text.
Our system first generates two types of embeddings for the social media text.
arXiv Detail & Related papers (2020-10-09T07:07:04Z) - NLP-CIC at SemEval-2020 Task 9: Analysing sentiment in code-switching
language using a simple deep-learning classifier [63.137661897716555]
Code-switching is a phenomenon in which two or more languages are used in the same message.
We use a standard convolutional neural network model to predict the sentiment of tweets in a blend of Spanish and English languages.
arXiv Detail & Related papers (2020-09-07T19:57:09Z) - ULD@NUIG at SemEval-2020 Task 9: Generative Morphemes with an Attention
Model for Sentiment Analysis in Code-Mixed Text [1.4926515182392508]
We present the Generative Morphemes with Attention (GenMA) Model sentiment analysis system contributed to SemEval 2020 Task 9 SentiMix.
The system aims to predict the sentiments of the given English-Hindi code-mixed tweets without using word-level language tags.
arXiv Detail & Related papers (2020-07-27T23:58:54Z) - Unsupervised Cross-Modal Audio Representation Learning from Unstructured
Multilingual Text [69.55642178336953]
We present an approach to unsupervised audio representation learning.
Based on a triplet neural network architecture, we harnesses semantically related cross-modal information to estimate audio track-relatedness.
We show that our approach is invariant to the variety of annotation styles as well as to the different languages of this collection.
arXiv Detail & Related papers (2020-03-27T07:37:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.