ULD@NUIG at SemEval-2020 Task 9: Generative Morphemes with an Attention
Model for Sentiment Analysis in Code-Mixed Text
- URL: http://arxiv.org/abs/2008.01545v1
- Date: Mon, 27 Jul 2020 23:58:54 GMT
- Title: ULD@NUIG at SemEval-2020 Task 9: Generative Morphemes with an Attention
Model for Sentiment Analysis in Code-Mixed Text
- Authors: Koustava Goswami, Priya Rani, Bharathi Raja Chakravarthi, Theodorus
Fransen, and John P. McCrae
- Abstract summary: We present the Generative Morphemes with Attention (GenMA) Model sentiment analysis system contributed to SemEval 2020 Task 9 SentiMix.
The system aims to predict the sentiments of the given English-Hindi code-mixed tweets without using word-level language tags.
- Score: 1.4926515182392508
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Code mixing is a common phenomena in multilingual societies where people
switch from one language to another for various reasons. Recent advances in
public communication over different social media sites have led to an increase
in the frequency of code-mixed usage in written language. In this paper, we
present the Generative Morphemes with Attention (GenMA) Model sentiment
analysis system contributed to SemEval 2020 Task 9 SentiMix. The system aims to
predict the sentiments of the given English-Hindi code-mixed tweets without
using word-level language tags instead inferring this automatically using a
morphological model. The system is based on a novel deep neural network (DNN)
architecture, which has outperformed the baseline F1-score on the test data-set
as well as the validation data-set. Our results can be found under the user
name "koustava" on the "Sentimix Hindi English" page
Related papers
- Generative Spoken Language Model based on continuous word-sized audio
tokens [52.081868603603844]
We introduce a Generative Spoken Language Model based on word-size continuous-valued audio embeddings.
The resulting model is the first generative language model based on word-size continuous embeddings.
arXiv Detail & Related papers (2023-10-08T16:46:14Z) - Transformer-based Model for Word Level Language Identification in
Code-mixed Kannada-English Texts [55.41644538483948]
We propose the use of a Transformer based model for word-level language identification in code-mixed Kannada English texts.
The proposed model on the CoLI-Kenglish dataset achieves a weighted F1-score of 0.84 and a macro F1-score of 0.61.
arXiv Detail & Related papers (2022-11-26T02:39:19Z) - Towards Zero-shot Language Modeling [90.80124496312274]
We construct a neural model that is inductively biased towards learning human languages.
We infer this distribution from a sample of typologically diverse training languages.
We harness additional language-specific side information as distant supervision for held-out languages.
arXiv Detail & Related papers (2021-08-06T23:49:18Z) - Paraphrastic Representations at Scale [134.41025103489224]
We release trained models for English, Arabic, German, French, Spanish, Russian, Turkish, and Chinese languages.
We train these models on large amounts of data, achieving significantly improved performance from the original papers.
arXiv Detail & Related papers (2021-04-30T16:55:28Z) - Sentiment Analysis of Persian-English Code-mixed Texts [0.0]
Due to the unstructured nature of social media data, we are observing more instances of multilingual and code-mixed texts.
In this study we collect, label and thus create a dataset of Persian-English code-mixed tweets.
We introduce a model which uses BERT pretrained embeddings as well as translation models to automatically learn the polarity scores of these Tweets.
arXiv Detail & Related papers (2021-02-25T06:05:59Z) - gundapusunil at SemEval-2020 Task 9: Syntactic Semantic LSTM
Architecture for SENTIment Analysis of Code-MIXed Data [7.538482310185133]
We have developed a system for SemEval 2020: Task 9 on Sentiment Analysis for Code-Mixed Social Media Text.
Our system first generates two types of embeddings for the social media text.
arXiv Detail & Related papers (2020-10-09T07:07:04Z) - kk2018 at SemEval-2020 Task 9: Adversarial Training for Code-Mixing
Sentiment Classification [18.41476971318978]
Code switching is a linguistic phenomenon that may occur within a multilingual setting where speakers share more than one language.
In this work, the domain transfer learning from state-of-the-art uni-language model ERNIE is tested on the code-mixing dataset.
adversarial training with a multi-lingual model is used to achieve 1st place of SemEval-2020 Task 9 Hindi-English sentiment classification competition.
arXiv Detail & Related papers (2020-09-08T12:20:04Z) - NLP-CIC at SemEval-2020 Task 9: Analysing sentiment in code-switching
language using a simple deep-learning classifier [63.137661897716555]
Code-switching is a phenomenon in which two or more languages are used in the same message.
We use a standard convolutional neural network model to predict the sentiment of tweets in a blend of Spanish and English languages.
arXiv Detail & Related papers (2020-09-07T19:57:09Z) - JUNLP@SemEval-2020 Task 9:Sentiment Analysis of Hindi-English code mixed
data using Grid Search Cross Validation [3.5169472410785367]
We focus on working out a plausible solution to the domain of Code-Mixed Sentiment Analysis.
This work was done as participation in the SemEval-2020 Sentimix Task.
arXiv Detail & Related papers (2020-07-24T15:06:48Z) - IIT Gandhinagar at SemEval-2020 Task 9: Code-Mixed Sentiment
Classification Using Candidate Sentence Generation and Selection [1.2301855531996841]
Code-mixing adds to the challenge of analyzing the sentiment of the text due to the non-standard writing style.
We present a candidate sentence generation and selection based approach on top of the Bi-LSTM based neural classifier.
The proposed approach shows an improvement in the system performance as compared to the Bi-LSTM based neural classifier.
arXiv Detail & Related papers (2020-06-25T14:59:47Z) - Rnn-transducer with language bias for end-to-end Mandarin-English
code-switching speech recognition [58.105818353866354]
We propose an improved recurrent neural network transducer (RNN-T) model with language bias to alleviate the problem.
We use the language identities to bias the model to predict the CS points.
This promotes the model to learn the language identity information directly from transcription, and no additional LID model is needed.
arXiv Detail & Related papers (2020-02-19T12:01:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.