kk2018 at SemEval-2020 Task 9: Adversarial Training for Code-Mixing
Sentiment Classification
- URL: http://arxiv.org/abs/2009.03673v2
- Date: Wed, 9 Sep 2020 02:20:46 GMT
- Title: kk2018 at SemEval-2020 Task 9: Adversarial Training for Code-Mixing
Sentiment Classification
- Authors: Jiaxiang Liu, Xuyi Chen, Shikun Feng, Shuohuan Wang, Xuan Ouyang, Yu
Sun, Zhengjie Huang, Weiyue Su
- Abstract summary: Code switching is a linguistic phenomenon that may occur within a multilingual setting where speakers share more than one language.
In this work, the domain transfer learning from state-of-the-art uni-language model ERNIE is tested on the code-mixing dataset.
adversarial training with a multi-lingual model is used to achieve 1st place of SemEval-2020 Task 9 Hindi-English sentiment classification competition.
- Score: 18.41476971318978
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Code switching is a linguistic phenomenon that may occur within a
multilingual setting where speakers share more than one language. With the
increasing communication between groups with different languages, this
phenomenon is more and more popular. However, there are little research and
data in this area, especially in code-mixing sentiment classification. In this
work, the domain transfer learning from state-of-the-art uni-language model
ERNIE is tested on the code-mixing dataset, and surprisingly, a strong baseline
is achieved. Furthermore, the adversarial training with a multi-lingual model
is used to achieve 1st place of SemEval-2020 Task 9 Hindi-English sentiment
classification competition.
Related papers
- cantnlp@LT-EDI-2023: Homophobia/Transphobia Detection in Social Media
Comments using Spatio-Temporally Retrained Language Models [0.9012198585960441]
This paper describes our multiclass classification system developed as part of the LTERAN@LP-2023 shared task.
We used a BERT-based language model to detect homophobic and transphobic content in social media comments across five language conditions.
We developed the best performing seven-label classification system for Malayalam based on weighted macro averaged F1 score.
arXiv Detail & Related papers (2023-08-20T21:30:34Z) - T3L: Translate-and-Test Transfer Learning for Cross-Lingual Text
Classification [50.675552118811]
Cross-lingual text classification is typically built on large-scale, multilingual language models (LMs) pretrained on a variety of languages of interest.
We propose revisiting the classic "translate-and-test" pipeline to neatly separate the translation and classification stages.
arXiv Detail & Related papers (2023-06-08T07:33:22Z) - Adversarial synthesis based data-augmentation for code-switched spoken
language identification [0.0]
Spoken Language Identification (LID) is an important sub-task of Automatic Speech Recognition (ASR)
This study focuses on Indic language code-mixed with English.
Generative Adversarial Network (GAN) based data augmentation technique performed using Mel spectrograms for audio data.
arXiv Detail & Related papers (2022-05-30T06:41:13Z) - Reducing language context confusion for end-to-end code-switching
automatic speech recognition [50.89821865949395]
We propose a language-related attention mechanism to reduce multilingual context confusion for the E2E code-switching ASR model.
By calculating the respective attention of multiple languages, our method can efficiently transfer language knowledge from rich monolingual data.
arXiv Detail & Related papers (2022-01-28T14:39:29Z) - Discovering Representation Sprachbund For Multilingual Pre-Training [139.05668687865688]
We generate language representation from multilingual pre-trained models and conduct linguistic analysis.
We cluster all the target languages into multiple groups and name each group as a representation sprachbund.
Experiments are conducted on cross-lingual benchmarks and significant improvements are achieved compared to strong baselines.
arXiv Detail & Related papers (2021-09-01T09:32:06Z) - VECO: Variable and Flexible Cross-lingual Pre-training for Language
Understanding and Generation [77.82373082024934]
We plug a cross-attention module into the Transformer encoder to explicitly build the interdependence between languages.
It can effectively avoid the degeneration of predicting masked words only conditioned on the context in its own language.
The proposed cross-lingual model delivers new state-of-the-art results on various cross-lingual understanding tasks of the XTREME benchmark.
arXiv Detail & Related papers (2020-10-30T03:41:38Z) - NLP-CIC at SemEval-2020 Task 9: Analysing sentiment in code-switching
language using a simple deep-learning classifier [63.137661897716555]
Code-switching is a phenomenon in which two or more languages are used in the same message.
We use a standard convolutional neural network model to predict the sentiment of tweets in a blend of Spanish and English languages.
arXiv Detail & Related papers (2020-09-07T19:57:09Z) - Leveraging Adversarial Training in Self-Learning for Cross-Lingual Text
Classification [52.69730591919885]
We present a semi-supervised adversarial training process that minimizes the maximal loss for label-preserving input perturbations.
We observe significant gains in effectiveness on document and intent classification for a diverse set of languages.
arXiv Detail & Related papers (2020-07-29T19:38:35Z) - ULD@NUIG at SemEval-2020 Task 9: Generative Morphemes with an Attention
Model for Sentiment Analysis in Code-Mixed Text [1.4926515182392508]
We present the Generative Morphemes with Attention (GenMA) Model sentiment analysis system contributed to SemEval 2020 Task 9 SentiMix.
The system aims to predict the sentiments of the given English-Hindi code-mixed tweets without using word-level language tags.
arXiv Detail & Related papers (2020-07-27T23:58:54Z) - IIT Gandhinagar at SemEval-2020 Task 9: Code-Mixed Sentiment
Classification Using Candidate Sentence Generation and Selection [1.2301855531996841]
Code-mixing adds to the challenge of analyzing the sentiment of the text due to the non-standard writing style.
We present a candidate sentence generation and selection based approach on top of the Bi-LSTM based neural classifier.
The proposed approach shows an improvement in the system performance as compared to the Bi-LSTM based neural classifier.
arXiv Detail & Related papers (2020-06-25T14:59:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.