Palomino-Ochoa at SemEval-2020 Task 9: Robust System based on
Transformer for Code-Mixed Sentiment Classification
- URL: http://arxiv.org/abs/2011.09448v1
- Date: Wed, 18 Nov 2020 18:25:58 GMT
- Title: Palomino-Ochoa at SemEval-2020 Task 9: Robust System based on
Transformer for Code-Mixed Sentiment Classification
- Authors: Daniel Palomino and Jose Ochoa-Luna
- Abstract summary: We present a transfer learning system to perform a mixed Spanish-English sentiment classification task.
Our proposal uses the state-of-the-art language model BERT and embed it within a ULMFiT transfer learning pipeline.
- Score: 1.6244541005112747
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present a transfer learning system to perform a mixed Spanish-English
sentiment classification task. Our proposal uses the state-of-the-art language
model BERT and embed it within a ULMFiT transfer learning pipeline. This
combination allows us to predict the polarity detection of code-mixed
(English-Spanish) tweets. Thus, among 29 submitted systems, our approach
(referred to as dplominop) is ranked 4th on the Sentimix Spanglish test set of
SemEval 2020 Task 9. In fact, our system yields the weighted-F1 score value of
0.755 which can be easily reproduced -- the source code and implementation
details are made available.
Related papers
- Blending LLMs into Cascaded Speech Translation: KIT's Offline Speech Translation System for IWSLT 2024 [61.189875635090225]
Large Language Models (LLMs) are currently under exploration for various tasks, including Automatic Speech Recognition (ASR), Machine Translation (MT), and even End-to-End Speech Translation (ST)
arXiv Detail & Related papers (2024-06-24T16:38:17Z) - HIT-SCIR at MMNLU-22: Consistency Regularization for Multilingual Spoken
Language Understanding [56.756090143062536]
We propose to use consistency regularization based on a hybrid data augmentation strategy.
We conduct experiments on the MASSIVE dataset under both full-dataset and zero-shot settings.
Our proposed method improves the performance on both intent detection and slot filling tasks.
arXiv Detail & Related papers (2023-01-05T11:21:15Z) - BJTU-WeChat's Systems for the WMT22 Chat Translation Task [66.81525961469494]
This paper introduces the joint submission of the Beijing Jiaotong University and WeChat AI to the WMT'22 chat translation task for English-German.
Based on the Transformer, we apply several effective variants.
Our systems achieve 0.810 and 0.946 COMET scores.
arXiv Detail & Related papers (2022-11-28T02:35:04Z) - Transformer-based Model for Word Level Language Identification in
Code-mixed Kannada-English Texts [55.41644538483948]
We propose the use of a Transformer based model for word-level language identification in code-mixed Kannada English texts.
The proposed model on the CoLI-Kenglish dataset achieves a weighted F1-score of 0.84 and a macro F1-score of 0.61.
arXiv Detail & Related papers (2022-11-26T02:39:19Z) - Tencent AI Lab - Shanghai Jiao Tong University Low-Resource Translation
System for the WMT22 Translation Task [49.916963624249355]
This paper describes Tencent AI Lab - Shanghai Jiao Tong University (TAL-SJTU) Low-Resource Translation systems for the WMT22 shared task.
We participate in the general translation task on English$Leftrightarrow$Livonian.
Our system is based on M2M100 with novel techniques that adapt it to the target language pair.
arXiv Detail & Related papers (2022-10-17T04:34:09Z) - Explicit Alignment Objectives for Multilingual Bidirectional Encoders [111.65322283420805]
We present a new method for learning multilingual encoders, AMBER (Aligned Multilingual Bi-directional EncodeR)
AMBER is trained on additional parallel data using two explicit alignment objectives that align the multilingual representations at different granularities.
Experimental results show that AMBER obtains gains of up to 1.1 average F1 score on sequence tagging and up to 27.3 average accuracy on retrieval over the XLMR-large model.
arXiv Detail & Related papers (2020-10-15T18:34:13Z) - WESSA at SemEval-2020 Task 9: Code-Mixed Sentiment Analysis using
Transformers [0.0]
We describe our system submitted for SemEval 2020 Task 9, Sentiment Analysis for Code-Mixed Social Media Text.
Our best performing system is a Transfer Learning-based model that fine-tunes "XLM-RoBERTa"
For later submissions, our system manages to achieve a 75.9% average F1-Score on the test set using CodaLab username "ahmed0sultan"
arXiv Detail & Related papers (2020-09-21T13:59:24Z) - LIMSI_UPV at SemEval-2020 Task 9: Recurrent Convolutional Neural Network
for Code-mixed Sentiment Analysis [8.8561720398658]
This paper describes the participation of LIMSI UPV team in SemEval-2020 Task 9: Sentiment Analysis for Code-Mixed Social Media Text.
The proposed approach competed in SentiMix Hindi-English subtask, that addresses the problem of predicting the sentiment of a given Hindi-English code-mixed tweet.
We propose Recurrent Convolutional Neural Network that combines both the recurrent neural network and the convolutional network to better capture the semantics of the text.
arXiv Detail & Related papers (2020-08-30T13:52:24Z) - Deep Learning Brasil -- NLP at SemEval-2020 Task 9: Overview of
Sentiment Analysis of Code-Mixed Tweets [0.2294014185517203]
In this paper, we describe a methodology to predict sentiment in code-mixed tweets (hindi-english)
Our team called verissimo.manoel in CodaLab developed an approach based on an ensemble of four models.
The final classification algorithm was an ensemble of some predictions of all softmax values from these four models.
arXiv Detail & Related papers (2020-07-28T16:42:41Z) - Reed at SemEval-2020 Task 9: Fine-Tuning and Bag-of-Words Approaches to
Code-Mixed Sentiment Analysis [1.2147145617662432]
We explore the task of sentiment analysis on Hinglish (code-mixed Hindi-English) tweets as participants of Task 9 of the SemEval-2020 competition, known as the SentiMix task.
We had two main approaches: 1) applying transfer learning by fine-tuning pre-trained BERT models and 2) training feedforward neural networks on bag-of-words representations.
During the evaluation phase of the competition, we obtained an F-score of 71.3% with our best model, which placed $4th$ out of 62 entries in the official system rankings.
arXiv Detail & Related papers (2020-07-26T05:48:46Z) - Yseop at SemEval-2020 Task 5: Cascaded BERT Language Model for
Counterfactual Statement Analysis [0.0]
We use a BERT base model for the classification task and build a hybrid BERT Multi-Layer Perceptron system to handle the sequence identification task.
Our experiments show that while introducing syntactic and semantic features does little in improving the system in the classification task, using these types of features as cascaded linear inputs to fine-tune the sequence-delimiting ability of the model ensures it outperforms other similar-purpose complex systems like BiLSTM-CRF in the second task.
arXiv Detail & Related papers (2020-05-18T08:19:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.