UPB at SemEval-2020 Task 9: Identifying Sentiment in Code-Mixed Social
Media Texts using Transformers and Multi-Task Learning
- URL: http://arxiv.org/abs/2009.02780v1
- Date: Sun, 6 Sep 2020 17:19:18 GMT
- Title: UPB at SemEval-2020 Task 9: Identifying Sentiment in Code-Mixed Social
Media Texts using Transformers and Multi-Task Learning
- Authors: George-Eduard Zaharia, George-Alexandru Vlad, Dumitru-Clementin
Cercel, Traian Rebedea, Costin-Gabriel Chiru
- Abstract summary: We describe the systems developed by our team for SemEval-2020 Task 9.
We aim to cover two well-known code-mixed languages: Hindi-English and Spanish-English.
Our approach achieves promising performance on the Hindi-English task, with an average F1-score of 0.6850.
For the Spanish-English task, we obtained an average F1-score of 0.7064 ranking our team 17th out of 29 participants.
- Score: 1.7196613099537055
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sentiment analysis is a process widely used in opinion mining campaigns
conducted today. This phenomenon presents applications in a variety of fields,
especially in collecting information related to the attitude or satisfaction of
users concerning a particular subject. However, the task of managing such a
process becomes noticeably more difficult when it is applied in cultures that
tend to combine two languages in order to express ideas and thoughts. By
interleaving words from two languages, the user can express with ease, but at
the cost of making the text far less intelligible for those who are not
familiar with this technique, but also for standard opinion mining algorithms.
In this paper, we describe the systems developed by our team for SemEval-2020
Task 9 that aims to cover two well-known code-mixed languages: Hindi-English
and Spanish-English.
We intend to solve this issue by introducing a solution that takes advantage
of several neural network approaches, as well as pre-trained word embeddings.
Our approach (multlingual BERT) achieves promising performance on the
Hindi-English task, with an average F1-score of 0.6850, registered on the
competition leaderboard, ranking our team 16th out of 62 participants. For the
Spanish-English task, we obtained an average F1-score of 0.7064 ranking our
team 17th out of 29 participants by using another multilingual
Transformer-based model, XLM-RoBERTa.
Related papers
- 1-800-SHARED-TASKS @ NLU of Devanagari Script Languages: Detection of Language, Hate Speech, and Targets using LLMs [0.0]
This paper presents a detailed system description of our entry for the CHiPSAL 2025 shared task.
We focus on language detection, hate speech identification, and target detection in Devanagari script languages.
arXiv Detail & Related papers (2024-11-11T10:34:36Z) - Breaking Language Barriers in Multilingual Mathematical Reasoning: Insights and Observations [59.056367787688146]
This paper pioneers exploring and training powerful Multilingual Math Reasoning (xMR) LLMs.
We construct the first multilingual math reasoning instruction dataset, MGSM8KInstruct, encompassing ten distinct languages.
By utilizing translation, we construct the first multilingual math reasoning instruction dataset, MGSM8KInstruct, encompassing ten distinct languages.
arXiv Detail & Related papers (2023-10-31T08:09:20Z) - Romanian Multiword Expression Detection Using Multilingual Adversarial
Training and Lateral Inhibition [0.17188280334580194]
This paper describes our improvements in automatically identifying Romanian multiword expressions on the corpus released for the PARSEME v1.2 shared task.
Our approach assumes a multilingual perspective based on the recently introduced lateral inhibition layer and adversarial training to boost the performance of the employed multilingual language models.
arXiv Detail & Related papers (2023-04-22T09:10:49Z) - Efficiently Aligned Cross-Lingual Transfer Learning for Conversational
Tasks using Prompt-Tuning [98.60739735409243]
Cross-lingual transfer of language models trained on high-resource languages like English has been widely studied for many NLP tasks.
We introduce XSGD for cross-lingual alignment pretraining, a parallel and large-scale multilingual conversation dataset.
To facilitate aligned cross-lingual representations, we develop an efficient prompt-tuning-based method for learning alignment prompts.
arXiv Detail & Related papers (2023-04-03T18:46:01Z) - BJTU-WeChat's Systems for the WMT22 Chat Translation Task [66.81525961469494]
This paper introduces the joint submission of the Beijing Jiaotong University and WeChat AI to the WMT'22 chat translation task for English-German.
Based on the Transformer, we apply several effective variants.
Our systems achieve 0.810 and 0.946 COMET scores.
arXiv Detail & Related papers (2022-11-28T02:35:04Z) - NLP-CIC at SemEval-2020 Task 9: Analysing sentiment in code-switching
language using a simple deep-learning classifier [63.137661897716555]
Code-switching is a phenomenon in which two or more languages are used in the same message.
We use a standard convolutional neural network model to predict the sentiment of tweets in a blend of Spanish and English languages.
arXiv Detail & Related papers (2020-09-07T19:57:09Z) - IUST at SemEval-2020 Task 9: Sentiment Analysis for Code-Mixed Social
Media Text using Deep Neural Networks and Linear Baselines [6.866104126509981]
We develop a system to predict the sentiment of a given code-mixed tweet.
Our best performing method obtains an F1 score of 0.751 for the Spanish-English sub-task and 0.706 over the Hindi-English sub-task.
arXiv Detail & Related papers (2020-07-24T18:48:37Z) - JUNLP@SemEval-2020 Task 9:Sentiment Analysis of Hindi-English code mixed
data using Grid Search Cross Validation [3.5169472410785367]
We focus on working out a plausible solution to the domain of Code-Mixed Sentiment Analysis.
This work was done as participation in the SemEval-2020 Sentimix Task.
arXiv Detail & Related papers (2020-07-24T15:06:48Z) - CoSDA-ML: Multi-Lingual Code-Switching Data Augmentation for Zero-Shot
Cross-Lingual NLP [68.2650714613869]
We propose a data augmentation framework to generate multi-lingual code-switching data to fine-tune mBERT.
Compared with the existing work, our method does not rely on bilingual sentences for training, and requires only one training process for multiple target languages.
arXiv Detail & Related papers (2020-06-11T13:15:59Z) - Kungfupanda at SemEval-2020 Task 12: BERT-Based Multi-Task Learning for
Offensive Language Detection [55.445023584632175]
We build an offensive language detection system, which combines multi-task learning with BERT-based models.
Our model achieves 91.51% F1 score in English Sub-task A, which is comparable to the first place.
arXiv Detail & Related papers (2020-04-28T11:27:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.