Deep Learning Brasil -- NLP at SemEval-2020 Task 9: Overview of
Sentiment Analysis of Code-Mixed Tweets
- URL: http://arxiv.org/abs/2008.01544v1
- Date: Tue, 28 Jul 2020 16:42:41 GMT
- Title: Deep Learning Brasil -- NLP at SemEval-2020 Task 9: Overview of
Sentiment Analysis of Code-Mixed Tweets
- Authors: Manoel Ver\'issimo dos Santos Neto, Ayrton Denner da Silva Amaral,
N\'adia F\'elix Felipe da Silva, Anderson da Silva Soares
- Abstract summary: In this paper, we describe a methodology to predict sentiment in code-mixed tweets (hindi-english)
Our team called verissimo.manoel in CodaLab developed an approach based on an ensemble of four models.
The final classification algorithm was an ensemble of some predictions of all softmax values from these four models.
- Score: 0.2294014185517203
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we describe a methodology to predict sentiment in code-mixed
tweets (hindi-english). Our team called verissimo.manoel in CodaLab developed
an approach based on an ensemble of four models (MultiFiT, BERT, ALBERT, and
XLNET). The final classification algorithm was an ensemble of some predictions
of all softmax values from these four models. This architecture was used and
evaluated in the context of the SemEval 2020 challenge (task 9), and our system
got 72.7% on the F1 score.
Related papers
- InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced Mathematical Reasoning [58.7966588457529]
InfiMM-WebMath-40B is a high-quality dataset of interleaved image-text documents.
It comprises 24 million web pages, 85 million associated image URLs, and 40 billion text tokens, all meticulously extracted and filtered from CommonCrawl.
Our evaluations on text-only benchmarks show that, despite utilizing only 40 billion tokens, our dataset significantly enhances the performance of our 1.3B model.
Our models set a new state-of-the-art among open-source models on multi-modal math benchmarks such as MathVerse and We-Math.
arXiv Detail & Related papers (2024-09-19T08:41:21Z) - Ensemble Transfer Learning for Multilingual Coreference Resolution [60.409789753164944]
A problem that frequently occurs when working with a non-English language is the scarcity of annotated training data.
We design a simple but effective ensemble-based framework that combines various transfer learning techniques.
We also propose a low-cost TL method that bootstraps coreference resolution models by utilizing Wikipedia anchor texts.
arXiv Detail & Related papers (2023-01-22T18:22:55Z) - UrduFake@FIRE2020: Shared Track on Fake News Identification in Urdu [62.6928395368204]
This paper gives the overview of the first shared task at FIRE 2020 on fake news detection in the Urdu language.
The goal is to identify fake news using a dataset composed of 900 annotated news articles for training and 400 news articles for testing.
The dataset contains news in five domains: (i) Health, (ii) Sports, (iii) Showbiz, (iv) Technology, and (v) Business.
arXiv Detail & Related papers (2022-07-25T03:46:51Z) - Overview of the Shared Task on Fake News Detection in Urdu at FIRE 2020 [62.6928395368204]
Task was posed as a binary classification task, in which the goal is to differentiate between real and fake news.
We provided a dataset divided into 900 annotated news articles for training and 400 news articles for testing.
42 teams from 6 different countries (India, China, Egypt, Germany, Pakistan, and the UK) registered for the task.
arXiv Detail & Related papers (2022-07-25T03:41:32Z) - Unifying Language Learning Paradigms [96.35981503087567]
We present a unified framework for pre-training models that are universally effective across datasets and setups.
We show how different pre-training objectives can be cast as one another and how interpolating between different objectives can be effective.
Our model also achieve strong results at in-context learning, outperforming 175B GPT-3 on zero-shot SuperGLUE and tripling the performance of T5-XXL on one-shot summarization.
arXiv Detail & Related papers (2022-05-10T19:32:20Z) - HFL at SemEval-2022 Task 8: A Linguistics-inspired Regression Model with
Data Augmentation for Multilingual News Similarity [16.454545004093735]
This paper describes our system designed for SemEval-2022 Task 8: Multilingual News Article Similarity.
We proposed a linguistics-inspired model trained with a few task-specific strategies.
Our system ranked 1st on the leaderboard while achieving a Pearson's Correlation Coefficient of 0.818 on the official evaluation set.
arXiv Detail & Related papers (2022-04-11T03:08:37Z) - Sentiment Analysis of Code-Mixed Social Media Text (Hinglish) [4.081440927534578]
Various stages involved in performing the sentiment analysis were data consolidation, data cleaning, data transformation and modelling.
The models were created using various machine learning algorithms such as SVM, KNN, Decision Trees, Random Forests, Naive Bayes, Logistic Regression, and ensemble voting classifiers.
arXiv Detail & Related papers (2021-02-24T09:15:34Z) - Palomino-Ochoa at SemEval-2020 Task 9: Robust System based on
Transformer for Code-Mixed Sentiment Classification [1.6244541005112747]
We present a transfer learning system to perform a mixed Spanish-English sentiment classification task.
Our proposal uses the state-of-the-art language model BERT and embed it within a ULMFiT transfer learning pipeline.
arXiv Detail & Related papers (2020-11-18T18:25:58Z) - Phonemer at WNUT-2020 Task 2: Sequence Classification Using COVID
Twitter BERT and Bagging Ensemble Technique based on Plurality Voting [0.0]
We develop a system that automatically identifies whether an English Tweet related to the novel coronavirus (COVID-19) is informative or not.
Our final approach achieved an F1-score of 0.9037 and we were ranked sixth overall with F1-score as the evaluation criteria.
arXiv Detail & Related papers (2020-10-01T10:54:54Z) - NLP-CIC at SemEval-2020 Task 9: Analysing sentiment in code-switching
language using a simple deep-learning classifier [63.137661897716555]
Code-switching is a phenomenon in which two or more languages are used in the same message.
We use a standard convolutional neural network model to predict the sentiment of tweets in a blend of Spanish and English languages.
arXiv Detail & Related papers (2020-09-07T19:57:09Z) - LIMSI_UPV at SemEval-2020 Task 9: Recurrent Convolutional Neural Network
for Code-mixed Sentiment Analysis [8.8561720398658]
This paper describes the participation of LIMSI UPV team in SemEval-2020 Task 9: Sentiment Analysis for Code-Mixed Social Media Text.
The proposed approach competed in SentiMix Hindi-English subtask, that addresses the problem of predicting the sentiment of a given Hindi-English code-mixed tweet.
We propose Recurrent Convolutional Neural Network that combines both the recurrent neural network and the convolutional network to better capture the semantics of the text.
arXiv Detail & Related papers (2020-08-30T13:52:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.