LT3 at SemEval-2020 Task 9: Cross-lingual Embeddings for Sentiment
Analysis of Hinglish Social Media Text
- URL: http://arxiv.org/abs/2010.11019v1
- Date: Wed, 21 Oct 2020 14:03:16 GMT
- Title: LT3 at SemEval-2020 Task 9: Cross-lingual Embeddings for Sentiment
Analysis of Hinglish Social Media Text
- Authors: Pranaydeep Singh and Els Lefever
- Abstract summary: We investigate two approaches to solve the task of Hinglish sentiment analysis.
The first approach uses cross-lingual embeddings resulting from projecting Hinglish and pre-trained English FastText word embeddings.
The second approach incorporates pre-trained English embeddings that are incrementally retrained with a set of Hinglish tweets.
- Score: 1.0152838128195465
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper describes our contribution to the SemEval-2020 Task 9 on Sentiment
Analysis for Code-mixed Social Media Text. We investigated two approaches to
solve the task of Hinglish sentiment analysis. The first approach uses
cross-lingual embeddings resulting from projecting Hinglish and pre-trained
English FastText word embeddings in the same space. The second approach
incorporates pre-trained English embeddings that are incrementally retrained
with a set of Hinglish tweets. The results show that the second approach
performs best, with an F1-score of 70.52% on the held-out test data.
Related papers
- Strategies for improving low resource speech to text translation relying
on pre-trained ASR models [59.90106959717875]
This paper presents techniques and findings for improving the performance of low-resource speech to text translation (ST)
We conducted experiments on both simulated and real-low resource setups, on language pairs English - Portuguese, and Tamasheq - French respectively.
arXiv Detail & Related papers (2023-05-31T21:58:07Z) - Tencent AI Lab - Shanghai Jiao Tong University Low-Resource Translation
System for the WMT22 Translation Task [49.916963624249355]
This paper describes Tencent AI Lab - Shanghai Jiao Tong University (TAL-SJTU) Low-Resource Translation systems for the WMT22 shared task.
We participate in the general translation task on English$Leftrightarrow$Livonian.
Our system is based on M2M100 with novel techniques that adapt it to the target language pair.
arXiv Detail & Related papers (2022-10-17T04:34:09Z) - RuArg-2022: Argument Mining Evaluation [69.87149207721035]
This paper is a report of the organizers on the first competition of argumentation analysis systems dealing with Russian language texts.
A corpus containing 9,550 sentences (comments on social media posts) on three topics related to the COVID-19 pandemic was prepared.
The system that won the first place in both tasks used the NLI (Natural Language Inference) variant of the BERT architecture.
arXiv Detail & Related papers (2022-06-18T17:13:37Z) - Methods for Detoxification of Texts for the Russian Language [55.337471467610094]
We introduce the first study of automatic detoxification of Russian texts to combat offensive language.
We test two types of models - unsupervised approach that performs local corrections and supervised approach based on pretrained language GPT-2 model.
The results show that the tested approaches can be successfully used for detoxification, although there is room for improvement.
arXiv Detail & Related papers (2021-05-19T10:37:44Z) - NLP-CIC at SemEval-2020 Task 9: Analysing sentiment in code-switching
language using a simple deep-learning classifier [63.137661897716555]
Code-switching is a phenomenon in which two or more languages are used in the same message.
We use a standard convolutional neural network model to predict the sentiment of tweets in a blend of Spanish and English languages.
arXiv Detail & Related papers (2020-09-07T19:57:09Z) - C1 at SemEval-2020 Task 9: SentiMix: Sentiment Analysis for Code-Mixed
Social Media Text using Feature Engineering [0.9646922337783134]
This paper describes our feature engineering approach to sentiment analysis in code-mixed social media text for SemEval-2020 Task 9: SentiMix.
We are able to obtain a weighted F1 score of 0.65 for the "Hinglish" task and 0.63 for the "Spanglish" tasks.
arXiv Detail & Related papers (2020-08-09T00:46:26Z) - Reed at SemEval-2020 Task 9: Fine-Tuning and Bag-of-Words Approaches to
Code-Mixed Sentiment Analysis [1.2147145617662432]
We explore the task of sentiment analysis on Hinglish (code-mixed Hindi-English) tweets as participants of Task 9 of the SemEval-2020 competition, known as the SentiMix task.
We had two main approaches: 1) applying transfer learning by fine-tuning pre-trained BERT models and 2) training feedforward neural networks on bag-of-words representations.
During the evaluation phase of the competition, we obtained an F-score of 71.3% with our best model, which placed $4th$ out of 62 entries in the official system rankings.
arXiv Detail & Related papers (2020-07-26T05:48:46Z) - IUST at SemEval-2020 Task 9: Sentiment Analysis for Code-Mixed Social
Media Text using Deep Neural Networks and Linear Baselines [6.866104126509981]
We develop a system to predict the sentiment of a given code-mixed tweet.
Our best performing method obtains an F1 score of 0.751 for the Spanish-English sub-task and 0.706 over the Hindi-English sub-task.
arXiv Detail & Related papers (2020-07-24T18:48:37Z) - JUNLP@SemEval-2020 Task 9:Sentiment Analysis of Hindi-English code mixed
data using Grid Search Cross Validation [3.5169472410785367]
We focus on working out a plausible solution to the domain of Code-Mixed Sentiment Analysis.
This work was done as participation in the SemEval-2020 Sentimix Task.
arXiv Detail & Related papers (2020-07-24T15:06:48Z) - CS-Embed at SemEval-2020 Task 9: The effectiveness of code-switched word
embeddings for sentiment analysis [0.5908471365011942]
We present word-embedding trained on code-switched tweets, specifically those that make use of Spanish and English, known as Spanglish.
We utilise them to train a sentiment classifier that achieves an F-1 score of 0.722.
This is higher than the baseline for the competition of 0.656, with our team ranking 14 out of 29 participating teams, beating the baseline.
arXiv Detail & Related papers (2020-06-08T13:48:17Z) - Kungfupanda at SemEval-2020 Task 12: BERT-Based Multi-Task Learning for
Offensive Language Detection [55.445023584632175]
We build an offensive language detection system, which combines multi-task learning with BERT-based models.
Our model achieves 91.51% F1 score in English Sub-task A, which is comparable to the first place.
arXiv Detail & Related papers (2020-04-28T11:27:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.