Voice@SRIB at SemEval-2020 Task 9 and 12: Stacked Ensembling method for
Sentiment and Offensiveness detection in Social Media
- URL: http://arxiv.org/abs/2007.10021v3
- Date: Sun, 11 Oct 2020 10:02:35 GMT
- Title: Voice@SRIB at SemEval-2020 Task 9 and 12: Stacked Ensembling method for
Sentiment and Offensiveness detection in Social Media
- Authors: Abhishek Singh and Surya Pratap Singh Parmar
- Abstract summary: We train embeddings, ensembling methods for Sentimix, and OffensEval tasks.
We evaluate our models on macro F1-score, precision, accuracy, and recall on the datasets.
- Score: 2.9008108937701333
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In social-media platforms such as Twitter, Facebook, and Reddit, people
prefer to use code-mixed language such as Spanish-English, Hindi-English to
express their opinions. In this paper, we describe different models we used,
using the external dataset to train embeddings, ensembling methods for
Sentimix, and OffensEval tasks. The use of pre-trained embeddings usually helps
in multiple tasks such as sentence classification, and machine translation. In
this experiment, we haveused our trained code-mixed embeddings and twitter
pre-trained embeddings to SemEval tasks. We evaluate our models on macro
F1-score, precision, accuracy, and recall on the datasets. We intend to show
that hyper-parameter tuning and data pre-processing steps help a lot in
improving the scores. In our experiments, we are able to achieve 0.886 F1-Macro
on OffenEval Greek language subtask post-evaluation, whereas the highest is
0.852 during the Evaluation Period. We stood third in Spanglish competition
with our best F1-score of 0.756. Codalab username is asking28.
Related papers
- Cross-lingual Contextualized Phrase Retrieval [63.80154430930898]
We propose a new task formulation of dense retrieval, cross-lingual contextualized phrase retrieval.
We train our Cross-lingual Contextualized Phrase Retriever (CCPR) using contrastive learning.
On the phrase retrieval task, CCPR surpasses baselines by a significant margin, achieving a top-1 accuracy that is at least 13 points higher.
arXiv Detail & Related papers (2024-03-25T14:46:51Z) - Unify word-level and span-level tasks: NJUNLP's Participation for the
WMT2023 Quality Estimation Shared Task [59.46906545506715]
We introduce the NJUNLP team to the WMT 2023 Quality Estimation (QE) shared task.
Our team submitted predictions for the English-German language pair on all two sub-tasks.
Our models achieved the best results in English-German for both word-level and fine-grained error span detection sub-tasks.
arXiv Detail & Related papers (2023-09-23T01:52:14Z) - SESCORE2: Learning Text Generation Evaluation via Synthesizing Realistic
Mistakes [93.19166902594168]
We propose SESCORE2, a self-supervised approach for training a model-based metric for text generation evaluation.
Key concept is to synthesize realistic model mistakes by perturbing sentences retrieved from a corpus.
We evaluate SESCORE2 and previous methods on four text generation tasks across three languages.
arXiv Detail & Related papers (2022-12-19T09:02:16Z) - BJTU-WeChat's Systems for the WMT22 Chat Translation Task [66.81525961469494]
This paper introduces the joint submission of the Beijing Jiaotong University and WeChat AI to the WMT'22 chat translation task for English-German.
Based on the Transformer, we apply several effective variants.
Our systems achieve 0.810 and 0.946 COMET scores.
arXiv Detail & Related papers (2022-11-28T02:35:04Z) - Transformer-based Model for Word Level Language Identification in
Code-mixed Kannada-English Texts [55.41644538483948]
We propose the use of a Transformer based model for word-level language identification in code-mixed Kannada English texts.
The proposed model on the CoLI-Kenglish dataset achieves a weighted F1-score of 0.84 and a macro F1-score of 0.61.
arXiv Detail & Related papers (2022-11-26T02:39:19Z) - Intent Classification Using Pre-Trained Embeddings For Low Resource
Languages [67.40810139354028]
Building Spoken Language Understanding systems that do not rely on language specific Automatic Speech Recognition is an important yet less explored problem in language processing.
We present a comparative study aimed at employing a pre-trained acoustic model to perform Spoken Language Understanding in low resource scenarios.
We perform experiments across three different languages: English, Sinhala, and Tamil each with different data sizes to simulate high, medium, and low resource scenarios.
arXiv Detail & Related papers (2021-10-18T13:06:59Z) - PSG@HASOC-Dravidian CodeMixFIRE2021: Pretrained Transformers for
Offensive Language Identification in Tanglish [0.0]
This paper describes the system submitted to Dravidian-Codemix-HASOC2021: Hate Speech and Offensive Language Identification in Dravidian languages.
This task aims to identify offensive content in code-mixed comments/posts in Dravidian languages collected from social media.
arXiv Detail & Related papers (2021-10-06T15:23:40Z) - KBCNMUJAL@HASOC-Dravidian-CodeMix-FIRE2020: Using Machine Learning for
Detection of Hate Speech and Offensive Code-Mixed Social Media text [1.0499611180329804]
This paper describes the system submitted by our team, KBCNMUJAL, for Task 2 of the shared task Hate Speech and Offensive Content Identification in Indo-European languages.
The datasets of two Dravidian languages Viz. Malayalam and Tamil of size 4000 observations, each were shared by the HASOC organizers.
The best performing classification models developed for both languages are applied on test datasets.
arXiv Detail & Related papers (2021-02-19T11:08:02Z) - Phonemer at WNUT-2020 Task 2: Sequence Classification Using COVID
Twitter BERT and Bagging Ensemble Technique based on Plurality Voting [0.0]
We develop a system that automatically identifies whether an English Tweet related to the novel coronavirus (COVID-19) is informative or not.
Our final approach achieved an F1-score of 0.9037 and we were ranked sixth overall with F1-score as the evaluation criteria.
arXiv Detail & Related papers (2020-10-01T10:54:54Z) - WESSA at SemEval-2020 Task 9: Code-Mixed Sentiment Analysis using
Transformers [0.0]
We describe our system submitted for SemEval 2020 Task 9, Sentiment Analysis for Code-Mixed Social Media Text.
Our best performing system is a Transfer Learning-based model that fine-tunes "XLM-RoBERTa"
For later submissions, our system manages to achieve a 75.9% average F1-Score on the test set using CodaLab username "ahmed0sultan"
arXiv Detail & Related papers (2020-09-21T13:59:24Z) - IUST at SemEval-2020 Task 9: Sentiment Analysis for Code-Mixed Social
Media Text using Deep Neural Networks and Linear Baselines [6.866104126509981]
We develop a system to predict the sentiment of a given code-mixed tweet.
Our best performing method obtains an F1 score of 0.751 for the Spanish-English sub-task and 0.706 over the Hindi-English sub-task.
arXiv Detail & Related papers (2020-07-24T18:48:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.