Duluth at SemEval-2020 Task 12: Offensive Tweet Identification in
English with Logistic Regression
- URL: http://arxiv.org/abs/2007.12946v1
- Date: Sat, 25 Jul 2020 14:49:31 GMT
- Title: Duluth at SemEval-2020 Task 12: Offensive Tweet Identification in
English with Logistic Regression
- Authors: Ted Pedersen
- Abstract summary: This paper describes the systems that participated in Duluth SemEval--2020 Task 12, Multilingual Offensive Language Identification in Social Media (OffensEval--2020).
We trained our models on the distantly supervised training data made available by the task organizers and used no other resources.
As might be expected we did not rank highly in the comparative evaluation: 79th of 85 in Task A, 34th of 43 in Task B, and 24th of 39 in Task C.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper describes the Duluth systems that participated in SemEval--2020
Task 12, Multilingual Offensive Language Identification in Social Media
(OffensEval--2020). We participated in the three English language tasks. Our
systems provide a simple Machine Learning baseline using logistic regression.
We trained our models on the distantly supervised training data made available
by the task organizers and used no other resources. As might be expected we did
not rank highly in the comparative evaluation: 79th of 85 in Task A, 34th of 43
in Task B, and 24th of 39 in Task C. We carried out a qualitative analysis of
our results and found that the class labels in the gold standard data are
somewhat noisy. We hypothesize that the extremely high accuracy (> 90%) of the
top ranked systems may reflect methods that learn the training data very well
but may not generalize to the task of identifying offensive language in
English. This analysis includes examples of tweets that despite being mildly
redacted are still offensive.
Related papers
- Unify word-level and span-level tasks: NJUNLP's Participation for the
WMT2023 Quality Estimation Shared Task [59.46906545506715]
We introduce the NJUNLP team to the WMT 2023 Quality Estimation (QE) shared task.
Our team submitted predictions for the English-German language pair on all two sub-tasks.
Our models achieved the best results in English-German for both word-level and fine-grained error span detection sub-tasks.
arXiv Detail & Related papers (2023-09-23T01:52:14Z) - Overview of Abusive and Threatening Language Detection in Urdu at FIRE
2021 [50.591267188664666]
We present two shared tasks of abusive and threatening language detection for the Urdu language.
We present two manually annotated datasets containing tweets labelled as (i) Abusive and Non-Abusive, and (ii) Threatening and Non-Threatening.
For both subtasks, m-Bert based transformer model showed the best performance.
arXiv Detail & Related papers (2022-07-14T07:38:13Z) - No Language Left Behind: Scaling Human-Centered Machine Translation [69.28110770760506]
We create datasets and models aimed at narrowing the performance gap between low and high-resource languages.
We propose multiple architectural and training improvements to counteract overfitting while training on thousands of tasks.
Our model achieves an improvement of 44% BLEU relative to the previous state-of-the-art.
arXiv Detail & Related papers (2022-07-11T07:33:36Z) - HFL at SemEval-2022 Task 8: A Linguistics-inspired Regression Model with
Data Augmentation for Multilingual News Similarity [16.454545004093735]
This paper describes our system designed for SemEval-2022 Task 8: Multilingual News Article Similarity.
We proposed a linguistics-inspired model trained with a few task-specific strategies.
Our system ranked 1st on the leaderboard while achieving a Pearson's Correlation Coefficient of 0.818 on the official evaluation set.
arXiv Detail & Related papers (2022-04-11T03:08:37Z) - Intent Classification Using Pre-Trained Embeddings For Low Resource
Languages [67.40810139354028]
Building Spoken Language Understanding systems that do not rely on language specific Automatic Speech Recognition is an important yet less explored problem in language processing.
We present a comparative study aimed at employing a pre-trained acoustic model to perform Spoken Language Understanding in low resource scenarios.
We perform experiments across three different languages: English, Sinhala, and Tamil each with different data sizes to simulate high, medium, and low resource scenarios.
arXiv Detail & Related papers (2021-10-18T13:06:59Z) - Galileo at SemEval-2020 Task 12: Multi-lingual Learning for Offensive
Language Identification using Pre-trained Language Models [11.868582973877626]
This paper describes Galileo's performance in SemEval-2020 Task 12 on detecting and categorizing offensive language in social media.
For Offensive Language Identification, we proposed a multi-lingual method using Pre-trained Language Models, ERNIE and XLM-R.
For offensive language categorization, we proposed a knowledge distillation method trained on soft labels generated by several supervised models.
arXiv Detail & Related papers (2020-10-07T17:40:19Z) - Garain at SemEval-2020 Task 12: Sequence based Deep Learning for
Categorizing Offensive Language in Social Media [3.236217153362305]
SemEval-2020 Task 12 was OffenseEval: Multilingual Offensive Language Identification in Social Media.
My system on training on 25% of the whole dataset macro averaged f1 score of 47.763%.
arXiv Detail & Related papers (2020-09-02T17:09:29Z) - Meta-Learning with Context-Agnostic Initialisations [86.47040878540139]
We introduce a context-adversarial component into the meta-learning process.
This produces an initialisation for fine-tuning to target which is context-agnostic and task-generalised.
We evaluate our approach on three commonly used meta-learning algorithms and two problems.
arXiv Detail & Related papers (2020-07-29T08:08:38Z) - Duluth at SemEval-2019 Task 6: Lexical Approaches to Identify and
Categorize Offensive Tweets [0.0]
This paper describes the systems that participated in SemEval- 2019 Task 6, Identifying and Categorizing Offensive Language in Social Media (OffensEval)
For the most part these systems took traditional Machine Learning approaches that built classifiers from lexical features found in manually labeled training data.
Our best systems in each of the three OffensEval tasks placed in the middle of the comparative evaluation, ranking 57th of 103 in task A, 39th of 75 in task B, and 44th of 65 in task C.
arXiv Detail & Related papers (2020-07-25T14:56:10Z) - Kungfupanda at SemEval-2020 Task 12: BERT-Based Multi-Task Learning for
Offensive Language Detection [55.445023584632175]
We build an offensive language detection system, which combines multi-task learning with BERT-based models.
Our model achieves 91.51% F1 score in English Sub-task A, which is comparable to the first place.
arXiv Detail & Related papers (2020-04-28T11:27:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.