Duluth at SemEval-2019 Task 6: Lexical Approaches to Identify and
Categorize Offensive Tweets
- URL: http://arxiv.org/abs/2007.12949v1
- Date: Sat, 25 Jul 2020 14:56:10 GMT
- Title: Duluth at SemEval-2019 Task 6: Lexical Approaches to Identify and
Categorize Offensive Tweets
- Authors: Ted Pedersen
- Abstract summary: This paper describes the systems that participated in SemEval- 2019 Task 6, Identifying and Categorizing Offensive Language in Social Media (OffensEval)
For the most part these systems took traditional Machine Learning approaches that built classifiers from lexical features found in manually labeled training data.
Our best systems in each of the three OffensEval tasks placed in the middle of the comparative evaluation, ranking 57th of 103 in task A, 39th of 75 in task B, and 44th of 65 in task C.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper describes the Duluth systems that participated in SemEval--2019
Task 6, Identifying and Categorizing Offensive Language in Social Media
(OffensEval). For the most part these systems took traditional Machine Learning
approaches that built classifiers from lexical features found in manually
labeled training data. However, our most successful system for classifying a
tweet as offensive (or not) was a rule-based black--list approach, and we also
experimented with combining the training data from two different but related
SemEval tasks. Our best systems in each of the three OffensEval tasks placed in
the middle of the comparative evaluation, ranking 57th of 103 in task A, 39th
of 75 in task B, and 44th of 65 in task C.
Related papers
- ThangDLU at #SMM4H 2024: Encoder-decoder models for classifying text data on social disorders in children and adolescents [49.00494558898933]
This paper describes our participation in Task 3 and Task 5 of the #SMM4H (Social Media Mining for Health) 2024 Workshop.
Task 3 is a multi-class classification task centered on tweets discussing the impact of outdoor environments on symptoms of social anxiety.
Task 5 involves a binary classification task focusing on tweets reporting medical disorders in children.
We applied transfer learning from pre-trained encoder-decoder models such as BART-base and T5-small to identify the labels of a set of given tweets.
arXiv Detail & Related papers (2024-04-30T17:06:20Z) - SemEval-2024 Shared Task 6: SHROOM, a Shared-task on Hallucinations and Related Observable Overgeneration Mistakes [48.83290963506378]
This paper presents the results of the SHROOM, a shared task focused on detecting hallucinations.
We observe a number of key trends in how this approach was tackled.
While a majority of the teams did outperform our proposed baseline system, the performances of top-scoring systems are still consistent with a random handling of the more challenging items.
arXiv Detail & Related papers (2024-03-12T15:06:22Z) - MarsEclipse at SemEval-2023 Task 3: Multi-Lingual and Multi-Label
Framing Detection with Contrastive Learning [21.616089539381996]
This paper describes our system for SemEval-2023 Task 3 Subtask 2 on Framing Detection.
We used a multi-label contrastive loss for fine-tuning large pre-trained language models in a multi-lingual setting.
Our system was ranked first on the official test set and on the official shared task leaderboard for five of the six languages.
arXiv Detail & Related papers (2023-04-20T18:42:23Z) - Distant finetuning with discourse relations for stance classification [55.131676584455306]
We propose a new method to extract data with silver labels from raw text to finetune a model for stance classification.
We also propose a 3-stage training framework where the noisy level in the data used for finetuning decreases over different stages.
Our approach ranks 1st among 26 competing teams in the stance classification track of the NLPCC 2021 shared task Argumentative Text Understanding for AI Debater.
arXiv Detail & Related papers (2022-04-27T04:24:35Z) - CAiRE in DialDoc21: Data Augmentation for Information-Seeking Dialogue
System [55.43871578056878]
In DialDoc21 competition, our system achieved 74.95 F1 score and 60.74 Exact Match score in subtask 1, and 37.72 SacreBLEU score in subtask 2.
arXiv Detail & Related papers (2021-06-07T11:40:55Z) - Identifying and Categorizing Offensive Language in Social Media [0.0]
This study provides a description of a classification system built for SemEval 2019 Task 6: OffensEval.
We trained machine learning and deep learning models along with data preprocessing and sampling techniques to come up with the best results.
arXiv Detail & Related papers (2021-04-10T22:53:43Z) - Garain at SemEval-2020 Task 12: Sequence based Deep Learning for
Categorizing Offensive Language in Social Media [3.236217153362305]
SemEval-2020 Task 12 was OffenseEval: Multilingual Offensive Language Identification in Social Media.
My system on training on 25% of the whole dataset macro averaged f1 score of 47.763%.
arXiv Detail & Related papers (2020-09-02T17:09:29Z) - Duluth at SemEval-2020 Task 12: Offensive Tweet Identification in
English with Logistic Regression [0.0]
This paper describes the systems that participated in Duluth SemEval--2020 Task 12, Multilingual Offensive Language Identification in Social Media (OffensEval--2020).
We trained our models on the distantly supervised training data made available by the task organizers and used no other resources.
As might be expected we did not rank highly in the comparative evaluation: 79th of 85 in Task A, 34th of 43 in Task B, and 24th of 39 in Task C.
arXiv Detail & Related papers (2020-07-25T14:49:31Z) - problemConquero at SemEval-2020 Task 12: Transformer and Soft
label-based approaches [2.434159858639793]
We present various systems submitted by our team problemConquero for SemEval-2020 Shared Task 12 Multilingual Offensive Language Identification in Social Media.
We participated in all the three sub-tasks of OffensEval-2020, and our final submissions during the evaluation phase included transformer-based approaches and a soft label-based approach.
arXiv Detail & Related papers (2020-07-21T15:06:58Z) - Device-Robust Acoustic Scene Classification Based on Two-Stage
Categorization and Data Augmentation [63.98724740606457]
We present a joint effort of four groups, namely GT, USTC, Tencent, and UKE, to tackle Task 1 - Acoustic Scene Classification (ASC) in the DCASE 2020 Challenge.
Task 1a focuses on ASC of audio signals recorded with multiple (real and simulated) devices into ten different fine-grained classes.
Task 1b concerns with classification of data into three higher-level classes using low-complexity solutions.
arXiv Detail & Related papers (2020-07-16T15:07:14Z) - Kungfupanda at SemEval-2020 Task 12: BERT-Based Multi-Task Learning for
Offensive Language Detection [55.445023584632175]
We build an offensive language detection system, which combines multi-task learning with BERT-based models.
Our model achieves 91.51% F1 score in English Sub-task A, which is comparable to the first place.
arXiv Detail & Related papers (2020-04-28T11:27:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.