ML_LTU at SemEval-2022 Task 4: T5 Towards Identifying Patronizing and
Condescending Language
- URL: http://arxiv.org/abs/2204.07432v1
- Date: Fri, 15 Apr 2022 12:00:25 GMT
- Title: ML_LTU at SemEval-2022 Task 4: T5 Towards Identifying Patronizing and
Condescending Language
- Authors: Tosin Adewumi, Lama Alkhaled, Hamam Alkhaled, Foteini Liwicki and
Marcus Liwicki
- Abstract summary: This paper describes the system used by the Machine Learning Group of LTU in subtask 1 of the SemEval-2022 Task 4: Patronizing and Condescending Language (PCL) Detection.
Our system consists of finetuning a pretrained Text-to-Text-Transfer Transformer (T5) and innovatively reducing its out-of-class predictions.
- Score: 1.3445335428144554
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper describes the system used by the Machine Learning Group of LTU in
subtask 1 of the SemEval-2022 Task 4: Patronizing and Condescending Language
(PCL) Detection. Our system consists of finetuning a pretrained
Text-to-Text-Transfer Transformer (T5) and innovatively reducing its
out-of-class predictions. The main contributions of this paper are 1) the
description of the implementation details of the T5 model we used, 2) analysis
of the successes & struggles of the model in this task, and 3) ablation studies
beyond the official submission to ascertain the relative importance of data
split. Our model achieves an F1 score of 0.5452 on the official test set.
Related papers
- Document Attribution: Examining Citation Relationships using Large Language Models [62.46146670035751]
We propose a zero-shot approach that frames attribution as a straightforward textual entailment task.<n>We also explore the role of the attention mechanism in enhancing the attribution process.
arXiv Detail & Related papers (2025-05-09T04:40:11Z) - Multilingual E5 Text Embeddings: A Technical Report [63.503320030117145]
Three embedding models of different sizes are provided, offering a balance between the inference efficiency and embedding quality.
We introduce a new instruction-tuned embedding model, whose performance is on par with state-of-the-art, English-only models of similar sizes.
arXiv Detail & Related papers (2024-02-08T13:47:50Z) - Document-Level Machine Translation with Large Language Models [91.03359121149595]
Large language models (LLMs) can produce coherent, cohesive, relevant, and fluent answers for various natural language processing (NLP) tasks.
This paper provides an in-depth evaluation of LLMs' ability on discourse modeling.
arXiv Detail & Related papers (2023-04-05T03:49:06Z) - Large Language Models in the Workplace: A Case Study on Prompt
Engineering for Job Type Classification [58.720142291102135]
This case study investigates the task of job classification in a real-world setting.
The goal is to determine whether an English-language job posting is appropriate for a graduate or entry-level position.
arXiv Detail & Related papers (2023-03-13T14:09:53Z) - Conciseness: An Overlooked Language Task [11.940413163824887]
We define the task and show that it is different from related tasks such as summarization and simplification.
We demonstrate that conciseness is a difficult task for which zero-shot setups with large neural language models often do not perform well.
arXiv Detail & Related papers (2022-11-08T09:47:11Z) - Evaluation of Transfer Learning for Polish with a Text-to-Text Model [54.81823151748415]
We introduce a new benchmark for assessing the quality of text-to-text models for Polish.
The benchmark consists of diverse tasks and datasets: KLEJ benchmark adapted for text-to-text, en-pl translation, summarization, and question answering.
We present plT5 - a general-purpose text-to-text model for Polish that can be fine-tuned on various Natural Language Processing (NLP) tasks with a single training objective.
arXiv Detail & Related papers (2022-05-18T09:17:14Z) - RoBLEURT Submission for the WMT2021 Metrics Task [72.26898579202076]
We present our submission to the Shared Metrics Task: RoBLEURT.
Our model reaches state-of-the-art correlations with the WMT 2020 human annotations upon 8 out of 10 to-English language pairs.
arXiv Detail & Related papers (2022-04-28T08:49:40Z) - PALI-NLP at SemEval-2022 Task 4: Discriminative Fine-tuning of Deep
Transformers for Patronizing and Condescending Language Detection [4.883341580669763]
We propose a novel Transformer-based model and its ensembles to accurately understand such language context for PCL detection.
To facilitate comprehension of the subtle and subjective nature of PCL, two fine-tuning strategies are applied.
The system achieves remarkable results on the official ranking, namely 1st in Subtask 1 and 5th in Subtask 2.
arXiv Detail & Related papers (2022-03-09T10:05:10Z) - IITK@Detox at SemEval-2021 Task 5: Semi-Supervised Learning and Dice
Loss for Toxic Spans Detection [2.1012672709024294]
We present our approach and findings for SemEval-2021 Task 5 - Toxic Spans Detection.
The task's main aim was to identify spans to which a given text's toxicity could be attributed.
Our paper investigates two techniques, semi-supervised learning and learning with Self-Adjusting Dice Loss, for tackling these challenges.
arXiv Detail & Related papers (2021-04-04T08:39:55Z) - ZJUKLAB at SemEval-2021 Task 4: Negative Augmentation with Language
Model for Reading Comprehension of Abstract Meaning [16.151203366447962]
We explain the algorithms used to learn our models and the process of tuning the algorithms and selecting the best model.
Inspired by the similarity of the ReCAM task and the language pre-training, we propose a simple yet effective technology, namely, negative augmentation with language model.
Our models achieve the 4th rank on both official test sets of Subtask 1 and Subtask 2 with an accuracy of 87.9% and an accuracy of 92.8%, respectively.
arXiv Detail & Related papers (2021-02-25T13:03:05Z) - mT5: A massively multilingual pre-trained text-to-text transformer [60.0210636815514]
"Text-to-Text Transfer Transformer" (T5) leveraged a unified text-to-text format and scale to attain state-of-the-art results on English-language NLP tasks.
We introduce mT5, a multilingual variant of T5 that was pre-trained on a new Common Crawl-based dataset covering 101 languages.
arXiv Detail & Related papers (2020-10-22T17:58:14Z) - KaLM at SemEval-2020 Task 4: Knowledge-aware Language Models for
Comprehension And Generation [4.94950858749529]
We propose a novel way to search for evidence and choose the different large-scale pre-trained models as the backbone for three subtasks.
The results show that our evidence-searching approach improves model performance on commonsense explanation task.
arXiv Detail & Related papers (2020-05-24T15:09:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.