UMass PCL at SemEval-2022 Task 4: Pre-trained Language Model Ensembles
for Detecting Patronizing and Condescending Language
- URL: http://arxiv.org/abs/2204.08304v1
- Date: Mon, 18 Apr 2022 13:22:10 GMT
- Title: UMass PCL at SemEval-2022 Task 4: Pre-trained Language Model Ensembles
for Detecting Patronizing and Condescending Language
- Authors: David Koleczek, Alex Scarlatos, Siddha Karakare, Preshma Linet Pereira
- Abstract summary: Patronizing and condescending language (PCL) is everywhere, but rarely is the focus on its use by media towards vulnerable communities.
In this paper, we describe our system for detecting such language which was submitted to SemEval 2022 Task 4: Patronizing and Condescending Language Detection.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Patronizing and condescending language (PCL) is everywhere, but rarely is the
focus on its use by media towards vulnerable communities. Accurately detecting
PCL of this form is a difficult task due to limited labeled data and how subtle
it can be. In this paper, we describe our system for detecting such language
which was submitted to SemEval 2022 Task 4: Patronizing and Condescending
Language Detection. Our approach uses an ensemble of pre-trained language
models, data augmentation, and optimizing the threshold for detection.
Experimental results on the evaluation dataset released by the competition
hosts show that our work is reliably able to detect PCL, achieving an F1 score
of 55.47% on the binary classification task and a macro F1 score of 36.25% on
the fine-grained, multi-label detection task.
Related papers
- KInIT at SemEval-2024 Task 8: Fine-tuned LLMs for Multilingual Machine-Generated Text Detection [0.0]
SemEval-2024 Task 8 is focused on multigenerator, multidomain, and multilingual black-box machine-generated text detection.
Our submitted method achieved competitive results, ranking at the fourth place, just under 1 percentage point behind the winner.
arXiv Detail & Related papers (2024-02-21T10:09:56Z) - DeMuX: Data-efficient Multilingual Learning [57.37123046817781]
DEMUX is a framework that prescribes exact data-points to label from vast amounts of unlabelled multilingual data.
Our end-to-end framework is language-agnostic, accounts for model representations, and supports multilingual target configurations.
arXiv Detail & Related papers (2023-11-10T20:09:08Z) - BEIKE NLP at SemEval-2022 Task 4: Prompt-Based Paragraph Classification
for Patronizing and Condescending Language Detection [13.944149742291788]
PCL detection task is aimed at identifying language that is patronizing or condescending towards vulnerable communities in the general media.
In this paper, we give an introduction to our solution, which exploits the power of prompt-based learning on paragraph classification.
arXiv Detail & Related papers (2022-08-02T08:38:47Z) - Bridging the Gap between Language Models and Cross-Lingual Sequence
Labeling [101.74165219364264]
Large-scale cross-lingual pre-trained language models (xPLMs) have shown effectiveness in cross-lingual sequence labeling tasks.
Despite the great success, we draw an empirical observation that there is a training objective gap between pre-training and fine-tuning stages.
In this paper, we first design a pre-training task tailored for xSL named Cross-lingual Language Informative Span Masking (CLISM) to eliminate the objective gap.
Second, we present ContrAstive-Consistency Regularization (CACR), which utilizes contrastive learning to encourage the consistency between representations of input parallel
arXiv Detail & Related papers (2022-04-11T15:55:20Z) - Few-Shot Cross-lingual Transfer for Coarse-grained De-identification of
Code-Mixed Clinical Texts [56.72488923420374]
Pre-trained language models (LMs) have shown great potential for cross-lingual transfer in low-resource settings.
We show the few-shot cross-lingual transfer property of LMs for named recognition (NER) and apply it to solve a low-resource and real-world challenge of code-mixed (Spanish-Catalan) clinical notes de-identification in the stroke.
arXiv Detail & Related papers (2022-04-10T21:46:52Z) - SATLab at SemEval-2022 Task 4: Trying to Detect Patronizing and
Condescending Language with only Character and Word N-grams [0.0]
A logistic regression model only fed with character and word n-grams is proposed for the SemEval-2022 Task 4 on Patronizing and Condescending Language Detection.
It obtained an average level of performance, well above the performance of a system that tries to guess without using any knowledge about the task, but much lower than the best teams.
arXiv Detail & Related papers (2022-03-10T13:09:48Z) - A Hierarchical Model for Spoken Language Recognition [29.948719321162883]
Spoken language recognition ( SLR) refers to the automatic process used to determine the language present in a speech sample.
We propose a novel hierarchical approach were two PLDA models are trained, one to generate scores for clusters of highly related languages and a second one to generate scores conditional to each cluster.
We show that this hierarchical approach consistently outperforms the non-hierarchical one for detection of highly related languages.
arXiv Detail & Related papers (2022-01-04T22:10:36Z) - On Cross-Lingual Retrieval with Multilingual Text Encoders [51.60862829942932]
We study the suitability of state-of-the-art multilingual encoders for cross-lingual document and sentence retrieval tasks.
We benchmark their performance in unsupervised ad-hoc sentence- and document-level CLIR experiments.
We evaluate multilingual encoders fine-tuned in a supervised fashion (i.e., we learn to rank) on English relevance data in a series of zero-shot language and domain transfer CLIR experiments.
arXiv Detail & Related papers (2021-12-21T08:10:27Z) - From Good to Best: Two-Stage Training for Cross-lingual Machine Reading
Comprehension [51.953428342923885]
We develop a two-stage approach to enhance the model performance.
The first stage targets at recall: we design a hard-learning (HL) algorithm to maximize the likelihood that the top-k predictions contain the accurate answer.
The second stage focuses on precision: an answer-aware contrastive learning mechanism is developed to learn the fine difference between the accurate answer and other candidates.
arXiv Detail & Related papers (2021-12-09T07:31:15Z) - UPV at CheckThat! 2021: Mitigating Cultural Differences for Identifying
Multilingual Check-worthy Claims [6.167830237917659]
In this paper, we propose a language identification task as an auxiliary task to mitigate unintended bias.
Our results show that joint training of language identification and check-worthy claim detection tasks can provide performance gains for some of the selected languages.
arXiv Detail & Related papers (2021-09-19T21:46:16Z) - Kungfupanda at SemEval-2020 Task 12: BERT-Based Multi-Task Learning for
Offensive Language Detection [55.445023584632175]
We build an offensive language detection system, which combines multi-task learning with BERT-based models.
Our model achieves 91.51% F1 score in English Sub-task A, which is comparable to the first place.
arXiv Detail & Related papers (2020-04-28T11:27:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.