Weigh Your Own Words: Improving Hate Speech Counter Narrative Generation
via Attention Regularization
- URL: http://arxiv.org/abs/2309.02311v1
- Date: Tue, 5 Sep 2023 15:27:22 GMT
- Title: Weigh Your Own Words: Improving Hate Speech Counter Narrative Generation
via Attention Regularization
- Authors: Helena Bonaldi, Giuseppe Attanasio, Debora Nozza, Marco Guerini
- Abstract summary: Recent computational approaches for combating online hate speech involve the automatic generation of counter narratives.
This paper introduces novel attention regularization methodologies to improve the generalization capabilities of PLMs.
Regularized models produce better counter narratives than state-of-the-art approaches in most cases.
- Score: 31.40751207207214
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent computational approaches for combating online hate speech involve the
automatic generation of counter narratives by adapting Pretrained
Transformer-based Language Models (PLMs) with human-curated data. This process,
however, can produce in-domain overfitting, resulting in models generating
acceptable narratives only for hatred similar to training data, with little
portability to other targets or to real-world toxic language. This paper
introduces novel attention regularization methodologies to improve the
generalization capabilities of PLMs for counter narratives generation.
Overfitting to training-specific terms is then discouraged, resulting in more
diverse and richer narratives. We experiment with two attention-based
regularization techniques on a benchmark English dataset. Regularized models
produce better counter narratives than state-of-the-art approaches in most
cases, both in terms of automatic metrics and human evaluation, especially when
hateful targets are not present in the training data. This work paves the way
for better and more flexible counter-speech generation models, a task for which
datasets are highly challenging to produce.
Related papers
- A Target-Aware Analysis of Data Augmentation for Hate Speech Detection [3.858155067958448]
Hate speech is one of the main threats posed by the widespread use of social networks.
We investigate the possibility of augmenting existing data with generative language models, reducing target imbalance.
For some hate categories such as origin, religion, and disability, hate speech classification using augmented data for training improves by more than 10% F1 over the no augmentation baseline.
arXiv Detail & Related papers (2024-10-10T15:46:27Z) - FairFlow: An Automated Approach to Model-based Counterfactual Data Augmentation For NLP [7.41244589428771]
This paper proposes FairFlow, an automated approach to generating parallel data for training counterfactual text generator models.
We show that FairFlow significantly overcomes the limitations of dictionary-based word-substitution approaches whilst maintaining good performance.
arXiv Detail & Related papers (2024-07-23T12:29:37Z) - Fine-tuning Language Models for Factuality [96.5203774943198]
Large pre-trained language models (LLMs) have led to their widespread use, sometimes even as a replacement for traditional search engines.
Yet language models are prone to making convincing but factually inaccurate claims, often referred to as 'hallucinations'
In this work, we fine-tune language models to be more factual, without human labeling.
arXiv Detail & Related papers (2023-11-14T18:59:15Z) - Error Norm Truncation: Robust Training in the Presence of Data Noise for Text Generation Models [39.37532848489779]
We propose Error Norm Truncation (ENT), a robust enhancement method to the standard training objective that truncates noisy data.
We show that ENT improves generation quality over standard training and previous soft and hard truncation methods.
arXiv Detail & Related papers (2023-10-02T01:30:27Z) - Debiasing Vision-Language Models via Biased Prompts [79.04467131711775]
We propose a general approach for debiasing vision-language foundation models by projecting out biased directions in the text embedding.
We show that debiasing only the text embedding with a calibrated projection matrix suffices to yield robust classifiers and fair generative models.
arXiv Detail & Related papers (2023-01-31T20:09:33Z) - Robust Preference Learning for Storytelling via Contrastive
Reinforcement Learning [53.92465205531759]
Controlled automated story generation seeks to generate natural language stories satisfying constraints from natural language critiques or preferences.
We train a contrastive bi-encoder model to align stories with human critiques, building a general purpose preference model.
We further fine-tune the contrastive reward model using a prompt-learning technique to increase story generation robustness.
arXiv Detail & Related papers (2022-10-14T13:21:33Z) - Combating high variance in Data-Scarce Implicit Hate Speech
Classification [0.0]
We develop a novel RoBERTa-based model that achieves state-of-the-art performance.
In this paper, we explore various optimization and regularization techniques and develop a novel RoBERTa-based model that achieves state-of-the-art performance.
arXiv Detail & Related papers (2022-08-29T13:45:21Z) - Towards Generalized Models for Task-oriented Dialogue Modeling on Spoken
Conversations [22.894541507068933]
This paper presents our approach to build generalized models for the Knowledge-grounded Task-oriented Dialogue Modeling on Spoken Conversations Challenge of DSTC-10.
We employ extensive data augmentation strategies on written data, including artificial error injection and round-trip text-speech transformation.
Our approach ranks third on the objective evaluation and second on the final official human evaluation.
arXiv Detail & Related papers (2022-03-08T12:26:57Z) - Towards Language Modelling in the Speech Domain Using Sub-word
Linguistic Units [56.52704348773307]
We propose a novel LSTM-based generative speech LM based on linguistic units including syllables and phonemes.
With a limited dataset, orders of magnitude smaller than that required by contemporary generative models, our model closely approximates babbling speech.
We show the effect of training with auxiliary text LMs, multitask learning objectives, and auxiliary articulatory features.
arXiv Detail & Related papers (2021-10-31T22:48:30Z) - Few-shot learning through contextual data augmentation [74.20290390065475]
Machine translation models need to adapt to new data to maintain their performance over time.
We show that adaptation on the scale of one to five examples is possible.
Our model reports better accuracy scores than a reference system trained with on average 313 parallel examples.
arXiv Detail & Related papers (2021-03-31T09:05:43Z) - Unsupervised Paraphrasing with Pretrained Language Models [85.03373221588707]
We propose a training pipeline that enables pre-trained language models to generate high-quality paraphrases in an unsupervised setting.
Our recipe consists of task-adaptation, self-supervision, and a novel decoding algorithm named Dynamic Blocking.
We show with automatic and human evaluations that our approach achieves state-of-the-art performance on both the Quora Question Pair and the ParaNMT datasets.
arXiv Detail & Related papers (2020-10-24T11:55:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.