Related papers: People Make Better Edits: Measuring the Efficacy of LLM-Generated Counterfactually Augmented Data for Harmful Language Detection

People Make Better Edits: Measuring the Efficacy of LLM-Generated Counterfactually Augmented Data for Harmful Language Detection

URL: http://arxiv.org/abs/2311.01270v3
Date: Sun, 25 Feb 2024 11:17:42 GMT
Title: People Make Better Edits: Measuring the Efficacy of LLM-Generated Counterfactually Augmented Data for Harmful Language Detection
Authors: Indira Sen, Dennis Assenmacher, Mattia Samory, Isabelle Augenstein, Wil van der Aalst, Claudia Wagner
Abstract summary: It is imperative that NLP models are robust to spurious features. Past work has attempted to tackle such spurious features using training data augmentation. We assess if this task can be automated using generative NLP models.
Score: 35.89913036572029
License: http://creativecommons.org/licenses/by/4.0/
Abstract: NLP models are used in a variety of critical social computing tasks, such as detecting sexist, racist, or otherwise hateful content. Therefore, it is imperative that these models are robust to spurious features. Past work has attempted to tackle such spurious features using training data augmentation, including Counterfactually Augmented Data (CADs). CADs introduce minimal changes to existing training data points and flip their labels; training on them may reduce model dependency on spurious features. However, manually generating CADs can be time-consuming and expensive. Hence in this work, we assess if this task can be automated using generative NLP models. We automatically generate CADs using Polyjuice, ChatGPT, and Flan-T5, and evaluate their usefulness in improving model robustness compared to manually-generated CADs. By testing both model performance on multiple out-of-domain test sets and individual data point efficacy, our results show that while manual CADs are still the most effective, CADs generated by ChatGPT come a close second. One key reason for the lower performance of automated methods is that the changes they introduce are often insufficient to flip the original label.

Related papers

Learning to Solve and Verify: A Self-Play Framework for Code and Test Generation [69.62857948698436]
Recent advances in large language models (LLMs) have improved their performance on coding benchmarks. However, improvement is plateauing due to the exhaustion of readily available high-quality data. We propose Sol-Ver, a self-play solver-verifier framework that jointly improves a single model's code and test generation capacity.
arXiv Detail & Related papers (2025-02-20T18:32:19Z)
BlenderLLM: Training Large Language Models for Computer-Aided Design with Self-improvement [45.19076032719869]
We present BlenderLLM, a framework for training Large Language Models (LLMs) in Computer-Aided Design (CAD) Our results reveal that existing models demonstrate significant limitations in generating accurate CAD scripts. Through minimal instruction-based fine-tuning and iterative self-improvement, BlenderLLM significantly surpasses these models in both functionality and accuracy of CAD script generation.
arXiv Detail & Related papers (2024-12-16T14:34:02Z)
Forewarned is Forearmed: Leveraging LLMs for Data Synthesis through Failure-Inducing Exploration [90.41908331897639]
Large language models (LLMs) have significantly benefited from training on diverse, high-quality task-specific data. We present a novel approach, ReverseGen, designed to automatically generate effective training samples.
arXiv Detail & Related papers (2024-10-22T06:43:28Z)
PairCFR: Enhancing Model Training on Paired Counterfactually Augmented Data through Contrastive Learning [49.60634126342945]
Counterfactually Augmented Data (CAD) involves creating new data samples by applying minimal yet sufficient modifications to flip the label of existing data samples to other classes. Recent research reveals that training with CAD may lead models to overly focus on modified features while ignoring other important contextual information. We employ contrastive learning to promote global feature alignment in addition to learning counterfactual clues.
arXiv Detail & Related papers (2024-06-09T07:29:55Z)
With a Little Push, NLI Models can Robustly and Efficiently Predict Faithfulness [19.79160738554967]
Conditional language models still generate unfaithful output that is not supported by their input. We show that pure NLI models can outperform more complex metrics when combining task-adaptive data augmentation with robust inference procedures.
arXiv Detail & Related papers (2023-05-26T11:00:04Z)
AutoCAD: Automatically Generating Counterfactuals for Mitigating Shortcut Learning [70.70393006697383]
We present AutoCAD, a fully automatic and task-agnostic CAD generation framework. In this paper, we present AutoCAD, a fully automatic and task-agnostic CAD generation framework.
arXiv Detail & Related papers (2022-11-29T13:39:53Z)
Discover, Explanation, Improvement: An Automatic Slice Detection Framework for Natural Language Processing [72.14557106085284]
slice detection models (SDM) automatically identify underperforming groups of datapoints. This paper proposes a benchmark named "Discover, Explain, improve (DEIM)" for classification NLP tasks. Our evaluation shows that Edisa can accurately select error-prone datapoints with informative semantic features.
arXiv Detail & Related papers (2022-11-08T19:00:00Z)
Counterfactually Augmented Data and Unintended Bias: The Case of Sexism and Hate Speech Detection [35.29235215101502]
Over-relying on core features may lead to unintended model bias. We test models for sexism and hate speech detection on challenging data. Using a diverse set of CAD -- construct-driven and construct-agnostic -- reduces such unintended bias.
arXiv Detail & Related papers (2022-05-09T12:39:26Z)
How Does Counterfactually Augmented Data Impact Models for Social Computing Constructs? [35.29235215101502]
We investigate the benefits of counterfactually augmented data (CAD) for social NLP models by focusing on three social computing constructs -- sentiment, sexism, and hate speech. We find that while models trained on CAD show lower in-domain performance, they generalize better out-of-domain.
arXiv Detail & Related papers (2021-09-14T23:46:39Z)
Machine Unlearning of Features and Labels [72.81914952849334]
We propose first scenarios for unlearning and labels in machine learning models. Our approach builds on the concept of influence functions and realizes unlearning through closed-form updates of model parameters.
arXiv Detail & Related papers (2021-08-26T04:42:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.