Related papers: Goodtriever: Adaptive Toxicity Mitigation with Retrieval-augmented Models

Goodtriever: Adaptive Toxicity Mitigation with Retrieval-augmented Models

URL: http://arxiv.org/abs/2310.07589v1
Date: Wed, 11 Oct 2023 15:30:35 GMT
Title: Goodtriever: Adaptive Toxicity Mitigation with Retrieval-augmented Models
Authors: Luiza Pozzobon, Beyza Ermis, Patrick Lewis, Sara Hooker
Abstract summary: Goodtriever is a flexible methodology that matches the current state-of-the-art toxicity mitigation. By incorporating a retrieval-based approach at decoding time, Goodtriever enables toxicity-controlled text generation.
Score: 11.805944680474823
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Considerable effort has been dedicated to mitigating toxicity, but existing methods often require drastic modifications to model parameters or the use of computationally intensive auxiliary models. Furthermore, previous approaches have often neglected the crucial factor of language's evolving nature over time. In this work, we present a comprehensive perspective on toxicity mitigation that takes into account its changing nature. We introduce Goodtriever, a flexible methodology that matches the current state-of-the-art toxicity mitigation while achieving 43% relative latency reduction during inference and being more computationally efficient. By incorporating a retrieval-based approach at decoding time, Goodtriever enables toxicity-controlled text generation. Our research advocates for an increased focus on adaptable mitigation techniques, which better reflect the data drift models face when deployed in the wild. Code and data are available at https://github.com/for-ai/goodtriever.

Related papers

Text Detoxification: Data Efficiency, Semantic Preservation and Model Generalization [23.328207651816957]
The dissemination of toxic content on social media poses a serious threat to online environments and public discourse.<n>Existing approaches often struggle to simultaneously achieve strong detoxification performance, semantic preservation, and to out-of-distribution data.<n>We propose a two-stage training framework that jointly optimize for data efficiency, semantic preservation, and model generalization.
arXiv Detail & Related papers (2025-06-23T05:48:10Z)
Model-agnostic Mitigation Strategies of Data Imbalance for Regression [0.0]
Data imbalance persists as a pervasive challenge in regression tasks, introducing bias in model performance and undermining predictive reliability.<n>We present advanced mitigation techniques, which build upon and improve existing sampling methods.<n>We demonstrate that constructing an ensemble of models -- one trained with imbalance mitigation and another without -- can significantly reduce these negative effects.
arXiv Detail & Related papers (2025-06-02T09:46:08Z)
When Bad Data Leads to Good Models [44.897123018926486]
In large language model (LLM) pretraining, data quality is believed to determine model quality.<n>We re-examine the notion of "quality" from the perspective of pre- and post-training co-design.
arXiv Detail & Related papers (2025-05-07T19:17:49Z)
Self-Consistent Equation-guided Neural Networks for Censored Time-to-Event Data [11.550402345767141]
We propose a novel approach to non-parametric estimation of the conditional survival functions using the generative adversarial networks leveraging self-consistent equations. The proposed method is model-free and does not require any parametric assumptions on the structure of the conditional survival function.
arXiv Detail & Related papers (2025-03-12T06:24:35Z)
Efficient Fine-Tuning and Concept Suppression for Pruned Diffusion Models [93.76814568163353]
We propose a novel bilevel optimization framework for pruned diffusion models. This framework consolidates the fine-tuning and unlearning processes into a unified phase. It is compatible with various pruning and concept unlearning methods.
arXiv Detail & Related papers (2024-12-19T19:13:18Z)
Open-Set Deepfake Detection: A Parameter-Efficient Adaptation Method with Forgery Style Mixture [58.60915132222421]
We introduce an approach that is both general and parameter-efficient for face forgery detection. We design a forgery-style mixture formulation that augments the diversity of forgery source domains. We show that the designed model achieves state-of-the-art generalizability with significantly reduced trainable parameters.
arXiv Detail & Related papers (2024-08-23T01:53:36Z)
Low-rank finetuning for LLMs: A fairness perspective [54.13240282850982]
Low-rank approximation techniques have become the de facto standard for fine-tuning Large Language Models. This paper investigates the effectiveness of these methods in capturing the shift of fine-tuning datasets from the initial pre-trained data distribution. We show that low-rank fine-tuning inadvertently preserves undesirable biases and toxic behaviors.
arXiv Detail & Related papers (2024-05-28T20:43:53Z)
Harmful algal bloom forecasting. A comparison between stream and batch learning [0.7067443325368975]
Harmful Algal Blooms (HABs) pose risks to public health and the shellfish industry. This study develops a machine learning workflow for predicting the number of cells of a toxic dinoflagellate. The model DoME emerged as the most effective and interpretable predictor, outperforming the other algorithms.
arXiv Detail & Related papers (2024-02-20T15:01:11Z)
Deep Ensembles Meets Quantile Regression: Uncertainty-aware Imputation for Time Series [45.76310830281876]
We propose Quantile Sub-Ensembles, a novel method to estimate uncertainty with ensemble of quantile-regression-based task networks. Our method not only produces accurate imputations that is robust to high missing rates, but also is computationally efficient due to the fast training of its non-generative model.
arXiv Detail & Related papers (2023-12-03T05:52:30Z)
On Practical Aspects of Aggregation Defenses against Data Poisoning Attacks [58.718697580177356]
Attacks on deep learning models with malicious training samples are known as data poisoning. Recent advances in defense strategies against data poisoning have highlighted the effectiveness of aggregation schemes in achieving certified poisoning robustness. Here we focus on Deep Partition Aggregation, a representative aggregation defense, and assess its practical aspects, including efficiency, performance, and robustness.
arXiv Detail & Related papers (2023-06-28T17:59:35Z)
Temporal Robustness against Data Poisoning [69.01705108817785]
Data poisoning considers cases when an adversary manipulates the behavior of machine learning algorithms through malicious training data. We propose a temporal threat model of data poisoning with two novel metrics, earliness and duration, which respectively measure how long an attack started in advance and how long an attack lasted.
arXiv Detail & Related papers (2023-02-07T18:59:19Z)
Cyberbullying Classifiers are Sensitive to Model-Agnostic Perturbations [15.152559543181523]
This study is the first to investigate the effect of adversarial behavior and augmentation for cyberbullying detection. We demonstrate that model-agnostic lexical substitutions significantly hurt performance. Augmentations proposed in prior work on toxicity prove to be less effective.
arXiv Detail & Related papers (2022-01-17T12:48:27Z)
ToxCCIn: Toxic Content Classification with Interpretability [16.153683223016973]
Explanations are important for tasks like offensive language or toxicity detection on social media. We propose a technique to improve the interpretability of transformer models, based on a simple and powerful assumption. We find this approach effective and can produce explanations that exceed the quality of those provided by Logistic Regression analysis.
arXiv Detail & Related papers (2021-03-01T22:17:10Z)
An Optimal Control Approach to Learning in SIDARTHE Epidemic model [67.22168759751541]
We propose a general approach for learning time-variant parameters of dynamic compartmental models from epidemic data. We forecast the epidemic evolution in Italy and France.
arXiv Detail & Related papers (2020-10-28T10:58:59Z)
RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models [93.151822563361]
Pretrained neural language models (LMs) are prone to generating racist, sexist, or otherwise toxic language which hinders their safe deployment. We investigate the extent to which pretrained LMs can be prompted to generate toxic language, and the effectiveness of controllable text generation algorithms at preventing such toxic degeneration.
arXiv Detail & Related papers (2020-09-24T03:17:19Z)
DeepHazard: neural network for time-varying risks [0.6091702876917281]
We propose a new flexible method for survival prediction: DeepHazard, a neural network for time-varying risks. Our approach is tailored for a wide range of continuous hazards forms, with the only restriction of being additive in time. Numerical examples illustrate that our approach outperforms existing state-of-the-art methodology in terms of predictive capability evaluated through the C-index metric.
arXiv Detail & Related papers (2020-07-26T21:01:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.