Goodtriever: Adaptive Toxicity Mitigation with Retrieval-augmented
Models
- URL: http://arxiv.org/abs/2310.07589v1
- Date: Wed, 11 Oct 2023 15:30:35 GMT
- Title: Goodtriever: Adaptive Toxicity Mitigation with Retrieval-augmented
Models
- Authors: Luiza Pozzobon, Beyza Ermis, Patrick Lewis, Sara Hooker
- Abstract summary: Goodtriever is a flexible methodology that matches the current state-of-the-art toxicity mitigation.
By incorporating a retrieval-based approach at decoding time, Goodtriever enables toxicity-controlled text generation.
- Score: 11.805944680474823
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Considerable effort has been dedicated to mitigating toxicity, but existing
methods often require drastic modifications to model parameters or the use of
computationally intensive auxiliary models. Furthermore, previous approaches
have often neglected the crucial factor of language's evolving nature over
time. In this work, we present a comprehensive perspective on toxicity
mitigation that takes into account its changing nature. We introduce
Goodtriever, a flexible methodology that matches the current state-of-the-art
toxicity mitigation while achieving 43% relative latency reduction during
inference and being more computationally efficient. By incorporating a
retrieval-based approach at decoding time, Goodtriever enables
toxicity-controlled text generation. Our research advocates for an increased
focus on adaptable mitigation techniques, which better reflect the data drift
models face when deployed in the wild. Code and data are available at
https://github.com/for-ai/goodtriever.
Related papers
- Open-Set Deepfake Detection: A Parameter-Efficient Adaptation Method with Forgery Style Mixture [58.60915132222421]
We introduce an approach that is both general and parameter-efficient for face forgery detection.
We design a forgery-style mixture formulation that augments the diversity of forgery source domains.
We show that the designed model achieves state-of-the-art generalizability with significantly reduced trainable parameters.
arXiv Detail & Related papers (2024-08-23T01:53:36Z) - Low-rank finetuning for LLMs: A fairness perspective [54.13240282850982]
Low-rank approximation techniques have become the de facto standard for fine-tuning Large Language Models.
This paper investigates the effectiveness of these methods in capturing the shift of fine-tuning datasets from the initial pre-trained data distribution.
We show that low-rank fine-tuning inadvertently preserves undesirable biases and toxic behaviors.
arXiv Detail & Related papers (2024-05-28T20:43:53Z) - Harmful algal bloom forecasting. A comparison between stream and batch
learning [0.7067443325368975]
Harmful Algal Blooms (HABs) pose risks to public health and the shellfish industry.
This study develops a machine learning workflow for predicting the number of cells of a toxic dinoflagellate.
The model DoME emerged as the most effective and interpretable predictor, outperforming the other algorithms.
arXiv Detail & Related papers (2024-02-20T15:01:11Z) - Deep Ensembles Meets Quantile Regression: Uncertainty-aware Imputation for Time Series [45.76310830281876]
We propose Quantile Sub-Ensembles, a novel method to estimate uncertainty with ensemble of quantile-regression-based task networks.
Our method not only produces accurate imputations that is robust to high missing rates, but also is computationally efficient due to the fast training of its non-generative model.
arXiv Detail & Related papers (2023-12-03T05:52:30Z) - On Practical Aspects of Aggregation Defenses against Data Poisoning
Attacks [58.718697580177356]
Attacks on deep learning models with malicious training samples are known as data poisoning.
Recent advances in defense strategies against data poisoning have highlighted the effectiveness of aggregation schemes in achieving certified poisoning robustness.
Here we focus on Deep Partition Aggregation, a representative aggregation defense, and assess its practical aspects, including efficiency, performance, and robustness.
arXiv Detail & Related papers (2023-06-28T17:59:35Z) - Temporal Robustness against Data Poisoning [69.01705108817785]
Data poisoning considers cases when an adversary manipulates the behavior of machine learning algorithms through malicious training data.
We propose a temporal threat model of data poisoning with two novel metrics, earliness and duration, which respectively measure how long an attack started in advance and how long an attack lasted.
arXiv Detail & Related papers (2023-02-07T18:59:19Z) - Cyberbullying Classifiers are Sensitive to Model-Agnostic Perturbations [15.152559543181523]
This study is the first to investigate the effect of adversarial behavior and augmentation for cyberbullying detection.
We demonstrate that model-agnostic lexical substitutions significantly hurt performance.
Augmentations proposed in prior work on toxicity prove to be less effective.
arXiv Detail & Related papers (2022-01-17T12:48:27Z) - ToxCCIn: Toxic Content Classification with Interpretability [16.153683223016973]
Explanations are important for tasks like offensive language or toxicity detection on social media.
We propose a technique to improve the interpretability of transformer models, based on a simple and powerful assumption.
We find this approach effective and can produce explanations that exceed the quality of those provided by Logistic Regression analysis.
arXiv Detail & Related papers (2021-03-01T22:17:10Z) - An Optimal Control Approach to Learning in SIDARTHE Epidemic model [67.22168759751541]
We propose a general approach for learning time-variant parameters of dynamic compartmental models from epidemic data.
We forecast the epidemic evolution in Italy and France.
arXiv Detail & Related papers (2020-10-28T10:58:59Z) - RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language
Models [93.151822563361]
Pretrained neural language models (LMs) are prone to generating racist, sexist, or otherwise toxic language which hinders their safe deployment.
We investigate the extent to which pretrained LMs can be prompted to generate toxic language, and the effectiveness of controllable text generation algorithms at preventing such toxic degeneration.
arXiv Detail & Related papers (2020-09-24T03:17:19Z) - DeepHazard: neural network for time-varying risks [0.6091702876917281]
We propose a new flexible method for survival prediction: DeepHazard, a neural network for time-varying risks.
Our approach is tailored for a wide range of continuous hazards forms, with the only restriction of being additive in time.
Numerical examples illustrate that our approach outperforms existing state-of-the-art methodology in terms of predictive capability evaluated through the C-index metric.
arXiv Detail & Related papers (2020-07-26T21:01:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.