Goodtriever: Adaptive Toxicity Mitigation with Retrieval-augmented
Models
- URL: http://arxiv.org/abs/2310.07589v1
- Date: Wed, 11 Oct 2023 15:30:35 GMT
- Title: Goodtriever: Adaptive Toxicity Mitigation with Retrieval-augmented
Models
- Authors: Luiza Pozzobon, Beyza Ermis, Patrick Lewis, Sara Hooker
- Abstract summary: Goodtriever is a flexible methodology that matches the current state-of-the-art toxicity mitigation.
By incorporating a retrieval-based approach at decoding time, Goodtriever enables toxicity-controlled text generation.
- Score: 11.805944680474823
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Considerable effort has been dedicated to mitigating toxicity, but existing
methods often require drastic modifications to model parameters or the use of
computationally intensive auxiliary models. Furthermore, previous approaches
have often neglected the crucial factor of language's evolving nature over
time. In this work, we present a comprehensive perspective on toxicity
mitigation that takes into account its changing nature. We introduce
Goodtriever, a flexible methodology that matches the current state-of-the-art
toxicity mitigation while achieving 43% relative latency reduction during
inference and being more computationally efficient. By incorporating a
retrieval-based approach at decoding time, Goodtriever enables
toxicity-controlled text generation. Our research advocates for an increased
focus on adaptable mitigation techniques, which better reflect the data drift
models face when deployed in the wild. Code and data are available at
https://github.com/for-ai/goodtriever.
Related papers
- Low-rank finetuning for LLMs: A fairness perspective [54.13240282850982]
Low-rank approximation techniques have become the de facto standard for fine-tuning Large Language Models.
This paper investigates the effectiveness of these methods in capturing the shift of fine-tuning datasets from the initial pre-trained data distribution.
We show that low-rank fine-tuning inadvertently preserves undesirable biases and toxic behaviors.
arXiv Detail & Related papers (2024-05-28T20:43:53Z) - Harmful algal bloom forecasting. A comparison between stream and batch
learning [0.7067443325368975]
Harmful Algal Blooms (HABs) pose risks to public health and the shellfish industry.
This study develops a machine learning workflow for predicting the number of cells of a toxic dinoflagellate.
The model DoME emerged as the most effective and interpretable predictor, outperforming the other algorithms.
arXiv Detail & Related papers (2024-02-20T15:01:11Z) - Deep Ensembles Meets Quantile Regression: Uncertainty-aware Imputation
for Time Series [49.992908221544624]
Time series data often exhibit numerous missing values, which is the time series imputation task.
Previous deep learning methods have been shown to be effective for time series imputation.
We propose a non-generative time series imputation method that produces accurate imputations with inherent uncertainty.
arXiv Detail & Related papers (2023-12-03T05:52:30Z) - On Practical Aspects of Aggregation Defenses against Data Poisoning
Attacks [58.718697580177356]
Attacks on deep learning models with malicious training samples are known as data poisoning.
Recent advances in defense strategies against data poisoning have highlighted the effectiveness of aggregation schemes in achieving certified poisoning robustness.
Here we focus on Deep Partition Aggregation, a representative aggregation defense, and assess its practical aspects, including efficiency, performance, and robustness.
arXiv Detail & Related papers (2023-06-28T17:59:35Z) - Temporal Robustness against Data Poisoning [69.01705108817785]
Data poisoning considers cases when an adversary manipulates the behavior of machine learning algorithms through malicious training data.
We propose a temporal threat model of data poisoning with two novel metrics, earliness and duration, which respectively measure how long an attack started in advance and how long an attack lasted.
arXiv Detail & Related papers (2023-02-07T18:59:19Z) - Cyberbullying Classifiers are Sensitive to Model-Agnostic Perturbations [15.152559543181523]
This study is the first to investigate the effect of adversarial behavior and augmentation for cyberbullying detection.
We demonstrate that model-agnostic lexical substitutions significantly hurt performance.
Augmentations proposed in prior work on toxicity prove to be less effective.
arXiv Detail & Related papers (2022-01-17T12:48:27Z) - ToxCCIn: Toxic Content Classification with Interpretability [16.153683223016973]
Explanations are important for tasks like offensive language or toxicity detection on social media.
We propose a technique to improve the interpretability of transformer models, based on a simple and powerful assumption.
We find this approach effective and can produce explanations that exceed the quality of those provided by Logistic Regression analysis.
arXiv Detail & Related papers (2021-03-01T22:17:10Z) - Gaussian Process Nowcasting: Application to COVID-19 Mortality Reporting [2.8712862578745018]
Updating observations of a signal due to the delays in the measurement process is a common problem in signal processing.
We present a flexible approach using a latent Gaussian process that is capable of describing the changing auto-correlation structure present in the reporting time-delay surface.
This approach also yields robust estimates of uncertainty for the estimated nowcasted numbers of deaths.
arXiv Detail & Related papers (2021-02-22T18:32:44Z) - An Optimal Control Approach to Learning in SIDARTHE Epidemic model [67.22168759751541]
We propose a general approach for learning time-variant parameters of dynamic compartmental models from epidemic data.
We forecast the epidemic evolution in Italy and France.
arXiv Detail & Related papers (2020-10-28T10:58:59Z) - RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language
Models [93.151822563361]
Pretrained neural language models (LMs) are prone to generating racist, sexist, or otherwise toxic language which hinders their safe deployment.
We investigate the extent to which pretrained LMs can be prompted to generate toxic language, and the effectiveness of controllable text generation algorithms at preventing such toxic degeneration.
arXiv Detail & Related papers (2020-09-24T03:17:19Z) - DeepHazard: neural network for time-varying risks [0.6091702876917281]
We propose a new flexible method for survival prediction: DeepHazard, a neural network for time-varying risks.
Our approach is tailored for a wide range of continuous hazards forms, with the only restriction of being additive in time.
Numerical examples illustrate that our approach outperforms existing state-of-the-art methodology in terms of predictive capability evaluated through the C-index metric.
arXiv Detail & Related papers (2020-07-26T21:01:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.