Unified Detoxifying and Debiasing in Language Generation via
Inference-time Adaptive Optimization
- URL: http://arxiv.org/abs/2210.04492v2
- Date: Fri, 2 Jun 2023 04:13:08 GMT
- Title: Unified Detoxifying and Debiasing in Language Generation via
Inference-time Adaptive Optimization
- Authors: Zonghan Yang, Xiaoyuan Yi, Peng Li, Yang Liu, Xing Xie
- Abstract summary: Pre-trained language models (PLMs) have prospered in various natural language generation (NLG) tasks due to their ability to generate fairly fluent text.
These models are observed to capture and reproduce harmful contents in training corpora, typically toxic language and social biases, raising severe moral issues.
We propose the first unified framework of detoxifying and debiasing called UDDIA, which jointly formalizes these two problems as rectifying the output space.
- Score: 32.50246008433889
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Warning: this paper contains model outputs exhibiting offensiveness and
biases. Recently pre-trained language models (PLMs) have prospered in various
natural language generation (NLG) tasks due to their ability to generate fairly
fluent text. Nevertheless, these models are observed to capture and reproduce
harmful contents in training corpora, typically toxic language and social
biases, raising severe moral issues. Prior works on ethical NLG tackle
detoxifying and debiasing separately, which is problematic since we find
debiased models still exhibit toxicity while detoxified ones even exacerbate
social biases. To address such a challenge, we propose the first unified
framework of detoxifying and debiasing called UDDIA, which jointly formalizes
these two problems as rectifying the output space. We theoretically interpret
our framework as learning a text distribution mixing weighted attributes.
Besides, UDDIA conducts adaptive optimization of only a few parameters during
decoding based on a parameter-efficient tuning schema without any training
data. This leads to minimal generation quality loss and improved rectification
performance with acceptable computational cost. Experimental results
demonstrate that compared to several strong baselines, UDDIA achieves debiasing
and detoxifying simultaneously and better balances efficiency and
effectiveness, taking a further step towards practical ethical NLG.
Related papers
- Language Rectified Flow: Advancing Diffusion Language Generation with Probabilistic Flows [53.31856123113228]
This paper proposes Language Rectified Flow (ours)
Our method is based on the reformulation of the standard probabilistic flow models.
Experiments and ablation studies demonstrate that our method can be general, effective, and beneficial for many NLP tasks.
arXiv Detail & Related papers (2024-03-25T17:58:22Z) - Fine-Grained Detoxification via Instance-Level Prefixes for Large
Language Models [26.474136481185724]
Fine-grained detoxification via instance-level prefixes (FGDILP) to mitigate toxic text without additional cost.
FGDILP contrasts the contextualized representation in attention space using a positive prefix-prepended prompt.
We validate that FGDILP enables controlled text generation with regard to toxicity at both the utterance and context levels.
arXiv Detail & Related papers (2024-02-23T09:04:48Z) - Contrastive Perplexity for Controlled Generation: An Application in
Detoxifying Large Language Models [25.212449683397647]
This paper studies the integration of a contrastive learning objective for fine-tuning LLMs for implicit knowledge editing and controlled text generation.
To facilitate training the model in a self-supervised fashion, we leverage an off-the-shelf LLM for training data generation.
arXiv Detail & Related papers (2024-01-16T16:49:39Z) - An Empirical Analysis of Parameter-Efficient Methods for Debiasing
Pre-Trained Language Models [55.14405248920852]
We conduct experiments with prefix tuning, prompt tuning, and adapter tuning on different language models and bias types to evaluate their debiasing performance.
We find that the parameter-efficient methods are effective in mitigating gender bias, where adapter tuning is consistently the most effective.
We also find that prompt tuning is more suitable for GPT-2 than BERT, and racial and religious bias is less effective when it comes to racial and religious bias.
arXiv Detail & Related papers (2023-06-06T23:56:18Z) - A Pretrainer's Guide to Training Data: Measuring the Effects of Data
Age, Domain Coverage, Quality, & Toxicity [84.6421260559093]
This study is the largest set of experiments to validate, quantify, and expose undocumented intuitions about text pretraining.
Our findings indicate there does not exist a one-size-fits-all solution to filtering training data.
arXiv Detail & Related papers (2023-05-22T15:57:53Z) - Improving Pre-trained Language Model Fine-tuning with Noise Stability
Regularization [94.4409074435894]
We propose a novel and effective fine-tuning framework, named Layerwise Noise Stability Regularization (LNSR)
Specifically, we propose to inject the standard Gaussian noise and regularize hidden representations of the fine-tuned model.
We demonstrate the advantages of the proposed method over other state-of-the-art algorithms including L2-SP, Mixout and SMART.
arXiv Detail & Related papers (2022-06-12T04:42:49Z) - NoiER: An Approach for Training more Reliable Fine-TunedDownstream Task
Models [54.184609286094044]
We propose noise entropy regularisation (NoiER) as an efficient learning paradigm that solves the problem without auxiliary models and additional data.
The proposed approach improved traditional OOD detection evaluation metrics by 55% on average compared to the original fine-tuned models.
arXiv Detail & Related papers (2021-08-29T06:58:28Z) - Understanding and Improving Lexical Choice in Non-Autoregressive
Translation [98.11249019844281]
We propose to expose the raw data to NAT models to restore the useful information of low-frequency words.
Our approach pushes the SOTA NAT performance on the WMT14 English-German and WMT16 Romanian-English datasets up to 27.8 and 33.8 BLEU points, respectively.
arXiv Detail & Related papers (2020-12-29T03:18:50Z) - Text Generation by Learning from Demonstrations [17.549815256968877]
Current approaches to text generation largely rely on autoregressive models and maximum likelihood estimation.
We propose GOLD: an easy-to-optimize algorithm that learns from expert demonstrations by importance weighting.
According to both automatic and human evaluation, models trained by GOLD outperform those trained by MLE and policy gradient.
arXiv Detail & Related papers (2020-09-16T17:58:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.