Related papers: Contrastive Perplexity for Controlled Generation: An Application in Detoxifying Large Language Models

Contrastive Perplexity for Controlled Generation: An Application in Detoxifying Large Language Models

URL: http://arxiv.org/abs/2401.08491v2
Date: Wed, 24 Jan 2024 23:04:02 GMT
Title: Contrastive Perplexity for Controlled Generation: An Application in Detoxifying Large Language Models
Authors: Tassilo Klein, Moin Nabi
Abstract summary: This paper studies the integration of a contrastive learning objective for fine-tuning LLMs for implicit knowledge editing and controlled text generation. To facilitate training the model in a self-supervised fashion, we leverage an off-the-shelf LLM for training data generation.
Score: 25.212449683397647
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The generation of undesirable and factually incorrect content of large language models poses a significant challenge and remains largely an unsolved issue. This paper studies the integration of a contrastive learning objective for fine-tuning LLMs for implicit knowledge editing and controlled text generation. Optimizing the training objective entails aligning text perplexities in a contrastive fashion. To facilitate training the model in a self-supervised fashion, we leverage an off-the-shelf LLM for training data generation. We showcase applicability in the domain of detoxification. Herein, the proposed approach leads to a significant decrease in the generation of toxic content while preserving general utility for downstream tasks such as commonsense reasoning and reading comprehension. The proposed approach is conceptually simple but empirically powerful.

Related papers

Prompting Forgetting: Unlearning in GANs via Textual Guidance [4.3562145620596215]
We propose Text-to-Unlearn, a novel framework that selectively unlearns concepts from pre-trained GANs using only text prompts. Our approach guides the unlearning process without requiring additional datasets or supervised fine-tuning. To our knowledge, Text-to-Unlearn is the first cross-modal unlearning framework for GANs.
arXiv Detail & Related papers (2025-04-01T22:18:40Z)
LLM Post-Training: A Deep Dive into Reasoning Large Language Models [131.10969986056]
Large Language Models (LLMs) have transformed the natural language processing landscape and brought to life diverse applications. Post-training methods enable LLMs to refine their knowledge, improve reasoning, enhance factual accuracy, and align more effectively with user intents and ethical considerations.
arXiv Detail & Related papers (2025-02-28T18:59:54Z)
Semantic Consistency Regularization with Large Language Models for Semi-supervised Sentiment Analysis [20.503153899462323]
We propose a framework for semi-supervised sentiment analysis. We introduce two prompting strategies to semantically enhance unlabeled text. Experiments show our method achieves remarkable performance over prior semi-supervised methods.
arXiv Detail & Related papers (2025-01-29T12:03:11Z)
Learning-to-Defer for Extractive Question Answering [3.6787328174619254]
We introduce an adapted two-stage Learning-to-Defer mechanism that enhances decision-making by enabling selective deference to human experts or larger models without retraining language models in the context of question-answering. Our results demonstrate that deferring a minimal number of queries allows the smaller model to achieve performance comparable to their larger counterparts while preserving computing efficiency.
arXiv Detail & Related papers (2024-10-21T08:21:00Z)
Large Language Models can be Strong Self-Detoxifiers [82.6594169242814]
Self-disciplined Autoregressive Sampling (SASA) is a lightweight controlled decoding algorithm for toxicity reduction of large language models (LLMs) SASA tracks the margin of the current output to steer the generation away from the toxic subspace, by adjusting the autoregressive sampling strategy. evaluated on LLMs of different scale and nature, namely Llama-3.1-Instruct (8B), Llama-2 (7B), and GPT2-L models with the RealToxicityPrompts, BOLD, and AttaQ benchmarks.
arXiv Detail & Related papers (2024-10-04T17:45:15Z)
Self-training Large Language Models through Knowledge Detection [26.831873737733737]
Large language models (LLMs) often necessitate extensive labeled datasets and training compute to achieve impressive performance across downstream tasks. This paper explores a self-training paradigm, where the LLM autonomously curates its own labels and selectively trains on unknown data samples. Empirical evaluations demonstrate significant improvements in reducing hallucination in generation across multiple subjects.
arXiv Detail & Related papers (2024-06-17T07:25:09Z)
ALMol: Aligned Language-Molecule Translation LLMs through Offline Preference Contrastive Optimisation [2.296475290901356]
We focus on machine language-molecule translation and deploy a novel training approach called contrastive preference optimisation. Our results demonstrate that our models achieve up to a 32% improvement compared to counterpart models.
arXiv Detail & Related papers (2024-05-14T13:59:24Z)
Debiasing Multimodal Large Language Models [61.6896704217147]
Large Vision-Language Models (LVLMs) have become indispensable tools in computer vision and natural language processing. Our investigation reveals a noteworthy bias in the generated content, where the output is primarily influenced by the underlying Large Language Models (LLMs) prior to the input image. To rectify these biases and redirect the model's focus toward vision information, we introduce two simple, training-free strategies.
arXiv Detail & Related papers (2024-03-08T12:35:07Z)
Self-Detoxifying Language Models via Toxification Reversal [11.238212967733165]
Language model detoxification aims to minimize the risk of generating offensive or harmful content in pretrained language models (PLMs) We propose a more lightweight approach that enables the PLM itself to achieve "self-detoxification" Our method is built upon the observation that prepending a negative steering prompt can effectively induce PLMs to generate toxic content.
arXiv Detail & Related papers (2023-10-14T12:51:38Z)
Are Large Language Models Really Robust to Word-Level Perturbations? [68.60618778027694]
We propose a novel rational evaluation approach that leverages pre-trained reward models as diagnostic tools. Longer conversations manifest the comprehensive grasp of language models in terms of their proficiency in understanding questions. Our results demonstrate that LLMs frequently exhibit vulnerability to word-level perturbations that are commonplace in daily language usage.
arXiv Detail & Related papers (2023-09-20T09:23:46Z)
Automatically Correcting Large Language Models: Surveying the landscape of diverse self-correction strategies [104.32199881187607]
Large language models (LLMs) have demonstrated remarkable performance across a wide array of NLP tasks. A promising approach to rectify these flaws is self-correction, where the LLM itself is prompted or guided to fix problems in its own output. This paper presents a comprehensive review of this emerging class of techniques.
arXiv Detail & Related papers (2023-08-06T18:38:52Z)
Large Language Models with Controllable Working Memory [64.71038763708161]
Large language models (LLMs) have led to a series of breakthroughs in natural language processing (NLP) What further sets these models apart is the massive amounts of world knowledge they internalize during pretraining. How the model's world knowledge interacts with the factual information presented in the context remains under explored.
arXiv Detail & Related papers (2022-11-09T18:58:29Z)
Unified Detoxifying and Debiasing in Language Generation via Inference-time Adaptive Optimization [32.50246008433889]
Pre-trained language models (PLMs) have prospered in various natural language generation (NLG) tasks due to their ability to generate fairly fluent text. These models are observed to capture and reproduce harmful contents in training corpora, typically toxic language and social biases, raising severe moral issues. We propose the first unified framework of detoxifying and debiasing called UDDIA, which jointly formalizes these two problems as rectifying the output space.
arXiv Detail & Related papers (2022-10-10T08:45:25Z)
A Simple but Tough-to-Beat Data Augmentation Approach for Natural Language Understanding and Generation [53.8171136907856]
We introduce a set of simple yet effective data augmentation strategies dubbed cutoff. cutoff relies on sampling consistency and thus adds little computational overhead. cutoff consistently outperforms adversarial training and achieves state-of-the-art results on the IWSLT2014 German-English dataset.
arXiv Detail & Related papers (2020-09-29T07:08:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.