Attribute Controlled Fine-tuning for Large Language Models: A Case Study on Detoxification
- URL: http://arxiv.org/abs/2410.05559v1
- Date: Mon, 7 Oct 2024 23:38:58 GMT
- Title: Attribute Controlled Fine-tuning for Large Language Models: A Case Study on Detoxification
- Authors: Tao Meng, Ninareh Mehrabi, Palash Goyal, Anil Ramakrishna, Aram Galstyan, Richard Zemel, Kai-Wei Chang, Rahul Gupta, Charith Peris,
- Abstract summary: We propose a constraint learning schema for fine-tuning Large Language Models (LLMs) with attribute control.
We show that our approach leads to an LLM that produces fewer inappropriate responses while achieving competitive performance on benchmarks and a toxicity detection task.
- Score: 76.14641982122696
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose a constraint learning schema for fine-tuning Large Language Models (LLMs) with attribute control. Given a training corpus and control criteria formulated as a sequence-level constraint on model outputs, our method fine-tunes the LLM on the training corpus while enhancing constraint satisfaction with minimal impact on its utility and generation quality. Specifically, our approach regularizes the LLM training by penalizing the KL divergence between the desired output distribution, which satisfies the constraints, and the LLM's posterior. This regularization term can be approximated by an auxiliary model trained to decompose the sequence-level constraints into token-level guidance, allowing the term to be measured by a closed-form formulation. To further improve efficiency, we design a parallel scheme for concurrently updating both the LLM and the auxiliary model. We evaluate the empirical performance of our approach by controlling the toxicity when training an LLM. We show that our approach leads to an LLM that produces fewer inappropriate responses while achieving competitive performance on benchmarks and a toxicity detection task.
Related papers
- RLEF: Grounding Code LLMs in Execution Feedback with Reinforcement Learning [35.446870721902904]
Large language models (LLMs) deployed as agents solve user-specified tasks over multiple steps while keeping the required manual engagement to a minimum.
We propose an end-to-end reinforcement learning method for teaching models to leverage execution feedback in the realm of code synthesis.
arXiv Detail & Related papers (2024-10-02T23:25:17Z) - Directed Exploration in Reinforcement Learning from Linear Temporal Logic [59.707408697394534]
Linear temporal logic (LTL) is a powerful language for task specification in reinforcement learning.
We show that the synthesized reward signal remains fundamentally sparse, making exploration challenging.
We show how better exploration can be achieved by further leveraging the specification and casting its corresponding Limit Deterministic B"uchi Automaton (LDBA) as a Markov reward process.
arXiv Detail & Related papers (2024-08-18T14:25:44Z) - CoMMIT: Coordinated Instruction Tuning for Multimodal Large Language Models [68.64605538559312]
In this paper, we analyze the MLLM instruction tuning from both theoretical and empirical perspectives.
Inspired by our findings, we propose a measurement to quantitatively evaluate the learning balance.
In addition, we introduce an auxiliary loss regularization method to promote updating of the generation distribution of MLLMs.
arXiv Detail & Related papers (2024-07-29T23:18:55Z) - Q*: Improving Multi-step Reasoning for LLMs with Deliberative Planning [53.6472920229013]
Large Language Models (LLMs) have demonstrated impressive capability in many natural language tasks.
LLMs are prone to produce errors, hallucinations and inconsistent statements when performing multi-step reasoning.
We introduce Q*, a framework for guiding LLMs decoding process with deliberative planning.
arXiv Detail & Related papers (2024-06-20T13:08:09Z) - AdpQ: A Zero-shot Calibration Free Adaptive Post Training Quantization Method for LLMs [22.25748046511075]
AdpQ is a novel zero-shot adaptive PTQ method for Large Language Models (LLMs)
It achieves the state-of-the-art performance in low-precision quantization without requiring any calibration data.
Our results achieve the same accuracy as the existing methods on various LLM benchmarks while the quantization time is reduced by at least 10x.
arXiv Detail & Related papers (2024-05-22T05:32:11Z) - Entropy-Regularized Token-Level Policy Optimization for Language Agent Reinforcement [67.1393112206885]
Large Language Models (LLMs) have shown promise as intelligent agents in interactive decision-making tasks.
We introduce Entropy-Regularized Token-level Policy Optimization (ETPO), an entropy-augmented RL method tailored for optimizing LLMs at the token level.
We assess the effectiveness of ETPO within a simulated environment that models data science code generation as a series of multi-step interactive tasks.
arXiv Detail & Related papers (2024-02-09T07:45:26Z) - Amortizing intractable inference in large language models [56.92471123778389]
We use amortized Bayesian inference to sample from intractable posterior distributions.
We empirically demonstrate that this distribution-matching paradigm of LLM fine-tuning can serve as an effective alternative to maximum-likelihood training.
As an important application, we interpret chain-of-thought reasoning as a latent variable modeling problem.
arXiv Detail & Related papers (2023-10-06T16:36:08Z) - Constrained Model-based Reinforcement Learning with Robust Cross-Entropy
Method [30.407700996710023]
This paper studies the constrained/safe reinforcement learning problem with sparse indicator signals for constraint violations.
We employ the neural network ensemble model to estimate the prediction uncertainty and use model predictive control as the basic control framework.
The results show that our approach learns to complete the tasks with a much smaller number of constraint violations than state-of-the-art baselines.
arXiv Detail & Related papers (2020-10-15T18:19:35Z) - Teaching the Old Dog New Tricks: Supervised Learning with Constraints [18.88930622054883]
Adding constraint support in Machine Learning has the potential to address outstanding issues in data-driven AI systems.
Existing approaches typically apply constrained optimization techniques to ML training, enforce constraint satisfaction by adjusting the model design, or use constraints to correct the output.
Here, we investigate a different, complementary, strategy based on "teaching" constraint satisfaction to a supervised ML method via the direct use of a state-of-the-art constraint solver.
arXiv Detail & Related papers (2020-02-25T09:47:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.