Large Language Models to Diffusion Finetuning
- URL: http://arxiv.org/abs/2501.15781v1
- Date: Mon, 27 Jan 2025 04:59:29 GMT
- Title: Large Language Models to Diffusion Finetuning
- Authors: Edoardo Cetin, Tianyu Zhao, Yujin Tang,
- Abstract summary: We show our finetuned models achieve monotonically increasing accuracy, directly translating to improved performance across downstream tasks.<n>Our method is universally applicable to any foundation model pre-trained with a cross-entropy loss.
- Score: 20.251827725749607
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a new finetuning method to provide pre-trained large language models (LMs) the ability to scale test-time compute through the diffusion framework. By increasing the number of diffusion steps, we show our finetuned models achieve monotonically increasing accuracy, directly translating to improved performance across downstream tasks. Furthermore, our finetuned models can expertly answer questions on specific topics by integrating powerful guidance techniques, and autonomously determine the compute required for a given problem by leveraging adaptive ODE solvers. Our method is universally applicable to any foundation model pre-trained with a cross-entropy loss and does not modify any of its original weights, fully preserving its strong single-step generation capabilities. We show our method is more effective and fully compatible with traditional finetuning approaches, introducing an orthogonal new direction to unify the strengths of the autoregressive and diffusion frameworks.
Related papers
- Generalized Interpolating Discrete Diffusion [65.74168524007484]
Masked diffusion is a popular choice due to its simplicity and effectiveness.
We derive the theoretical backbone of a family of general interpolating discrete diffusion processes.
Exploiting GIDD's flexibility, we explore a hybrid approach combining masking and uniform noise.
arXiv Detail & Related papers (2025-03-06T14:30:55Z) - Merging Models on the Fly Without Retraining: A Sequential Approach to Scalable Continual Model Merging [75.93960998357812]
Deep model merging represents an emerging research direction that combines multiple fine-tuned models to harness their capabilities across different tasks and domains.
Current model merging techniques focus on merging all available models simultaneously, with weight matrices-based methods being the predominant approaches.
We propose a training-free projection-based continual merging method that processes models sequentially.
arXiv Detail & Related papers (2025-01-16T13:17:24Z) - Stochastic Control for Fine-tuning Diffusion Models: Optimality, Regularity, and Convergence [11.400431211239958]
Diffusion models have emerged as powerful tools for generative modeling.<n>We propose a control framework for fine-tuning diffusion models.<n>We show that PI-FT achieves global convergence at a linear rate.
arXiv Detail & Related papers (2024-12-24T04:55:46Z) - SaRA: High-Efficient Diffusion Model Fine-tuning with Progressive Sparse Low-Rank Adaptation [52.6922833948127]
In this work, we investigate the importance of parameters in pre-trained diffusion models.
We propose a novel model fine-tuning method to make full use of these ineffective parameters.
Our method enhances the generative capabilities of pre-trained models in downstream applications.
arXiv Detail & Related papers (2024-09-10T16:44:47Z) - Enhancing Robustness of Vision-Language Models through Orthogonality Learning and Self-Regularization [77.62516752323207]
We introduce an orthogonal fine-tuning method for efficiently fine-tuning pretrained weights and enabling enhanced robustness and generalization.
A self-regularization strategy is further exploited to maintain the stability in terms of zero-shot generalization of VLMs, dubbed OrthSR.
For the first time, we revisit the CLIP and CoOp with our method to effectively improve the model on few-shot image classficiation scenario.
arXiv Detail & Related papers (2024-07-11T10:35:53Z) - MAST: Model-Agnostic Sparsified Training [4.962431253126472]
We introduce a novel optimization problem formulation that departs from the conventional way of minimizing machine learning model loss as a black-box function.
Unlike traditional formulations, the proposed approach explicitly incorporates an initially pre-trained model and random sketch operators.
We present several variants of the Gradient Descent (SGD) method adapted to the new problem formulation.
arXiv Detail & Related papers (2023-11-27T18:56:03Z) - Training-free Linear Image Inverses via Flows [17.291903204982326]
We propose a training-free method for solving linear inverse problems by using pretrained flow models.
Our approach requires no problem-specific tuning across an extensive suite of noisy linear inverse problems on high-dimensional datasets.
arXiv Detail & Related papers (2023-09-25T22:13:16Z) - Phasic Content Fusing Diffusion Model with Directional Distribution
Consistency for Few-Shot Model Adaption [73.98706049140098]
We propose a novel phasic content fusing few-shot diffusion model with directional distribution consistency loss.
Specifically, we design a phasic training strategy with phasic content fusion to help our model learn content and style information when t is large.
Finally, we propose a cross-domain structure guidance strategy that enhances structure consistency during domain adaptation.
arXiv Detail & Related papers (2023-09-07T14:14:11Z) - Likelihood-Based Diffusion Language Models [13.916640262862215]
We take the first steps towards closing the likelihood gap between autoregressive and diffusion-based language models.
We pursue this goal through algorithmic improvements, scaling laws, and increased compute.
We release Plaid 1B, a large diffusion language model which outperforms GPT-2 124M in likelihood on benchmark datasets.
arXiv Detail & Related papers (2023-05-30T16:43:31Z) - Exploiting Diffusion Prior for Real-World Image Super-Resolution [75.5898357277047]
We present a novel approach to leverage prior knowledge encapsulated in pre-trained text-to-image diffusion models for blind super-resolution.
By employing our time-aware encoder, we can achieve promising restoration results without altering the pre-trained synthesis model.
arXiv Detail & Related papers (2023-05-11T17:55:25Z) - Bayesian Prompt Learning for Image-Language Model Generalization [64.50204877434878]
We use the regularization ability of Bayesian methods to frame prompt learning as a variational inference problem.
Our approach regularizes the prompt space, reduces overfitting to the seen prompts and improves the prompt generalization on unseen prompts.
We demonstrate empirically on 15 benchmarks that Bayesian prompt learning provides an appropriate coverage of the prompt space.
arXiv Detail & Related papers (2022-10-05T17:05:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.