David helps Goliath: Inference-Time Collaboration Between Small
Specialized and Large General Diffusion LMs
- URL: http://arxiv.org/abs/2305.14771v2
- Date: Wed, 14 Feb 2024 17:45:41 GMT
- Title: David helps Goliath: Inference-Time Collaboration Between Small
Specialized and Large General Diffusion LMs
- Authors: Xiaochuang Han, Sachin Kumar, Yulia Tsvetkov, Marjan Ghazvininejad
- Abstract summary: Diffusion-based language models are emerging as a promising alternative to autoregressive LMs.
We propose methods to scale a recently proposed diffusion model SSD-LM from 0.4B to 13B parameters.
We show that SSD-2 facilitates novel ensembles with 100x smaller models that can be customized and deployed by individual users.
- Score: 49.822063966687175
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Diffusion-based language models are emerging as a promising alternative to
autoregressive LMs: they approach the competence of autoregressive LMs while
offering nuanced controllability at inference time. While autoregressive LMs
have benefited immensely from scaling and instruction-based learning, existing
studies of diffusion LMs have been conducted on a smaller scale. Starting with
a recently proposed diffusion model SSD-LM, in this work we first explore
methods to scale it from 0.4B to 13B parameters, proposing techniques to
improve its training and inference efficiency, and to finetune the model to
follow instructions. Armed with a more powerful, general purpose diffusion LM,
we introduce the primary contribution of this work -- SSD-2 -- an approach to
easily ensemble at inference time a large general-purpose diffusion LM with
smaller, but specialized and contextualized diffusion LMs. We show that SSD-2
facilitates novel ensembles with 100x smaller models that can be customized and
deployed by individual users. We find that compared to autoregressive models,
the collaboration between diffusion LMs is more effective, leading to
higher-quality model responses due to their ability to dynamically incorporate
bi-directional contexts.
Related papers
- Scaling Diffusion Language Models via Adaptation from Autoregressive Models [105.70889434492143]
Diffusion Language Models (DLMs) have emerged as a promising new paradigm for text generative modeling.
We show that we can convert AR models ranging from 127M to 7B parameters into diffusion models DiffuGPT and DiffuLLaMA, using less than 200B tokens for training.
Our experimental results reveal that these models outperform earlier DLMs and are competitive with their AR counterparts.
arXiv Detail & Related papers (2024-10-23T14:04:22Z) - A Multi-LLM Debiasing Framework [85.17156744155915]
Large Language Models (LLMs) are powerful tools with the potential to benefit society immensely, yet, they have demonstrated biases that perpetuate societal inequalities.
Recent research has shown a growing interest in multi-LLM approaches, which have been demonstrated to be effective in improving the quality of reasoning.
We propose a novel multi-LLM debiasing framework aimed at reducing bias in LLMs.
arXiv Detail & Related papers (2024-09-20T20:24:50Z) - Table-to-Text Generation with Pretrained Diffusion Models [0.0]
Diffusion models have demonstrated significant potential in achieving state-of-the-art performance across various text generation tasks.
We investigate their application to the table-to-text problem by adapting the diffusion model to the task and conducting an in-depth analysis.
Our findings reveal that diffusion models achieve comparable results in the table-to-text domain.
arXiv Detail & Related papers (2024-09-10T15:36:53Z) - Self-Refine Instruction-Tuning for Aligning Reasoning in Language Models [0.8133739801185272]
The alignments of reasoning abilities between smaller and larger Language Models are largely conducted via Supervised Fine-Tuning (SFT)
We propose the Self-refine Instruction-tuning method that elicits Smaller Language Models to self-refine their abilities.
Results obtained on commonsense and math reasoning tasks show that this approach significantly outperforms Instruction-tuning in both in-domain and out-domain scenarios.
arXiv Detail & Related papers (2024-05-01T09:10:27Z) - Weak-to-Strong Extrapolation Expedites Alignment [135.12769233630362]
We propose a method called ExPO to boost models' alignment with human preference.
We demonstrate that ExPO consistently improves off-the-shelf DPO/RLHF models.
We shed light on the essence of ExPO amplifying the reward signal learned during alignment training.
arXiv Detail & Related papers (2024-04-25T17:39:50Z) - Mixed Preference Optimization: Reinforcement Learning with Data Selection and Better Reference Model [3.300814846990438]
Large Language Models (LLMs) have become increasingly popular due to their ability to process and generate natural language.
As they are trained on massive datasets of text, LLMs can inherit harmful biases and produce outputs that are not aligned with human values.
This paper studies two main approaches to LLM alignment: Reinforcement Learning with Human Feedback (RLHF) and contrastive learning-based methods like Direct Preference Optimization (DPO)
By analyzing the stability and robustness of RLHF and DPO, we propose MPO, a novel method that mitigates the weaknesses of both approaches.
arXiv Detail & Related papers (2024-03-28T14:15:10Z) - Guided Diffusion from Self-Supervised Diffusion Features [49.78673164423208]
Guidance serves as a key concept in diffusion models, yet its effectiveness is often limited by the need for extra data annotation or pretraining.
We propose a framework to extract guidance from, and specifically for, diffusion models.
arXiv Detail & Related papers (2023-12-14T11:19:11Z) - Diff-Instruct: A Universal Approach for Transferring Knowledge From
Pre-trained Diffusion Models [77.83923746319498]
We propose a framework called Diff-Instruct to instruct the training of arbitrary generative models.
We show that Diff-Instruct results in state-of-the-art single-step diffusion-based models.
Experiments on refining GAN models show that the Diff-Instruct can consistently improve the pre-trained generators of GAN models.
arXiv Detail & Related papers (2023-05-29T04:22:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.