David helps Goliath: Inference-Time Collaboration Between Small
Specialized and Large General Diffusion LMs
- URL: http://arxiv.org/abs/2305.14771v2
- Date: Wed, 14 Feb 2024 17:45:41 GMT
- Title: David helps Goliath: Inference-Time Collaboration Between Small
Specialized and Large General Diffusion LMs
- Authors: Xiaochuang Han, Sachin Kumar, Yulia Tsvetkov, Marjan Ghazvininejad
- Abstract summary: Diffusion-based language models are emerging as a promising alternative to autoregressive LMs.
We propose methods to scale a recently proposed diffusion model SSD-LM from 0.4B to 13B parameters.
We show that SSD-2 facilitates novel ensembles with 100x smaller models that can be customized and deployed by individual users.
- Score: 49.822063966687175
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Diffusion-based language models are emerging as a promising alternative to
autoregressive LMs: they approach the competence of autoregressive LMs while
offering nuanced controllability at inference time. While autoregressive LMs
have benefited immensely from scaling and instruction-based learning, existing
studies of diffusion LMs have been conducted on a smaller scale. Starting with
a recently proposed diffusion model SSD-LM, in this work we first explore
methods to scale it from 0.4B to 13B parameters, proposing techniques to
improve its training and inference efficiency, and to finetune the model to
follow instructions. Armed with a more powerful, general purpose diffusion LM,
we introduce the primary contribution of this work -- SSD-2 -- an approach to
easily ensemble at inference time a large general-purpose diffusion LM with
smaller, but specialized and contextualized diffusion LMs. We show that SSD-2
facilitates novel ensembles with 100x smaller models that can be customized and
deployed by individual users. We find that compared to autoregressive models,
the collaboration between diffusion LMs is more effective, leading to
higher-quality model responses due to their ability to dynamically incorporate
bi-directional contexts.
Related papers
- TESS 2: A Large-Scale Generalist Diffusion Language Model [24.91689676432666]
TESS 2 is an instruction-following diffusion language model that outperforms contemporary instruction-tuned diffusion models.
We find that adaptation training as well as the choice of the base model is crucial for training good instruction-following diffusion models.
We propose reward guidance, a novel and modular inference-time guidance procedure to align model outputs without needing to train the underlying model.
arXiv Detail & Related papers (2025-02-19T17:50:31Z) - Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search [57.28671084993782]
Large language models (LLMs) have demonstrated remarkable reasoning capabilities across diverse domains.
Recent studies have shown that increasing test-time computation enhances LLMs' reasoning capabilities.
We propose a two-stage training paradigm: 1) a small-scale format tuning stage to internalize the COAT reasoning format and 2) a large-scale self-improvement stage leveraging reinforcement learning.
arXiv Detail & Related papers (2025-02-04T17:26:58Z) - Scaling Diffusion Language Models via Adaptation from Autoregressive Models [105.70889434492143]
Diffusion Language Models (DLMs) have emerged as a promising new paradigm for text generative modeling.
We show that we can convert AR models ranging from 127M to 7B parameters into diffusion models DiffuGPT and DiffuLLaMA, using less than 200B tokens for training.
Our experimental results reveal that these models outperform earlier DLMs and are competitive with their AR counterparts.
arXiv Detail & Related papers (2024-10-23T14:04:22Z) - A Multi-LLM Debiasing Framework [85.17156744155915]
Large Language Models (LLMs) are powerful tools with the potential to benefit society immensely, yet, they have demonstrated biases that perpetuate societal inequalities.
Recent research has shown a growing interest in multi-LLM approaches, which have been demonstrated to be effective in improving the quality of reasoning.
We propose a novel multi-LLM debiasing framework aimed at reducing bias in LLMs.
arXiv Detail & Related papers (2024-09-20T20:24:50Z) - Mixed Preference Optimization: Reinforcement Learning with Data Selection and Better Reference Model [3.300814846990438]
Large Language Models (LLMs) have become increasingly popular due to their ability to process and generate natural language.
As they are trained on massive datasets of text, LLMs can inherit harmful biases and produce outputs that are not aligned with human values.
This paper studies two main approaches to LLM alignment: Reinforcement Learning with Human Feedback (RLHF) and contrastive learning-based methods like Direct Preference Optimization (DPO)
By analyzing the stability and robustness of RLHF and DPO, we propose MPO, a novel method that mitigates the weaknesses of both approaches.
arXiv Detail & Related papers (2024-03-28T14:15:10Z) - Guided Diffusion from Self-Supervised Diffusion Features [49.78673164423208]
Guidance serves as a key concept in diffusion models, yet its effectiveness is often limited by the need for extra data annotation or pretraining.
We propose a framework to extract guidance from, and specifically for, diffusion models.
arXiv Detail & Related papers (2023-12-14T11:19:11Z) - Diff-Instruct: A Universal Approach for Transferring Knowledge From
Pre-trained Diffusion Models [77.83923746319498]
We propose a framework called Diff-Instruct to instruct the training of arbitrary generative models.
We show that Diff-Instruct results in state-of-the-art single-step diffusion-based models.
Experiments on refining GAN models show that the Diff-Instruct can consistently improve the pre-trained generators of GAN models.
arXiv Detail & Related papers (2023-05-29T04:22:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.