Related papers: David helps Goliath: Inference-Time Collaboration Between Small Specialized and Large General Diffusion LMs

David helps Goliath: Inference-Time Collaboration Between Small Specialized and Large General Diffusion LMs

URL: http://arxiv.org/abs/2305.14771v2
Date: Wed, 14 Feb 2024 17:45:41 GMT
Title: David helps Goliath: Inference-Time Collaboration Between Small Specialized and Large General Diffusion LMs
Authors: Xiaochuang Han, Sachin Kumar, Yulia Tsvetkov, Marjan Ghazvininejad
Abstract summary: Diffusion-based language models are emerging as a promising alternative to autoregressive LMs. We propose methods to scale a recently proposed diffusion model SSD-LM from 0.4B to 13B parameters. We show that SSD-2 facilitates novel ensembles with 100x smaller models that can be customized and deployed by individual users.
Score: 49.822063966687175
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Diffusion-based language models are emerging as a promising alternative to autoregressive LMs: they approach the competence of autoregressive LMs while offering nuanced controllability at inference time. While autoregressive LMs have benefited immensely from scaling and instruction-based learning, existing studies of diffusion LMs have been conducted on a smaller scale. Starting with a recently proposed diffusion model SSD-LM, in this work we first explore methods to scale it from 0.4B to 13B parameters, proposing techniques to improve its training and inference efficiency, and to finetune the model to follow instructions. Armed with a more powerful, general purpose diffusion LM, we introduce the primary contribution of this work -- SSD-2 -- an approach to easily ensemble at inference time a large general-purpose diffusion LM with smaller, but specialized and contextualized diffusion LMs. We show that SSD-2 facilitates novel ensembles with 100x smaller models that can be customized and deployed by individual users. We find that compared to autoregressive models, the collaboration between diffusion LMs is more effective, leading to higher-quality model responses due to their ability to dynamically incorporate bi-directional contexts.

Related papers

d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning [31.531278643184656]
Recent large language models (LLMs) have demonstrated strong reasoning capabilities that benefits from online reinforcement learning (RL) We propose d1, a framework to adapt pre-trained dLLMs into reasoning models via a combination of supervised finetuning (SFT) and RL. We find that d1 yields the best performance and significantly improves performance of a state-of-the-art dLLM.
arXiv Detail & Related papers (2025-04-16T16:08:45Z)
TESS 2: A Large-Scale Generalist Diffusion Language Model [24.91689676432666]
TESS 2 is an instruction-following diffusion language model that outperforms contemporary instruction-tuned diffusion models. We find that adaptation training as well as the choice of the base model is crucial for training good instruction-following diffusion models. We propose reward guidance, a novel and modular inference-time guidance procedure to align model outputs without needing to train the underlying model.
arXiv Detail & Related papers (2025-02-19T17:50:31Z)
Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search [57.28671084993782]
Large language models (LLMs) have demonstrated remarkable reasoning capabilities across diverse domains. Recent studies have shown that increasing test-time computation enhances LLMs' reasoning capabilities. We propose a two-stage training paradigm: 1) a small-scale format tuning stage to internalize the COAT reasoning format and 2) a large-scale self-improvement stage leveraging reinforcement learning.
arXiv Detail & Related papers (2025-02-04T17:26:58Z)
Scaling Diffusion Language Models via Adaptation from Autoregressive Models [105.70889434492143]
Diffusion Language Models (DLMs) have emerged as a promising new paradigm for text generative modeling. We show that we can convert AR models ranging from 127M to 7B parameters into diffusion models DiffuGPT and DiffuLLaMA, using less than 200B tokens for training. Our experimental results reveal that these models outperform earlier DLMs and are competitive with their AR counterparts.
arXiv Detail & Related papers (2024-10-23T14:04:22Z)
A Multi-LLM Debiasing Framework [85.17156744155915]
Large Language Models (LLMs) are powerful tools with the potential to benefit society immensely, yet, they have demonstrated biases that perpetuate societal inequalities. Recent research has shown a growing interest in multi-LLM approaches, which have been demonstrated to be effective in improving the quality of reasoning. We propose a novel multi-LLM debiasing framework aimed at reducing bias in LLMs.
arXiv Detail & Related papers (2024-09-20T20:24:50Z)
Table-to-Text Generation with Pretrained Diffusion Models [0.0]
Diffusion models have demonstrated significant potential in achieving state-of-the-art performance across various text generation tasks. We investigate their application to the table-to-text problem by adapting the diffusion model to the task and conducting an in-depth analysis. Our findings reveal that diffusion models achieve comparable results in the table-to-text domain.
arXiv Detail & Related papers (2024-09-10T15:36:53Z)
Self-Refine Instruction-Tuning for Aligning Reasoning in Language Models [0.8133739801185272]
The alignments of reasoning abilities between smaller and larger Language Models are largely conducted via Supervised Fine-Tuning (SFT) We propose the Self-refine Instruction-tuning method that elicits Smaller Language Models to self-refine their abilities. Results obtained on commonsense and math reasoning tasks show that this approach significantly outperforms Instruction-tuning in both in-domain and out-domain scenarios.
arXiv Detail & Related papers (2024-05-01T09:10:27Z)
Weak-to-Strong Extrapolation Expedites Alignment [135.12769233630362]
We propose a method called ExPO to boost models' alignment with human preference. We demonstrate that ExPO consistently improves off-the-shelf DPO/RLHF models. We shed light on the essence of ExPO amplifying the reward signal learned during alignment training.
arXiv Detail & Related papers (2024-04-25T17:39:50Z)
Mixed Preference Optimization: Reinforcement Learning with Data Selection and Better Reference Model [3.300814846990438]
Large Language Models (LLMs) have become increasingly popular due to their ability to process and generate natural language. As they are trained on massive datasets of text, LLMs can inherit harmful biases and produce outputs that are not aligned with human values. This paper studies two main approaches to LLM alignment: Reinforcement Learning with Human Feedback (RLHF) and contrastive learning-based methods like Direct Preference Optimization (DPO) By analyzing the stability and robustness of RLHF and DPO, we propose MPO, a novel method that mitigates the weaknesses of both approaches.
arXiv Detail & Related papers (2024-03-28T14:15:10Z)
Guided Diffusion from Self-Supervised Diffusion Features [49.78673164423208]
Guidance serves as a key concept in diffusion models, yet its effectiveness is often limited by the need for extra data annotation or pretraining. We propose a framework to extract guidance from, and specifically for, diffusion models.
arXiv Detail & Related papers (2023-12-14T11:19:11Z)
Diff-Instruct: A Universal Approach for Transferring Knowledge From Pre-trained Diffusion Models [77.83923746319498]
We propose a framework called Diff-Instruct to instruct the training of arbitrary generative models. We show that Diff-Instruct results in state-of-the-art single-step diffusion-based models. Experiments on refining GAN models show that the Diff-Instruct can consistently improve the pre-trained generators of GAN models.
arXiv Detail & Related papers (2023-05-29T04:22:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.