Related papers: MDNS: Masked Diffusion Neural Sampler via Stochastic Optimal Control

MDNS: Masked Diffusion Neural Sampler via Stochastic Optimal Control

URL: http://arxiv.org/abs/2508.10684v2
Date: Sat, 08 Nov 2025 03:10:03 GMT
Title: MDNS: Masked Diffusion Neural Sampler via Stochastic Optimal Control
Authors: Yuchen Zhu, Wei Guo, Jaemoo Choi, Guan-Horng Liu, Yongxin Chen, Molei Tao,
Abstract summary: We study the problem of learning a neural sampler to generate samples from discrete state spaces where the target probability mass function $piproptomathrme-U$ is known up to normalizing constant.<n>We propose $textbfM$asked $textbfDiffusion, a novel framework for discrete neural samplers by aligning measures through a family of learning objectives.
Score: 48.504188275208556
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We study the problem of learning a neural sampler to generate samples from discrete state spaces where the target probability mass function $\pi\propto\mathrm{e}^{-U}$ is known up to a normalizing constant, which is an important task in fields such as statistical physics, machine learning, combinatorial optimization, etc. To better address this challenging task when the state space has a large cardinality and the distribution is multi-modal, we propose $\textbf{M}$asked $\textbf{D}$iffusion $\textbf{N}$eural $\textbf{S}$ampler ($\textbf{MDNS}$), a novel framework for training discrete neural samplers by aligning two path measures through a family of learning objectives, theoretically grounded in the stochastic optimal control of the continuous-time Markov chains. We validate the efficiency and scalability of MDNS through extensive experiments on various distributions with distinct statistical properties, where MDNS learns to accurately sample from the target distributions despite the extremely high problem dimensions and outperforms other learning-based baselines by a large margin. A comprehensive study of ablations and extensions is also provided to demonstrate the efficacy and potential of the proposed framework. Our code is available at https://github.com/yuchen-zhu-zyc/MDNS.

Related papers

Proximal Diffusion Neural Sampler [45.335824474545795]
We propose a framework named textbfProximal Diffusion Neural Sampler (PDNS) that tackles the optimal control problem via proximal point method on the space of path measures.<n>PDNS decomposes the learning process into a series of simpler subproblems that create a path gradually approaching the desired distribution.<n>We demonstrate the effectiveness and robustness of PDNS through extensive experiments on both continuous and discrete sampling tasks.
arXiv Detail & Related papers (2025-10-04T14:44:47Z)
MaskPro: Linear-Space Probabilistic Learning for Strict (N:M)-Sparsity on Large Language Models [53.36415620647177]
Semi-structured sparsity offers a promising solution by strategically retaining $N$ elements out of every $M$ weights.<n>Existing (N:M)-compatible approaches typically fall into two categories: rule-based layerwise greedy search, which suffers from considerable errors, and gradient-driven learning, which incurs prohibitive training costs.<n>We propose a novel linear-space probabilistic framework named MaskPro, which aims to learn a prior categorical distribution for every $M$ consecutive weights and subsequently leverages this distribution to generate the (N:M)-sparsity throughout an $N$-way sampling
arXiv Detail & Related papers (2025-06-15T15:02:59Z)
Generative Diffusion Models for Resource Allocation in Wireless Networks [77.36145730415045]
We train a policy to imitate an expert and generate new samples from the optimal distribution.<n>We achieve near-optimal performance through the sequential execution of the generated samples.<n>We present numerical results in a case study of power control.
arXiv Detail & Related papers (2025-04-28T21:44:31Z)
Amortized Bayesian Multilevel Models [9.831471158899644]
Multilevel models (MLMs) are a central building block of the Bayesian workflow.<n>MLMs pose significant computational challenges, often rendering their estimation and evaluation intractable within reasonable time constraints.<n>Recent advances in simulation-based inference offer promising solutions for addressing complex probabilistic models using deep generative networks.<n>We explore a family of neural network architectures that leverage the probabilistic factorization of multilevel models to facilitate efficient neural network training and subsequent near-instant posterior inference on unseen datasets.
arXiv Detail & Related papers (2024-08-23T17:11:04Z)
Fast, Distribution-free Predictive Inference for Neural Networks with Coverage Guarantees [25.798057062452443]
This paper introduces a novel, computationally-efficient algorithm for predictive inference (PI) It requires no distributional assumptions on the data and can be computed faster than existing bootstrap-type methods for neural networks.
arXiv Detail & Related papers (2023-06-11T04:03:58Z)
Sample Efficient Reinforcement Learning in Mixed Systems through Augmented Samples and Its Applications to Queueing Networks [22.20726152012564]
This paper considers a class of reinforcement learning problems involving systems with two types of states: and pseudo-stochastic. We propose a sample efficient method that accelerates learning by generating augmented data samples.
arXiv Detail & Related papers (2023-05-25T21:29:11Z)
Efficient Model-Based Multi-Agent Mean-Field Reinforcement Learning [89.31889875864599]
We propose an efficient model-based reinforcement learning algorithm for learning in multi-agent systems. Our main theoretical contributions are the first general regret bounds for model-based reinforcement learning for MFC. We provide a practical parametrization of the core optimization problem.
arXiv Detail & Related papers (2021-07-08T18:01:02Z)
Coded Stochastic ADMM for Decentralized Consensus Optimization with Edge Computing [113.52575069030192]
Big data, including applications with high security requirements, are often collected and stored on multiple heterogeneous devices, such as mobile devices, drones and vehicles. Due to the limitations of communication costs and security requirements, it is of paramount importance to extract information in a decentralized manner instead of aggregating data to a fusion center. We consider the problem of learning model parameters in a multi-agent system with data locally processed via distributed edge nodes. A class of mini-batch alternating direction method of multipliers (ADMM) algorithms is explored to develop the distributed learning model.
arXiv Detail & Related papers (2020-10-02T10:41:59Z)
Deep Networks and the Multiple Manifold Problem [15.144495799445824]
We study the multiple manifold problem, a binary classification task modeled on applications in machine vision, in which a deep fully-connected neural network is trained to separate two low-dimensional submanifolds of the unit sphere. We prove for a simple manifold configuration that when the network depth $L$ is large relative to certain geometric and statistical properties of the data, the network width grows as a sufficiently large in $L$. Our analysis demonstrates concrete benefits of depth and width in the context of a practically-motivated model problem.
arXiv Detail & Related papers (2020-08-25T19:20:00Z)
Towards Deep Learning Models Resistant to Large Perturbations [0.0]
Adversarial robustness has proven to be a required property of machine learning algorithms. We show that the well-established algorithm called "adversarial training" fails to train a deep neural network given a large, but reasonable, perturbation magnitude.
arXiv Detail & Related papers (2020-03-30T12:03:09Z)
Diversity inducing Information Bottleneck in Model Ensembles [73.80615604822435]
In this paper, we target the problem of generating effective ensembles of neural networks by encouraging diversity in prediction. We explicitly optimize a diversity inducing adversarial loss for learning latent variables and thereby obtain diversity in the output predictions necessary for modeling multi-modal data. Compared to the most competitive baselines, we show significant improvements in classification accuracy, under a shift in the data distribution.
arXiv Detail & Related papers (2020-03-10T03:10:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.