SymmetricDiffusers: Learning Discrete Diffusion on Finite Symmetric Groups
- URL: http://arxiv.org/abs/2410.02942v1
- Date: Thu, 3 Oct 2024 19:37:40 GMT
- Title: SymmetricDiffusers: Learning Discrete Diffusion on Finite Symmetric Groups
- Authors: Yongxing Zhang, Donglin Yang, Renjie Liao,
- Abstract summary: We introduce a novel discrete diffusion model that simplifies the task of learning a complicated distribution over $S_n$.
Our model achieves state-of-the-art or comparable performances on solving tasks including sorting 4-digit MNIST images.
- Score: 14.925722398371498
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Finite symmetric groups $S_n$ are essential in fields such as combinatorics, physics, and chemistry. However, learning a probability distribution over $S_n$ poses significant challenges due to its intractable size and discrete nature. In this paper, we introduce SymmetricDiffusers, a novel discrete diffusion model that simplifies the task of learning a complicated distribution over $S_n$ by decomposing it into learning simpler transitions of the reverse diffusion using deep neural networks. We identify the riffle shuffle as an effective forward transition and provide empirical guidelines for selecting the diffusion length based on the theory of random walks on finite groups. Additionally, we propose a generalized Plackett-Luce (PL) distribution for the reverse transition, which is provably more expressive than the PL distribution. We further introduce a theoretically grounded "denoising schedule" to improve sampling and learning efficiency. Extensive experiments show that our model achieves state-of-the-art or comparable performances on solving tasks including sorting 4-digit MNIST images, jigsaw puzzles, and traveling salesman problems. Our code is released at https://github.com/NickZhang53/SymmetricDiffusers.
Related papers
- Learning Mixture Models via Efficient High-dimensional Sparse Fourier Transforms [13.490333761774542]
We give a $rm poly(d,k)$ time and sample algorithm for efficiently learning the parameters of a mixture of $k$ spherical distributions in $d$ dimensions.<n>Our method succeeds whenever the cluster distributions have a characteristic function with sufficiently heavy tails.
arXiv Detail & Related papers (2026-01-08T17:47:58Z) - When Scores Learn Geometry: Rate Separations under the Manifold Hypothesis [33.93481564069631]
diffusion models and inverse problems are often interpreted as learning the data distribution in the low-noise limit.<n>We argue that their success arises from implicitly learning the data manifold rather than the full distribution.<n>We show that concentration on data support can be achieved with a score error of $o(sigma-2)$, whereas recovering the specific data distribution requires a much stricter $o(1)$ error.
arXiv Detail & Related papers (2025-09-29T15:18:43Z) - On the Complexity Theory of Masked Discrete Diffusion: From $\mathrm{poly}(1/ε)$ to Nearly $ε$-Free [49.34727933066799]
Masked discrete diffusion is a flexible paradigm for text generation in which tokens are corrupted by special mask symbols before being denoised.<n>We show that Euler samplers can achieve $epsilon$-accuracy in total variation (TV) with $tildeO(d2epsilon-3/2)$ discrete score evaluations.<n>We then propose a Mask-Aware Truncated Uniformization (MATU) approach that both removes bounded-score assumptions and preserves unbiased discrete score approximation.
arXiv Detail & Related papers (2025-09-26T03:50:17Z) - MDNS: Masked Diffusion Neural Sampler via Stochastic Optimal Control [48.504188275208556]
We study the problem of learning a neural sampler to generate samples from discrete state spaces where the target probability mass function $piproptomathrme-U$ is known up to normalizing constant.<n>We propose $textbfM$asked $textbfDiffusion, a novel framework for discrete neural samplers by aligning measures through a family of learning objectives.
arXiv Detail & Related papers (2025-08-14T14:27:16Z) - A Distributional-Lifting Theorem for PAC Learning [16.985620991607345]
Distributional assumptions facilitate the design of efficient algorithms but limit their reach and relevance.<n>Recent work of Blanc, Lange, Malik, and Tan considered the special case of lifting uniform-distribution learners.<n>We show that their approach is information-theoretically intractable with access only to random examples.<n>We then take a different approach that sidesteps the need to learn $Dstar$, yielding a lifter that works in the standard PAC model.
arXiv Detail & Related papers (2025-06-19T23:28:38Z) - Generalization Dynamics of Linear Diffusion Models [8.107431208836426]
We analytically study the memorisation-to-generalisation transition in a simple model using linear denoisers.<n>Our work clarifies how sample complexity governs generalisation in a simple model of diffusion-based generative models.
arXiv Detail & Related papers (2025-05-30T16:31:58Z) - Non-asymptotic bounds for forward processes in denoising diffusions: Ornstein-Uhlenbeck is hard to beat [49.1574468325115]
This paper presents explicit non-asymptotic bounds on the forward diffusion error in total variation (TV)
We parametrise multi-modal data distributions in terms of the distance $R$ to their furthest modes and consider forward diffusions with additive and multiplicative noise.
arXiv Detail & Related papers (2024-08-25T10:28:31Z) - Discrete generative diffusion models without stochastic differential equations: a tensor network approach [1.5839621757142595]
Diffusion models (DMs) are a class of generative machine learning methods.
We show how to use networks (TNs) to efficiently define and sample such discrete models''
arXiv Detail & Related papers (2024-07-15T18:00:11Z) - Amortizing intractable inference in diffusion models for vision, language, and control [89.65631572949702]
This paper studies amortized sampling of the posterior over data, $mathbfxsim prm post(mathbfx)propto p(mathbfx)r(mathbfx)$, in a model that consists of a diffusion generative model prior $p(mathbfx)$ and a black-box constraint or function $r(mathbfx)$.
We prove the correctness of a data-free learning objective, relative trajectory balance, for training a diffusion model that samples from
arXiv Detail & Related papers (2024-05-31T16:18:46Z) - Learning general Gaussian mixtures with efficient score matching [16.06356123715737]
We study the problem of learning mixtures of $k$ Gaussians in $d$ dimensions.
We make no separation assumptions on the underlying mixture components.
We give an algorithm that draws $dmathrmpoly(k/varepsilon)$ samples from the target mixture, runs in sample-polynomial time, and constructs a sampler.
arXiv Detail & Related papers (2024-04-29T17:30:36Z) - Learning Mixtures of Gaussians Using Diffusion Models [9.118706387430883]
We give a new algorithm for learning mixtures of $k$ Gaussians to TV error.
Our approach is analytic and relies on the framework of diffusion models.
arXiv Detail & Related papers (2024-04-29T17:00:20Z) - Probabilistic Contrastive Learning for Long-Tailed Visual Recognition [78.70453964041718]
Longtailed distributions frequently emerge in real-world data, where a large number of minority categories contain a limited number of samples.
Recent investigations have revealed that supervised contrastive learning exhibits promising potential in alleviating the data imbalance.
We propose a novel probabilistic contrastive (ProCo) learning algorithm that estimates the data distribution of the samples from each class in the feature space.
arXiv Detail & Related papers (2024-03-11T13:44:49Z) - SwinGNN: Rethinking Permutation Invariance in Diffusion Models for Graph Generation [15.977241867213516]
Diffusion models based on permutation-equivariant networks can learn permutation-invariant distributions for graph data.
We propose a non-invariant diffusion model, called $textitSwinGNN$, which employs an efficient edge-to-edge 2-WL message passing network.
arXiv Detail & Related papers (2023-07-04T10:58:42Z) - Sampling, Diffusions, and Stochastic Localization [10.871336306134395]
Diffusions are a successful technique to sample from high-dimensional expository distributions.<n>The drift of the diffusion process is typically represented as a neural network.<n>An algorithmic version of Markov localization was recently proposed in order to sample from certain statistical mechanics models.
arXiv Detail & Related papers (2023-05-18T04:01:40Z) - Unite and Conquer: Plug & Play Multi-Modal Synthesis using Diffusion
Models [54.1843419649895]
We propose a solution based on denoising diffusion probabilistic models (DDPMs)
Our motivation for choosing diffusion models over other generative models comes from the flexible internal structure of diffusion models.
Our method can unite multiple diffusion models trained on multiple sub-tasks and conquer the combined task.
arXiv Detail & Related papers (2022-12-01T18:59:55Z) - On-Demand Sampling: Learning Optimally from Multiple Distributions [63.20009081099896]
Social and real-world considerations have given rise to multi-distribution learning paradigms.
We establish the optimal sample complexity of these learning paradigms and give algorithms that meet this sample complexity.
Our algorithm design and analysis are enabled by our extensions of online learning techniques for solving zero-sum games.
arXiv Detail & Related papers (2022-10-22T19:07:26Z) - Diffusion models as plug-and-play priors [98.16404662526101]
We consider the problem of inferring high-dimensional data $mathbfx$ in a model that consists of a prior $p(mathbfx)$ and an auxiliary constraint $c(mathbfx,mathbfy)$.
The structure of diffusion models allows us to perform approximate inference by iterating differentiation through the fixed denoising network enriched with different amounts of noise.
arXiv Detail & Related papers (2022-06-17T21:11:36Z) - Embedding Propagation: Smoother Manifold for Few-Shot Classification [131.81692677836202]
We propose to use embedding propagation as an unsupervised non-parametric regularizer for manifold smoothing in few-shot classification.
We empirically show that embedding propagation yields a smoother embedding manifold.
We show that embedding propagation consistently improves the accuracy of the models in multiple semi-supervised learning scenarios by up to 16% points.
arXiv Detail & Related papers (2020-03-09T13:51:09Z) - Stein Variational Inference for Discrete Distributions [70.19352762933259]
We propose a simple yet general framework that transforms discrete distributions to equivalent piecewise continuous distributions.
Our method outperforms traditional algorithms such as Gibbs sampling and discontinuous Hamiltonian Monte Carlo.
We demonstrate that our method provides a promising tool for learning ensembles of binarized neural network (BNN)
In addition, such transform can be straightforwardly employed in gradient-free kernelized Stein discrepancy to perform goodness-of-fit (GOF) test on discrete distributions.
arXiv Detail & Related papers (2020-03-01T22:45:41Z) - Neural Bayes: A Generic Parameterization Method for Unsupervised
Representation Learning [175.34232468746245]
We introduce a parameterization method called Neural Bayes.
It allows computing statistical quantities that are in general difficult to compute.
We show two independent use cases for this parameterization.
arXiv Detail & Related papers (2020-02-20T22:28:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.