Related papers: Multivariate Variational Autoencoder

Multivariate Variational Autoencoder

URL: http://arxiv.org/abs/2511.07472v1
Date: Wed, 12 Nov 2025 01:01:09 GMT
Title: Multivariate Variational Autoencoder
Authors: Mehmet Can Yavuz,
Abstract summary: We present a VAE variant that preserves Gaussian tractability while lifting the diagonal posterior restriction.<n>MVAE factorizes each posterior covariance, where a emphglobal coupling matrix $mathbfC$ induces dataset-wide latent correlations.<n>We release a fully reproducible implementation with training/evaluation scripts and sweep utilities to facilitate fair comparison and reuse.
Score: 0.08460698440162889
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We present the Multivariate Variational Autoencoder (MVAE), a VAE variant that preserves Gaussian tractability while lifting the diagonal posterior restriction. MVAE factorizes each posterior covariance, where a \emph{global} coupling matrix $\mathbf{C}$ induces dataset-wide latent correlations and \emph{per-sample} diagonal scales modulate local uncertainty. This yields a full-covariance family with analytic KL and an efficient reparameterization via $\mathbf{L}=\mathbf{C}\mathrm{diag}(\boldsymbolσ)$. Across Larochelle-style MNIST variants, Fashion-MNIST, CIFAR-10, and CIFAR-100, MVAE consistently matches or improves reconstruction (MSE~$\downarrow$) and delivers robust gains in calibration (NLL/Brier/ECE~$\downarrow$) and unsupervised structure (NMI/ARI~$\uparrow$) relative to diagonal-covariance VAEs with matched capacity, especially at mid-range latent sizes. Latent-plane visualizations further indicate smoother, more coherent factor traversals and sharper local detail. We release a fully reproducible implementation with training/evaluation scripts and sweep utilities to facilitate fair comparison and reuse.

Related papers

Anisotropic local law for non-separable sample covariance matrices [10.181748307494608]
We establish local laws for sample covariance matrices $K = N-1sum_i=1N g_ig_ig_i*$ where the random vectors $g_1, ldots, g_N in Rn$ are independent with common covariance $$.<n>We discuss several classes of non-separable examples satisfying our assumptions, including conditionally mean-zero distributions, the random features model $g = (Xw)$ arising in machine learning, and Gaussian measures with
arXiv Detail & Related papers (2026-02-20T03:28:51Z)
Decoupling Variance and Scale-Invariant Updates in Adaptive Gradient Descent for Unified Vector and Matrix Optimization [14.136955342888987]
We reform the AdaGrad update and decompose it into a variance adaptation term and a scale-invariant term.<n>This produces $textbfDeVA$ ($textbfV$ariance $textbfA$daptation), a framework that bridges between vector-based variance adaptation and matrix spectral optimization.
arXiv Detail & Related papers (2026-02-06T17:06:42Z)
Singular Bayesian Neural Networks [1.2891210250935148]
Bayesian neural networks promise calibrated uncertainty but require $O(mn)$ parameters for standard mean-field Gaussian posteriors.<n>We induce a posterior that is singular with respect to the Lebesgue measure, concentrating on the rank-$r$ manifold.<n>We derive PAC-Bayes generalization bounds whose complexity term scales as $sqrtr(m+n)$ instead of $sqrtm n$, and prove loss bounds that decompose the error into optimization and rank-induced bias.
arXiv Detail & Related papers (2026-01-30T23:06:34Z)
Robust Layerwise Scaling Rules by Proper Weight Decay Tuning [50.11170157029911]
In modern scale-invariant architectures, training quickly enters an degrading-governed steady state.<n>We introduce a weight-decay scaling rule for AdamW that preserves sublayer gain across widths.<n>Our results extend $mu$P beyond the near-init regime by explicitly controlling the steady-state scales set by parameters.
arXiv Detail & Related papers (2025-10-17T02:58:35Z)
Structured Sparse Transition Matrices to Enable State Tracking in State-Space Models [68.31088463716269]
We propose a structured sparse parametrization of transition matrices in state-space models (SSMs)<n>Our method, PD-SSM, parametrizes the transition matrix as the product of a column one-hot matrix ($P$) and a complex-valued diagonal matrix ($D$)<n>The model significantly outperforms a wide collection of modern SSM variants on various FSA state tracking tasks.
arXiv Detail & Related papers (2025-09-26T12:46:30Z)
Global Convergence of Gradient EM for Over-Parameterized Gaussian Mixtures [53.51230405648361]
We study the dynamics of gradient EM and employ tensor decomposition to characterize the geometric landscape of the likelihood loss.<n>This is the first global convergence and recovery result for EM or gradient EM beyond the special case of $m=2$.
arXiv Detail & Related papers (2025-06-06T23:32:38Z)
FedSVD: Adaptive Orthogonalization for Private Federated Learning with LoRA [68.44043212834204]
Low-Rank Adaptation (LoRA) is widely used for efficient fine-tuning of language models in learning (FL)<n>Low-Rank Adaptation (LoRA) is widely used for efficient fine-tuning of language models in learning (FL)
arXiv Detail & Related papers (2025-05-19T07:32:56Z)
Batch, match, and patch: low-rank approximations for score-based variational inference [8.840147522046651]
Black-box variational inference scales poorly to high-dimensional problems.<n>We extend the batch-and-match (BaM) framework for score-based BBVI.<n>We evaluate this approach on a variety of synthetic target distributions and real-world problems in high-dimensional inference.
arXiv Detail & Related papers (2024-10-29T17:42:56Z)
Accurate and Scalable Stochastic Gaussian Process Regression via Learnable Coreset-based Variational Inference [8.077736581030264]
We introduce a novel inductive variational inference method for Gaussian process ($mathcalGP$) regression, by deriving a posterior over a learnable set of coresets.<n>Unlike former free-form variational families for inference, our coreset-based variational $mathcalGP$ (CVGP) is defined in terms of the $mathcalGP$ prior and the (weighted) data likelihood.
arXiv Detail & Related papers (2023-11-02T17:22:22Z)
Transformers as Support Vector Machines [54.642793677472724]
We establish a formal equivalence between the optimization geometry of self-attention and a hard-margin SVM problem. We characterize the implicit bias of 1-layer transformers optimized with gradient descent. We believe these findings inspire the interpretation of transformers as a hierarchy of SVMs that separates and selects optimal tokens.
arXiv Detail & Related papers (2023-08-31T17:57:50Z)
Higher Order Gauge Equivariant CNNs on Riemannian Manifolds and Applications [7.322121417864824]
We introduce a higher order generalization of the gauge equivariant convolution, dubbed a gauge equivariant Volterra network (GEVNet) This allows us to model spatially extended nonlinear interactions within a given field while still maintaining equivariance to global isometries. In the neuroimaging data experiments, the resulting two-part architecture is used to automatically discriminate between patients with Lewy Body Disease (DLB), Alzheimer's Disease (AD) and Parkinson's Disease (PD) from diffusion magnetic resonance images (dMRI)
arXiv Detail & Related papers (2023-05-26T06:02:31Z)
Training \eta-VAE by Aggregating a Learned Gaussian Posterior with a Decoupled Decoder [0.553073476964056]
Current practices in VAE training often result in a trade-off between the reconstruction fidelity and the continuity$/$disentanglement of the latent space. We present intuitions and a careful analysis of the antagonistic mechanism of the two losses, and propose a simple yet effective two-stage method for training a VAE. We evaluate the method using a medical dataset intended for 3D skull reconstruction and shape completion, and the results indicate promising generative capabilities of the VAE trained using the proposed method.
arXiv Detail & Related papers (2022-09-29T13:49:57Z)
Improving the Sample-Complexity of Deep Classification Networks with Invariant Integration [77.99182201815763]
Leveraging prior knowledge on intraclass variance due to transformations is a powerful method to improve the sample complexity of deep neural networks. We propose a novel monomial selection algorithm based on pruning methods to allow an application to more complex problems. We demonstrate the improved sample complexity on the Rotated-MNIST, SVHN and CIFAR-10 datasets.
arXiv Detail & Related papers (2022-02-08T16:16:11Z)
Unbiased Gradient Estimation for Variational Auto-Encoders using Coupled Markov Chains [34.77971292478243]
The variational auto-encoder (VAE) is a deep latent variable model that has two neural networks in an autoencoder-like architecture. We develop a training scheme for VAEs by introducing unbiased estimators of the log-likelihood gradient. We show experimentally that VAEs fitted with unbiased estimators exhibit better predictive performance.
arXiv Detail & Related papers (2020-10-05T08:11:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.