Multivariate Variational Autoencoder
- URL: http://arxiv.org/abs/2511.07472v1
- Date: Wed, 12 Nov 2025 01:01:09 GMT
- Title: Multivariate Variational Autoencoder
- Authors: Mehmet Can Yavuz,
- Abstract summary: We present a VAE variant that preserves Gaussian tractability while lifting the diagonal posterior restriction.<n>MVAE factorizes each posterior covariance, where a emphglobal coupling matrix $mathbfC$ induces dataset-wide latent correlations.<n>We release a fully reproducible implementation with training/evaluation scripts and sweep utilities to facilitate fair comparison and reuse.
- Score: 0.08460698440162889
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present the Multivariate Variational Autoencoder (MVAE), a VAE variant that preserves Gaussian tractability while lifting the diagonal posterior restriction. MVAE factorizes each posterior covariance, where a \emph{global} coupling matrix $\mathbf{C}$ induces dataset-wide latent correlations and \emph{per-sample} diagonal scales modulate local uncertainty. This yields a full-covariance family with analytic KL and an efficient reparameterization via $\mathbf{L}=\mathbf{C}\mathrm{diag}(\boldsymbolσ)$. Across Larochelle-style MNIST variants, Fashion-MNIST, CIFAR-10, and CIFAR-100, MVAE consistently matches or improves reconstruction (MSE~$\downarrow$) and delivers robust gains in calibration (NLL/Brier/ECE~$\downarrow$) and unsupervised structure (NMI/ARI~$\uparrow$) relative to diagonal-covariance VAEs with matched capacity, especially at mid-range latent sizes. Latent-plane visualizations further indicate smoother, more coherent factor traversals and sharper local detail. We release a fully reproducible implementation with training/evaluation scripts and sweep utilities to facilitate fair comparison and reuse.
Related papers
- Anisotropic local law for non-separable sample covariance matrices [10.181748307494608]
We establish local laws for sample covariance matrices $K = N-1sum_i=1N g_ig_ig_i*$ where the random vectors $g_1, ldots, g_N in Rn$ are independent with common covariance $$.<n>We discuss several classes of non-separable examples satisfying our assumptions, including conditionally mean-zero distributions, the random features model $g = (Xw)$ arising in machine learning, and Gaussian measures with
arXiv Detail & Related papers (2026-02-20T03:28:51Z) - Decoupling Variance and Scale-Invariant Updates in Adaptive Gradient Descent for Unified Vector and Matrix Optimization [14.136955342888987]
We reform the AdaGrad update and decompose it into a variance adaptation term and a scale-invariant term.<n>This produces $textbfDeVA$ ($textbfV$ariance $textbfA$daptation), a framework that bridges between vector-based variance adaptation and matrix spectral optimization.
arXiv Detail & Related papers (2026-02-06T17:06:42Z) - Singular Bayesian Neural Networks [1.2891210250935148]
Bayesian neural networks promise calibrated uncertainty but require $O(mn)$ parameters for standard mean-field Gaussian posteriors.<n>We induce a posterior that is singular with respect to the Lebesgue measure, concentrating on the rank-$r$ manifold.<n>We derive PAC-Bayes generalization bounds whose complexity term scales as $sqrtr(m+n)$ instead of $sqrtm n$, and prove loss bounds that decompose the error into optimization and rank-induced bias.
arXiv Detail & Related papers (2026-01-30T23:06:34Z) - Robust Layerwise Scaling Rules by Proper Weight Decay Tuning [50.11170157029911]
In modern scale-invariant architectures, training quickly enters an degrading-governed steady state.<n>We introduce a weight-decay scaling rule for AdamW that preserves sublayer gain across widths.<n>Our results extend $mu$P beyond the near-init regime by explicitly controlling the steady-state scales set by parameters.
arXiv Detail & Related papers (2025-10-17T02:58:35Z) - Structured Sparse Transition Matrices to Enable State Tracking in State-Space Models [68.31088463716269]
We propose a structured sparse parametrization of transition matrices in state-space models (SSMs)<n>Our method, PD-SSM, parametrizes the transition matrix as the product of a column one-hot matrix ($P$) and a complex-valued diagonal matrix ($D$)<n>The model significantly outperforms a wide collection of modern SSM variants on various FSA state tracking tasks.
arXiv Detail & Related papers (2025-09-26T12:46:30Z) - Global Convergence of Gradient EM for Over-Parameterized Gaussian Mixtures [53.51230405648361]
We study the dynamics of gradient EM and employ tensor decomposition to characterize the geometric landscape of the likelihood loss.<n>This is the first global convergence and recovery result for EM or gradient EM beyond the special case of $m=2$.
arXiv Detail & Related papers (2025-06-06T23:32:38Z) - FedSVD: Adaptive Orthogonalization for Private Federated Learning with LoRA [68.44043212834204]
Low-Rank Adaptation (LoRA) is widely used for efficient fine-tuning of language models in learning (FL)<n>Low-Rank Adaptation (LoRA) is widely used for efficient fine-tuning of language models in learning (FL)
arXiv Detail & Related papers (2025-05-19T07:32:56Z) - Batch, match, and patch: low-rank approximations for score-based variational inference [8.840147522046651]
Black-box variational inference scales poorly to high-dimensional problems.<n>We extend the batch-and-match (BaM) framework for score-based BBVI.<n>We evaluate this approach on a variety of synthetic target distributions and real-world problems in high-dimensional inference.
arXiv Detail & Related papers (2024-10-29T17:42:56Z) - Accurate and Scalable Stochastic Gaussian Process Regression via Learnable Coreset-based Variational Inference [8.077736581030264]
We introduce a novel inductive variational inference method for Gaussian process ($mathcalGP$) regression, by deriving a posterior over a learnable set of coresets.<n>Unlike former free-form variational families for inference, our coreset-based variational $mathcalGP$ (CVGP) is defined in terms of the $mathcalGP$ prior and the (weighted) data likelihood.
arXiv Detail & Related papers (2023-11-02T17:22:22Z) - Transformers as Support Vector Machines [54.642793677472724]
We establish a formal equivalence between the optimization geometry of self-attention and a hard-margin SVM problem.
We characterize the implicit bias of 1-layer transformers optimized with gradient descent.
We believe these findings inspire the interpretation of transformers as a hierarchy of SVMs that separates and selects optimal tokens.
arXiv Detail & Related papers (2023-08-31T17:57:50Z) - Higher Order Gauge Equivariant CNNs on Riemannian Manifolds and
Applications [7.322121417864824]
We introduce a higher order generalization of the gauge equivariant convolution, dubbed a gauge equivariant Volterra network (GEVNet)
This allows us to model spatially extended nonlinear interactions within a given field while still maintaining equivariance to global isometries.
In the neuroimaging data experiments, the resulting two-part architecture is used to automatically discriminate between patients with Lewy Body Disease (DLB), Alzheimer's Disease (AD) and Parkinson's Disease (PD) from diffusion magnetic resonance images (dMRI)
arXiv Detail & Related papers (2023-05-26T06:02:31Z) - Training \eta-VAE by Aggregating a Learned Gaussian Posterior with a
Decoupled Decoder [0.553073476964056]
Current practices in VAE training often result in a trade-off between the reconstruction fidelity and the continuity$/$disentanglement of the latent space.
We present intuitions and a careful analysis of the antagonistic mechanism of the two losses, and propose a simple yet effective two-stage method for training a VAE.
We evaluate the method using a medical dataset intended for 3D skull reconstruction and shape completion, and the results indicate promising generative capabilities of the VAE trained using the proposed method.
arXiv Detail & Related papers (2022-09-29T13:49:57Z) - Improving the Sample-Complexity of Deep Classification Networks with
Invariant Integration [77.99182201815763]
Leveraging prior knowledge on intraclass variance due to transformations is a powerful method to improve the sample complexity of deep neural networks.
We propose a novel monomial selection algorithm based on pruning methods to allow an application to more complex problems.
We demonstrate the improved sample complexity on the Rotated-MNIST, SVHN and CIFAR-10 datasets.
arXiv Detail & Related papers (2022-02-08T16:16:11Z) - Unbiased Gradient Estimation for Variational Auto-Encoders using Coupled
Markov Chains [34.77971292478243]
The variational auto-encoder (VAE) is a deep latent variable model that has two neural networks in an autoencoder-like architecture.
We develop a training scheme for VAEs by introducing unbiased estimators of the log-likelihood gradient.
We show experimentally that VAEs fitted with unbiased estimators exhibit better predictive performance.
arXiv Detail & Related papers (2020-10-05T08:11:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.