Related papers: Multi-Component VAE with Gaussian Markov Random Field

Multi-Component VAE with Gaussian Markov Random Field

URL: http://arxiv.org/abs/2507.12165v1
Date: Wed, 16 Jul 2025 11:53:08 GMT
Title: Multi-Component VAE with Gaussian Markov Random Field
Authors: Fouad Oubari, Mohamed El-Baha, Raphael Meunier, Rodrigue Décatoire, Mathilde Mougeot,
Abstract summary: We introduce a novel generative framework embedding Gaussian Markov Random Fields into both prior and posterior distributions.<n>This design choice explicitly models cross-component relationships, enabling richer representation and faithful reproduction of complex interactions.<n>Our results indicate that the GMRF MCVAE is especially suited for practical applications demanding robust and realistic modeling of multi-component coherence.
Score: 1.2233362977312945
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Multi-component datasets with intricate dependencies, like industrial assemblies or multi-modal imaging, challenge current generative modeling techniques. Existing Multi-component Variational AutoEncoders typically rely on simplified aggregation strategies, neglecting critical nuances and consequently compromising structural coherence across generated components. To explicitly address this gap, we introduce the Gaussian Markov Random Field Multi-Component Variational AutoEncoder , a novel generative framework embedding Gaussian Markov Random Fields into both prior and posterior distributions. This design choice explicitly models cross-component relationships, enabling richer representation and faithful reproduction of complex interactions. Empirically, our GMRF MCVAE achieves state-of-the-art performance on a synthetic Copula dataset specifically constructed to evaluate intricate component relationships, demonstrates competitive results on the PolyMNIST benchmark, and significantly enhances structural coherence on the real-world BIKED dataset. Our results indicate that the GMRF MCVAE is especially suited for practical applications demanding robust and realistic modeling of multi-component coherence

Related papers

CoVAE: correlated multimodal generative modeling [1.9336815376402718]
We introduce Correlated Variational Autoencoders (CoVAE), a new generative architecture that captures the correlations between modalities.<n>We test CoVAE on a number of real and synthetic data sets demonstrating both accurate cross-modal reconstruction and effective quantification of the associated uncertainties.
arXiv Detail & Related papers (2026-03-02T15:14:59Z)
NExT-OMNI: Towards Any-to-Any Omnimodal Foundation Models with Discrete Flow Matching [64.10695425442164]
We introduce NExT-OMNI, an open-source omnimodal foundation model that achieves unified modeling through discrete flow paradigms.<n>Trained on large-scale interleaved text, image, video, and audio data, NExT-OMNI delivers competitive performance on multimodal generation and understanding benchmarks.<n>To advance further research, we release training details, data protocols, and open-source both the code and model checkpoints.
arXiv Detail & Related papers (2025-10-15T16:25:18Z)
FindRec: Stein-Guided Entropic Flow for Multi-Modal Sequential Recommendation [50.438552588818]
We propose textbfFindRec (textbfFlexible unified textbfinformation textbfdisentanglement for multi-modal sequential textbfRecommendation)<n>A Stein kernel-based Integrated Information Coordination Module (IICM) theoretically guarantees distribution consistency between multimodal features and ID streams.<n>A cross-modal expert routing mechanism that adaptively filters and combines multimodal features based on their contextual relevance.
arXiv Detail & Related papers (2025-07-07T04:09:45Z)
Gated Recursive Fusion: A Stateful Approach to Scalable Multimodal Transformers [0.0]
Gated Recurrent Fusion (GRF) is a novel architecture that captures the power of cross-modal attention within a linearly scalable, recurrent pipeline.<n>Our work presents a robust and efficient paradigm for powerful, scalable multimodal representation learning.
arXiv Detail & Related papers (2025-07-01T09:33:38Z)
High-Fidelity Scientific Simulation Surrogates via Adaptive Implicit Neural Representations [51.90920900332569]
Implicit neural representations (INRs) offer a compact and continuous framework for modeling spatially structured data.<n>Recent approaches address this by introducing additional features along rigid geometric structures.<n>We propose a simple yet effective alternative: Feature-Adaptive INR (FA-INR)
arXiv Detail & Related papers (2025-06-07T16:45:17Z)
ArtGS: Building Interactable Replicas of Complex Articulated Objects via Gaussian Splatting [66.29782808719301]
Building articulated objects is a key challenge in computer vision.<n>Existing methods often fail to effectively integrate information across different object states.<n>We introduce ArtGS, a novel approach that leverages 3D Gaussians as a flexible and efficient representation.
arXiv Detail & Related papers (2025-02-26T10:25:32Z)
Single Domain Generalization with Model-aware Parametric Batch-wise Mixup [22.709796153794507]
Single Domain Generalization remains a formidable challenge in the field of machine learning.<n>We propose a novel data augmentation approach, named as Model-aware Parametric Batch-wise Mixup.<n>By exploiting inter-feature correlations, the parameterized mixup generator introduces additional versatility in combining features across a batch of instances.
arXiv Detail & Related papers (2025-02-22T03:45:18Z)
Enhancing Non-Intrusive Load Monitoring with Features Extracted by Independent Component Analysis [0.0]
A novel neural network architecture is proposed to address the challenges in energy disaggregation algorithms.<n>Our results demonstrate that the model is less prone to overfitting, exhibits low complexity, and effectively decomposes signals with many individual components.
arXiv Detail & Related papers (2025-01-28T09:45:06Z)
Improving Retrieval-Augmented Generation through Multi-Agent Reinforcement Learning [51.54046200512198]
Retrieval-augmented generation (RAG) is extensively utilized to incorporate external, current knowledge into large language models.<n>A standard RAG pipeline may comprise several components, such as query rewriting, document retrieval, document filtering, and answer generation.<n>To overcome these challenges, we propose treating the RAG pipeline as a multi-agent cooperative task, with each component regarded as an RL agent.
arXiv Detail & Related papers (2025-01-25T14:24:50Z)
A Markov Random Field Multi-Modal Variational AutoEncoder [1.2233362977312945]
This work introduces a novel multimodal VAE that incorporates a Markov Random Field (MRF) into both the prior and posterior distributions.<n>Our approach is specifically designed to model and leverage the intricacies of these relationships, enabling a more faithful representation of multimodal data.
arXiv Detail & Related papers (2024-08-18T19:27:30Z)
Recurrent Complex-Weighted Autoencoders for Unsupervised Object Discovery [62.43562856605473]
We argue for the computational advantages of a recurrent architecture with complex-valued weights. We propose a fully convolutional autoencoder, SynCx, that performs iterative constraint satisfaction.
arXiv Detail & Related papers (2024-05-27T15:47:03Z)
Modality-Collaborative Transformer with Hybrid Feature Reconstruction for Robust Emotion Recognition [35.15390769958969]
We propose a unified framework, Modality-Collaborative Transformer with Hybrid Feature Reconstruction (MCT-HFR) MCT-HFR consists of a novel attention-based encoder which concurrently extracts and dynamically balances the intra- and inter-modality relations. During model training, LFI leverages complete features as supervisory signals to recover local missing features, while GFA is designed to reduce the global semantic gap between pairwise complete and incomplete representations.
arXiv Detail & Related papers (2023-12-26T01:59:23Z)
A Pareto-optimal compositional energy-based model for sampling and optimization of protein sequences [55.25331349436895]
Deep generative models have emerged as a popular machine learning-based approach for inverse problems in the life sciences. These problems often require sampling new designs that satisfy multiple properties of interest in addition to learning the data distribution.
arXiv Detail & Related papers (2022-10-19T19:04:45Z)
Adversarial Audio Synthesis with Complex-valued Polynomial Networks [60.231877895663956]
Time-frequency (TF) representations in audio have been increasingly modeled real-valued networks. We introduce complex-valued networks called APOLLO, that integrate such complex-valued representations in a natural way. APOLLO results in $17.5%$ improvement over adversarial methods and $8.2%$ over the state-of-the-art diffusion models on SC09 in audio generation.
arXiv Detail & Related papers (2022-06-14T12:58:59Z)
Revealing the Invisible with Model and Data Shrinking for Composite-database Micro-expression Recognition [49.463864096615254]
We analyze the influence of learning complexity, including the input complexity and model complexity. We propose a recurrent convolutional network (RCN) to explore the shallower-architecture and lower-resolution input data. We develop three parameter-free modules to integrate with RCN without increasing any learnable parameters.
arXiv Detail & Related papers (2020-06-17T06:19:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.