Related papers: Bi-level Doubly Variational Learning for Energy-based Latent Variable Models

Bi-level Doubly Variational Learning for Energy-based Latent Variable Models

URL: http://arxiv.org/abs/2203.14702v1
Date: Thu, 24 Mar 2022 04:13:38 GMT
Title: Bi-level Doubly Variational Learning for Energy-based Latent Variable Models
Authors: Ge Kan, Jinhu L\"u, Tian Wang, Baochang Zhang, Aichun Zhu, Lei Huang, Guodong Guo, Hichem Snoussi
Abstract summary: Energy-based latent variable models (EBLVMs) are more expressive than conventional energy-based models. We propose Bi-level doubly variational learning (BiDVL) to facilitate learning EBLVMs. Our model achieves impressive image generation performance over related works.
Score: 46.75117861209482
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Energy-based latent variable models (EBLVMs) are more expressive than conventional energy-based models. However, its potential on visual tasks are limited by its training process based on maximum likelihood estimate that requires sampling from two intractable distributions. In this paper, we propose Bi-level doubly variational learning (BiDVL), which is based on a new bi-level optimization framework and two tractable variational distributions to facilitate learning EBLVMs. Particularly, we lead a decoupled EBLVM consisting of a marginal energy-based distribution and a structural posterior to handle the difficulties when learning deep EBLVMs on images. By choosing a symmetric KL divergence in the lower level of our framework, a compact BiDVL for visual tasks can be obtained. Our model achieves impressive image generation performance over related works. It also demonstrates the significant capacity of testing image reconstruction and out-of-distribution detection.

Related papers

MoCa: Modality-aware Continual Pre-training Makes Better Bidirectional Multimodal Embeddings [75.0617088717528]
MoCa is a framework for transforming pre-trained VLM backbones into effective bidirectional embedding models.<n>MoCa consistently improves performance across MMEB and ViDoRe-v2 benchmarks, achieving new state-of-the-art results.
arXiv Detail & Related papers (2025-06-29T06:41:00Z)
Boosting the Generalization and Reasoning of Vision Language Models with Curriculum Reinforcement Learning [12.728451197053321]
We propose Curriculum Reinforcement Finetuning (Curr-ReFT), a novel post-training paradigm specifically designed for small-scale vision-language models (VLMs) Curr-ReFT comprises two sequential stages: Curriculum Reinforcement Learning and Rejected Sampling-based Self-improvement. Our experiments demonstrate that models trained with Curr-ReFT paradigm achieve state-of-the-art performance across various visual tasks.
arXiv Detail & Related papers (2025-03-10T08:48:50Z)
TAID: Temporally Adaptive Interpolated Distillation for Efficient Knowledge Transfer in Language Models [6.8298782282181865]
We introduce $textitTemporally Adaptive Interpolated Distillation (TAID)$, a novel knowledge distillation approach. We show TAID's superior performance across various model sizes and architectures in both instruction tuning and pre-training scenarios. These results demonstrate TAID's effectiveness in creating high-performing and efficient models, advancing the development of more accessible AI technologies.
arXiv Detail & Related papers (2025-01-28T13:31:18Z)
LSSInst: Improving Geometric Modeling in LSS-Based BEV Perception with Instance Representation [10.434754671492723]
We propose LSSInst, a two-stage object detector incorporating BEV and instance representations in tandem. The proposed detector exploits fine-grained pixel-level features that can be flexibly integrated into existing LSS-based BEV networks. Our proposed framework is of excellent generalization ability and performance, which boosts the performances of modern LSS-based BEV perception methods without bells and whistles.
arXiv Detail & Related papers (2024-11-09T13:03:54Z)
VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks [60.5257456681402]
We study the potential for building universal embeddings capable of handling a wide range of downstream tasks. We build a series of VLM2Vec models on SoTA VLMs like Phi-3.5-V, LLaVA-1.6 and evaluate them on MMEB's evaluation split. Our results show that VLM2Vec achieves an absolute average improvement of 10% to 20% over existing multimodal embedding models.
arXiv Detail & Related papers (2024-10-07T16:14:05Z)
Hyperbolic Learning with Multimodal Large Language Models [8.98815579836401]
We address the challenges of scaling multi-modal hyperbolic models by orders of magnitude in terms of parameters (billions) and training complexity using the BLIP-2 architecture. We propose a novel training strategy for a hyperbolic version of BLIP-2, which allows to achieve comparable performance to its Euclidean counterpart, while maintaining stability throughout the training process and showing a meaningful indication of uncertainty with each embedding.
arXiv Detail & Related papers (2024-08-09T14:39:15Z)
Robust Training of Federated Models with Extremely Label Deficiency [84.00832527512148]
Federated semi-supervised learning (FSSL) has emerged as a powerful paradigm for collaboratively training machine learning models using distributed data with label deficiency. We propose a novel twin-model paradigm, called Twin-sight, designed to enhance mutual guidance by providing insights from different perspectives of labeled and unlabeled data. Our comprehensive experiments on four benchmark datasets provide substantial evidence that Twin-sight can significantly outperform state-of-the-art methods across various experimental settings.
arXiv Detail & Related papers (2024-02-22T10:19:34Z)
Learning Energy-Based Models by Cooperative Diffusion Recovery Likelihood [64.95663299945171]
Training energy-based models (EBMs) on high-dimensional data can be both challenging and time-consuming. There exists a noticeable gap in sample quality between EBMs and other generative frameworks like GANs and diffusion models. We propose cooperative diffusion recovery likelihood (CDRL), an effective approach to tractably learn and sample from a series of EBMs.
arXiv Detail & Related papers (2023-09-10T22:05:24Z)
Bilevel Generative Learning for Low-Light Vision [64.77933848939327]
We propose a generic low-light vision solution by introducing a generative block to convert data from the RAW to the RGB domain. This novel approach connects diverse vision problems by explicitly depicting data generation, which is the first in the field. We develop two types of learning strategies targeting different goals, namely low cost and high accuracy, to acquire a new bilevel generative learning paradigm.
arXiv Detail & Related papers (2023-08-07T07:59:56Z)
Persistently Trained, Diffusion-assisted Energy-based Models [18.135784288023928]
We introduce diffusion data and learn a joint EBM, called diffusion assisted-EBMs, through persistent training. We show that persistently trained EBMs can simultaneously achieve long-run stability, post-training image generation, and superior out-of-distribution detection.
arXiv Detail & Related papers (2023-04-21T02:29:18Z)
Latent Diffusion Energy-Based Model for Interpretable Text Modeling [104.85356157724372]
We introduce a novel symbiosis between the diffusion models and latent space EBMs in a variational learning framework. We develop a geometric clustering-based regularization jointly with the information bottleneck to further improve the quality of the learned latent space.
arXiv Detail & Related papers (2022-06-13T03:41:31Z)
Bi-level Score Matching for Learning Energy-based Latent Variable Models [46.7000048886801]
Score matching (SM) provides a compelling approach to learn energy-based models (EBMs) by avoiding the calculation of partition function. This paper presents a bi-level score matching (BiSM) method to learn EBLVMs with general structures. We show that BiSM is comparable to the widely adopted contrastive divergence and SM methods when they are applicable.
arXiv Detail & Related papers (2020-10-15T16:24:04Z)
ICE-BeeM: Identifiable Conditional Energy-Based Deep Models Based on Nonlinear ICA [11.919315372249802]
We consider the identifiability theory of probabilistic models. We show that our model can be used for the estimation of the components in the framework of Independently Modulated Component Analysis.
arXiv Detail & Related papers (2020-02-26T14:43:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.