Conditional Generative Modeling via Learning the Latent Space
- URL: http://arxiv.org/abs/2010.03132v2
- Date: Fri, 9 Oct 2020 03:29:17 GMT
- Title: Conditional Generative Modeling via Learning the Latent Space
- Authors: Sameera Ramasinghe, Kanchana Ranasinghe, Salman Khan, Nick Barnes, and
Stephen Gould
- Abstract summary: We propose a novel framework for conditional generation in multimodal spaces.
It uses latent variables to model generalizable learning patterns.
At inference, the latent variables are optimized to find optimal solutions corresponding to multiple output modes.
- Score: 54.620761775441046
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Although deep learning has achieved appealing results on several machine
learning tasks, most of the models are deterministic at inference, limiting
their application to single-modal settings. We propose a novel general-purpose
framework for conditional generation in multimodal spaces, that uses latent
variables to model generalizable learning patterns while minimizing a family of
regression cost functions. At inference, the latent variables are optimized to
find optimal solutions corresponding to multiple output modes. Compared to
existing generative solutions, in multimodal spaces, our approach demonstrates
faster and stable convergence, and can learn better representations for
downstream tasks. Importantly, it provides a simple generic model that can beat
highly engineered pipelines tailored using domain expertise on a variety of
tasks, while generating diverse outputs. Our codes will be released.
Related papers
- Deep Multivariate Models with Parametric Conditionals [47.20275199636936]
We consider deep multivariate models for heterogeneous collections of random variables.<n>We propose to represent the joint probability distribution by means of conditional probability distributions for each group of variables conditioned on the rest.<n>Their learning can be approached as training a parametrised Markov chain kernel by maximising the data likelihood of its limiting distribution.
arXiv Detail & Related papers (2026-02-02T11:01:48Z) - NExT-OMNI: Towards Any-to-Any Omnimodal Foundation Models with Discrete Flow Matching [64.10695425442164]
We introduce NExT-OMNI, an open-source omnimodal foundation model that achieves unified modeling through discrete flow paradigms.<n>Trained on large-scale interleaved text, image, video, and audio data, NExT-OMNI delivers competitive performance on multimodal generation and understanding benchmarks.<n>To advance further research, we release training details, data protocols, and open-source both the code and model checkpoints.
arXiv Detail & Related papers (2025-10-15T16:25:18Z) - Define latent spaces by example: optimisation over the outputs of generative models [37.62017041960412]
Many downstream tasks require a higher level of control than unconstrained sampling.<n>We introduce surrogate latent spaces: non-parametric, low-dimensional Euclidean embeddings that can be extracted from any generative model.<n>Our approach is architecture-agnostic, incurs almost no additional computational cost, and generalises across modalities.
arXiv Detail & Related papers (2025-09-28T10:50:06Z) - OmniBridge: Unified Multimodal Understanding, Generation, and Retrieval via Latent Space Alignment [79.98946571424607]
We present OmniBridge, a unified framework that supports vision-language understanding, generation, and retrieval within a unified architecture.<n>To address the challenge of task interference, we propose a two-stage decoupled training strategy.<n>Experiments demonstrate that OmniBridge achieves competitive or state-of-the-art performance in all three tasks.
arXiv Detail & Related papers (2025-09-23T13:57:55Z) - Multitask Learning with Stochastic Interpolants [13.301909784310894]
We propose a framework for learning maps between probability distributions that broadly generalizes the time dynamics of flow and diffusion models.<n>We generalize interpolants by replacing the scalar time variable with vectors, matrices, or linear operators.<n>This approach enables the construction of versatile generative models capable of fulfilling multiple tasks without task-specific training.
arXiv Detail & Related papers (2025-08-06T16:25:19Z) - Spatial Reasoners for Continuous Variables in Any Domain [49.83744014336816]
We present a framework to perform spatial reasoning over continuous variables with generative denoising models.<n>We provide interfaces to control variable mapping from arbitrary data domains, generative model paradigms, and inference strategies.
arXiv Detail & Related papers (2025-07-14T19:46:54Z) - Continual Learning for Generative AI: From LLMs to MLLMs and Beyond [56.29231194002407]
We present a comprehensive survey of continual learning methods for mainstream generative AI models.<n>We categorize these approaches into three paradigms: architecture-based, regularization-based, and replay-based.<n>We analyze continual learning setups for different generative models, including training objectives, benchmarks, and core backbones.
arXiv Detail & Related papers (2025-06-16T02:27:25Z) - CAFe: Unifying Representation and Generation with Contrastive-Autoregressive Finetuning [24.981279071712173]
We introduce CAFe, a contrastive-autoregressive fine-tuning framework that enhances LVLMs for both representation and generative tasks.
Our approach unifies these traditionally separate tasks, achieving state-of-the-art results in both multimodal retrieval and multimodal generative benchmarks.
arXiv Detail & Related papers (2025-03-25T17:57:17Z) - Few-shot Steerable Alignment: Adapting Rewards and LLM Policies with Neural Processes [50.544186914115045]
Large language models (LLMs) are increasingly embedded in everyday applications.
Ensuring their alignment with the diverse preferences of individual users has become a critical challenge.
We present a novel framework for few-shot steerable alignment.
arXiv Detail & Related papers (2024-12-18T16:14:59Z) - EmbedLLM: Learning Compact Representations of Large Language Models [28.49433308281983]
We propose EmbedLLM, a framework designed to learn compact vector representations of Large Language Models.
We introduce an encoder-decoder approach for learning such embeddings, along with a systematic framework to evaluate their effectiveness.
Empirical results show that EmbedLLM outperforms prior methods in model routing both in accuracy and latency.
arXiv Detail & Related papers (2024-10-03T05:43:24Z) - A Practitioner's Guide to Continual Multimodal Pretraining [83.63894495064855]
Multimodal foundation models serve numerous applications at the intersection of vision and language.
To keep models updated, research into continual pretraining mainly explores scenarios with either infrequent, indiscriminate updates on large-scale new data, or frequent, sample-level updates.
We introduce FoMo-in-Flux, a continual multimodal pretraining benchmark with realistic compute constraints and practical deployment requirements.
arXiv Detail & Related papers (2024-08-26T17:59:01Z) - DiffSG: A Generative Solver for Network Optimization with Diffusion Model [75.27274046562806]
Diffusion generative models can consider a broader range of solutions and exhibit stronger generalization by learning parameters.
We propose a new framework, which leverages intrinsic distribution learning of diffusion generative models to learn high-quality solutions.
arXiv Detail & Related papers (2024-08-13T07:56:21Z) - Concrete Subspace Learning based Interference Elimination for Multi-task
Model Fusion [86.6191592951269]
Merging models fine-tuned from common extensively pretrained large model but specialized for different tasks has been demonstrated as a cheap and scalable strategy to construct a multitask model that performs well across diverse tasks.
We propose the CONtinuous relaxation dis (Concrete) subspace learning method to identify a common lowdimensional subspace and utilize its shared information track interference problem without sacrificing performance.
arXiv Detail & Related papers (2023-12-11T07:24:54Z) - Diffusion-Generative Multi-Fidelity Learning for Physical Simulation [24.723536390322582]
We develop a diffusion-generative multi-fidelity learning method based on differential equations (SDE), where the generation is a continuous denoising process.
By conditioning on additional inputs (temporal or spacial variables), our model can efficiently learn and predict multi-dimensional solution arrays.
arXiv Detail & Related papers (2023-11-09T18:59:05Z) - Towards Robust Multi-Modal Reasoning via Model Selection [7.6621866737827045]
LLM serves as the "brain" of the agent, orchestrating multiple tools for collaborative multi-step task solving.
We propose the $textitM3$ framework as a plug-in with negligible runtime overhead at test-time.
Our experiments reveal that our framework enables dynamic model selection, considering both user inputs and subtask dependencies.
arXiv Detail & Related papers (2023-10-12T16:06:18Z) - Generative Model for Models: Rapid DNN Customization for Diverse Tasks
and Resource Constraints [28.983470365172057]
NN-Factory is a one-for-all framework to generate customized lightweight models for diverse edge scenarios.
The main components of NN-Factory include a modular supernet with pretrained modules that can be conditionally activated to accomplish different tasks.
NN-Factory is able to generate high-quality task- and resource-specific models within few seconds, faster than conventional model customization approaches by orders of magnitude.
arXiv Detail & Related papers (2023-08-29T03:28:14Z) - HyperImpute: Generalized Iterative Imputation with Automatic Model
Selection [77.86861638371926]
We propose a generalized iterative imputation framework for adaptively and automatically configuring column-wise models.
We provide a concrete implementation with out-of-the-box learners, simulators, and interfaces.
arXiv Detail & Related papers (2022-06-15T19:10:35Z) - Twist Decoding: Diverse Generators Guide Each Other [116.20780037268801]
We introduce Twist decoding, a simple and general inference algorithm that generates text while benefiting from diverse models.
Our method does not assume the vocabulary, tokenization or even generation order is shared.
arXiv Detail & Related papers (2022-05-19T01:27:53Z) - SAGE: Generating Symbolic Goals for Myopic Models in Deep Reinforcement
Learning [18.37286885057802]
We propose an algorithm combining learning and planning to exploit a previously unusable class of incomplete models.
This combines the strengths of symbolic planning and neural learning approaches in a novel way that outperforms competing methods on variations of taxi world and Minecraft.
arXiv Detail & Related papers (2022-03-09T22:55:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.