Conditional Generative Modeling via Learning the Latent Space
- URL: http://arxiv.org/abs/2010.03132v2
- Date: Fri, 9 Oct 2020 03:29:17 GMT
- Title: Conditional Generative Modeling via Learning the Latent Space
- Authors: Sameera Ramasinghe, Kanchana Ranasinghe, Salman Khan, Nick Barnes, and
Stephen Gould
- Abstract summary: We propose a novel framework for conditional generation in multimodal spaces.
It uses latent variables to model generalizable learning patterns.
At inference, the latent variables are optimized to find optimal solutions corresponding to multiple output modes.
- Score: 54.620761775441046
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Although deep learning has achieved appealing results on several machine
learning tasks, most of the models are deterministic at inference, limiting
their application to single-modal settings. We propose a novel general-purpose
framework for conditional generation in multimodal spaces, that uses latent
variables to model generalizable learning patterns while minimizing a family of
regression cost functions. At inference, the latent variables are optimized to
find optimal solutions corresponding to multiple output modes. Compared to
existing generative solutions, in multimodal spaces, our approach demonstrates
faster and stable convergence, and can learn better representations for
downstream tasks. Importantly, it provides a simple generic model that can beat
highly engineered pipelines tailored using domain expertise on a variety of
tasks, while generating diverse outputs. Our codes will be released.
Related papers
- EmbedLLM: Learning Compact Representations of Large Language Models [28.49433308281983]
We propose EmbedLLM, a framework designed to learn compact vector representations of Large Language Models.
We introduce an encoder-decoder approach for learning such embeddings, along with a systematic framework to evaluate their effectiveness.
Empirical results show that EmbedLLM outperforms prior methods in model routing both in accuracy and latency.
arXiv Detail & Related papers (2024-10-03T05:43:24Z) - A Practitioner's Guide to Continual Multimodal Pretraining [83.63894495064855]
Multimodal foundation models serve numerous applications at the intersection of vision and language.
To keep models updated, research into continual pretraining mainly explores scenarios with either infrequent, indiscriminate updates on large-scale new data, or frequent, sample-level updates.
We introduce FoMo-in-Flux, a continual multimodal pretraining benchmark with realistic compute constraints and practical deployment requirements.
arXiv Detail & Related papers (2024-08-26T17:59:01Z) - DiffSG: A Generative Solver for Network Optimization with Diffusion Model [75.27274046562806]
Diffusion generative models can consider a broader range of solutions and exhibit stronger generalization by learning parameters.
We propose a new framework, which leverages intrinsic distribution learning of diffusion generative models to learn high-quality solutions.
arXiv Detail & Related papers (2024-08-13T07:56:21Z) - Concrete Subspace Learning based Interference Elimination for Multi-task
Model Fusion [86.6191592951269]
Merging models fine-tuned from common extensively pretrained large model but specialized for different tasks has been demonstrated as a cheap and scalable strategy to construct a multitask model that performs well across diverse tasks.
We propose the CONtinuous relaxation dis (Concrete) subspace learning method to identify a common lowdimensional subspace and utilize its shared information track interference problem without sacrificing performance.
arXiv Detail & Related papers (2023-12-11T07:24:54Z) - Diffusion-Generative Multi-Fidelity Learning for Physical Simulation [24.723536390322582]
We develop a diffusion-generative multi-fidelity learning method based on differential equations (SDE), where the generation is a continuous denoising process.
By conditioning on additional inputs (temporal or spacial variables), our model can efficiently learn and predict multi-dimensional solution arrays.
arXiv Detail & Related papers (2023-11-09T18:59:05Z) - Towards Robust Multi-Modal Reasoning via Model Selection [7.6621866737827045]
LLM serves as the "brain" of the agent, orchestrating multiple tools for collaborative multi-step task solving.
We propose the $textitM3$ framework as a plug-in with negligible runtime overhead at test-time.
Our experiments reveal that our framework enables dynamic model selection, considering both user inputs and subtask dependencies.
arXiv Detail & Related papers (2023-10-12T16:06:18Z) - Generative Model for Models: Rapid DNN Customization for Diverse Tasks
and Resource Constraints [28.983470365172057]
NN-Factory is a one-for-all framework to generate customized lightweight models for diverse edge scenarios.
The main components of NN-Factory include a modular supernet with pretrained modules that can be conditionally activated to accomplish different tasks.
NN-Factory is able to generate high-quality task- and resource-specific models within few seconds, faster than conventional model customization approaches by orders of magnitude.
arXiv Detail & Related papers (2023-08-29T03:28:14Z) - HyperImpute: Generalized Iterative Imputation with Automatic Model
Selection [77.86861638371926]
We propose a generalized iterative imputation framework for adaptively and automatically configuring column-wise models.
We provide a concrete implementation with out-of-the-box learners, simulators, and interfaces.
arXiv Detail & Related papers (2022-06-15T19:10:35Z) - Twist Decoding: Diverse Generators Guide Each Other [116.20780037268801]
We introduce Twist decoding, a simple and general inference algorithm that generates text while benefiting from diverse models.
Our method does not assume the vocabulary, tokenization or even generation order is shared.
arXiv Detail & Related papers (2022-05-19T01:27:53Z) - SAGE: Generating Symbolic Goals for Myopic Models in Deep Reinforcement
Learning [18.37286885057802]
We propose an algorithm combining learning and planning to exploit a previously unusable class of incomplete models.
This combines the strengths of symbolic planning and neural learning approaches in a novel way that outperforms competing methods on variations of taxi world and Minecraft.
arXiv Detail & Related papers (2022-03-09T22:55:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.