Learning Structured Output Representations from Attributes using Deep
Conditional Generative Models
- URL: http://arxiv.org/abs/2305.00980v1
- Date: Sun, 30 Apr 2023 17:25:31 GMT
- Title: Learning Structured Output Representations from Attributes using Deep
Conditional Generative Models
- Authors: Mohamed Debbagh
- Abstract summary: This paper recreates the Conditional Variational Auto-encoder architecture and trains it on images conditioned on attributes.
We attempt to generate new faces with distinct attributes such as hair color and glasses, as well as different bird species samples.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Structured output representation is a generative task explored in computer
vision that often times requires the mapping of low dimensional features to
high dimensional structured outputs. Losses in complex spatial information in
deterministic approaches such as Convolutional Neural Networks (CNN) lead to
uncertainties and ambiguous structures within a single output representation. A
probabilistic approach through deep Conditional Generative Models (CGM) is
presented by Sohn et al. in which a particular model known as the Conditional
Variational Auto-encoder (CVAE) is introduced and explored. While the original
paper focuses on the task of image segmentation, this paper adopts the CVAE
framework for the task of controlled output representation through attributes.
This approach allows us to learn a disentangled multimodal prior distribution,
resulting in more controlled and robust approach to sample generation. In this
work we recreate the CVAE architecture and train it on images conditioned on
various attributes obtained from two image datasets; the Large-scale CelebFaces
Attributes (CelebA) dataset and the Caltech-UCSD Birds (CUB-200-2011) dataset.
We attempt to generate new faces with distinct attributes such as hair color
and glasses, as well as different bird species samples with various attributes.
We further introduce strategies for improving generalized sample generation by
applying a weighted term to the variational lower bound.
Related papers
- Neural Clustering based Visual Representation Learning [61.72646814537163]
Clustering is one of the most classic approaches in machine learning and data analysis.
We propose feature extraction with clustering (FEC), which views feature extraction as a process of selecting representatives from data.
FEC alternates between grouping pixels into individual clusters to abstract representatives and updating the deep features of pixels with current representatives.
arXiv Detail & Related papers (2024-03-26T06:04:50Z) - Attribute-Aware Deep Hashing with Self-Consistency for Large-Scale
Fine-Grained Image Retrieval [65.43522019468976]
We propose attribute-aware hashing networks with self-consistency for generating attribute-aware hash codes.
We develop an encoder-decoder structure network of a reconstruction task to unsupervisedly distill high-level attribute-specific vectors.
Our models are equipped with a feature decorrelation constraint upon these attribute vectors to strengthen their representative abilities.
arXiv Detail & Related papers (2023-11-21T08:20:38Z) - T1: Scaling Diffusion Probabilistic Fields to High-Resolution on Unified
Visual Modalities [69.16656086708291]
Diffusion Probabilistic Field (DPF) models the distribution of continuous functions defined over metric spaces.
We propose a new model comprising of a view-wise sampling algorithm to focus on local structure learning.
The model can be scaled to generate high-resolution data while unifying multiple modalities.
arXiv Detail & Related papers (2023-05-24T03:32:03Z) - The geometry of hidden representations of large transformer models [43.16765170255552]
Large transformers are powerful architectures used for self-supervised data analysis across various data types.
We show that the semantic structure of the dataset emerges from a sequence of transformations between one representation and the next.
We show that the semantic information of the dataset is better expressed at the end of the first peak, and this phenomenon can be observed across many models trained on diverse datasets.
arXiv Detail & Related papers (2023-02-01T07:50:26Z) - DeepDC: Deep Distance Correlation as a Perceptual Image Quality
Evaluator [53.57431705309919]
ImageNet pre-trained deep neural networks (DNNs) show notable transferability for building effective image quality assessment (IQA) models.
We develop a novel full-reference IQA (FR-IQA) model based exclusively on pre-trained DNN features.
We conduct comprehensive experiments to demonstrate the superiority of the proposed quality model on five standard IQA datasets.
arXiv Detail & Related papers (2022-11-09T14:57:27Z) - Multi-Facet Clustering Variational Autoencoders [9.150555507030083]
High-dimensional data, such as images, typically feature multiple interesting characteristics one could cluster over.
We introduce Multi-Facet Clustering Variational Autoencoders (MFCVAE)
MFCVAE learns multiple clusterings simultaneously, and is trained fully unsupervised and end-to-end.
arXiv Detail & Related papers (2021-06-09T17:36:38Z) - Analogous to Evolutionary Algorithm: Designing a Unified Sequence Model [58.17021225930069]
We explain the rationality of Vision Transformer by analogy with the proven practical Evolutionary Algorithm (EA)
We propose a more efficient EAT model, and design task-related heads to deal with different tasks more flexibly.
Our approach achieves state-of-the-art results on the ImageNet classification task compared with recent vision transformer works.
arXiv Detail & Related papers (2021-05-31T16:20:03Z) - MOGAN: Morphologic-structure-aware Generative Learning from a Single
Image [59.59698650663925]
Recently proposed generative models complete training based on only one image.
We introduce a MOrphologic-structure-aware Generative Adversarial Network named MOGAN that produces random samples with diverse appearances.
Our approach focuses on internal features including the maintenance of rational structures and variation on appearance.
arXiv Detail & Related papers (2021-03-04T12:45:23Z) - VAE-Info-cGAN: Generating Synthetic Images by Combining Pixel-level and
Feature-level Geospatial Conditional Inputs [0.0]
We present a conditional generative model for synthesizing semantically rich images simultaneously conditioned on a pixellevel (PLC) and a featurelevel condition (FLC)
Experiments on a GPS dataset show that the proposed model can accurately generate various forms of macroscopic aggregates across different geographic locations.
arXiv Detail & Related papers (2020-12-08T03:46:19Z) - Generating Annotated High-Fidelity Images Containing Multiple Coherent
Objects [10.783993190686132]
We propose a multi-object generation framework that can synthesize images with multiple objects without explicitly requiring contextual information.
We demonstrate how coherency and fidelity are preserved with our method through experiments on the Multi-MNIST and CLEVR datasets.
arXiv Detail & Related papers (2020-06-22T11:33:55Z) - Network Bending: Expressive Manipulation of Deep Generative Models [0.2062593640149624]
We introduce a new framework for manipulating and interacting with deep generative models that we call network bending.
We show how it allows for the direct manipulation of semantically meaningful aspects of the generative process as well as allowing for a broad range of expressive outcomes.
arXiv Detail & Related papers (2020-05-25T21:48:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.