Appearance Codes using Joint Embedding Learning of Multiple Modalities
- URL: http://arxiv.org/abs/2311.11427v1
- Date: Sun, 19 Nov 2023 21:24:34 GMT
- Title: Appearance Codes using Joint Embedding Learning of Multiple Modalities
- Authors: Alex Zhang and Evan Dogariu
- Abstract summary: A major limitation of this technique is the need to re-train new appearance codes for every scene on inference.
We propose a framework that learns a joint embedding space for the appearance and structure of the scene by enforcing a contrastive loss constraint between different modalities.
We show that our approach achieves generations of similar quality without learning appearance codes for any unseen images on inference.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The use of appearance codes in recent work on generative modeling has enabled
novel view renders with variable appearance and illumination, such as day-time
and night-time renders of a scene. A major limitation of this technique is the
need to re-train new appearance codes for every scene on inference, so in this
work we address this problem proposing a framework that learns a joint
embedding space for the appearance and structure of the scene by enforcing a
contrastive loss constraint between different modalities. We apply our
framework to a simple Variational Auto-Encoder model on the RADIATE dataset
\cite{sheeny2021radiate} and qualitatively demonstrate that we can generate new
renders of night-time photos using day-time appearance codes without additional
optimization iterations. Additionally, we compare our model to a baseline VAE
that uses the standard per-image appearance code technique and show that our
approach achieves generations of similar quality without learning appearance
codes for any unseen images on inference.
Related papers
- Are CLIP features all you need for Universal Synthetic Image Origin Attribution? [13.96698277726253]
We propose a framework that incorporates features from large pre-trained foundation models to perform Open-Set origin attribution of synthetic images.
We show that our method leads to remarkable attribution performance, even in the low-data regime.
arXiv Detail & Related papers (2024-08-17T09:54:21Z) - ZePo: Zero-Shot Portrait Stylization with Faster Sampling [61.14140480095604]
This paper presents an inversion-free portrait stylization framework based on diffusion models that accomplishes content and style feature fusion in merely four sampling steps.
We propose a feature merging strategy to amalgamate redundant features in Consistency Features, thereby reducing the computational load of attention control.
arXiv Detail & Related papers (2024-08-10T08:53:41Z) - Semantically Consistent Video Inpainting with Conditional Diffusion Models [16.42354856518832]
We present a framework for solving problems with conditional video diffusion models.
We introduce inpainting-specific sampling schemes which capture crucial long-range dependencies in the context.
We devise a novel method for conditioning on the known pixels in incomplete frames.
arXiv Detail & Related papers (2024-04-30T23:49:26Z) - CodeEnhance: A Codebook-Driven Approach for Low-Light Image Enhancement [97.95330185793358]
Low-light image enhancement (LLIE) aims to improve low-illumination images.
Existing methods face two challenges: uncertainty in restoration from diverse brightness degradations and loss of texture and color information.
We propose a novel enhancement approach, CodeEnhance, by leveraging quantized priors and image refinement.
arXiv Detail & Related papers (2024-04-08T07:34:39Z) - Layered Rendering Diffusion Model for Zero-Shot Guided Image Synthesis [60.260724486834164]
This paper introduces innovative solutions to enhance spatial controllability in diffusion models reliant on text queries.
We present two key innovations: Vision Guidance and the Layered Rendering Diffusion framework.
We apply our method to three practical applications: bounding box-to-image, semantic mask-to-image and image editing.
arXiv Detail & Related papers (2023-11-30T10:36:19Z) - SAMPLING: Scene-adaptive Hierarchical Multiplane Images Representation
for Novel View Synthesis from a Single Image [60.52991173059486]
We introduce SAMPLING, a Scene-adaptive Hierarchical Multiplane Images Representation for Novel View Synthesis from a Single Image.
Our method demonstrates considerable performance gains in large-scale unbounded outdoor scenes using a single image on the KITTI dataset.
arXiv Detail & Related papers (2023-09-12T15:33:09Z) - Similarity Min-Max: Zero-Shot Day-Night Domain Adaptation [52.923298434948606]
Low-light conditions not only hamper human visual experience but also degrade the model's performance on downstream vision tasks.
This paper challenges a more complicated scenario with border applicability, i.e., zero-shot day-night domain adaptation.
We propose a similarity min-max paradigm that considers them under a unified framework.
arXiv Detail & Related papers (2023-07-17T18:50:15Z) - Few-shot Neural Radiance Fields Under Unconstrained Illumination [40.384916810850385]
We introduce a new challenge for synthesizing novel view images in practical environments with limited input multi-view images and varying lighting conditions.
NeRF, one of the pioneering works for this task, demand an extensive set of multi-view images taken under constrained illumination.
We suggest ExtremeNeRF, which utilizes multi-view albedo consistency, supported by geometric alignment.
arXiv Detail & Related papers (2023-03-21T10:32:27Z) - Enhance Images as You Like with Unpaired Learning [8.104571453311442]
We propose a lightweight one-path conditional generative adversarial network (cGAN) to learn a one-to-many relation from low-light to normal-light image space.
Our network learns to generate a collection of enhanced images from a given input conditioned on various reference images.
Our model achieves competitive visual and quantitative results on par with fully supervised methods on both noisy and clean datasets.
arXiv Detail & Related papers (2021-10-04T03:00:44Z) - Crowdsampling the Plenoptic Function [56.10020793913216]
We present a new approach to novel view synthesis under time-varying illumination from such data.
We introduce a new DeepMPI representation, motivated by observations on the sparsity structure of the plenoptic function.
Our method can synthesize the same compelling parallax and view-dependent effects as previous MPI methods, while simultaneously interpolating along changes in reflectance and illumination with time.
arXiv Detail & Related papers (2020-07-30T02:52:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.