Physically Disentangled Representations
- URL: http://arxiv.org/abs/2204.05281v1
- Date: Mon, 11 Apr 2022 17:36:40 GMT
- Title: Physically Disentangled Representations
- Authors: Tzofi Klinghoffer, Kushagra Tiwary, Arkadiusz Balata, Vivek Sharma,
Ramesh Raskar
- Abstract summary: inverse rendering can be used to learn physically disentangled representations of scenes without supervision.
We show the utility of inverse rendering in learning representations that yield improved accuracy on downstream clustering, linear classification, and segmentation tasks.
- Score: 13.234029150635658
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: State-of-the-art methods in generative representation learning yield semantic
disentanglement, but typically do not consider physical scene parameters, such
as geometry, albedo, lighting, or camera. We posit that inverse rendering, a
way to reverse the rendering process to recover scene parameters from an image,
can also be used to learn physically disentangled representations of scenes
without supervision. In this paper, we show the utility of inverse rendering in
learning representations that yield improved accuracy on downstream clustering,
linear classification, and segmentation tasks with the help of our novel
Leave-One-Out, Cycle Contrastive loss (LOOCC), which improves disentanglement
of scene parameters and robustness to out-of-distribution lighting and
viewpoints. We perform a comparison of our method with other generative
representation learning methods across a variety of downstream tasks, including
face attribute classification, emotion recognition, identification, face
segmentation, and car classification. Our physically disentangled
representations yield higher accuracy than semantically disentangled
alternatives across all tasks and by as much as 18%. We hope that this work
will motivate future research in applying advances in inverse rendering and 3D
understanding to representation learning.
Related papers
- When Does Perceptual Alignment Benefit Vision Representations? [76.32336818860965]
We investigate how aligning vision model representations to human perceptual judgments impacts their usability.
We find that aligning models to perceptual judgments yields representations that improve upon the original backbones across many downstream tasks.
Our results suggest that injecting an inductive bias about human perceptual knowledge into vision models can contribute to better representations.
arXiv Detail & Related papers (2024-10-14T17:59:58Z) - 3D Facial Expressions through Analysis-by-Neural-Synthesis [30.2749903946587]
SMIRK (Spatial Modeling for Image-based Reconstruction of Kinesics) faithfully reconstructs expressive 3D faces from images.
We identify two key limitations in existing methods: shortcomings in their self-supervised training formulation, and a lack of expression diversity in the training images.
Our qualitative, quantitative and particularly our perceptual evaluations demonstrate that SMIRK achieves the new state-of-the art performance on accurate expression reconstruction.
arXiv Detail & Related papers (2024-04-05T14:00:07Z) - Hyperbolic Contrastive Learning for Visual Representations beyond
Objects [30.618032825306187]
We focus on learning representations for objects and scenes that preserve the structure among them.
Motivated by the observation that visually similar objects are close in the representation space, we argue that the scenes and objects should instead follow a hierarchical structure.
arXiv Detail & Related papers (2022-12-01T16:58:57Z) - A domain adaptive deep learning solution for scanpath prediction of
paintings [66.46953851227454]
This paper focuses on the eye-movement analysis of viewers during the visual experience of a certain number of paintings.
We introduce a new approach to predicting human visual attention, which impacts several cognitive functions for humans.
The proposed new architecture ingests images and returns scanpaths, a sequence of points featuring a high likelihood of catching viewers' attention.
arXiv Detail & Related papers (2022-09-22T22:27:08Z) - Unsupervised Part Discovery from Contrastive Reconstruction [90.88501867321573]
The goal of self-supervised visual representation learning is to learn strong, transferable image representations.
We propose an unsupervised approach to object part discovery and segmentation.
Our method yields semantic parts consistent across fine-grained but visually distinct categories.
arXiv Detail & Related papers (2021-11-11T17:59:42Z) - Self-Supervised Representation Learning from Flow Equivariance [97.13056332559526]
We present a new self-supervised learning representation framework that can be directly deployed on a video stream of complex scenes.
Our representations, learned from high-resolution raw video, can be readily used for downstream tasks on static images.
arXiv Detail & Related papers (2021-01-16T23:44:09Z) - Unsupervised Learning Facial Parameter Regressor for Action Unit
Intensity Estimation via Differentiable Renderer [51.926868759681014]
We present a framework to predict the facial parameters based on a bone-driven face model (BDFM) under different views.
The proposed framework consists of a feature extractor, a generator, and a facial parameter regressor.
arXiv Detail & Related papers (2020-08-20T09:49:13Z) - Distilling Localization for Self-Supervised Representation Learning [82.79808902674282]
Contrastive learning has revolutionized unsupervised representation learning.
Current contrastive models are ineffective at localizing the foreground object.
We propose a data-driven approach for learning in variance to backgrounds.
arXiv Detail & Related papers (2020-04-14T16:29:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.