Related papers: RISP: Rendering-Invariant State Predictor with Differentiable Simulation and Rendering for Cross-Domain Parameter Estimation

RISP: Rendering-Invariant State Predictor with Differentiable Simulation and Rendering for Cross-Domain Parameter Estimation

URL: http://arxiv.org/abs/2205.05678v1
Date: Wed, 11 May 2022 17:59:51 GMT
Title: RISP: Rendering-Invariant State Predictor with Differentiable Simulation and Rendering for Cross-Domain Parameter Estimation
Authors: Pingchuan Ma, Tao Du, Joshua B. Tenenbaum, Wojciech Matusik, Chuang Gan
Abstract summary: Existing solutions require massive training data or lack generalizability to unknown rendering configurations. We propose a novel approach that marries domain randomization and differentiable rendering gradients to address this problem. Our approach achieves significantly lower reconstruction errors and has better generalizability among unknown rendering configurations.
Score: 110.4255414234771
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: This work considers identifying parameters characterizing a physical system's dynamic motion directly from a video whose rendering configurations are inaccessible. Existing solutions require massive training data or lack generalizability to unknown rendering configurations. We propose a novel approach that marries domain randomization and differentiable rendering gradients to address this problem. Our core idea is to train a rendering-invariant state-prediction (RISP) network that transforms image differences into state differences independent of rendering configurations, e.g., lighting, shadows, or material reflectance. To train this predictor, we formulate a new loss on rendering variances using gradients from differentiable rendering. Moreover, we present an efficient, second-order method to compute the gradients of this loss, allowing it to be integrated seamlessly into modern deep learning frameworks. We evaluate our method in rigid-body and deformable-body simulation environments using four tasks: state estimation, system identification, imitation learning, and visuomotor control. We further demonstrate the efficacy of our approach on a real-world example: inferring the state and action sequences of a quadrotor from a video of its motion sequences. Compared with existing methods, our approach achieves significantly lower reconstruction errors and has better generalizability among unknown rendering configurations.

Related papers

MIRReS: Multi-bounce Inverse Rendering using Reservoir Sampling [17.435649250309904]
We present MIRReS, a novel two-stage inverse rendering framework. Our method extracts an explicit geometry (triangular mesh) in stage one, and introduces a more realistic physically-based inverse rendering model. Our method effectively estimates indirect illumination, including self-shadowing and internal reflections.
arXiv Detail & Related papers (2024-06-24T07:00:57Z)
Rasterized Edge Gradients: Handling Discontinuities Differentiably [25.85191317712521]
We present a novel method for computing gradients at discontinuities for rendering approximations. Our method elegantly simplifies the traditionally complex problem through a carefully designed approximation strategy. We showcase our method in human head scene reconstruction, demonstrating handling of camera images and segmentation masks.
arXiv Detail & Related papers (2024-05-03T22:42:00Z)
Near-realtime Facial Animation by Deep 3D Simulation Super-Resolution [7.14576106770047]
We present a neural network-based simulation framework that can efficiently and realistically enhance a facial performance produced by a low-cost, realtime physics-based simulation. We use face animation as an exemplar of such a simulation domain, where creating this semantic congruence is achieved by simply dialing in the same muscle actuation controls and skeletal pose in the two simulators. Our proposed neural network super-resolution framework generalizes from this training set to unseen expressions, compensates for modeling discrepancies between the two simulations due to limited resolution or cost-cutting approximations in the real-time variant, and does not require any semantic descriptors or parameters to
arXiv Detail & Related papers (2023-05-05T00:09:24Z)
CbwLoss: Constrained Bidirectional Weighted Loss for Self-supervised Learning of Depth and Pose [13.581694284209885]
Photometric differences are used to train neural networks for estimating depth and camera pose from unlabeled monocular videos. In this paper, we deal with moving objects and occlusions utilizing the difference of the flow fields and depth structure generated by affine transformation and view synthesis. We mitigate the effect of textureless regions on model optimization by measuring differences between features with more semantic and contextual information without adding networks.
arXiv Detail & Related papers (2022-12-12T12:18:24Z)
Differentiable Rendering with Perturbed Optimizers [85.66675707599782]
Reasoning about 3D scenes from their 2D image projections is one of the core problems in computer vision. Our work highlights the link between some well-known differentiable formulations and randomly smoothed renderings. We apply our method to 3D scene reconstruction and demonstrate its advantages on the tasks of 6D pose estimation and 3D mesh reconstruction.
arXiv Detail & Related papers (2021-10-18T08:56:23Z)
Inverting Generative Adversarial Renderer for Face Reconstruction [58.45125455811038]
In this work, we introduce a novel Generative Adversa Renderer (GAR) GAR learns to model the complicated real-world image, instead of relying on the graphics rules, it is capable of producing realistic images. Our method achieves state-of-the-art performances on multiple face reconstruction.
arXiv Detail & Related papers (2021-05-06T04:16:06Z)
gradSim: Differentiable simulation for system identification and visuomotor control [66.37288629125996]
We present gradSim, a framework that overcomes the dependence on 3D supervision by leveraging differentiable multiphysics simulation and differentiable rendering. Our unified graph enables learning in challenging visuomotor control tasks, without relying on state-based (3D) supervision.
arXiv Detail & Related papers (2021-04-06T16:32:01Z)
Efficient and Differentiable Shadow Computation for Inverse Problems [64.70468076488419]
Differentiable geometric computation has received increasing interest for image-based inverse problems. We propose an efficient yet efficient approach for differentiable visibility and soft shadow computation. As our formulation is differentiable, it can be used to solve inverse problems such as texture, illumination, rigid pose, and deformation recovery from images.
arXiv Detail & Related papers (2021-04-01T09:29:05Z)
Deep Variational Network Toward Blind Image Restoration [60.45350399661175]
Blind image restoration is a common yet challenging problem in computer vision. We propose a novel blind image restoration method, aiming to integrate both the advantages of them. Experiments on two typical blind IR tasks, namely image denoising and super-resolution, demonstrate that the proposed method achieves superior performance over current state-of-the-arts.
arXiv Detail & Related papers (2020-08-25T03:30:53Z)
Monocular Real-Time Volumetric Performance Capture [28.481131687883256]
We present the first approach to volumetric performance capture and novel-view rendering at real-time speed from monocular video. Our system reconstructs a fully textured 3D human from each frame by leveraging Pixel-Aligned Implicit Function (PIFu) We also introduce an Online Hard Example Mining (OHEM) technique that effectively suppresses failure modes due to the rare occurrence of challenging examples.
arXiv Detail & Related papers (2020-07-28T04:45:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.