Related papers: Illumination Normalization by Partially Impossible Encoder-Decoder Cost Function

Illumination Normalization by Partially Impossible Encoder-Decoder Cost Function

URL: http://arxiv.org/abs/2011.03428v2
Date: Mon, 9 Nov 2020 15:43:42 GMT
Title: Illumination Normalization by Partially Impossible Encoder-Decoder Cost Function
Authors: Steve Dias Da Cruz, Bertram Taetz, Thomas Stifter, Didier Stricker
Abstract summary: We introduce a new strategy for the cost function formulation of encoder-decoder networks to average out all the unimportant information in the input images. Our method exploits the availability of identical sceneries under different illumination and environmental conditions. Its applicability is assessed on three publicly available datasets.
Score: 13.618797548020462
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Images recorded during the lifetime of computer vision based systems undergo a wide range of illumination and environmental conditions affecting the reliability of previously trained machine learning models. Image normalization is hence a valuable preprocessing component to enhance the models' robustness. To this end, we introduce a new strategy for the cost function formulation of encoder-decoder networks to average out all the unimportant information in the input images (e.g. environmental features and illumination changes) to focus on the reconstruction of the salient features (e.g. class instances). Our method exploits the availability of identical sceneries under different illumination and environmental conditions for which we formulate a partially impossible reconstruction target: the input image will not convey enough information to reconstruct the target in its entirety. Its applicability is assessed on three publicly available datasets. We combine the triplet loss as a regularizer in the latent space representation and a nearest neighbour search to improve the generalization to unseen illuminations and class instances. The importance of the aforementioned post-processing is highlighted on an automotive application. To this end, we release a synthetic dataset of sceneries from three different passenger compartments where each scenery is rendered under ten different illumination and environmental conditions: see https://sviro.kl.dfki.de

Related papers

A Survey of 3D Reconstruction with Event Cameras [16.103940503726022]
Event cameras produce sparse yet temporally dense data streams, enabling robust and accurate 3D reconstruction.<n>These capabilities offer substantial promise for transformative applications across various fields, including autonomous driving, robotics, aerial navigation, and immersive virtual reality.<n>This survey aims to serve as an essential reference and provides a clear and motivating roadmap toward advancing the state of the art in event-driven 3D reconstruction.
arXiv Detail & Related papers (2025-05-13T11:04:04Z)
Drive-1-to-3: Enriching Diffusion Priors for Novel View Synthesis of Real Vehicles [81.29018359825872]
This paper consolidates a set of good practices to finetune large pretrained models for a real-world task. Specifically, we develop several strategies to account for discrepancies between the synthetic data and real driving data. Our insights lead to effective finetuning that results in a $68.8%$ reduction in FID for novel view synthesis over prior arts.
arXiv Detail & Related papers (2024-12-19T03:39:13Z)
IDArb: Intrinsic Decomposition for Arbitrary Number of Input Views and Illuminations [64.07859467542664]
Capturing geometric and material information from images remains a fundamental challenge in computer vision and graphics. Traditional optimization-based methods often require hours of computational time to reconstruct geometry, material properties, and environmental lighting from dense multi-view inputs. We introduce IDArb, a diffusion-based model designed to perform intrinsic decomposition on an arbitrary number of images under varying illuminations.
arXiv Detail & Related papers (2024-12-16T18:52:56Z)
KRONC: Keypoint-based Robust Camera Optimization for 3D Car Reconstruction [58.04846444985808]
This paper introduces KRONC, a novel approach aimed at inferring view poses by leveraging prior knowledge about the object to reconstruct and its representation through semantic keypoints. With a focus on vehicle scenes, KRONC is able to estimate the position of the views as a solution to a light optimization problem targeting the convergence of keypoints' back-projections to a singular point.
arXiv Detail & Related papers (2024-09-09T08:08:05Z)
UpFusion: Novel View Diffusion from Unposed Sparse View Observations [66.36092764694502]
UpFusion can perform novel view synthesis and infer 3D representations for an object given a sparse set of reference images. We show that this mechanism allows generating high-fidelity novel views while improving the synthesis quality given additional (unposed) images.
arXiv Detail & Related papers (2023-12-11T18:59:55Z)
CarPatch: A Synthetic Benchmark for Radiance Field Evaluation on Vehicle Components [77.33782775860028]
We introduce CarPatch, a novel synthetic benchmark of vehicles. In addition to a set of images annotated with their intrinsic and extrinsic camera parameters, the corresponding depth maps and semantic segmentation masks have been generated for each view. Global and part-based metrics have been defined and used to evaluate, compare, and better characterize some state-of-the-art techniques.
arXiv Detail & Related papers (2023-07-24T11:59:07Z)
MS-PS: A Multi-Scale Network for Photometric Stereo With a New Comprehensive Training Dataset [0.0]
Photometric stereo (PS) problem consists in reconstructing the 3D-surface of an object. We propose a multi-scale architecture for PS which, combined with a new dataset, yields state-of-the-art results.
arXiv Detail & Related papers (2022-11-25T14:01:54Z)
Shape, Pose, and Appearance from a Single Image via Bootstrapped Radiance Field Inversion [54.151979979158085]
We introduce a principled end-to-end reconstruction framework for natural images, where accurate ground-truth poses are not available. We leverage an unconditional 3D-aware generator, to which we apply a hybrid inversion scheme where a model produces a first guess of the solution. Our framework can de-render an image in as few as 10 steps, enabling its use in practical scenarios.
arXiv Detail & Related papers (2022-11-21T17:42:42Z)
Neural Radiance Transfer Fields for Relightable Novel-view Synthesis with Global Illumination [63.992213016011235]
We propose a method for scene relighting under novel views by learning a neural precomputed radiance transfer function. Our method can be solely supervised on a set of real images of the scene under a single unknown lighting condition. Results show that the recovered disentanglement of scene parameters improves significantly over the current state of the art.
arXiv Detail & Related papers (2022-07-27T16:07:48Z)
Spatio-Temporal Outdoor Lighting Aggregation on Image Sequences using Transformer Networks [23.6427456783115]
In this work, we focus on outdoor lighting estimation by aggregating individual noisy estimates from images. Recent work based on deep neural networks has shown promising results for single image lighting estimation, but suffers from robustness. We tackle this problem by combining lighting estimates from several image views sampled in the angular and temporal domain of an image sequence.
arXiv Detail & Related papers (2022-02-18T14:11:16Z)
Learning Efficient Photometric Feature Transform for Multi-view Stereo [37.26574529243778]
We learn to convert the perpixel photometric information at each view into spatially distinctive and view-invariant low-level features. Our framework automatically adapts to and makes efficient use of the geometric information available in different forms of input data.
arXiv Detail & Related papers (2021-03-27T02:53:15Z)
Scene relighting with illumination estimation in the latent space on an encoder-decoder scheme [68.8204255655161]
In this report we present methods that we tried to achieve that goal. Our models are trained on a rendered dataset of artificial locations with varied scene content, light source location and color temperature. With this dataset, we used a network with illumination estimation component aiming to infer and replace light conditions in the latent space representation of the concerned scenes.
arXiv Detail & Related papers (2020-06-03T15:25:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.