GALA: Toward Geometry-and-Lighting-Aware Object Search for Compositing
- URL: http://arxiv.org/abs/2204.00125v1
- Date: Thu, 31 Mar 2022 22:36:08 GMT
- Title: GALA: Toward Geometry-and-Lighting-Aware Object Search for Compositing
- Authors: Sijie Zhu, Zhe Lin, Scott Cohen, Jason Kuen, Zhifei Zhang, Chen Chen
- Abstract summary: GALA is a generic foreground object search method with discriminative modeling on geometry and lighting compatibility.
It generalizes well on large-scale open-world datasets, i.e. Pixabay and Open Images.
In addition, our method can effectively handle non-box scenarios, where users only provide background images without any input bounding box.
- Score: 43.14411954867784
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Compositing-aware object search aims to find the most compatible objects for
compositing given a background image and a query bounding box. Previous works
focus on learning compatibility between the foreground object and background,
but fail to learn other important factors from large-scale data, i.e. geometry
and lighting. To move a step further, this paper proposes GALA
(Geometry-and-Lighting-Aware), a generic foreground object search method with
discriminative modeling on geometry and lighting compatibility for open-world
image compositing. Remarkably, it achieves state-of-the-art results on the CAIS
dataset and generalizes well on large-scale open-world datasets, i.e. Pixabay
and Open Images. In addition, our method can effectively handle non-box
scenarios, where users only provide background images without any input
bounding box. A web demo (see supplementary materials) is built to showcase
applications of the proposed method for compositing-aware search and automatic
location/scale prediction for the foreground object.
Related papers
- RMAFF-PSN: A Residual Multi-Scale Attention Feature Fusion Photometric Stereo Network [37.759675702107586]
Predicting accurate maps of objects from two-dimensional images in regions of complex structure spatial material variations is challenging.
We propose a method of calibrated feature information from different resolution stages and scales of the image.
This approach preserves more physical information, such as texture and geometry of the object in complex regions.
arXiv Detail & Related papers (2024-04-11T14:05:37Z) - TopNet: Transformer-based Object Placement Network for Image Compositing [43.14411954867784]
Local clues in background images are important to determine the compatibility of placing objects with certain locations/scales.
We propose to learn the correlation between object features and all local background features with a transformer module.
Our new formulation generates a 3D heatmap indicating the plausibility of all location/scale combinations in one network forward pass.
arXiv Detail & Related papers (2023-04-06T20:58:49Z) - Designing An Illumination-Aware Network for Deep Image Relighting [69.750906769976]
We present an Illumination-Aware Network (IAN) which follows the guidance from hierarchical sampling to progressively relight a scene from a single image.
In addition, an Illumination-Aware Residual Block (IARB) is designed to approximate the physical rendering process.
Experimental results show that our proposed method produces better quantitative and qualitative relighting results than previous state-of-the-art methods.
arXiv Detail & Related papers (2022-07-21T16:21:24Z) - Towards High-Fidelity Single-view Holistic Reconstruction of Indoor
Scenes [50.317223783035075]
We present a new framework to reconstruct holistic 3D indoor scenes from single-view images.
We propose an instance-aligned implicit function (InstPIFu) for detailed object reconstruction.
Our code and model will be made publicly available.
arXiv Detail & Related papers (2022-07-18T14:54:57Z) - NeROIC: Neural Rendering of Objects from Online Image Collections [42.02832046768925]
We present a novel method to acquire object representations from online image collections, capturing high-quality geometry and material properties of arbitrary objects.
This enables various object-centric rendering applications such as novel-view synthesis, relighting, and harmonized background composition.
arXiv Detail & Related papers (2022-01-07T16:45:15Z) - Unbiased IoU for Spherical Image Object Detection [45.17996641893818]
We first identify that spherical rectangles are unbiased bounding boxes for objects in spherical images, and then propose an analytical method for IoU calculation without any approximations.
Based on the unbiased representation and calculation, we also present an anchor free object detection algorithm for spherical images.
arXiv Detail & Related papers (2021-08-18T08:18:37Z) - Scene Inference for Object Illumination Editing [24.529871334658573]
We apply a physically-based rendering method to create a large-scale, high-quality dataset, named IH dataset.
We also propose a deep learning-based SI-GAN method, a multi-task collaborative network, to edit object illumination.
Our proposed SI-GAN provides a practical and effective solution for image-based object illumination editing, and validate the superiority of our method against state-of-the-art methods.
arXiv Detail & Related papers (2021-07-31T05:02:52Z) - DONet: Learning Category-Level 6D Object Pose and Size Estimation from
Depth Observation [53.55300278592281]
We propose a method of Category-level 6D Object Pose and Size Estimation (COPSE) from a single depth image.
Our framework makes inferences based on the rich geometric information of the object in the depth channel alone.
Our framework competes with state-of-the-art approaches that require labeled real-world images.
arXiv Detail & Related papers (2021-06-27T10:41:50Z) - Refer-it-in-RGBD: A Bottom-up Approach for 3D Visual Grounding in RGBD
Images [69.5662419067878]
Grounding referring expressions in RGBD image has been an emerging field.
We present a novel task of 3D visual grounding in single-view RGBD image where the referred objects are often only partially scanned due to occlusion.
Our approach first fuses the language and the visual features at the bottom level to generate a heatmap that localizes the relevant regions in the RGBD image.
Then our approach conducts an adaptive feature learning based on the heatmap and performs the object-level matching with another visio-linguistic fusion to finally ground the referred object.
arXiv Detail & Related papers (2021-03-14T11:18:50Z) - Mask2CAD: 3D Shape Prediction by Learning to Segment and Retrieve [54.054575408582565]
We propose to leverage existing large-scale datasets of 3D models to understand the underlying 3D structure of objects seen in an image.
We present Mask2CAD, which jointly detects objects in real-world images and for each detected object, optimize for the most similar CAD model and its pose.
This produces a clean, lightweight representation of the objects in an image.
arXiv Detail & Related papers (2020-07-26T00:08:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.