Learning to Zoom and Unzoom
- URL: http://arxiv.org/abs/2303.15390v1
- Date: Mon, 27 Mar 2023 17:03:30 GMT
- Title: Learning to Zoom and Unzoom
- Authors: Chittesh Thavamani, Mengtian Li, Francesco Ferroni, Deva Ramanan
- Abstract summary: We "learn to zoom" in on the input image, compute spatial features, and then "unzoom" to revert any deformations.
We demonstrate this versatility by evaluating on a variety of tasks and datasets.
- Score: 49.587516562644836
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Many perception systems in mobile computing, autonomous navigation, and AR/VR
face strict compute constraints that are particularly challenging for
high-resolution input images. Previous works propose nonuniform downsamplers
that "learn to zoom" on salient image regions, reducing compute while retaining
task-relevant image information. However, for tasks with spatial labels (such
as 2D/3D object detection and semantic segmentation), such distortions may harm
performance. In this work (LZU), we "learn to zoom" in on the input image,
compute spatial features, and then "unzoom" to revert any deformations. To
enable efficient and differentiable unzooming, we approximate the zooming warp
with a piecewise bilinear mapping that is invertible. LZU can be applied to any
task with 2D spatial input and any model with 2D spatial features, and we
demonstrate this versatility by evaluating on a variety of tasks and datasets:
object detection on Argoverse-HD, semantic segmentation on Cityscapes, and
monocular 3D object detection on nuScenes. Interestingly, we observe boosts in
performance even when high-resolution sensor data is unavailable, implying that
LZU can be used to "learn to upsample" as well.
Related papers
- Implicit-Zoo: A Large-Scale Dataset of Neural Implicit Functions for 2D Images and 3D Scenes [65.22070581594426]
"Implicit-Zoo" is a large-scale dataset requiring thousands of GPU training days to facilitate research and development in this field.
We showcase two immediate benefits as it enables to: (1) learn token locations for transformer models; (2) directly regress 3D cameras poses of 2D images with respect to NeRF models.
This in turn leads to an improved performance in all three task of image classification, semantic segmentation, and 3D pose regression, thereby unlocking new avenues for research.
arXiv Detail & Related papers (2024-06-25T10:20:44Z) - Dual-Camera Smooth Zoom on Mobile Phones [55.4114152554769]
We introduce a new task, ie, dual-camera smooth zoom (DCSZ) to achieve a smooth zoom preview.
The frame models (FI) technique is a potential solution but struggles with ground-truth collection.
We suggest a data factory solution where continuous virtual cameras are assembled to generate DCSZ data by rendering reconstructed 3D models of the scene.
arXiv Detail & Related papers (2024-04-07T10:28:01Z) - Perspective-aware Convolution for Monocular 3D Object Detection [2.33877878310217]
We propose a novel perspective-aware convolutional layer that captures long-range dependencies in images.
By enforcing convolutional kernels to extract features along the depth axis of every image pixel, we incorporates perspective information into network architecture.
We demonstrate improved performance on the KITTI3D dataset, achieving a 23.9% average precision in the easy benchmark.
arXiv Detail & Related papers (2023-08-24T17:25:36Z) - Parametric Depth Based Feature Representation Learning for Object
Detection and Segmentation in Bird's Eye View [44.78243406441798]
This paper focuses on leveraging geometry information, such as depth, to model such feature transformation.
We first lift the 2D image features to the 3D space defined for the ego vehicle via a predicted parametric depth distribution for each pixel in each view.
We then aggregate the 3D feature volume based on the 3D space occupancy derived from depth to the BEV frame.
arXiv Detail & Related papers (2023-07-09T06:07:22Z) - Aug3D-RPN: Improving Monocular 3D Object Detection by Synthetic Images
with Virtual Depth [64.29043589521308]
We propose a rendering module to augment the training data by synthesizing images with virtual-depths.
The rendering module takes as input the RGB image and its corresponding sparse depth image, outputs a variety of photo-realistic synthetic images.
Besides, we introduce an auxiliary module to improve the detection model by jointly optimizing it through a depth estimation task.
arXiv Detail & Related papers (2021-07-28T11:00:47Z) - VR3Dense: Voxel Representation Learning for 3D Object Detection and
Monocular Dense Depth Reconstruction [0.951828574518325]
We introduce a method for jointly training 3D object detection and monocular dense depth reconstruction neural networks.
It takes as inputs, a LiDAR point-cloud, and a single RGB image during inference and produces object pose predictions as well as a densely reconstructed depth map.
While our object detection is trained in a supervised manner, the depth prediction network is trained with both self-supervised and supervised loss functions.
arXiv Detail & Related papers (2021-04-13T04:25:54Z) - Lightweight Multi-View 3D Pose Estimation through Camera-Disentangled
Representation [57.11299763566534]
We present a solution to recover 3D pose from multi-view images captured with spatially calibrated cameras.
We exploit 3D geometry to fuse input images into a unified latent representation of pose, which is disentangled from camera view-points.
Our architecture then conditions the learned representation on camera projection operators to produce accurate per-view 2d detections.
arXiv Detail & Related papers (2020-04-05T12:52:29Z) - ZoomNet: Part-Aware Adaptive Zooming Neural Network for 3D Object
Detection [69.68263074432224]
We present a novel framework named ZoomNet for stereo imagery-based 3D detection.
The pipeline of ZoomNet begins with an ordinary 2D object detection model which is used to obtain pairs of left-right bounding boxes.
To further exploit the abundant texture cues in RGB images for more accurate disparity estimation, we introduce a conceptually straight-forward module -- adaptive zooming.
arXiv Detail & Related papers (2020-03-01T17:18:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.