Scale Normalized Image Pyramids with AutoFocus for Object Detection
- URL: http://arxiv.org/abs/2102.05646v1
- Date: Wed, 10 Feb 2021 18:57:53 GMT
- Title: Scale Normalized Image Pyramids with AutoFocus for Object Detection
- Authors: Bharat Singh, Mahyar Najibi, Abhishek Sharma and Larry S. Davis
- Abstract summary: A scale normalized image pyramid (SNIP) is generated that, like human vision, only attends to objects within a fixed size range at different scales.
We propose an efficient spatial sub-sampling scheme which only operates on fixed-size sub-regions likely to contain objects.
The resulting algorithm is referred to as AutoFocus and results in a 2.5-5 times speed-up during inference when used with SNIP.
- Score: 75.71320993452372
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present an efficient foveal framework to perform object detection. A scale
normalized image pyramid (SNIP) is generated that, like human vision, only
attends to objects within a fixed size range at different scales. Such a
restriction of objects' size during training affords better learning of
object-sensitive filters, and therefore, results in better accuracy. However,
the use of an image pyramid increases the computational cost. Hence, we propose
an efficient spatial sub-sampling scheme which only operates on fixed-size
sub-regions likely to contain objects (as object locations are known during
training). The resulting approach, referred to as Scale Normalized Image
Pyramid with Efficient Resampling or SNIPER, yields up to 3 times speed-up
during training. Unfortunately, as object locations are unknown during
inference, the entire image pyramid still needs processing. To this end, we
adopt a coarse-to-fine approach, and predict the locations and extent of
object-like regions which will be processed in successive scales of the image
pyramid. Intuitively, it's akin to our active human-vision that first skims
over the field-of-view to spot interesting regions for further processing and
only recognizes objects at the right resolution. The resulting algorithm is
referred to as AutoFocus and results in a 2.5-5 times speed-up during inference
when used with SNIP.
Related papers
- Temporal Lidar Depth Completion [0.08192907805418582]
We show how a state-of-the-art method PENet can be modified to benefit from recurrency.
Our algorithm achieves state-of-the-art results on the KITTI depth completion dataset.
arXiv Detail & Related papers (2024-06-17T08:25:31Z) - LocaliseBot: Multi-view 3D object localisation with differentiable
rendering for robot grasping [9.690844449175948]
We focus on object pose estimation.
Our approach relies on three pieces of information: multiple views of the object, the camera's parameters at those viewpoints, and 3D CAD models of objects.
We show that the estimated object pose results in 99.65% grasp accuracy with the ground truth grasp candidates.
arXiv Detail & Related papers (2023-11-14T14:27:53Z) - Diff-DOPE: Differentiable Deep Object Pose Estimation [29.703385848843414]
We introduce Diff-DOPE, a 6-DoF pose refiner that takes as input an image, a 3D textured model of an object, and an initial pose of the object.
The method uses differentiable rendering to update the object pose to minimize the visual error between the image and the projection of the model.
We show that this simple, yet effective, idea is able to achieve state-of-the-art results on pose estimation datasets.
arXiv Detail & Related papers (2023-09-30T18:52:57Z) - MV-ROPE: Multi-view Constraints for Robust Category-level Object Pose and Size Estimation [23.615122326731115]
We propose a novel solution that makes use of RGB video streams.
Our framework consists of three modules: a scale-aware monocular dense SLAM solution, a lightweight object pose predictor, and an object-level pose graph.
Our experimental results demonstrate that when utilizing public dataset sequences with high-quality depth information, the proposed method exhibits comparable performance to state-of-the-art RGB-D methods.
arXiv Detail & Related papers (2023-08-17T08:29:54Z) - Learning to segment from object sizes [0.0]
We propose an algorithm for training a deep segmentation network from a dataset of a few pixel-wise annotated images and many images with known object sizes.
The algorithm minimizes a discrete (non-differentiable) loss function defined over the object sizes by sampling the gradient and then using the standard back-propagation algorithm.
arXiv Detail & Related papers (2022-07-01T09:34:44Z) - Unseen Object 6D Pose Estimation: A Benchmark and Baselines [62.8809734237213]
We propose a new task that enables and facilitates algorithms to estimate the 6D pose estimation of novel objects during testing.
We collect a dataset with both real and synthetic images and up to 48 unseen objects in the test set.
By training an end-to-end 3D correspondences network, our method finds corresponding points between an unseen object and a partial view RGBD image accurately and efficiently.
arXiv Detail & Related papers (2022-06-23T16:29:53Z) - ZebraPose: Coarse to Fine Surface Encoding for 6DoF Object Pose
Estimation [76.31125154523056]
We present a discrete descriptor, which can represent the object surface densely.
We also propose a coarse to fine training strategy, which enables fine-grained correspondence prediction.
arXiv Detail & Related papers (2022-03-17T16:16:24Z) - Category-Level Metric Scale Object Shape and Pose Estimation [73.92460712829188]
We propose a framework that jointly estimates a metric scale shape and pose from a single RGB image.
We validated our method on both synthetic and real-world datasets to evaluate category-level object pose and shape.
arXiv Detail & Related papers (2021-09-01T12:16:46Z) - A Simple and Effective Use of Object-Centric Images for Long-Tailed
Object Detection [56.82077636126353]
We take advantage of object-centric images to improve object detection in scene-centric images.
We present a simple yet surprisingly effective framework to do so.
Our approach can improve the object detection (and instance segmentation) accuracy of rare objects by 50% (and 33%) relatively.
arXiv Detail & Related papers (2021-02-17T17:27:21Z) - Supervised Training of Dense Object Nets using Optimal Descriptors for
Industrial Robotic Applications [57.87136703404356]
Dense Object Nets (DONs) by Florence, Manuelli and Tedrake introduced dense object descriptors as a novel visual object representation for the robotics community.
In this paper we show that given a 3D model of an object, we can generate its descriptor space image, which allows for supervised training of DONs.
We compare the training methods on generating 6D grasps for industrial objects and show that our novel supervised training approach improves the pick-and-place performance in industry-relevant tasks.
arXiv Detail & Related papers (2021-02-16T11:40:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.