SALISA: Saliency-based Input Sampling for Efficient Video Object
Detection
- URL: http://arxiv.org/abs/2204.02397v1
- Date: Tue, 5 Apr 2022 17:59:51 GMT
- Title: SALISA: Saliency-based Input Sampling for Efficient Video Object
Detection
- Authors: Babak Ehteshami Bejnordi, Amirhossein Habibian, Fatih Porikli, Amir
Ghodrati
- Abstract summary: We propose SALISA, a novel non-uniform SALiency-based Input SAmpling technique for video object detection.
We show that SALISA significantly improves the detection of small objects.
- Score: 58.22508131162269
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: High-resolution images are widely adopted for high-performance object
detection in videos. However, processing high-resolution inputs comes with high
computation costs, and naive down-sampling of the input to reduce the
computation costs quickly degrades the detection performance. In this paper, we
propose SALISA, a novel non-uniform SALiency-based Input SAmpling technique for
video object detection that allows for heavy down-sampling of unimportant
background regions while preserving the fine-grained details of a
high-resolution image. The resulting image is spatially smaller, leading to
reduced computational costs while enabling a performance comparable to a
high-resolution input. To achieve this, we propose a differentiable resampling
module based on a thin plate spline spatial transformer network (TPS-STN). This
module is regularized by a novel loss to provide an explicit supervision signal
to learn to "magnify" salient regions. We report state-of-the-art results in
the low compute regime on the ImageNet-VID and UA-DETRAC video object detection
datasets. We demonstrate that on both datasets, the mAP of an EfficientDet-D1
(EfficientDet-D2) gets on par with EfficientDet-D2 (EfficientDet-D3) at a much
lower computational cost. We also show that SALISA significantly improves the
detection of small objects. In particular, SALISA with an EfficientDet-D1
detector improves the detection of small objects by $77\%$, and remarkably also
outperforms EfficientDetD3 baseline.
Related papers
- ESOD: Efficient Small Object Detection on High-Resolution Images [36.80623357577051]
Small objects are usually sparsely distributed and locally clustered.
Massive feature extraction computations are wasted on the non-target background area of images.
We propose to reuse the detector's backbone to conduct feature-level object-seeking and patch-slicing.
arXiv Detail & Related papers (2024-07-23T12:21:23Z) - DM3D: Distortion-Minimized Weight Pruning for Lossless 3D Object Detection [42.07920565812081]
We propose a novel post-training weight pruning scheme for 3D object detection.
It determines redundant parameters in the pretrained model that lead to minimal distortion in both locality and confidence.
This framework aims to minimize detection distortion of network output to maximally maintain detection precision.
arXiv Detail & Related papers (2024-07-02T09:33:32Z) - 3D Small Object Detection with Dynamic Spatial Pruning [62.72638845817799]
We propose an efficient feature pruning strategy for 3D small object detection.
We present a multi-level 3D detector named DSPDet3D which benefits from high spatial resolution.
It takes less than 2s to directly process a whole building consisting of more than 4500k points while detecting out almost all objects.
arXiv Detail & Related papers (2023-05-05T17:57:04Z) - Efficient Decoder-free Object Detection with Transformers [75.00499377197475]
Vision transformers (ViTs) are changing the landscape of object detection approaches.
We propose a decoder-free fully transformer-based (DFFT) object detector.
DFFT_SMALL achieves high efficiency in both training and inference stages.
arXiv Detail & Related papers (2022-06-14T13:22:19Z) - TRACER: Extreme Attention Guided Salient Object Tracing Network [3.2434811678562676]
We propose TRACER, which detects salient objects with explicit edges by incorporating attention guided tracing modules.
A comparison with 13 existing methods reveals that TRACER achieves state-of-the-art performance on five benchmark datasets.
arXiv Detail & Related papers (2021-12-14T13:20:07Z) - QueryDet: Cascaded Sparse Query for Accelerating High-Resolution Small
Object Detection [17.775203579232144]
We propose a novel query mechanism to accelerate the inference speed of feature-pyramid based object detectors.
The pipeline first predicts the coarse locations of small objects on low-resolution features and then computes the accurate detection results using high-resolution features.
On the popular COCO dataset, the proposed method improves the detection mAP by 1.0 and mAP-small by 2.0, and the high-resolution inference speed is improved to 3.0x on average.
arXiv Detail & Related papers (2021-03-16T15:30:20Z) - PLUME: Efficient 3D Object Detection from Stereo Images [95.31278688164646]
Existing methods tackle the problem in two steps: first depth estimation is performed, a pseudo LiDAR point cloud representation is computed from the depth estimates, and then object detection is performed in 3D space.
We propose a model that unifies these two tasks in the same metric space.
Our approach achieves state-of-the-art performance on the challenging KITTI benchmark, with significantly reduced inference time compared with existing methods.
arXiv Detail & Related papers (2021-01-17T05:11:38Z) - Reinforced Axial Refinement Network for Monocular 3D Object Detection [160.34246529816085]
Monocular 3D object detection aims to extract the 3D position and properties of objects from a 2D input image.
Conventional approaches sample 3D bounding boxes from the space and infer the relationship between the target object and each of them, however, the probability of effective samples is relatively small in the 3D space.
We propose to start with an initial prediction and refine it gradually towards the ground truth, with only one 3d parameter changed in each step.
This requires designing a policy which gets a reward after several steps, and thus we adopt reinforcement learning to optimize it.
arXiv Detail & Related papers (2020-08-31T17:10:48Z) - Spatial-Spectral Residual Network for Hyperspectral Image
Super-Resolution [82.1739023587565]
We propose a novel spectral-spatial residual network for hyperspectral image super-resolution (SSRNet)
Our method can effectively explore spatial-spectral information by using 3D convolution instead of 2D convolution, which enables the network to better extract potential information.
In each unit, we employ spatial and temporal separable 3D convolution to extract spatial and spectral information, which not only reduces unaffordable memory usage and high computational cost, but also makes the network easier to train.
arXiv Detail & Related papers (2020-01-14T03:34:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.