Modality-Buffet for Real-Time Object Detection
- URL: http://arxiv.org/abs/2011.08726v1
- Date: Tue, 17 Nov 2020 15:57:06 GMT
- Title: Modality-Buffet for Real-Time Object Detection
- Authors: Nicolai Dorka, Johannes Meyer, Wolfram Burgard
- Abstract summary: Real-time object detection in videos using lightweight hardware is a crucial component of many robotic tasks.
One option is to have a very lightweight model that can predict from all modalities at once for each frame.
We formulate this task as a sequential decision making problem and use reinforcement learning (RL) to generate a policy that decides from the RGB input which detector out of a portfolio of different object detectors to take for the next prediction.
- Score: 25.89199578900324
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Real-time object detection in videos using lightweight hardware is a crucial
component of many robotic tasks. Detectors using different modalities and with
varying computational complexities offer different trade-offs. One option is to
have a very lightweight model that can predict from all modalities at once for
each frame. However, in some situations (e.g., in static scenes) it might be
better to have a more complex but more accurate model and to extrapolate from
previous predictions for the frames coming in at processing time. We formulate
this task as a sequential decision making problem and use reinforcement
learning (RL) to generate a policy that decides from the RGB input which
detector out of a portfolio of different object detectors to take for the next
prediction. The objective of the RL agent is to maximize the accuracy of the
predictions per image. We evaluate the approach on the Waymo Open Dataset and
show that it exceeds the performance of each single detector.
Related papers
- CorrDiff: Adaptive Delay-aware Detector with Temporal Cue Inputs for Real-time Object Detection [11.714072240331518]
CorrDiff is designed to tackle the challenge of delays in real-time detection systems.
It is able to utilize runtime-estimated temporal cues to predict objects' locations for multiple future frames.
It meets the stringent real-time processing requirements on all kinds of devices.
arXiv Detail & Related papers (2025-01-09T10:34:25Z) - Practical Video Object Detection via Feature Selection and Aggregation [18.15061460125668]
Video object detection (VOD) needs to concern the high across-frame variation in object appearance, and the diverse deterioration in some frames.
Most of contemporary aggregation methods are tailored for two-stage detectors, suffering from high computational costs.
This study invents a very simple yet potent strategy of feature selection and aggregation, gaining significant accuracy at marginal computational expense.
arXiv Detail & Related papers (2024-07-29T02:12:11Z) - XTrack: Multimodal Training Boosts RGB-X Video Object Trackers [88.72203975896558]
It is crucial to ensure that knowledge gained from multimodal sensing is effectively shared.
Similar samples across different modalities have more knowledge to share than otherwise.
We propose a method for RGB-X tracker during inference, with an average +3% precision improvement over the current SOTA.
arXiv Detail & Related papers (2024-05-28T03:00:58Z) - PoIFusion: Multi-Modal 3D Object Detection via Fusion at Points of Interest [65.48057241587398]
PoIFusion is a framework to fuse information of RGB images and LiDAR point clouds at the points of interest (PoIs)
Our approach maintains the view of each modality and obtains multi-modal features by computation-friendly projection and computation.
We conducted extensive experiments on nuScenes and Argoverse2 datasets to evaluate our approach.
arXiv Detail & Related papers (2024-03-14T09:28:12Z) - DAMSDet: Dynamic Adaptive Multispectral Detection Transformer with
Competitive Query Selection and Adaptive Feature Fusion [82.2425759608975]
Infrared-visible object detection aims to achieve robust even full-day object detection by fusing the complementary information of infrared and visible images.
We propose a Dynamic Adaptive Multispectral Detection Transformer (DAMSDet) to address these two challenges.
Experiments on four public datasets demonstrate significant improvements compared to other state-of-the-art methods.
arXiv Detail & Related papers (2024-03-01T07:03:27Z) - Bi-directional Adapter for Multi-modal Tracking [67.01179868400229]
We propose a novel multi-modal visual prompt tracking model based on a universal bi-directional adapter.
We develop a simple but effective light feature adapter to transfer modality-specific information from one modality to another.
Our model achieves superior tracking performance in comparison with both the full fine-tuning methods and the prompt learning-based methods.
arXiv Detail & Related papers (2023-12-17T05:27:31Z) - Identifying Light-curve Signals with a Deep Learning Based Object
Detection Algorithm. II. A General Light Curve Classification Framework [0.0]
We present a novel deep learning framework for classifying light curves using a weakly supervised object detection model.
Our framework identifies the optimal windows for both light curves and power spectra automatically, and zooms in on their corresponding data.
We train our model on datasets obtained from both space-based and ground-based multi-band observations of variable stars and transients.
arXiv Detail & Related papers (2023-11-14T11:08:34Z) - 3D Video Object Detection with Learnable Object-Centric Global
Optimization [65.68977894460222]
Correspondence-based optimization is the cornerstone for 3D scene reconstruction but is less studied in 3D video object detection.
We propose BA-Det, an end-to-end optimizable object detector with object-centric temporal correspondence learning and featuremetric object bundle adjustment.
arXiv Detail & Related papers (2023-03-27T17:39:39Z) - 2nd Place Solution for Waymo Open Dataset Challenge - Real-time 2D
Object Detection [26.086623067939605]
In this report, we introduce a real-time method to detect the 2D objects from images.
We leverage accelerationRT to optimize the inference time of our detection pipeline.
Our framework achieves the latency of 45.8ms/frame on an Nvidia Tesla V100 GPU.
arXiv Detail & Related papers (2021-06-16T11:32:03Z) - RMOPP: Robust Multi-Objective Post-Processing for Effective Object
Detection [0.0]
RMOPP is a statistically driven, post-processing algorithm that allows for simultaneous optimization of precision and recall.
We provide a compelling test case on YOLOv2 using the MS-COCO dataset.
arXiv Detail & Related papers (2021-02-09T00:02:38Z) - End-to-End Object Detection with Transformers [88.06357745922716]
We present a new method that views object detection as a direct set prediction problem.
Our approach streamlines the detection pipeline, effectively removing the need for many hand-designed components.
The main ingredients of the new framework, called DEtection TRansformer or DETR, are a set-based global loss.
arXiv Detail & Related papers (2020-05-26T17:06:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.