Diffusion-Based Particle-DETR for BEV Perception
- URL: http://arxiv.org/abs/2312.11578v1
- Date: Mon, 18 Dec 2023 09:52:14 GMT
- Title: Diffusion-Based Particle-DETR for BEV Perception
- Authors: Asen Nachkov, Martin Danelljan, Danda Pani Paudel, Luc Van Gool
- Abstract summary: Bird-Eye-View (BEV) is one of the most widely-used scene representations for visual perception in Autonomous Vehicles (AVs)
Recent diffusion-based methods offer a promising approach to uncertainty modeling for visual perception but fail to effectively detect small objects in the large coverage of the BEV.
Here, we address this problem by combining the diffusion paradigm with current state-of-the-art 3D object detectors in BEV.
- Score: 94.88305708174796
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The Bird-Eye-View (BEV) is one of the most widely-used scene representations
for visual perception in Autonomous Vehicles (AVs) due to its well suited
compatibility to downstream tasks. For the enhanced safety of AVs, modeling
perception uncertainty in BEV is crucial. Recent diffusion-based methods offer
a promising approach to uncertainty modeling for visual perception but fail to
effectively detect small objects in the large coverage of the BEV. Such
degradation of performance can be attributed primarily to the specific network
architectures and the matching strategy used when training. Here, we address
this problem by combining the diffusion paradigm with current state-of-the-art
3D object detectors in BEV. We analyze the unique challenges of this approach,
which do not exist with deterministic detectors, and present a simple technique
based on object query interpolation that allows the model to learn positional
dependencies even in the presence of the diffusion noise. Based on this, we
present a diffusion-based DETR model for object detection that bears
similarities to particle methods. Abundant experimentation on the NuScenes
dataset shows equal or better performance for our generative approach, compared
to deterministic state-of-the-art methods. Our source code will be made
publicly available.
Related papers
- Open-Set Deepfake Detection: A Parameter-Efficient Adaptation Method with Forgery Style Mixture [58.60915132222421]
We introduce an approach that is both general and parameter-efficient for face forgery detection.
We design a forgery-style mixture formulation that augments the diversity of forgery source domains.
We show that the designed model achieves state-of-the-art generalizability with significantly reduced trainable parameters.
arXiv Detail & Related papers (2024-08-23T01:53:36Z) - FSD-BEV: Foreground Self-Distillation for Multi-view 3D Object Detection [33.225938984092274]
We propose a Foreground Self-Distillation (FSD) scheme that effectively avoids the issue of distribution discrepancies.
We also design two Point Cloud Intensification ( PCI) strategies to compensate for the sparsity of point clouds.
We develop a Multi-Scale Foreground Enhancement (MSFE) module to extract and fuse multi-scale foreground features.
arXiv Detail & Related papers (2024-07-14T09:39:44Z) - Instance-aware Multi-Camera 3D Object Detection with Structural Priors
Mining and Self-Boosting Learning [93.71280187657831]
Camera-based bird-eye-view (BEV) perception paradigm has made significant progress in the autonomous driving field.
We propose IA-BEV, which integrates image-plane instance awareness into the depth estimation process within a BEV-based detector.
arXiv Detail & Related papers (2023-12-13T09:24:42Z) - Diffusion-based 3D Object Detection with Random Boxes [58.43022365393569]
Existing anchor-based 3D detection methods rely on empiricals setting of anchors, which makes the algorithms lack elegance.
Our proposed Diff3Det migrates the diffusion model to proposal generation for 3D object detection by considering the detection boxes as generative targets.
In the inference stage, the model progressively refines a set of random boxes to the prediction results.
arXiv Detail & Related papers (2023-09-05T08:49:53Z) - Unsupervised Video Anomaly Detection with Diffusion Models Conditioned
on Compact Motion Representations [17.816344808780965]
unsupervised video anomaly detection (VAD) problem involves classifying each frame in a video as normal or abnormal, without any access to labels.
To accomplish this, proposed method employs conditional diffusion models, where the input data is features extracted from pre-trained network.
Our method utilizes a data-driven threshold and considers a high reconstruction error as an indicator of anomalous events.
arXiv Detail & Related papers (2023-07-04T07:36:48Z) - CamoDiffusion: Camouflaged Object Detection via Conditional Diffusion
Models [72.93652777646233]
Camouflaged Object Detection (COD) is a challenging task in computer vision due to the high similarity between camouflaged objects and their surroundings.
We propose a new paradigm that treats COD as a conditional mask-generation task leveraging diffusion models.
Our method, dubbed CamoDiffusion, employs the denoising process of diffusion models to iteratively reduce the noise of the mask.
arXiv Detail & Related papers (2023-05-29T07:49:44Z) - DiffBEV: Conditional Diffusion Model for Bird's Eye View Perception [14.968177102647783]
We propose an end-to-end framework, named DiffBEV, to exploit the potential of diffusion model to generate a more comprehensive BEV representation.
In practice, we design three types of conditions to guide the training of the diffusion model which denoises the coarse samples and refines the semantic feature.
We show that DiffBEV achieves a 25.9% mIoU on the nuScenes dataset, which is 6.2% higher than the best-performing existing approach.
arXiv Detail & Related papers (2023-03-15T02:42:48Z) - Cycle and Semantic Consistent Adversarial Domain Adaptation for Reducing
Simulation-to-Real Domain Shift in LiDAR Bird's Eye View [110.83289076967895]
We present a BEV domain adaptation method based on CycleGAN that uses prior semantic classification in order to preserve the information of small objects of interest during the domain adaptation process.
The quality of the generated BEVs has been evaluated using a state-of-the-art 3D object detection framework at KITTI 3D Object Detection Benchmark.
arXiv Detail & Related papers (2021-04-22T12:47:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.