Efficient Transformer-based 3D Object Detection with Dynamic Token
Halting
- URL: http://arxiv.org/abs/2303.05078v2
- Date: Wed, 11 Oct 2023 17:46:03 GMT
- Title: Efficient Transformer-based 3D Object Detection with Dynamic Token
Halting
- Authors: Mao Ye, Gregory P. Meyer, Yuning Chai, Qiang Liu
- Abstract summary: We propose an effective approach for accelerating transformer-based 3D object detectors by dynamically halting tokens at different layers.
Although halting a token is a non-differentiable operation, our method allows for differentiable end-to-end learning.
Our framework allows halted tokens to be reused to inform the model's predictions through a straightforward token recycling mechanism.
- Score: 19.88560740238657
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Balancing efficiency and accuracy is a long-standing problem for deploying
deep learning models. The trade-off is even more important for real-time
safety-critical systems like autonomous vehicles. In this paper, we propose an
effective approach for accelerating transformer-based 3D object detectors by
dynamically halting tokens at different layers depending on their contribution
to the detection task. Although halting a token is a non-differentiable
operation, our method allows for differentiable end-to-end learning by
leveraging an equivalent differentiable forward-pass. Furthermore, our
framework allows halted tokens to be reused to inform the model's predictions
through a straightforward token recycling mechanism. Our method significantly
improves the Pareto frontier of efficiency versus accuracy when compared with
the existing approaches. By halting tokens and increasing model capacity, we
are able to improve the baseline model's performance without increasing the
model's latency on the Waymo Open Dataset.
Related papers
- MOREL: Enhancing Adversarial Robustness through Multi-Objective Representation Learning [1.534667887016089]
deep neural networks (DNNs) are vulnerable to slight adversarial perturbations.
We show that strong feature representation learning during training can significantly enhance the original model's robustness.
We propose MOREL, a multi-objective feature representation learning approach, encouraging classification models to produce similar features for inputs within the same class, despite perturbations.
arXiv Detail & Related papers (2024-10-02T16:05:03Z) - Distribution Discrepancy and Feature Heterogeneity for Active 3D Object Detection [18.285299184361598]
LiDAR-based 3D object detection is a critical technology for the development of autonomous driving and robotics.
We propose a novel and effective active learning (AL) method called Distribution Discrepancy and Feature Heterogeneity (DDFH)
It simultaneously considers geometric features and model embeddings, assessing information from both the instance-level and frame-level perspectives.
arXiv Detail & Related papers (2024-09-09T08:26:11Z) - Efficient Point Transformer with Dynamic Token Aggregating for Point Cloud Processing [19.73918716354272]
We propose an efficient point TransFormer with Dynamic Token Aggregating (DTA-Former) for point cloud representation and processing.
It achieves SOTA performance with up to 30$times$ faster than prior point Transformers on ModelNet40, ShapeNet, and airborne MultiSpectral LiDAR (MS-LiDAR) datasets.
arXiv Detail & Related papers (2024-05-23T20:50:50Z) - Unsupervised Domain Adaptation for Self-Driving from Past Traversal
Features [69.47588461101925]
We propose a method to adapt 3D object detectors to new driving environments.
Our approach enhances LiDAR-based detection models using spatial quantized historical features.
Experiments on real-world datasets demonstrate significant improvements.
arXiv Detail & Related papers (2023-09-21T15:00:31Z) - Value function estimation using conditional diffusion models for control [62.27184818047923]
We propose a simple algorithm called Diffused Value Function (DVF)
It learns a joint multi-step model of the environment-robot interaction dynamics using a diffusion model.
We show how DVF can be used to efficiently capture the state visitation measure for multiple controllers.
arXiv Detail & Related papers (2023-06-09T18:40:55Z) - Confident Adaptive Language Modeling [95.45272377648773]
CALM is a framework for dynamically allocating different amounts of compute per input and generation timestep.
We demonstrate the efficacy of our framework in reducing compute -- potential speedup of up to $times 3$ -- while provably maintaining high performance.
arXiv Detail & Related papers (2022-07-14T17:00:19Z) - Efficient Few-Shot Object Detection via Knowledge Inheritance [62.36414544915032]
Few-shot object detection (FSOD) aims at learning a generic detector that can adapt to unseen tasks with scarce training samples.
We present an efficient pretrain-transfer framework (PTF) baseline with no computational increment.
We also propose an adaptive length re-scaling (ALR) strategy to alleviate the vector length inconsistency between the predicted novel weights and the pretrained base weights.
arXiv Detail & Related papers (2022-03-23T06:24:31Z) - Dynamic Iterative Refinement for Efficient 3D Hand Pose Estimation [87.54604263202941]
We propose a tiny deep neural network of which partial layers are iteratively exploited for refining its previous estimations.
We employ learned gating criteria to decide whether to exit from the weight-sharing loop, allowing per-sample adaptation in our model.
Our method consistently outperforms state-of-the-art 2D/3D hand pose estimation approaches in terms of both accuracy and efficiency for widely used benchmarks.
arXiv Detail & Related papers (2021-11-11T23:31:34Z) - Efficient Transformer based Method for Remote Sensing Image Change
Detection [17.553240434628087]
High-resolution remote sensing CD remains challenging due to the complexity of objects in the scene.
We propose a bitemporal image transformer (BiT) to efficiently and effectively model contexts within the spatial-temporal domain.
BiT-based model significantly outperforms the purely convolutional baseline using only 3 times lower computational costs and model parameters.
arXiv Detail & Related papers (2021-02-27T13:08:46Z) - Data-efficient Weakly-supervised Learning for On-line Object Detection
under Domain Shift in Robotics [24.878465999976594]
Several object detection methods have been proposed in the literature, the vast majority based on Deep Convolutional Neural Networks (DCNNs)
These methods have important limitations for robotics: Learning solely on off-line data may introduce biases, and prevents adaptation to novel tasks.
In this work, we investigate how weakly-supervised learning can cope with these problems.
arXiv Detail & Related papers (2020-12-28T16:36:11Z) - Tracking Performance of Online Stochastic Learners [57.14673504239551]
Online algorithms are popular in large-scale learning settings due to their ability to compute updates on the fly, without the need to store and process data in large batches.
When a constant step-size is used, these algorithms also have the ability to adapt to drifts in problem parameters, such as data or model properties, and track the optimal solution with reasonable accuracy.
We establish a link between steady-state performance derived under stationarity assumptions and the tracking performance of online learners under random walk models.
arXiv Detail & Related papers (2020-04-04T14:16:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.