Think Twice before Driving: Towards Scalable Decoders for End-to-End
Autonomous Driving
- URL: http://arxiv.org/abs/2305.06242v1
- Date: Wed, 10 May 2023 15:22:02 GMT
- Title: Think Twice before Driving: Towards Scalable Decoders for End-to-End
Autonomous Driving
- Authors: Xiaosong Jia, Penghao Wu, Li Chen, Jiangwei Xie, Conghui He, Junchi
Yan, Hongyang Li
- Abstract summary: Existing methods usually adopt the decoupled encoder-decoder paradigm.
In this work, we aim to alleviate the problem by two principles.
We first predict a coarse-grained future position and action based on the encoder features.
Then, conditioned on the position and action, the future scene is imagined to check the ramification if we drive accordingly.
- Score: 74.28510044056706
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: End-to-end autonomous driving has made impressive progress in recent years.
Existing methods usually adopt the decoupled encoder-decoder paradigm, where
the encoder extracts hidden features from raw sensor data, and the decoder
outputs the ego-vehicle's future trajectories or actions. Under such a
paradigm, the encoder does not have access to the intended behavior of the ego
agent, leaving the burden of finding out safety-critical regions from the
massive receptive field and inferring about future situations to the decoder.
Even worse, the decoder is usually composed of several simple multi-layer
perceptrons (MLP) or GRUs while the encoder is delicately designed (e.g., a
combination of heavy ResNets or Transformer). Such an imbalanced resource-task
division hampers the learning process.
In this work, we aim to alleviate the aforementioned problem by two
principles: (1) fully utilizing the capacity of the encoder; (2) increasing the
capacity of the decoder. Concretely, we first predict a coarse-grained future
position and action based on the encoder features. Then, conditioned on the
position and action, the future scene is imagined to check the ramification if
we drive accordingly. We also retrieve the encoder features around the
predicted coordinate to obtain fine-grained information about the
safety-critical region. Finally, based on the predicted future and the
retrieved salient feature, we refine the coarse-grained position and action by
predicting its offset from ground-truth. The above refinement module could be
stacked in a cascaded fashion, which extends the capacity of the decoder with
spatial-temporal prior knowledge about the conditioned future. We conduct
experiments on the CARLA simulator and achieve state-of-the-art performance in
closed-loop benchmarks. Extensive ablation studies demonstrate the
effectiveness of each proposed module.
Related papers
- Learning Linear Block Error Correction Codes [62.25533750469467]
We propose for the first time a unified encoder-decoder training of binary linear block codes.
We also propose a novel Transformer model in which the self-attention masking is performed in a differentiable fashion for the efficient backpropagation of the code gradient.
arXiv Detail & Related papers (2024-05-07T06:47:12Z) - Take an Irregular Route: Enhance the Decoder of Time-Series Forecasting
Transformer [9.281993269355544]
We propose FPPformer to utilize bottom-up and top-down architectures in encoder and decoder to build the full and rational hierarchy.
Extensive experiments with six state-of-the-art benchmarks verify the promising performances of FPPformer.
arXiv Detail & Related papers (2023-12-10T06:50:56Z) - DEED: Dynamic Early Exit on Decoder for Accelerating Encoder-Decoder
Transformer Models [22.276574156358084]
We build a multi-exit encoder-decoder transformer model which is trained with deep supervision so that each of its decoder layers is capable of generating plausible predictions.
We show our approach can reduce overall inference latency by 30%-60% with comparable or even higher accuracy compared to baselines.
arXiv Detail & Related papers (2023-11-15T01:01:02Z) - Regress Before Construct: Regress Autoencoder for Point Cloud
Self-supervised Learning [18.10704604275133]
Masked Autoencoders (MAE) have demonstrated promising performance in self-supervised learning for 2D and 3D computer vision.
We propose Point Regress AutoEncoder (Point-RAE), a new scheme for regressive autoencoders for point cloud self-supervised learning.
Our approach is efficient during pre-training and generalizes well on various downstream tasks.
arXiv Detail & Related papers (2023-09-25T17:23:33Z) - Challenging Decoder helps in Masked Auto-Encoder Pre-training for Dense
Passage Retrieval [10.905033385938982]
Masked auto-encoder (MAE) pre-training architecture has emerged as the most promising.
We propose a novel token importance aware masking strategy based on pointwise mutual information to intensify the challenge of the decoder.
arXiv Detail & Related papers (2023-05-22T16:27:10Z) - Dense Coding with Locality Restriction for Decoder: Quantum Encoders vs.
Super-Quantum Encoders [67.12391801199688]
We investigate dense coding by imposing various locality restrictions to our decoder.
In this task, the sender Alice and the receiver Bob share an entangled state.
arXiv Detail & Related papers (2021-09-26T07:29:54Z) - Decoder Fusion RNN: Context and Interaction Aware Decoders for
Trajectory Prediction [53.473846742702854]
We propose a recurrent, attention-based approach for motion forecasting.
Decoder Fusion RNN (DF-RNN) is composed of a recurrent behavior encoder, an inter-agent multi-headed attention module, and a context-aware decoder.
We demonstrate the efficacy of our method by testing it on the Argoverse motion forecasting dataset and show its state-of-the-art performance on the public benchmark.
arXiv Detail & Related papers (2021-08-12T15:53:37Z) - Split Learning Meets Koopman Theory for Wireless Remote Monitoring and
Prediction [76.88643211266168]
We propose to train an autoencoder whose encoder and decoder are split and stored at a state sensor and its remote observer, respectively.
This autoencoder not only decreases the remote monitoring payload size by reducing the state representation dimension, but also learns the system dynamics by lifting it via a Koopman operator.
Numerical results under a non-linear cart-pole environment demonstrate that the proposed split learning of a Koopman autoencoder can locally predict future states, and the prediction accuracy increases with the representation dimension and transmission power.
arXiv Detail & Related papers (2021-04-16T13:34:01Z) - On Sparsifying Encoder Outputs in Sequence-to-Sequence Models [90.58793284654692]
We take Transformer as the testbed and introduce a layer of gates in-between the encoder and the decoder.
The gates are regularized using the expected value of the sparsity-inducing L0penalty.
We investigate the effects of this sparsification on two machine translation and two summarization tasks.
arXiv Detail & Related papers (2020-04-24T16:57:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.