Related papers: DRoPE: Directional Rotary Position Embedding for Efficient Agent Interaction Modeling

DRoPE: Directional Rotary Position Embedding for Efficient Agent Interaction Modeling

URL: http://arxiv.org/abs/2503.15029v1
Date: Wed, 19 Mar 2025 09:23:09 GMT
Title: DRoPE: Directional Rotary Position Embedding for Efficient Agent Interaction Modeling
Authors: Jianbo Zhao, Taiyu Ban, Zhihao Liu, Hangning Zhou, Xiyang Wang, Qibin Zhou, Hailong Qin, Mu Yang, Lei Liu, Bin Li,
Abstract summary: Directional Rotary Position Embedding (DRoPE) is a novel adaptation of Rotary Position Embedding (RoPE) originally developed in natural language processing.<n>DRoPE overcomes limitations by introducing a uniform identity scalar into RoPE's 2D rotary transformation.<n> Empirical evaluations confirm DRoPE's good performance and significantly reduced space complexity.
Score: 9.86959003425198
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Accurate and efficient modeling of agent interactions is essential for trajectory generation, the core of autonomous driving systems. Existing methods, scene-centric, agent-centric, and query-centric frameworks, each present distinct advantages and drawbacks, creating an impossible triangle among accuracy, computational time, and memory efficiency. To break this limitation, we propose Directional Rotary Position Embedding (DRoPE), a novel adaptation of Rotary Position Embedding (RoPE), originally developed in natural language processing. Unlike traditional relative position embedding (RPE), which introduces significant space complexity, RoPE efficiently encodes relative positions without explicitly increasing complexity but faces inherent limitations in handling angular information due to periodicity. DRoPE overcomes this limitation by introducing a uniform identity scalar into RoPE's 2D rotary transformation, aligning rotation angles with realistic agent headings to naturally encode relative angular information. We theoretically analyze DRoPE's correctness and efficiency, demonstrating its capability to simultaneously optimize trajectory generation accuracy, time complexity, and space complexity. Empirical evaluations compared with various state-of-the-art trajectory generation models, confirm DRoPE's good performance and significantly reduced space complexity, indicating both theoretical soundness and practical effectiveness. The video documentation is available at https://drope-traj.github.io/.

Related papers

ReCoM: Realistic Co-Speech Motion Generation with Recurrent Embedded Transformer [58.49950218437718]
We present ReCoM, an efficient framework for generating high-fidelity and generalizable human body motions synchronized with speech. The core innovation lies in the Recurrent Embedded Transformer (RET), which integrates Dynamic Embedding Regularization (DER) into a Vision Transformer (ViT) core architecture. To enhance model robustness, we incorporate the proposed DER strategy, which equips the model with dual capabilities of noise resistance and cross-domain generalization.
arXiv Detail & Related papers (2025-03-27T16:39:40Z)
Benchmarking Rotary Position Embeddings for Automatic Speech Recognition [17.360059094663182]
Rotary Position Embedding (RoPE) encodes relative and absolute positional information in Transformer-based models.<n>RoPE consistently achieves lower error rates compared to the currently widely used relative positional embedding.<n>To facilitate further research, we release the implementation and all experimental recipes through the SpeechBrain toolkit.
arXiv Detail & Related papers (2025-01-10T15:30:46Z)
Bench2Drive-R: Turning Real World Data into Reactive Closed-Loop Autonomous Driving Benchmark by Generative Model [63.336123527432136]
We introduce Bench2Drive-R, a generative framework that enables reactive closed-loop evaluation.<n>Unlike existing video generative models for autonomous driving, the proposed designs are tailored for interactive simulation.<n>We compare the generation quality of Bench2Drive-R with existing generative models and achieve state-of-the-art performance.
arXiv Detail & Related papers (2024-12-11T06:35:18Z)
Real-Time Motion Prediction via Heterogeneous Polyline Transformer with Relative Pose Encoding [121.08841110022607]
Existing agent-centric methods have demonstrated outstanding performance on public benchmarks. We introduce the K-nearest neighbor attention with relative pose encoding (KNARPE), a novel attention mechanism allowing the pairwise-relative representation to be used by Transformers. By sharing contexts among agents and reusing the unchanged contexts, our approach is as efficient as scene-centric methods, while performing on par with state-of-the-art agent-centric methods.
arXiv Detail & Related papers (2023-10-19T17:59:01Z)
Rotation-Invariant Transformer for Point Cloud Matching [42.5714375149213]
We introduce RoITr, a Rotation-Invariant Transformer to cope with the pose variations in the point cloud matching task. We propose a global transformer with rotation-invariant cross-frame spatial awareness learned by the self-attention mechanism. RoITr surpasses the existing methods by at least 13 and 5 percentage points in terms of Inlier Ratio and Registration Recall.
arXiv Detail & Related papers (2023-03-14T20:55:27Z)
SPE-Net: Boosting Point Cloud Analysis via Rotation Robustness Enhancement [118.20816888815658]
We propose a novel deep architecture tailored for 3D point cloud applications, named as SPE-Net. The embedded Selective Position variant' procedure relies on an attention mechanism that can effectively attend to the underlying rotation condition of the input. We demonstrate the merits of the SPE-Net and the associated hypothesis on four benchmarks, showing evident improvements on both rotated and unrotated test data over SOTA methods.
arXiv Detail & Related papers (2022-11-15T15:59:09Z)
Joint Spatial-Temporal and Appearance Modeling with Transformer for Multiple Object Tracking [59.79252390626194]
We propose a novel solution named TransSTAM, which leverages Transformer to model both the appearance features of each object and the spatial-temporal relationships among objects. The proposed method is evaluated on multiple public benchmarks including MOT16, MOT17, and MOT20, and it achieves a clear performance improvement in both IDF1 and HOTA.
arXiv Detail & Related papers (2022-05-31T01:19:18Z)
Attention-based Proposals Refinement for 3D Object Detection [0.0]
This paper takes a more data-driven approach to ROI feature extraction using the attention mechanism. Experiments on KITTI textitvalidation set show that our method achieves competitive performance of 84.84 AP for class Car at moderate difficulty.
arXiv Detail & Related papers (2022-01-18T15:50:31Z)
Rethinking and Improving Relative Position Encoding for Vision Transformer [61.559777439200744]
Relative position encoding (RPE) is important for transformer to capture sequence ordering of input tokens. We propose new relative position encoding methods dedicated to 2D images, called image RPE (iRPE)
arXiv Detail & Related papers (2021-07-29T17:55:10Z)
Conformer-based End-to-end Speech Recognition With Rotary Position Embedding [11.428057887454008]
We introduce rotary position embedding (RoPE) in the convolution-augmented transformer (conformer) RoPE encodes absolute positional information into the input sequence by a rotation matrix, and then naturally incorporates explicit relative position information into a self-attention module. Our model achieves a relative word error rate reduction of 8.70% and 7.27% over the conformer on test-clean and test-other sets of the LibriSpeech corpus respectively.
arXiv Detail & Related papers (2021-07-13T08:07:22Z)
RoFormer: Enhanced Transformer with Rotary Position Embedding [9.01819510933327]
We propose a novel method named Rotary Position Embedding(RoPE) to effectively leverage the positional information. RoPE encodes the absolute position with a rotation matrix and meanwhile incorporates the explicit relative position dependency in self-attention formulation. We evaluate the enhanced transformer with rotary position embedding, also called RoFormer, on various long text classification benchmark datasets.
arXiv Detail & Related papers (2021-04-20T09:54:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.