Related papers: GeoPE:A Unified Geometric Positional Embedding for Structured Tensors

GeoPE:A Unified Geometric Positional Embedding for Structured Tensors

URL: http://arxiv.org/abs/2512.04963v1
Date: Thu, 04 Dec 2025 16:31:12 GMT
Title: GeoPE:A Unified Geometric Positional Embedding for Structured Tensors
Authors: Yupu Yao, Bowen Yang,
Abstract summary: We introduce Geometric Positional Embedding (GeoPE), a framework that extends rotations to 3D Euclidean space using quaternions.<n>To overcome non-commutativity and ensure symmetry, GeoPE constructs a unified rotational operator by computing the geometric mean in the Lie algebra.<n>Experiments on image classification, object detection, and 3D semantic segmentation demonstrate that GeoPE consistently outperforms existing 2D RoPE variants.
Score: 12.459742491179947
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Standard Vision Transformers flatten 2D images into 1D sequences, disrupting the natural spatial topology. While Rotary Positional Embedding (RoPE) excels in 1D, it inherits this limitation, often treating spatially distant patches (e.g., at row edges) as sequence neighbors. Existing 2D approaches typically treat spatial axes independently, failing to decouple this false sequential proximity from true spatial distance. To restore the 2D spatial manifold, we introduce Geometric Positional Embedding (GeoPE), a framework that extends rotations to 3D Euclidean space using quaternions. To overcome non-commutativity and ensure symmetry, GeoPE constructs a unified rotational operator by computing the geometric mean in the Lie algebra. This creates a geometrically coupled encoding that effectively separates spatial dimensions. Extensive experiments on image classification, object detection, and 3D semantic segmentation demonstrate that GeoPE consistently outperforms existing 2D RoPE variants and significantly enhances shape bias, confirming its ability to capture true geometric structure.

Related papers

Spherical Geometry Diffusion: Generating High-quality 3D Face Geometry via Sphere-anchored Representations [18.442834011472005]
A fundamental challenge in text-to-3D face generation is achieving high-quality geometry.<n>We introduce the Spherical Geometry Representation, a novel face representation that anchors geometric signals to uniform spherical coordinates.<n>We then introduce Spherical Diffusion Geometry, a conditional diffusion framework built upon this 2D map.
arXiv Detail & Related papers (2026-01-19T20:15:45Z)
COREA: Coarse-to-Fine 3D Representation Alignment Between Relightable 3D Gaussians and SDF via Bidirectional 3D-to-3D Supervision [15.632917458525851]
We present COREA, the first unified framework that jointly learns relightable 3D Gaussians and a Signed Distance Field (SDF) for accurate geometry reconstruction and faithful relighting.<n>Experiments on standard benchmarks demonstrate that COREA achieves superior performance in novel-view synthesis, mesh reconstruction, and PBR within a unified framework.
arXiv Detail & Related papers (2025-12-08T02:41:42Z)
SPAN: Spatial-Projection Alignment for Monocular 3D Object Detection [49.12928389918159]
Existing monocular 3D detectors typically tame the pronounced nonlinear regression of 3D bounding box through decoupled prediction paradigm.<n>We propose novel Spatial-Projection Alignment (SPAN) with two pivotal components.<n>SPAN enforces an explicit global spatial constraint between the predicted and ground-truth 3D bounding boxes, thereby rectifying spatial drift caused by decoupled attribute regression.<n>3D-2D Projection Alignment ensures that the projected 3D box is aligned tightly within its corresponding 2D detection bounding box on the image plane, mitigating projection misalignment overlooked in previous works.
arXiv Detail & Related papers (2025-11-10T04:48:48Z)
GeoPurify: A Data-Efficient Geometric Distillation Framework for Open-Vocabulary 3D Segmentation [57.8059956428009]
Recent attempts to transfer features from 2D Vision-Language Models to 3D semantic segmentation expose a persistent trade-off.<n>We propose GeoPurify that applies a small Student Affinity Network to 2D VLM-generated 3D point features using geometric priors distilled from a 3D self-supervised teacher model.<n>Benefiting from latent geometric information and the learned affinity network, GeoPurify effectively mitigates the trade-off and achieves superior data efficiency.
arXiv Detail & Related papers (2025-10-02T16:37:56Z)
Seeing 3D Through 2D Lenses: 3D Few-Shot Class-Incremental Learning via Cross-Modal Geometric Rectification [59.17489431187807]
We propose a framework that enhances 3D geometric fidelity by leveraging CLIP's hierarchical spatial semantics.<n>Our method significantly improves 3D few-shot class-incremental learning, achieving superior geometric coherence and robustness to texture bias.
arXiv Detail & Related papers (2025-09-18T13:45:08Z)
Geo2Vec: Shape- and Distance-Aware Neural Representation of Geospatial Entities [13.206124101350847]
We introduce Geo2Vec, a novel method inspired by signed distance fields (SDF) that operates directly in the original space.<n>A neural network trained to approximate the SDF produces compact, geometry-aware, and unified representations for all geo-entity types.<n> Empirical results show that Geo2Vec consistently outperforms existing methods in representing shape and location, capturing topological and distance relationships, and achieving greater efficiency in real-world GeoAI applications.
arXiv Detail & Related papers (2025-08-26T07:12:28Z)
GeoGS3D: Single-view 3D Reconstruction via Geometric-aware Diffusion Model and Gaussian Splatting [81.03553265684184]
We introduce GeoGS3D, a framework for reconstructing detailed 3D objects from single-view images. We propose a novel metric, Gaussian Divergence Significance (GDS), to prune unnecessary operations during optimization. Experiments demonstrate that GeoGS3D generates images with high consistency across views and reconstructs high-quality 3D objects.
arXiv Detail & Related papers (2024-03-15T12:24:36Z)
NDC-Scene: Boost Monocular 3D Semantic Scene Completion in Normalized Device Coordinates Space [77.6067460464962]
Monocular 3D Semantic Scene Completion (SSC) has garnered significant attention in recent years due to its potential to predict complex semantics and geometry shapes from a single image, requiring no 3D inputs. We identify several critical issues in current state-of-the-art methods, including the Feature Ambiguity of projected 2D features in the ray to the 3D space, the Pose Ambiguity of the 3D convolution, and the Imbalance in the 3D convolution across different depth levels. We devise a novel Normalized Device Coordinates scene completion network (NDC-Scene) that directly extends the 2
arXiv Detail & Related papers (2023-09-26T02:09:52Z)
Learning Pose Image Manifolds Using Geometry-Preserving GANs and Elasticae [13.202747831999414]
Geometric Style-GAN (Geom-SGAN) maps images to low-dimensional latent representations. Euler's elastica smoothly interpolate between directed points (points + tangent directions) in the low-dimensional latent space.
arXiv Detail & Related papers (2023-05-17T18:45:56Z)
Flattening-Net: Deep Regular 2D Representation for 3D Point Cloud Analysis [66.49788145564004]
We present an unsupervised deep neural architecture called Flattening-Net to represent irregular 3D point clouds of arbitrary geometry and topology. Our methods perform favorably against the current state-of-the-art competitors.
arXiv Detail & Related papers (2022-12-17T15:05:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.