Related papers: Rectifying Geometry-Induced Similarity Distortions for Real-World Aerial-Ground Person Re-Identification

Rectifying Geometry-Induced Similarity Distortions for Real-World Aerial-Ground Person Re-Identification

URL: http://arxiv.org/abs/2601.21405v1
Date: Thu, 29 Jan 2026 08:41:42 GMT
Title: Rectifying Geometry-Induced Similarity Distortions for Real-World Aerial-Ground Person Re-Identification
Authors: Kailash A. Hambarde, Hugo Proença,
Abstract summary: Aerial-ground person re-identification (AG-ReID) is fundamentally challenged by extreme viewpoint and distance discrepancies.<n>Existing methods rely on geometry-aware feature learning or appearance-conditioned prompting.<n>We introduce Geometry-Induced Query-Key Transformation (GIQT), a lightweight low-rank module that rectifies the similarity space by conditioning query-key interactions on camera geometry.
Score: 4.039576422478934
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Aerial-ground person re-identification (AG-ReID) is fundamentally challenged by extreme viewpoint and distance discrepancies between aerial and ground cameras, which induce severe geometric distortions and invalidate the assumption of a shared similarity space across views. Existing methods primarily rely on geometry-aware feature learning or appearance-conditioned prompting, while implicitly assuming that the geometry-invariant dot-product similarity used in attention mechanisms remains reliable under large viewpoint and scale variations. We argue that this assumption does not hold. Extreme camera geometry systematically distorts the query-key similarity space and degrades attention-based matching, even when feature representations are partially aligned. To address this issue, we introduce Geometry-Induced Query-Key Transformation (GIQT), a lightweight low-rank module that explicitly rectifies the similarity space by conditioning query-key interactions on camera geometry. Rather than modifying feature representations or the attention formulation itself, GIQT adapts the similarity computation to compensate for dominant geometry-induced anisotropic distortions. Building on this local similarity rectification, we further incorporate a geometry-conditioned prompt generation mechanism that provides global, view-adaptive representation priors derived directly from camera geometry. Experiments on four aerial-ground person re-identification benchmarks demonstrate that the proposed framework consistently improves robustness under extreme and previously unseen geometric conditions, while introducing minimal computational overhead compared to state-of-the-art methods.

Related papers

ArGEnT: Arbitrary Geometry-encoded Transformer for Operator Learning [2.757490632589873]
We propose Arbitrary Geometry-encoded Transformer (ArGEnT), a geometry-aware attention-based architecture for operator learning on arbitrary domains.<n>By combining flexible geometry encoding with operator-learning capabilities, ArGEnT provides a scalable surrogate modeling framework for optimization, uncertainty, and data-driven modeling of complex physical systems.
arXiv Detail & Related papers (2026-02-12T06:22:59Z)
HyperAlign: Hyperbolic Entailment Cones for Adaptive Text-to-Image Alignment Assessment [84.65251073657883]
We propose HyperAlign, an adaptive text-to-image alignment assessment framework based on hyperbolic entailment geometry.<n>First, we extract Euclidean features using CLIP and map them to hyperbolic space.<n>Second, we design a dynamic-supervision entailment modeling mechanism that transforms discrete entailment logic into continuous geometric structure supervision.<n>Third, we propose an adaptive modulation regressor that utilizes hyperbolic geometric features to generate sample-level modulation parameters.
arXiv Detail & Related papers (2026-01-08T05:41:06Z)
ARGUS: Adaptive Rotation-Invariant Geometric Unsupervised System [0.0]
This paper introduces Argus, a framework that reconceptualizes drift detection as tracking local statistics over a fixed spatial partition of the data manifold.<n> Voronoi tessellations over canonical orthonormal frames yield drift metrics that are invariant to transformations.<n>A graph-theoretic characterization of drift propagation is developed that distinguishes coherent distributional shifts from isolated perturbations.
arXiv Detail & Related papers (2026-01-03T22:39:20Z)
Seamlessly Natural: Image Stitching with Natural Appearance Preservation [0.6089774484591287]
SENA prioritizes structural fidelity in challenging real-world scenes characterized by parallax and depth variation.<n> SENA addresses fundamental limitations through three key contributions.<n>Experiments conducted on challenging datasets demonstrate that SENA achieves alignment accuracy comparable to leading homography-based methods.
arXiv Detail & Related papers (2026-01-03T18:40:35Z)
Dense Semantic Matching with VGGT Prior [49.42199006453071]
We propose an approach that retains VGGT's intrinsic strengths by reusing early feature stages, fine-tuning later ones, and adding a semantic head for bidirectional correspondences.<n>Our approach achieves superior geometry awareness, matching reliability, and manifold preservation, outperforming previous baselines.
arXiv Detail & Related papers (2025-09-25T14:56:11Z)
CP$^2$: Leveraging Geometry for Conformal Prediction via Canonicalization [51.716834831684004]
We study the problem of conformal prediction (CP) under geometric data shifts.<n>We propose integrating geometric information--such as geometric pose--into the conformal procedure to reinstate its guarantees.
arXiv Detail & Related papers (2025-06-19T10:12:02Z)
Geometry-Editable and Appearance-Preserving Object Compositon [67.98806888489385]
General object composition (GOC) aims to seamlessly integrate a target object into a background scene with desired geometric properties.<n>Recent approaches derive semantic embeddings and integrate them into advanced diffusion models to enable geometry-editable generation.<n>We introduce a Disentangled Geometry-editable and Appearance-preserving Diffusion model that first leverages semantic embeddings to implicitly capture desired geometric transformations.
arXiv Detail & Related papers (2025-05-27T09:05:28Z)
Disentangled Representation Learning with the Gromov-Monge Gap [65.73194652234848]
Learning disentangled representations from unlabelled data is a fundamental challenge in machine learning.<n>We introduce a novel approach to disentangled representation learning based on quadratic optimal transport.<n>We demonstrate the effectiveness of our approach for quantifying disentanglement across four standard benchmarks.
arXiv Detail & Related papers (2024-07-10T16:51:32Z)
Adaptive Surface Normal Constraint for Geometric Estimation from Monocular Images [56.86175251327466]
We introduce a novel approach to learn geometries such as depth and surface normal from images while incorporating geometric context. Our approach extracts geometric context that encodes the geometric variations present in the input image and correlates depth estimation with geometric constraints. Our method unifies depth and surface normal estimations within a cohesive framework, which enables the generation of high-quality 3D geometry from images.
arXiv Detail & Related papers (2024-02-08T17:57:59Z)
GeoDeformer: Geometric Deformable Transformer for Action Recognition [22.536307401874105]
Vision transformers have recently emerged as an effective alternative to convolutional networks for action recognition. This paper proposes a novel approach, GeoDeformer, designed to capture the variations inherent in action video by integrating geometric comprehension directly into the ViT architecture.
arXiv Detail & Related papers (2023-11-29T16:55:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.