Related papers: Following the Clues: Experiments on Person Re-ID using Cross-Modal Intelligence

Following the Clues: Experiments on Person Re-ID using Cross-Modal Intelligence

URL: http://arxiv.org/abs/2507.01504v3
Date: Tue, 15 Jul 2025 14:54:52 GMT
Title: Following the Clues: Experiments on Person Re-ID using Cross-Modal Intelligence
Authors: Robert Aufschläger, Youssef Shoeb, Azarm Nowzad, Michael Heigl, Fabian Bally, Martin Schramm,
Abstract summary: CRID is a cross-modal framework combining Large Vision-Language Models, Graph Attention Networks, and representation learning.<n>Our approach focuses on identifying and leveraging interpretable features, enabling the detection of semantically meaningful PII beyond low-level appearance cues.<n>Our experiments show improved performance in practical cross-dataset Re-ID scenarios.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The collection and release of street-level recordings as Open Data play a vital role in advancing autonomous driving systems and AI research. However, these datasets pose significant privacy risks, particularly for pedestrians, due to the presence of Personally Identifiable Information (PII) that extends beyond biometric traits such as faces. In this paper, we present cRID, a novel cross-modal framework combining Large Vision-Language Models, Graph Attention Networks, and representation learning to detect textual describable clues of PII and enhance person re-identification (Re-ID). Our approach focuses on identifying and leveraging interpretable features, enabling the detection of semantically meaningful PII beyond low-level appearance cues. We conduct a systematic evaluation of PII presence in person image datasets. Our experiments show improved performance in practical cross-dataset Re-ID scenarios, notably from Market-1501 to CUHK03-np (detected), highlighting the framework's practical utility. Code is available at https://github.com/RAufschlaeger/cRID.

Related papers

When Person Re-Identification Meets Event Camera: A Benchmark Dataset and An Attribute-guided Re-Identification Framework [82.9994467829281]
This paper introduces a large-scale RGB-event based person ReID dataset, called EvReID.<n>The dataset contains 118,988 image pairs and covers 1200 pedestrian identities, with data collected across multiple seasons, scenes, and lighting conditions.<n>We also evaluate 15 state-of-the-art person ReID algorithms, laying a solid foundation for future research in terms of both data and benchmarking.
arXiv Detail & Related papers (2025-07-18T05:04:59Z)
Multilinear subspace learning for person re-identification based fusion of high order tensor features [2.03240755905453]
PRe-ID aims to identify and track target individuals who have already been detected in a network of cameras.<n>To this end, two powerful features, Conal Neural Networks (CNN) and Local Maximal Occurrence (LOMO) are modeled on multidimensional data.<n>New tensor fusion scheme is introduced to leverage and combine these two types of features in a single tensor.
arXiv Detail & Related papers (2025-05-09T23:39:27Z)
Keypoint Promptable Re-Identification [76.31113049256375]
Occluded Person Re-Identification (ReID) is a metric learning task that involves matching occluded individuals based on their appearance. We introduce Keypoint Promptable ReID (KPR), a novel formulation of the ReID problem that explicitly complements the input bounding box with a set of semantic keypoints. We release custom keypoint labels for four popular ReID benchmarks. Experiments on person retrieval, but also on pose tracking, demonstrate that our method systematically surpasses previous state-of-the-art approaches.
arXiv Detail & Related papers (2024-07-25T15:20:58Z)
ID-Aligner: Enhancing Identity-Preserving Text-to-Image Generation with Reward Feedback Learning [57.91881829308395]
Identity-preserving text-to-image generation (ID-T2I) has received significant attention due to its wide range of application scenarios like AI portrait and advertising. We present textbfID-Aligner, a general feedback learning framework to enhance ID-T2I performance.
arXiv Detail & Related papers (2024-04-23T18:41:56Z)
Using Skew to Assess the Quality of GAN-generated Image Features [3.300324211572204]
The Fr'echetInception Distance (FID) has been widely adopted due to its conceptual simplicity, fast computation time, and strong correlation with human perception. In this paper we explore the importance of third-moments in image feature data and use this information to define a new measure, which we call the Skew Inception Distance (SID)
arXiv Detail & Related papers (2023-10-31T17:05:02Z)
Data-Driven but Privacy-Conscious: Pedestrian Dataset De-identification via Full-Body Person Synthesis [16.394031759681678]
We motivate and introduce the Pedestrian dataset De-Identification task. PDI evaluates the degree of de-identification and downstream task training performance for a given de-identification method. We show how our data is able to narrow the synthetic-to-real performance gap in a privacy-conscious manner.
arXiv Detail & Related papers (2023-06-20T17:39:24Z)
Robust Saliency-Aware Distillation for Few-shot Fine-grained Visual Recognition [57.08108545219043]
Recognizing novel sub-categories with scarce samples is an essential and challenging research topic in computer vision. Existing literature addresses this challenge by employing local-based representation approaches. This article proposes a novel model, Robust Saliency-aware Distillation (RSaD), for few-shot fine-grained visual recognition.
arXiv Detail & Related papers (2023-05-12T00:13:17Z)
Learning Transferable Pedestrian Representation from Multimodal Information Supervision [174.5150760804929]
VAL-PAT is a novel framework that learns transferable representations to enhance various pedestrian analysis tasks with multimodal information. We first perform pre-training on LUPerson-TA dataset, where each image contains text and attribute annotations. We then transfer the learned representations to various downstream tasks, including person reID, person attribute recognition and text-based person search.
arXiv Detail & Related papers (2023-04-12T01:20:58Z)
Benchmarking person re-identification datasets and approaches for practical real-world implementations [1.0079626733116613]
Person Re-Identification (Re-ID) has received a lot of attention. However, when such Re-ID models are deployed in new cities or environments, the task of searching for people within a network of security cameras is likely to face an important domain shift. This paper introduces a complete methodology to evaluate Re-ID approaches and training datasets with respect to their suitability for unsupervised deployment for live operations.
arXiv Detail & Related papers (2022-12-20T03:45:38Z)
Stronger Baseline for Person Re-Identification [6.087398773657721]
Person re-identification (re-ID) aims to identify the same person of interest across non-overlapping capturing cameras. We propose a Stronger Baseline for person re-ID, an enhancement version of the current prevailing method, namely, Strong Baseline.
arXiv Detail & Related papers (2021-12-02T08:50:03Z)
One-Shot Object Affordance Detection in the Wild [76.46484684007706]
Affordance detection refers to identifying the potential action possibilities of objects in an image. We devise a One-Shot Affordance Detection Network (OSAD-Net) that estimates the human action purpose and then transfers it to help detect the common affordance from all candidate images. With complex scenes and rich annotations, our PADv2 dataset can be used as a test bed to benchmark affordance detection methods.
arXiv Detail & Related papers (2021-08-08T14:53:10Z)
Unsupervised Pre-training for Person Re-identification [90.98552221699508]
We present a large scale unlabeled person re-identification (Re-ID) dataset "LUPerson" We make the first attempt of performing unsupervised pre-training for improving the generalization ability of the learned person Re-ID feature representation.
arXiv Detail & Related papers (2020-12-07T14:48:26Z)
A Convolutional Baseline for Person Re-Identification Using Vision and Language Descriptions [24.794592610444514]
In real-world surveillance scenarios, frequently no visual information will be available about the queried person. A two stream deep convolutional neural network framework supervised by cross entropy loss is presented. The learnt visual representations are more robust and perform 22% better during retrieval as compared to a single modality system.
arXiv Detail & Related papers (2020-02-20T10:12:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.