Related papers: Leveraging Prior Knowledge of Diffusion Model for Person Search

Leveraging Prior Knowledge of Diffusion Model for Person Search

URL: http://arxiv.org/abs/2510.01841v1
Date: Thu, 02 Oct 2025 09:36:26 GMT
Title: Leveraging Prior Knowledge of Diffusion Model for Person Search
Authors: Giyeol Kim, Sooyoung Yang, Jihyong Oh, Myungjoo Kang, Chanho Eom,
Abstract summary: Person search aims to jointly perform person detection and re-identification by localizing and identifying a query person within a gallery of uncropped scene images.<n>Existing methods predominantly utilize ImageNet pre-trained backbones, which may be suboptimal for capturing the complex spatial context.<n>We propose DiffPS, a novel framework that leverages a pre-trained diffusion model while eliminating the optimization conflict between two sub-tasks.
Score: 21.72593483340187
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Person search aims to jointly perform person detection and re-identification by localizing and identifying a query person within a gallery of uncropped scene images. Existing methods predominantly utilize ImageNet pre-trained backbones, which may be suboptimal for capturing the complex spatial context and fine-grained identity cues necessary for person search. Moreover, they rely on a shared backbone feature for both person detection and re-identification, leading to suboptimal features due to conflicting optimization objectives. In this paper, we propose DiffPS (Diffusion Prior Knowledge for Person Search), a novel framework that leverages a pre-trained diffusion model while eliminating the optimization conflict between two sub-tasks. We analyze key properties of diffusion priors and propose three specialized modules: (i) Diffusion-Guided Region Proposal Network (DGRPN) for enhanced person localization, (ii) Multi-Scale Frequency Refinement Network (MSFRN) to mitigate shape bias, and (iii) Semantic-Adaptive Feature Aggregation Network (SFAN) to leverage text-aligned diffusion features. DiffPS sets a new state-of-the-art on CUHK-SYSU and PRW.

Related papers

Bidirectional Cross-Perception for Open-Vocabulary Semantic Segmentation in Remote Sensing Imagery [1.0742675209112622]
Training-free open-vocabulary semantic segmentation (OVSS) methods typically fuse CLIP and vision foundation models (VFMs)<n>We propose a spatial-regularization-aware dual-branch collaborative inference framework for training-free OVSS, termed SDCI.<n> Experiments on multiple remote sensing semantic segmentation benchmarks demonstrate that our method achieves better performance than existing approaches.
arXiv Detail & Related papers (2026-01-29T01:46:03Z)
Background Matters Too: A Language-Enhanced Adversarial Framework for Person Re-Identification [1.409283414986451]
We argue that background semantics are as important as the foreground semantics in ReID.<n>This paper proposes an end-to-end framework that jointly models foreground and background information.
arXiv Detail & Related papers (2025-09-03T05:38:22Z)
Towards Robust and Realistic Human Pose Estimation via WiFi Signals [85.60557095666934]
WiFi-based human pose estimation is a challenging task that bridges discrete and subtle WiFi signals to human skeletons.<n>This paper revisits this problem and reveals two critical yet overlooked issues: 1) cross-domain gap, i.e., due to significant variations between source-target domain pose distributions; and 2) structural fidelity gap, i.e., predicted skeletal poses manifest distorted topology.<n>This paper fills these gaps by reformulating the task into a novel two-phase framework dubbed DT-Pose: Domain-consistent representation learning and Topology-constrained Pose decoding.
arXiv Detail & Related papers (2025-01-16T09:38:22Z)
Unity in Diversity: Multi-expert Knowledge Confrontation and Collaboration for Generalizable Vehicle Re-identification [60.20318058777603]
Generalizable vehicle re-identification (ReID) seeks to develop models that can adapt to unknown target domains without the need for fine-tuning or retraining.<n>Previous works have mainly focused on extracting domain-invariant features by aligning data distributions between source domains.<n>We propose a two-stage Multi-expert Knowledge Confrontation and Collaboration (MiKeCoCo) method to solve this unique problem.
arXiv Detail & Related papers (2024-07-10T04:06:39Z)
PSDiff: Diffusion Model for Person Search with Iterative and Collaborative Refinement [59.6260680005195]
We present a novel Person Search framework based on the Diffusion model, PSDiff.<n> PSDiff formulates the person search as a dual denoising process from noisy boxes and ReID embeddings to ground truths.<n>Following the new paradigm, we further design a new Collaborative Denoising Layer (CDL) to optimize detection and ReID sub-tasks in an iterative and collaborative way.
arXiv Detail & Related papers (2023-09-20T08:16:39Z)
Prompting Diffusion Representations for Cross-Domain Semantic Segmentation [101.04326113360342]
diffusion-pretraining achieves extraordinary domain generalization results for semantic segmentation. We introduce a scene prompt and a prompt randomization strategy to help further disentangle the domain-invariant information when training the segmentation head.
arXiv Detail & Related papers (2023-07-05T09:28:25Z)
An Interactively Reinforced Paradigm for Joint Infrared-Visible Image Fusion and Saliency Object Detection [59.02821429555375]
This research focuses on the discovery and localization of hidden objects in the wild and serves unmanned systems. Through empirical analysis, infrared and visible image fusion (IVIF) enables hard-to-find objects apparent. multimodal salient object detection (SOD) accurately delineates the precise spatial location of objects within the picture.
arXiv Detail & Related papers (2023-05-17T06:48:35Z)
OIMNet++: Prototypical Normalization and Localization-aware Learning for Person Search [34.460973847554364]
We address the task of person search, that is, localizing and re-identifying query persons from a set of raw scene images. Recent approaches are typically built upon OIMNet, a pioneer work on person search, that learns joint person representations for performing both detection and person re-identification tasks. We introduce a novel normalization layer, dubbed ProtoNorm, that calibrates features from pedestrian proposals, while considering a long-tail distribution of person IDs.
arXiv Detail & Related papers (2022-07-21T06:34:03Z)
Hierarchical Deep CNN Feature Set-Based Representation Learning for Robust Cross-Resolution Face Recognition [59.29808528182607]
Cross-resolution face recognition (CRFR) is important in intelligent surveillance and biometric forensics. Existing shallow learning-based and deep learning-based methods focus on mapping the HR-LR face pairs into a joint feature space. In this study, we desire to fully exploit the multi-level deep convolutional neural network (CNN) feature set for robust CRFR.
arXiv Detail & Related papers (2021-03-25T14:03:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.