A Theory-Inspired Framework for Few-Shot Cross-Modal Sketch Person Re-Identification
- URL: http://arxiv.org/abs/2511.18677v1
- Date: Mon, 24 Nov 2025 01:26:46 GMT
- Title: A Theory-Inspired Framework for Few-Shot Cross-Modal Sketch Person Re-Identification
- Authors: Yunpeng Gong, Yongjie Hou, Jiangming Shi, Kim Long Diep, Min Jiang,
- Abstract summary: Sketch based person re-identification aims to match hand-drawn sketches with RGB surveillance images.<n>We introduce KTCAA, a framework for few-shot cross-modal generalization.<n>We show that KTCAA achieves state-of-the-art performance, particularly in data-scarce conditions.
- Score: 5.499165736807566
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Sketch based person re-identification aims to match hand-drawn sketches with RGB surveillance images, but remains challenging due to significant modality gaps and limited annotated data. To address this, we introduce KTCAA, a theoretically grounded framework for few-shot cross-modal generalization. Motivated by generalization theory, we identify two key factors influencing target domain risk: (1) domain discrepancy, which quantifies the alignment difficulty between source and target distributions; and (2) perturbation invariance, which evaluates the model's robustness to modality shifts. Based on these insights, we propose two components: (1) Alignment Augmentation (AA), which applies localized sketch-style transformations to simulate target distributions and facilitate progressive alignment; and (2) Knowledge Transfer Catalyst (KTC), which enhances invariance by introducing worst-case perturbations and enforcing consistency. These modules are jointly optimized under a meta-learning paradigm that transfers alignment knowledge from data-rich RGB domains to sketch-based scenarios. Experiments on multiple benchmarks demonstrate that KTCAA achieves state-of-the-art performance, particularly in data-scarce conditions.
Related papers
- Agreement Disagreement Guided Knowledge Transfer for Cross-Scene Hyperspectral Imaging [13.858601384061197]
We propose an Agreement Disagreement Guided Knowledge Transfer (ADGKT) framework to enhance cross-scene transfer.<n>The framework includes GradVac and LogitNorm, which align gradient directions to mitigate conflicts between source and target domains.<n>The disagreement component consists of a Disagreement Restriction (DiR) and an ensemble strategy, which capture diverse predictive target features.
arXiv Detail & Related papers (2025-12-08T02:25:27Z) - Concept Regions Matter: Benchmarking CLIP with a New Cluster-Importance Approach [20.898059440239603]
Cluster-based Concept Importance (CCI) is a novel interpretability method.<n>CCI sets a new state of the art on faithfulness benchmarks.<n>We present a comprehensive evaluation of eighteen CLIP variants.
arXiv Detail & Related papers (2025-11-17T05:01:24Z) - Domain Adaptation via Feature Refinement [0.3867363075280543]
We propose Domain Adaptation via Feature Refinement (DAFR2), a simple yet effective framework for unsupervised domain adaptation under distribution shift.<n>The proposed method combines three key components: adaptation of Batch Normalization statistics using unlabeled target data, feature distillation from a source-trained model and hypothesis transfer.
arXiv Detail & Related papers (2025-08-22T06:32:19Z) - NDCG-Consistent Softmax Approximation with Accelerated Convergence [67.10365329542365]
We propose novel loss formulations that align directly with ranking metrics.<n>We integrate the proposed RG losses with the highly efficient Alternating Least Squares (ALS) optimization method.<n> Empirical evaluations on real-world datasets demonstrate that our approach achieves comparable or superior ranking performance.
arXiv Detail & Related papers (2025-06-11T06:59:17Z) - Symmetrical Bidirectional Knowledge Alignment for Zero-Shot Sketch-Based
Image Retrieval [69.46139774646308]
This paper studies the problem of zero-shot sketch-based image retrieval (ZS-SBIR)
It aims to use sketches from unseen categories as queries to match the images of the same category.
We propose a novel Symmetrical Bidirectional Knowledge Alignment for zero-shot sketch-based image retrieval (SBKA)
arXiv Detail & Related papers (2023-12-16T04:50:34Z) - Invariant Representation via Decoupling Style and Spurious Features from Images [27.965593857283316]
This paper considers the out-of-distribution (OOD) generalization problem under the setting that both style distribution shift and spurious features exist and domain labels are missing.
We propose a structural causal model (SCM) for the image generation process, which captures both style distribution shift and spurious features.
The proposed SCM enables us to design a new framework called IRSS, which can gradually separate style distribution and spurious features from images.
arXiv Detail & Related papers (2023-12-11T09:14:42Z) - Relation Matters: Foreground-aware Graph-based Relational Reasoning for
Domain Adaptive Object Detection [81.07378219410182]
We propose a new and general framework for DomainD, named Foreground-aware Graph-based Reasoning (FGRR)
FGRR incorporates graph structures into the detection pipeline to explicitly model the intra- and inter-domain foreground object relations.
Empirical results demonstrate that the proposed FGRR exceeds the state-of-the-art on four DomainD benchmarks.
arXiv Detail & Related papers (2022-06-06T05:12:48Z) - BDA-SketRet: Bi-Level Domain Adaptation for Zero-Shot SBIR [52.78253400327191]
BDA-SketRet is a novel framework performing a bi-level domain adaptation for aligning the spatial and semantic features of the visual data pairs.
Experimental results on the extended Sketchy, TU-Berlin, and QuickDraw exhibit sharp improvements over the literature.
arXiv Detail & Related papers (2022-01-17T18:45:55Z) - Semi-supervised Domain Adaptive Structure Learning [72.01544419893628]
Semi-supervised domain adaptation (SSDA) is a challenging problem requiring methods to overcome both 1) overfitting towards poorly annotated data and 2) distribution shift across domains.
We introduce an adaptive structure learning method to regularize the cooperation of SSL and DA.
arXiv Detail & Related papers (2021-12-12T06:11:16Z) - G$^2$DA: Geometry-Guided Dual-Alignment Learning for RGB-Infrared Person
Re-Identification [3.909938091041451]
RGB-IR person re-identification aims to retrieve person-of-interest between heterogeneous modalities.
This paper presents a Geometry-Guided Dual-Alignment learning framework (G$2$DA) to tackle sample-level modality difference.
arXiv Detail & Related papers (2021-06-15T03:14:31Z) - Deep Semantic Matching with Foreground Detection and Cycle-Consistency [103.22976097225457]
We address weakly supervised semantic matching based on a deep network.
We explicitly estimate the foreground regions to suppress the effect of background clutter.
We develop cycle-consistent losses to enforce the predicted transformations across multiple images to be geometrically plausible and consistent.
arXiv Detail & Related papers (2020-03-31T22:38:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.