Related papers: Minimizing the Pretraining Gap: Domain-aligned Text-Based Person Retrieval

Minimizing the Pretraining Gap: Domain-aligned Text-Based Person Retrieval

URL: http://arxiv.org/abs/2507.10195v1
Date: Mon, 14 Jul 2025 12:03:04 GMT
Title: Minimizing the Pretraining Gap: Domain-aligned Text-Based Person Retrieval
Authors: Shuyu Yang, Yaxiong Wang, Yongrui Li, Li Zhu, Zhedong Zheng,
Abstract summary: We introduce a unified text-based person retrieval pipeline considering domain adaptation at both image and region levels.<n>Our dual-level adaptation method has achieved state-of-the-art results on the CUHK-PEDES, ICFG-PEDES, and RSTPReid datasets.
Score: 24.544672733180196
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this work, we focus on text-based person retrieval, which aims to identify individuals based on textual descriptions. Given the significant privacy issues and the high cost associated with manual annotation, synthetic data has become a popular choice for pretraining models, leading to notable advancements. However, the considerable domain gap between synthetic pretraining datasets and real-world target datasets, characterized by differences in lighting, color, and viewpoint, remains a critical obstacle that hinders the effectiveness of the pretrain-finetune paradigm. To bridge this gap, we introduce a unified text-based person retrieval pipeline considering domain adaptation at both image and region levels. In particular, it contains two primary components, i.e., Domain-aware Diffusion (DaD) for image-level adaptation and Multi-granularity Relation Alignment (MRA) for region-level adaptation. As the name implies, Domain-aware Diffusion is to migrate the distribution of images from the pretraining dataset domain to the target real-world dataset domain, e.g., CUHK-PEDES. Subsequently, MRA performs a meticulous region-level alignment by establishing correspondences between visual regions and their descriptive sentences, thereby addressing disparities at a finer granularity. Extensive experiments show that our dual-level adaptation method has achieved state-of-the-art results on the CUHK-PEDES, ICFG-PEDES, and RSTPReid datasets, outperforming existing methodologies. The dataset, model, and code are available at https://github.com/Shuyu-XJTU/MRA.

Related papers

Exploiting Aggregation and Segregation of Representations for Domain Adaptive Human Pose Estimation [50.31351006532924]
Human pose estimation (HPE) has received increasing attention recently due to its wide application in motion analysis, virtual reality, healthcare, etc.<n>It suffers from the lack of labeled diverse real-world datasets due to the time- and labor-intensive annotation.<n>We introduce a novel framework that capitalizes on both representation aggregation and segregation for domain adaptive human pose estimation.
arXiv Detail & Related papers (2024-12-29T17:59:45Z)
Stratified Domain Adaptation: A Progressive Self-Training Approach for Scene Text Recognition [1.2878987353423252]
Unsupervised domain adaptation (UDA) has become increasingly prevalent in scene text recognition (STR) We introduce the Stratified Domain Adaptation (StrDA) approach, which examines the gradual escalation of the domain gap for the learning process. We propose a novel method for employing domain discriminators to estimate the out-of-distribution and domain discriminative levels of data samples.
arXiv Detail & Related papers (2024-10-13T16:40:48Z)
One-Shot Domain Adaptive and Generalizable Semantic Segmentation with Class-Aware Cross-Domain Transformers [96.51828911883456]
Unsupervised sim-to-real domain adaptation (UDA) for semantic segmentation aims to improve the real-world test performance of a model trained on simulated data. Traditional UDA often assumes that there are abundant unlabeled real-world data samples available during training for the adaptation. We explore the one-shot unsupervised sim-to-real domain adaptation (OSUDA) and generalization problem, where only one real-world data sample is available.
arXiv Detail & Related papers (2022-12-14T15:54:15Z)
Domain-Agnostic Prior for Transfer Semantic Segmentation [197.9378107222422]
Unsupervised domain adaptation (UDA) is an important topic in the computer vision community. We present a mechanism that regularizes cross-domain representation learning with a domain-agnostic prior (DAP) Our research reveals that UDA benefits much from better proxies, possibly from other data modalities.
arXiv Detail & Related papers (2022-04-06T09:13:25Z)
Stagewise Unsupervised Domain Adaptation with Adversarial Self-Training for Road Segmentation of Remote Sensing Images [93.50240389540252]
Road segmentation from remote sensing images is a challenging task with wide ranges of application potentials. We propose a novel stagewise domain adaptation model called RoadDA to address the domain shift (DS) issue in this field. Experiment results on two benchmarks demonstrate that RoadDA can efficiently reduce the domain gap and outperforms state-of-the-art methods.
arXiv Detail & Related papers (2021-08-28T09:29:14Z)
Multi-Source Domain Adaptation with Collaborative Learning for Semantic Segmentation [32.95273803359897]
Multi-source unsupervised domain adaptation(MSDA) aims at adapting models trained on multiple labeled source domains to an unlabeled target domain. We propose a novel multi-source domain adaptation framework based on collaborative learning for semantic segmentation.
arXiv Detail & Related papers (2021-03-08T12:51:42Z)
Cross-Domain Facial Expression Recognition: A Unified Evaluation Benchmark and Adversarial Graph Learning [85.6386289476598]
We develop a novel adversarial graph representation adaptation (AGRA) framework for cross-domain holistic-local feature co-adaptation. We conduct extensive and fair evaluations on several popular benchmarks and show that the proposed AGRA framework outperforms previous state-of-the-art methods.
arXiv Detail & Related papers (2020-08-03T15:00:31Z)
Contextual-Relation Consistent Domain Adaptation for Semantic Segmentation [44.19436340246248]
This paper presents an innovative local contextual-relation consistent domain adaptation technique. It aims to achieve local-level consistencies during the global-level alignment. Experiments demonstrate its superior segmentation performance as compared with state-of-the-art methods.
arXiv Detail & Related papers (2020-07-05T19:00:46Z)
Unsupervised Domain Adaptation with Multiple Domain Discriminators and Adaptive Self-Training [22.366638308792734]
Unsupervised Domain Adaptation (UDA) aims at improving the generalization capability of a model trained on a source domain to perform well on a target domain for which no labeled data is available. We propose an approach to adapt a deep neural network trained on synthetic data to real scenes addressing the domain shift between the two different data distributions.
arXiv Detail & Related papers (2020-04-27T11:48:03Z)
Phase Consistent Ecological Domain Adaptation [76.75730500201536]
We focus on the task of semantic segmentation, where annotated synthetic data are aplenty, but annotating real data is laborious. The first criterion, inspired by visual psychophysics, is that the map between the two image domains be phase-preserving. The second criterion aims to leverage ecological statistics, or regularities in the scene which are manifest in any image of it, regardless of the characteristics of the illuminant or the imaging sensor.
arXiv Detail & Related papers (2020-04-10T06:58:03Z)
Focus on Semantic Consistency for Cross-domain Crowd Understanding [34.560447389853614]
Some domain adaptation algorithms try to liberate it by training models with synthetic data. We found that a mass of estimation errors in the background areas impede the performance of the existing methods. In this paper, we propose a domain adaptation method to eliminate it.
arXiv Detail & Related papers (2020-02-20T08:51:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.