Related papers: Contextualized Multimodal Lifelong Person Re-Identification in Hybrid Clothing States

Contextualized Multimodal Lifelong Person Re-Identification in Hybrid Clothing States

URL: http://arxiv.org/abs/2509.11247v1
Date: Sun, 14 Sep 2025 12:46:39 GMT
Title: Contextualized Multimodal Lifelong Person Re-Identification in Hybrid Clothing States
Authors: Robert Long, Rongxin Jiang, Mingrui Yan,
Abstract summary: Person Re-Identification (ReID) has several challenges in real-world surveillance systems due to clothing changes (CCReID)<n>Previous existing methods either develop models specifically for one application or treat CCReID as its own sub-problem.<n>We will introduce the LReID-Hybrid task with the goal of developing a model to achieve both SC and CC while learning in a continual setting.
Score: 2.6399783378460158
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Person Re-Identification (ReID) has several challenges in real-world surveillance systems due to clothing changes (CCReID) and the need for maintaining continual learning (LReID). Previous existing methods either develop models specifically for one application, which is mostly a same-cloth (SC) setting or treat CCReID as its own separate sub-problem. In this work, we will introduce the LReID-Hybrid task with the goal of developing a model to achieve both SC and CC while learning in a continual setting. Mismatched representations and forgetting from one task to the next are significant issues, we address this with CMLReID, a CLIP-based framework composed of two novel tasks: (1) Context-Aware Semantic Prompt (CASP) that generates adaptive prompts, and also incorporates context to align richly multi-grained visual cues with semantic text space; and (2) Adaptive Knowledge Fusion and Projection (AKFP) which produces robust SC/CC prototypes through the use of a dual-path learner that aligns features with our Clothing-State-Aware Projection Loss. Experiments performed on a wide range of datasets and illustrate that CMLReID outperforms all state-of-the-art methods with strong robustness and generalization despite clothing variations and a sophisticated process of sequential learning.

Related papers

R2LED: Equipping Retrieval and Refinement in Lifelong User Modeling with Semantic IDs for CTR Prediction [23.668401664583758]
We propose a novel paradigm that equips retrieval and refinement in Lifelong User Modeling with SEmantic IDs (R2LED)<n>First, we introduce a Multi-route Mixed Retrieval for the retrieval stage. On the other hand, a mixed retrieval mechanism is proposed to efficiently retrieve candidates from both collaborative and semantic views.<n>For refinement, we design a Bi-level Fusion Refinement, including a target-aware cross-attention for route-level fusion and a gate mechanism for SID-level fusion.
arXiv Detail & Related papers (2026-02-06T11:27:20Z)
LibContinual: A Comprehensive Library towards Realistic Continual Learning [62.34449396069085]
A fundamental challenge in Continual Learning (CL) is catastrophic forgetting, where adapting to new tasks degrades the performance on previous ones.<n>We propose LibContinual, a comprehensive and reproducible library designed to serve as a foundational platform for realistic CL.
arXiv Detail & Related papers (2025-12-26T13:59:13Z)
Exploiting ID-Text Complementarity via Ensembling for Sequential Recommendation [29.942561497927116]
We study the complementarity of ID and modality features in Sequential Recommendation models.<n>We propose a new SR method that preserves ID-textarity through independent model training, then harnesses it through a simple ensembling strategy.
arXiv Detail & Related papers (2025-12-19T17:24:12Z)
The Best of the Two Worlds: Harmonizing Semantic and Hash IDs for Sequential Recommendation [51.62815306481903]
We propose textbfname, a novel framework that harmonizes the SID and HID. Specifically, we devise a dual-branch modeling architecture that enables the model to capture both the multi-granular semantics within SID while preserving the unique collaborative identity of HID.<n>Experiments on three real-world datasets show that name balances recommendation quality for both head and tail items while surpassing the existing baselines.
arXiv Detail & Related papers (2025-12-11T07:50:53Z)
Exploring a Unified Vision-Centric Contrastive Alternatives on Multi-Modal Web Documents [99.62178668680578]
We propose Vision-Centric Contrastive Learning (VC2L), a unified framework that models text, images, and their combinations using a single vision transformer.<n> VC2L operates entirely in pixel space by rendering all inputs, whether textual, visual, or combined, as images.<n>To capture complex cross-modal relationships in web documents, VC2L employs a snippet-level contrastive learning objective that aligns consecutive multimodal segments.
arXiv Detail & Related papers (2025-10-21T14:59:29Z)
Continual Learning for VLMs: A Survey and Taxonomy Beyond Forgetting [70.83781268763215]
Vision-language models (VLMs) have achieved impressive performance across diverse multimodal tasks by leveraging large-scale pre-training.<n>VLMs face unique challenges such as cross-modal feature drift, parameter interference due to shared architectures, and zero-shot capability erosion.<n>This survey aims to serve as a comprehensive and diagnostic reference for researchers developing lifelong vision-language systems.
arXiv Detail & Related papers (2025-08-06T09:03:10Z)
CogDual: Enhancing Dual Cognition of LLMs via Reinforcement Learning with Implicit Rule-Based Rewards [53.36917093757101]
Role-Playing Language Agents (RPLAs) have emerged as a significant application direction for Large Language Models (LLMs)<n>We introduce textbfCogDual, a novel RPLA adopting a textitcognize-then-respond reasoning paradigm.<n>By jointly modeling external situational awareness and internal self-awareness, CogDual generates responses with improved character consistency and contextual alignment.
arXiv Detail & Related papers (2025-07-23T02:26:33Z)
Try Harder: Hard Sample Generation and Learning for Clothes-Changing Person Re-ID [4.256800812615341]
Hard samples pose a significant challenge in person re-identification (ReID) tasks.<n>Their inherent ambiguity or similarity, coupled with the lack of explicit definitions, makes them a fundamental bottleneck.<n>We propose a novel multimodal-guided Hard Sample Generation and Learning framework.
arXiv Detail & Related papers (2025-07-15T09:14:01Z)
CKAA: Cross-subspace Knowledge Alignment and Aggregation for Robust Continual Learning [80.18781219542016]
Continual Learning (CL) empowers AI models to continuously learn from sequential task streams.<n>Recent parameter-efficient fine-tuning (PEFT)-based CL methods have garnered increasing attention due to their superior performance.<n>We propose Cross-subspace Knowledge Alignment and Aggregation (CKAA) to enhance robustness against misleading task-ids.
arXiv Detail & Related papers (2025-07-13T03:11:35Z)
PS-ReID: Advancing Person Re-Identification and Precise Segmentation with Multimodal Retrieval [38.530536338075684]
Person re-identification (ReID) plays a critical role in applications such as security surveillance and criminal investigations.<n>We propose bf PS-ReID, a multimodal model that combines image and text inputs to enhance ReID performance.<n> Experimental results demonstrate that PS-ReID significantly outperforms unimodal query-based models in both ReID and segmentation tasks.
arXiv Detail & Related papers (2025-03-27T15:14:03Z)
Compositional Image Retrieval via Instruction-Aware Contrastive Learning [40.54022628032561]
Composed Image Retrieval (CIR) involves retrieving a target image based on a composed query of an image paired with text that specifies modifications or changes to the visual reference.<n>In practice, due to the scarcity of annotated data in downstream tasks, Zero-Shot CIR (ZS-CIR) is desirable.<n>We propose a novel embedding method utilizing an instruction-tuned Multimodal LLM (MLLM) to generate composed representation.
arXiv Detail & Related papers (2024-12-07T22:46:52Z)
Cross-Modal Few-Shot Learning: a Generative Transfer Learning Framework [58.362064122489166]
This paper introduces the Cross-modal Few-Shot Learning task, which aims to recognize instances across multiple modalities while relying on scarce labeled data.<n>We propose a Generative Transfer Learning framework by simulating how humans abstract and generalize concepts.<n>We show that the GTL achieves state-of-the-art performance across seven multi-modal datasets across RGB-Sketch, RGB-Infrared, and RGB-Depth.
arXiv Detail & Related papers (2024-10-14T16:09:38Z)
Interactive Continual Learning: Fast and Slow Thinking [19.253164551254734]
This paper presents a novel Interactive Continual Learning framework, enabled by collaborative interactions among models of various sizes. To improve memory retrieval in System1, we introduce the CL-vMF mechanism, based on the von Mises-Fisher (vMF) distribution. Comprehensive evaluation of our proposed ICL demonstrates significant resistance to forgetting and superior performance relative to existing methods.
arXiv Detail & Related papers (2024-03-05T03:37:28Z)
VILLS -- Video-Image Learning to Learn Semantics for Person Re-Identification [51.89551385538251]
We propose VILLS (Video-Image Learning to Learn Semantics), a self-supervised method that jointly learns spatial and temporal features from images and videos. VILLS first designs a local semantic extraction module that adaptively extracts semantically consistent and robust spatial features. Then, VILLS designs a unified feature learning and adaptation module to represent image and video modalities in a consistent feature space.
arXiv Detail & Related papers (2023-11-27T19:30:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.