Related papers: FlexiReID: Adaptive Mixture of Expert for Multi-Modal Person Re-Identification

FlexiReID: Adaptive Mixture of Expert for Multi-Modal Person Re-Identification

URL: http://arxiv.org/abs/2510.15595v1
Date: Fri, 17 Oct 2025 12:41:05 GMT
Title: FlexiReID: Adaptive Mixture of Expert for Multi-Modal Person Re-Identification
Authors: Zhen Sun, Lei Tan, Yunhang Shen, Chengmao Cai, Xing Sun, Pingyang Dai, Liujuan Cao, Rongrong Ji,
Abstract summary: FlexiReID is a flexible framework that supports seven retrieval modes across four modalities.<n>We construct CIRS-PEDES, a unified dataset extending four popular Re-ID datasets to include all four modalities.
Score: 88.61193805417024
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Multimodal person re-identification (Re-ID) aims to match pedestrian images across different modalities. However, most existing methods focus on limited cross-modal settings and fail to support arbitrary query-retrieval combinations, hindering practical deployment. We propose FlexiReID, a flexible framework that supports seven retrieval modes across four modalities: rgb, infrared, sketches, and text. FlexiReID introduces an adaptive mixture-of-experts (MoE) mechanism to dynamically integrate diverse modality features and a cross-modal query fusion module to enhance multimodal feature extraction. To facilitate comprehensive evaluation, we construct CIRS-PEDES, a unified dataset extending four popular Re-ID datasets to include all four modalities. Extensive experiments demonstrate that FlexiReID achieves state-of-the-art performance and offers strong generalization in complex scenarios.

Related papers

PRISM: Personalized Recommendation via Information Synergy Module [12.797662213207936]
PRISM is a plug-and-play framework for sequential recommendation (SR)<n>It decomposes multimodal information into unique, redundant, and synergistic components.<n>Experiments on four datasets and three SR backbones demonstrate its effectiveness and versatility.
arXiv Detail & Related papers (2026-01-16T02:17:54Z)
OmniSegmentor: A Flexible Multi-Modal Learning Framework for Semantic Segmentation [74.55725909072903]
We propose a novel multi-modal learning framework, termed OmniSegmentor.<n>Based on ImageNet, we assemble a large-scale dataset for multi-modal pretraining, called ImageNeXt.<n>We introduce a universal multi-modal pretraining framework that consistently amplifies the model's perceptual capabilities across various scenarios.
arXiv Detail & Related papers (2025-09-18T15:52:44Z)
ReID5o: Achieving Omni Multi-modal Person Re-identification in a Single Model [38.4111384634895]
We investigate a new challenging problem called Omni Multi-modal Person Re-identification (OM-ReID)<n>We construct ORBench, the first high-quality multi-modal dataset comprising 1,000 unique identities across five modalities.<n>We also propose ReID5o, a novel multi-modal learning framework for person ReID.
arXiv Detail & Related papers (2025-06-11T04:26:13Z)
Flex-MoE: Modeling Arbitrary Modality Combination via the Flexible Mixture-of-Experts [31.395361664653677]
We propose Flex-MoE, a new framework designed to flexibly incorporate arbitrary modality combinations. We evaluate Flex-MoE on the ADNI dataset, which encompasses four modalities in the Alzheimer's Disease domain, as well as on the MIMIC-IV dataset.
arXiv Detail & Related papers (2024-10-10T09:37:21Z)
All in One Framework for Multimodal Re-identification in the Wild [58.380708329455466]
multimodal learning paradigm for ReID introduced, referred to as All-in-One (AIO) AIO harnesses a frozen pre-trained big model as an encoder, enabling effective multimodal retrieval without additional fine-tuning. Experiments on cross-modal and multimodal ReID reveal that AIO not only adeptly handles various modal data but also excels in challenging contexts.
arXiv Detail & Related papers (2024-05-08T01:04:36Z)
Dynamic Enhancement Network for Partial Multi-modality Person Re-identification [52.70235136651996]
We design a novel dynamic enhancement network (DENet), which allows missing arbitrary modalities while maintaining the representation ability of multiple modalities. Since the missing state might be changeable, we design a dynamic enhancement module, which dynamically enhances modality features according to the missing state in an adaptive manner.
arXiv Detail & Related papers (2023-05-25T06:22:01Z)
FM-ViT: Flexible Modal Vision Transformers for Face Anti-Spoofing [88.6654909354382]
We present a pure transformer-based framework, dubbed the Flexible Modal Vision Transformer (FM-ViT) for face anti-spoofing. FM-ViT can flexibly target any single-modal (i.e., RGB) attack scenarios with the help of available multi-modal data. Experiments demonstrate that the single model trained based on FM-ViT can not only flexibly evaluate different modal samples, but also outperforms existing single-modal frameworks by a large margin.
arXiv Detail & Related papers (2023-05-05T04:28:48Z)
Flexible-Modal Face Anti-Spoofing: A Benchmark [66.18359076810549]
Face anti-spoofing (FAS) plays a vital role in securing face recognition systems from presentation attacks. We establish the first flexible-modal FAS benchmark with the principle train one for all' We also investigate prevalent deep models and feature fusion strategies for flexible-modal FAS.
arXiv Detail & Related papers (2022-02-16T16:55:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.