Reference-Aided Part-Aligned Feature Disentangling for Video Person
Re-Identification
- URL: http://arxiv.org/abs/2103.11319v1
- Date: Sun, 21 Mar 2021 06:53:57 GMT
- Title: Reference-Aided Part-Aligned Feature Disentangling for Video Person
Re-Identification
- Authors: Guoqing Zhang, Yuhao Chen, Yang Dai, Yuhui Zheng, Yi Wu
- Abstract summary: We propose a textbfReference-textbfAided textbfPart-textbfAligned (textbfRAPA) framework to disentangle robust features of different parts.
By using both modules, the informative parts of pedestrian in videos are well aligned and more discriminative feature representation is generated.
- Score: 18.13546384207381
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, video-based person re-identification (re-ID) has drawn increasing
attention in compute vision community because of its practical application
prospects. Due to the inaccurate person detections and pose changes, pedestrian
misalignment significantly increases the difficulty of feature extraction and
matching. To address this problem, in this paper, we propose a
\textbf{R}eference-\textbf{A}ided \textbf{P}art-\textbf{A}ligned
(\textbf{RAPA}) framework to disentangle robust features of different parts.
Firstly, in order to obtain better references between different videos, a
pose-based reference feature learning module is introduced. Secondly, an
effective relation-based part feature disentangling module is explored to align
frames within each video. By means of using both modules, the informative parts
of pedestrian in videos are well aligned and more discriminative feature
representation is generated. Comprehensive experiments on three widely-used
benchmarks, i.e. iLIDS-VID, PRID-2011 and MARS datasets verify the
effectiveness of the proposed framework. Our code will be made publicly
available.
Related papers
- Dual-Modal Attention-Enhanced Text-Video Retrieval with Triplet Partial
Margin Contrastive Learning [35.404100473539195]
Text-video retrieval aims to rank relevant text/video higher than irrelevant ones.
Recent contrastive learning methods have shown promising results for text-video retrieval.
This paper improves contrastive learning using two novel techniques.
arXiv Detail & Related papers (2023-09-20T06:08:11Z) - Mitigating Representation Bias in Action Recognition: Algorithms and
Benchmarks [76.35271072704384]
Deep learning models perform poorly when applied to videos with rare scenes or objects.
We tackle this problem from two different angles: algorithm and dataset.
We show that the debiased representation can generalize better when transferred to other datasets and tasks.
arXiv Detail & Related papers (2022-09-20T00:30:35Z) - Correspondence Matters for Video Referring Expression Comprehension [64.60046797561455]
Video Referring Expression (REC) aims to localize the referent objects described in the sentence to visual regions in the video frames.
Existing methods suffer from two problems: 1) inconsistent localization results across video frames; 2) confusion between the referent and contextual objects.
We propose a novel Dual Correspondence Network (dubbed as DCNet) which explicitly enhances the dense associations in both the inter-frame and cross-modal manners.
arXiv Detail & Related papers (2022-07-21T10:31:39Z) - Exploring Motion and Appearance Information for Temporal Sentence
Grounding [52.01687915910648]
We propose a Motion-Appearance Reasoning Network (MARN) to solve temporal sentence grounding.
We develop separate motion and appearance branches to learn motion-guided and appearance-guided object relations.
Our proposed MARN significantly outperforms previous state-of-the-art methods by a large margin.
arXiv Detail & Related papers (2022-01-03T02:44:18Z) - Video-Text Pre-training with Learned Regions [59.30893505895156]
Video-Text pre-training aims at learning transferable representations from large-scale video-text pairs.
We propose a module for videotext-learning, RegionLearner, which can take into account the structure of objects during pre-training on large-scale video-text pairs.
arXiv Detail & Related papers (2021-12-02T13:06:53Z) - Support-Set Based Cross-Supervision for Video Grounding [98.29089558426399]
Support-set Based Cross-Supervision (Sscs) module can improve existing methods during training phase without extra inference cost.
The proposed Sscs module contains two main components, i.e., discriminative contrastive objective and generative caption objective.
We extensively evaluate Sscs on three challenging datasets, and show that our method can improve current state-of-the-art methods by large margins.
arXiv Detail & Related papers (2021-08-24T08:25:26Z) - Learning Multi-Granular Hypergraphs for Video-Based Person
Re-Identification [110.52328716130022]
Video-based person re-identification (re-ID) is an important research topic in computer vision.
We propose a novel graph-based framework, namely Multi-Granular Hypergraph (MGH) to better representational capabilities.
90.0% top-1 accuracy on MARS is achieved using MGH, outperforming the state-of-the-arts schemes.
arXiv Detail & Related papers (2021-04-30T11:20:02Z) - FOCAL: A Forgery Localization Framework based on Video Coding
Self-Consistency [26.834506269499094]
This paper presents a video forgery localization framework that verifies the self-consistency of coding traces between and within video frames.
The overall framework was validated in two typical forgery scenarios: temporal and spatial splicing.
Experimental results show an improvement to the state-of-the-art on temporal splicing localization and also promising performance in the newly tackled case of spatial splicing.
arXiv Detail & Related papers (2020-08-24T13:55:14Z) - Exploiting Visual Semantic Reasoning for Video-Text Retrieval [14.466809435818984]
We propose a Visual Semantic Enhanced Reasoning Network (ViSERN) to exploit reasoning between frame regions.
We perform reasoning by novel random walk rule-based graph convolutional networks to generate region features involved with semantic relations.
With the benefit of reasoning, semantic interactions between regions are considered, while the impact of redundancy is suppressed.
arXiv Detail & Related papers (2020-06-16T02:56:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.