Local-Global Associative Frame Assemble in Video Re-ID
- URL: http://arxiv.org/abs/2110.12018v1
- Date: Fri, 22 Oct 2021 19:07:39 GMT
- Title: Local-Global Associative Frame Assemble in Video Re-ID
- Authors: Qilei Li, Jiabo Huang, Shaogang Gong
- Abstract summary: Noisy and unrepresentative frames in automatically generated object bounding boxes from video sequences cause challenges in learning discriminative representations in video re-identification (Re-ID)
Most existing methods tackle this problem by assessing the importance of video frames according to either their local part alignments or global appearance correlations separately.
In this work, we explore jointly both local alignments and global correlations with further consideration of their mutual promotion/reinforcement.
- Score: 57.7470971197962
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Noisy and unrepresentative frames in automatically generated object bounding
boxes from video sequences cause significant challenges in learning
discriminative representations in video re-identification (Re-ID). Most
existing methods tackle this problem by assessing the importance of video
frames according to either their local part alignments or global appearance
correlations separately. However, given the diverse and unknown sources of
noise which usually co-exist in captured video data, existing methods have not
been effective satisfactorily. In this work, we explore jointly both local
alignments and global correlations with further consideration of their mutual
promotion/reinforcement so to better assemble complementary discriminative
Re-ID information within all the relevant frames in video tracklets.
Specifically, we concurrently optimise a local aligned quality (LAQ) module
that distinguishes the quality of each frame based on local alignments, and a
global correlated quality (GCQ) module that estimates global appearance
correlations. With the help of a local-assembled global appearance prototype,
we associate LAQ and GCQ to exploit their mutual complement. Extensive
experiments demonstrate the superiority of the proposed model against
state-of-the-art methods on five Re-ID benchmarks, including MARS, Duke-Video,
Duke-SI, iLIDS-VID, and PRID2011.
Related papers
- Global Meets Local: Effective Multi-Label Image Classification via
Category-Aware Weak Supervision [37.761378069277676]
This paper builds a unified framework to perform effective noisy-proposal suppression.
We develop a cross-granularity attention module to explore the complementary information between global and local features.
Our framework achieves superior performance over state-of-the-art methods.
arXiv Detail & Related papers (2022-11-23T05:39:17Z) - Context Sensing Attention Network for Video-based Person
Re-identification [20.865710012336724]
Video-based person re-identification (ReID) is challenging due to the presence of various interferences in video frames.
Recent approaches handle this problem using temporal aggregation strategies.
We propose a novel Context Sensing Attention Network (CSA-Net), which improves both the frame feature extraction and temporal aggregation steps.
arXiv Detail & Related papers (2022-07-06T12:48:27Z) - Exploring Intra- and Inter-Video Relation for Surgical Semantic Scene
Segmentation [58.74791043631219]
We propose a novel framework STswinCL that explores the complementary intra- and inter-video relations to boost segmentation performance.
We extensively validate our approach on two public surgical video benchmarks, including EndoVis18 Challenge and CaDIS dataset.
Experimental results demonstrate the promising performance of our method, which consistently exceeds previous state-of-the-art approaches.
arXiv Detail & Related papers (2022-03-29T05:52:23Z) - Global2Local: A Joint-Hierarchical Attention for Video Captioning [123.12188554567079]
We propose a novel joint-hierarchical attention model for video captioning, which embeds the key clips, the key frames and the key regions jointly into the captioning model.
Such a joint-hierarchical attention model first conducts a global selection to identify key frames, followed by a Gumbel sampling operation to identify further key regions based on the key frames.
arXiv Detail & Related papers (2022-03-13T14:31:54Z) - Exploring Global Diversity and Local Context for Video Summarization [4.452227592307381]
Video summarization aims to automatically generate a diverse and concise summary which is useful in large-scale video processing.
Most of methods tend to adopt self attention mechanism across video frames, which fails to model the diversity of video frames.
We propose global diverse attention by using the squared Euclidean distance instead to compute the affinities.
arXiv Detail & Related papers (2022-01-27T06:56:01Z) - Context-aware Biaffine Localizing Network for Temporal Sentence
Grounding [61.18824806906945]
This paper addresses the problem of temporal sentence grounding (TSG)
TSG aims to identify the temporal boundary of a specific segment from an untrimmed video by a sentence query.
We propose a novel localization framework that scores all pairs of start and end indices within the video simultaneously with a biaffine mechanism.
arXiv Detail & Related papers (2021-03-22T03:13:05Z) - Watching You: Global-guided Reciprocal Learning for Video-based Person
Re-identification [82.6971648465279]
We propose a novel Global-guided Reciprocal Learning framework for video-based person Re-ID.
Our approach can achieve better performance than other state-of-the-art approaches.
arXiv Detail & Related papers (2021-03-07T12:27:42Z) - Multi-Granularity Reference-Aided Attentive Feature Aggregation for
Video-based Person Re-identification [98.7585431239291]
Video-based person re-identification aims at matching the same person across video clips.
In this paper, we propose an attentive feature aggregation module, namely Multi-Granularity Reference-Attentive Feature aggregation module MG-RAFA.
Our framework achieves the state-of-the-art ablation performance on three benchmark datasets.
arXiv Detail & Related papers (2020-03-27T03:49:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.