Vision Transformer based Random Walk for Group Re-Identification
- URL: http://arxiv.org/abs/2410.05808v1
- Date: Tue, 8 Oct 2024 08:41:14 GMT
- Title: Vision Transformer based Random Walk for Group Re-Identification
- Authors: Guoqing Zhang, Tianqi Liu, Wenxuan Fang, Yuhui Zheng,
- Abstract summary: Group re-identification (re-ID) aims to match groups with the same people under different cameras.
We propose a novel vision transformer based random walk framework for group re-ID.
- Score: 15.63292108454152
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Group re-identification (re-ID) aims to match groups with the same people under different cameras, mainly involves the challenges of group members and layout changes well. Most existing methods usually use the k-nearest neighbor algorithm to update node features to consider changes in group membership, but these methods cannot solve the problem of group layout changes. To this end, we propose a novel vision transformer based random walk framework for group re-ID. Specifically, we design a vision transformer based on a monocular depth estimation algorithm to construct a graph through the average depth value of pedestrian features to fully consider the impact of camera distance on group members relationships. In addition, we propose a random walk module to reconstruct the graph by calculating affinity scores between target and gallery images to remove pedestrians who do not belong to the current group. Experimental results show that our framework is superior to most methods.
Related papers
- The Research of Group Re-identification from Multiple Cameras [0.4955551943523977]
Group re-identification is very challenging since it is not only interfered by view-point and human pose variations in the traditional re-identification tasks.
This paper introduces a novel approach which leverages the multi-granularity information inside groups to facilitate group re-identification.
arXiv Detail & Related papers (2024-07-19T18:28:13Z) - AggNet: Learning to Aggregate Faces for Group Membership Verification [20.15673797674449]
In some face recognition applications, we are interested to verify whether an individual is a member of a group, without revealing their identity.
Some existing methods, propose a mechanism for quantizing precomputed face descriptors into discrete embeddings and aggregating them into one group representation.
We propose a deep architecture that jointly learns face descriptors and the aggregation mechanism for better end-to-end performances.
arXiv Detail & Related papers (2022-06-17T10:48:34Z) - Green Hierarchical Vision Transformer for Masked Image Modeling [54.14989750044489]
We present an efficient approach for Masked Image Modeling with hierarchical Vision Transformers (ViTs)
We design a Group Window Attention scheme following the Divide-and-Conquer strategy.
We further improve the grouping strategy via the Dynamic Programming algorithm to minimize the overall cost of the attention on the grouped patches.
arXiv Detail & Related papers (2022-05-26T17:34:42Z) - Causal Scene BERT: Improving object detection by searching for
challenging groups of data [125.40669814080047]
Computer vision applications rely on learning-based perception modules parameterized with neural networks for tasks like object detection.
These modules frequently have low expected error overall but high error on atypical groups of data due to biases inherent in the training process.
Our main contribution is a pseudo-automatic method to discover such groups in foresight by performing causal interventions on simulated scenes.
arXiv Detail & Related papers (2022-02-08T05:14:16Z) - Learning Multi-Attention Context Graph for Group-Based Re-Identification [214.84551361855443]
Learning to re-identify or retrieve a group of people across non-overlapped camera systems has important applications in video surveillance.
In this work, we consider employing context information for identifying groups of people, i.e., group re-id.
We propose a novel unified framework based on graph neural networks to simultaneously address the group-based re-id tasks.
arXiv Detail & Related papers (2021-04-29T09:57:47Z) - Learning Spatial Context with Graph Neural Network for Multi-Person Pose
Grouping [71.59494156155309]
Bottom-up approaches for image-based multi-person pose estimation consist of two stages: keypoint detection and grouping.
In this work, we formulate the grouping task as a graph partitioning problem, where we learn the affinity matrix with a Graph Neural Network (GNN)
The learned geometry-based affinity is further fused with appearance-based affinity to achieve robust keypoint association.
arXiv Detail & Related papers (2021-04-06T09:21:14Z) - Group-CAM: Group Score-Weighted Visual Explanations for Deep
Convolutional Networks [4.915848175689936]
We propose an efficient saliency map generation method, called Group score-weighted Class Activation Mapping (Group-CAM)
Group-CAM is efficient yet effective, which only requires dozens of queries to the network while producing target-related saliency maps.
arXiv Detail & Related papers (2021-03-25T14:16:02Z) - Joint Noise-Tolerant Learning and Meta Camera Shift Adaptation for
Unsupervised Person Re-Identification [60.36551512902312]
unsupervised person re-identification (re-ID) aims to learn discriminative models with unlabeled data.
One popular method is to obtain pseudo-label by clustering and use them to optimize the model.
In this paper, we propose a unified framework to solve both problems.
arXiv Detail & Related papers (2021-03-08T09:13:06Z) - Overcoming Data Sparsity in Group Recommendation [52.00998276970403]
Group recommender systems should be able to accurately learn not only users' personal preferences but also preference aggregation strategy.
In this paper, we take Bipartite Graphding Model (BGEM), the self-attention mechanism and Graph Convolutional Networks (GCNs) as basic building blocks to learn group and user representations in a unified way.
arXiv Detail & Related papers (2020-10-02T07:11:19Z) - Deep Grouping Model for Unified Perceptual Parsing [36.73032339428497]
The perceptual-based grouping process produces a hierarchical and compositional image representation.
We propose a deep grouping model (DGM) that tightly marries the two types of representations and defines a bottom-up and a top-down process for feature exchanging.
The model achieves state-of-the-art results while having a small computational overhead compared to other contextual-based segmentation models.
arXiv Detail & Related papers (2020-03-25T21:16:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.