Generalizable Person Re-Identification via Viewpoint Alignment and
Fusion
- URL: http://arxiv.org/abs/2212.02398v1
- Date: Mon, 5 Dec 2022 16:24:09 GMT
- Title: Generalizable Person Re-Identification via Viewpoint Alignment and
Fusion
- Authors: Bingliang Jiao, Lingqiao Liu, Liying Gao, Guosheng Lin, Ruiqi Wu,
Shizhou Zhang, Peng Wang, and Yanning Zhang
- Abstract summary: This work proposes to use a 3D dense pose estimation model and a texture mapping module to map pedestrian images to canonical view images.
Due to the imperfection of the texture mapping module, the canonical view images may lose the discriminative detail clues from the original images.
We show that our method can lead to superior performance over the existing approaches in various evaluation settings.
- Score: 74.30861504619851
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: In the current person Re-identification (ReID) methods, most domain
generalization works focus on dealing with style differences between domains
while largely ignoring unpredictable camera view change, which we identify as
another major factor leading to a poor generalization of ReID methods. To
tackle the viewpoint change, this work proposes to use a 3D dense pose
estimation model and a texture mapping module to map the pedestrian images to
canonical view images. Due to the imperfection of the texture mapping module,
the canonical view images may lose the discriminative detail clues from the
original images, and thus directly using them for ReID will inevitably result
in poor performance. To handle this issue, we propose to fuse the original
image and canonical view image via a transformer-based module. The key insight
of this design is that the cross-attention mechanism in the transformer could
be an ideal solution to align the discriminative texture clues from the
original image with the canonical view image, which could compensate for the
low-quality texture information of the canonical view image. Through extensive
experiments, we show that our method can lead to superior performance over the
existing approaches in various evaluation settings.
Related papers
- SHIC: Shape-Image Correspondences with no Keypoint Supervision [106.99157362200867]
Canonical surface mapping generalizes keypoint detection by assigning each pixel of an object to a corresponding point in a 3D template.
Popularised by DensePose for the analysis of humans, authors have attempted to apply the concept to more categories.
We introduce SHIC, a method to learn canonical maps without manual supervision which achieves better results than supervised methods for most categories.
arXiv Detail & Related papers (2024-07-26T17:58:59Z) - Breaking the Frame: Visual Place Recognition by Overlap Prediction [53.17564423756082]
We propose a novel visual place recognition approach based on overlap prediction, called VOP.
VOP proceeds co-visible image sections by obtaining patch-level embeddings using a Vision Transformer backbone.
Our approach uses a voting mechanism to assess overlap scores for potential database images.
arXiv Detail & Related papers (2024-06-23T20:00:20Z) - Face Feature Visualisation of Single Morphing Attack Detection [13.680968065638108]
This paper proposes an explainable visualisation of different face feature extraction algorithms.
It enables the detection of bona fide and morphing images for single morphing attack detection.
The visualisation may help to develop a Graphical User Interface for border policies.
arXiv Detail & Related papers (2023-04-25T17:51:23Z) - Semantic Layout Manipulation with High-Resolution Sparse Attention [106.59650698907953]
We tackle the problem of semantic image layout manipulation, which aims to manipulate an input image by editing its semantic label map.
A core problem of this task is how to transfer visual details from the input images to the new semantic layout while making the resulting image visually realistic.
We propose a high-resolution sparse attention module that effectively transfers visual details to new layouts at a resolution up to 512x512.
arXiv Detail & Related papers (2020-12-14T06:50:43Z) - Learning Edge-Preserved Image Stitching from Large-Baseline Deep
Homography [32.28310831466225]
We propose an image stitching learning framework, which consists of a large-baseline deep homography module and an edge-preserved deformation module.
Our method is superior to the existing learning method and shows competitive performance with state-of-the-art traditional methods.
arXiv Detail & Related papers (2020-12-11T08:43:30Z) - Cross-Descriptor Visual Localization and Mapping [81.16435356103133]
Visual localization and mapping is the key technology underlying the majority of Mixed Reality and robotics systems.
We present three novel scenarios for localization and mapping which require the continuous update of feature representations.
Our data-driven approach is agnostic to the feature descriptor type, has low computational requirements, and scales linearly with the number of description algorithms.
arXiv Detail & Related papers (2020-12-02T18:19:51Z) - Look here! A parametric learning based approach to redirect visual
attention [49.609412873346386]
We introduce an automatic method to make an image region more attention-capturing via subtle image edits.
Our model predicts a distinct set of global parametric transformations to be applied to the foreground and background image regions.
Our edits enable inference at interactive rates on any image size, and easily generalize to videos.
arXiv Detail & Related papers (2020-08-12T16:08:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.