Pose-Diversified Augmentation with Diffusion Model for Person Re-Identification
- URL: http://arxiv.org/abs/2406.16042v1
- Date: Sun, 23 Jun 2024 07:48:21 GMT
- Title: Pose-Diversified Augmentation with Diffusion Model for Person Re-Identification
- Authors: Inès Hyeonsu Kim, JoungBin Lee, Soowon Son, Woojeong Jin, Kyusun Cho, Junyoung Seo, Min-Seop Kwak, Seokju Cho, JeongYeol Baek, Byeongwon Lee, Seungryong Kim,
- Abstract summary: Person re-identification (Re-ID) often faces challenges due to variations in human poses and camera viewpoints.
Previous methods have attempted to address these issues through data augmentation.
We propose Diff-ID, a novel data augmentation approach that incorporates sparse and underrepresented human pose and camera viewpoint examples.
- Score: 28.794827024749658
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Person re-identification (Re-ID) often faces challenges due to variations in human poses and camera viewpoints, which significantly affect the appearance of individuals across images. Existing datasets frequently lack diversity and scalability in these aspects, hindering the generalization of Re-ID models to new camera systems. Previous methods have attempted to address these issues through data augmentation; however, they rely on human poses already present in the training dataset, failing to effectively reduce the human pose bias in the dataset. We propose Diff-ID, a novel data augmentation approach that incorporates sparse and underrepresented human pose and camera viewpoint examples into the training data, addressing the limited diversity in the original training data distribution. Our objective is to augment a training dataset that enables existing Re-ID models to learn features unbiased by human pose and camera viewpoint variations. To achieve this, we leverage the knowledge of pre-trained large-scale diffusion models. Using the SMPL model, we simultaneously capture both the desired human poses and camera viewpoints, enabling realistic human rendering. The depth information provided by the SMPL model indirectly conveys the camera viewpoints. By conditioning the diffusion model on both the human pose and camera viewpoint concurrently through the SMPL model, we generate realistic images with diverse human poses and camera viewpoints. Qualitative results demonstrate the effectiveness of our method in addressing human pose bias and enhancing the generalizability of Re-ID models compared to other data augmentation-based Re-ID approaches. The performance gains achieved by training Re-ID models on our offline augmented dataset highlight the potential of our proposed framework in improving the scalability and generalizability of person Re-ID models.
Related papers
- Synthesizing Efficient Data with Diffusion Models for Person Re-Identification Pre-Training [51.87027943520492]
We present a novel paradigm Diffusion-ReID to efficiently augment and generate diverse images based on known identities.
Benefiting from our proposed paradigm, we first create a new large-scale person Re-ID dataset Diff-Person, which consists of over 777K images from 5,183 identities.
arXiv Detail & Related papers (2024-06-10T06:26:03Z) - DEEM: Diffusion Models Serve as the Eyes of Large Language Models for Image Perception [66.88792390480343]
We propose DEEM, a simple and effective approach that utilizes the generative feedback of diffusion models to align the semantic distributions of the image encoder.
DEEM exhibits enhanced robustness and a superior capacity to alleviate hallucinations while utilizing fewer trainable parameters, less pre-training data, and a smaller base model size.
arXiv Detail & Related papers (2024-05-24T05:46:04Z) - Data Augmentation in Human-Centric Vision [54.97327269866757]
This survey presents a comprehensive analysis of data augmentation techniques in human-centric vision tasks.
It delves into a wide range of research areas including person ReID, human parsing, human pose estimation, and pedestrian detection.
Our work categorizes data augmentation methods into two main types: data generation and data perturbation.
arXiv Detail & Related papers (2024-03-13T16:05:18Z) - Diffusion Models Trained with Large Data Are Transferable Visual Models [49.84679952948808]
We show that it is possible to achieve remarkable transferable performance on fundamental vision perception tasks using a moderate amount of target data.
Results showcase the remarkable transferability of the backbone of diffusion models across diverse tasks and real-world datasets.
arXiv Detail & Related papers (2024-03-10T04:23:24Z) - Pose Invariant Person Re-Identification using Robust Pose-transformation
GAN [11.338815177557645]
Person re-identification (re-ID) aims to retrieve a person's images from an image gallery, given a single instance of the person of interest.
Despite several advancements, learning discriminative identity-sensitive and viewpoint invariant features for robust Person Re-identification is a major challenge owing to large pose variation of humans.
This paper proposes a re-ID pipeline that utilizes the image generation capability of Generative Adversarial Networks combined with pose regression and feature fusion to achieve pose invariant feature learning.
arXiv Detail & Related papers (2021-04-11T15:47:03Z) - Unsupervised Pre-training for Person Re-identification [90.98552221699508]
We present a large scale unlabeled person re-identification (Re-ID) dataset "LUPerson"
We make the first attempt of performing unsupervised pre-training for improving the generalization ability of the learned person Re-ID feature representation.
arXiv Detail & Related papers (2020-12-07T14:48:26Z) - Self-Supervised 3D Human Pose Estimation via Part Guided Novel Image
Synthesis [72.34794624243281]
We propose a self-supervised learning framework to disentangle variations from unlabeled video frames.
Our differentiable formalization, bridging the representation gap between the 3D pose and spatial part maps, allows us to operate on videos with diverse camera movements.
arXiv Detail & Related papers (2020-04-09T07:55:01Z) - MirrorNet: A Deep Bayesian Approach to Reflective 2D Pose Estimation
from Human Images [42.27703025887059]
The main problems with the standard supervised approach are that it often yields anatomically implausible poses.
We propose a semi-supervised method that can make effective use of images with and without pose annotations.
The results of experiments show that the proposed reflective architecture makes estimated poses anatomically plausible.
arXiv Detail & Related papers (2020-04-08T05:02:48Z) - A Robust Pose Transformational GAN for Pose Guided Person Image
Synthesis [9.570395744724461]
We propose a simple yet effective pose transformation GAN by utilizing the Residual Learning method without any additional feature learning to generate a given human image in any arbitrary pose.
Using effective data augmentation techniques and cleverly tuning the model, we achieve robustness in terms of illumination, occlusion, distortion and scale.
We present a detailed study, both qualitative and quantitative, to demonstrate the superiority of our model over the existing methods on two large datasets.
arXiv Detail & Related papers (2020-01-05T15:32:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.