How Image Generation Helps Visible-to-Infrared Person Re-Identification?
- URL: http://arxiv.org/abs/2210.01585v1
- Date: Tue, 4 Oct 2022 13:09:29 GMT
- Title: How Image Generation Helps Visible-to-Infrared Person Re-Identification?
- Authors: Honghu Pan and Yongyong Chen and Yunqi He and Xin Li and Zhenyu He
- Abstract summary: Flow2Flow is a framework that can jointly achieve training sample expansion and cross-modality image generation for V2I person ReID.
For the purpose of identity alignment and modality alignment of generated images, we develop adversarial training strategies to train Flow2Flow.
Experimental results on SYSU-MM01 and RegDB demonstrate that both training sample expansion and cross-modality image generation can significantly improve V2I ReID accuracy.
- Score: 15.951145523749735
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Compared to visible-to-visible (V2V) person re-identification (ReID), the
visible-to-infrared (V2I) person ReID task is more challenging due to the lack
of sufficient training samples and the large cross-modality discrepancy.
To this end, we propose Flow2Flow, a unified framework that could jointly
achieve training sample expansion and cross-modality image generation for V2I
person ReID.
Specifically, Flow2Flow learns bijective transformations from both the
visible image domain and the infrared domain to a shared isotropic Gaussian
domain with an invertible visible flow-based generator and an infrared one,
respectively.
With Flow2Flow, we are able to generate pseudo training samples by the
transformation from latent Gaussian noises to visible or infrared images, and
generate cross-modality images by transformations from existing-modality images
to latent Gaussian noises to missing-modality images.
For the purpose of identity alignment and modality alignment of generated
images, we develop adversarial training strategies to train Flow2Flow.
Specifically, we design an image encoder and a modality discriminator for
each modality.
The image encoder encourages the generated images to be similar to real
images of the same identity via identity adversarial training, and the modality
discriminator makes the generated images modal-indistinguishable from real
images via modality adversarial training.
Experimental results on SYSU-MM01 and RegDB demonstrate that both training
sample expansion and cross-modality image generation can significantly improve
V2I ReID accuracy.
Related papers
- Uni-Renderer: Unifying Rendering and Inverse Rendering Via Dual Stream Diffusion [25.5139351758218]
Rendering and inverse rendering are pivotal tasks in computer vision and graphics.
We propose a data-driven method that jointly models rendering and inverse rendering as two conditional generation tasks.
We will open-source our training and inference code to the public, fostering further research and development in this area.
arXiv Detail & Related papers (2024-12-19T16:57:45Z) - TokenFlow: Unified Image Tokenizer for Multimodal Understanding and Generation [26.29803524047736]
TokenFlow is a novel unified image tokenizer that bridges the gap between multimodal understanding and generation.
We demonstrate for the first time that discrete visual input can surpass LLaVA-1.5 13B in understanding performance.
We also establish state-of-the-art performance in autoregressive image generation with a GenEval score of 0.55 at 256*256 resolution.
arXiv Detail & Related papers (2024-12-04T06:46:55Z) - DiffBIR: Towards Blind Image Restoration with Generative Diffusion Prior [70.46245698746874]
We present DiffBIR, a general restoration pipeline that could handle different blind image restoration tasks.
DiffBIR decouples blind image restoration problem into two stages: 1) degradation removal: removing image-independent content; 2) information regeneration: generating the lost image content.
In the first stage, we use restoration modules to remove degradations and obtain high-fidelity restored results.
For the second stage, we propose IRControlNet that leverages the generative ability of latent diffusion models to generate realistic details.
arXiv Detail & Related papers (2023-08-29T07:11:52Z) - DiffDis: Empowering Generative Diffusion Model with Cross-Modal
Discrimination Capability [75.9781362556431]
We propose DiffDis to unify the cross-modal generative and discriminative pretraining into one single framework under the diffusion process.
We show that DiffDis outperforms single-task models on both the image generation and the image-text discriminative tasks.
arXiv Detail & Related papers (2023-08-18T05:03:48Z) - Unsupervised Misaligned Infrared and Visible Image Fusion via
Cross-Modality Image Generation and Registration [59.02821429555375]
We present a robust cross-modality generation-registration paradigm for unsupervised misaligned infrared and visible image fusion.
To better fuse the registered infrared images and visible images, we present a feature Interaction Fusion Module (IFM)
arXiv Detail & Related papers (2022-05-24T07:51:57Z) - Towards Homogeneous Modality Learning and Multi-Granularity Information
Exploration for Visible-Infrared Person Re-Identification [16.22986967958162]
Visible-infrared person re-identification (VI-ReID) is a challenging and essential task, which aims to retrieve a set of person images over visible and infrared camera views.
Previous methods attempt to apply generative adversarial network (GAN) to generate the modality-consisitent data.
In this work, we address cross-modality matching problem with Aligned Grayscale Modality (AGM), an unified dark-line spectrum that reformulates visible-infrared dual-mode learning as a gray-gray single-mode learning problem.
arXiv Detail & Related papers (2022-04-11T03:03:19Z) - Diverse Image Inpainting with Bidirectional and Autoregressive
Transformers [55.21000775547243]
We propose BAT-Fill, an image inpainting framework with a novel bidirectional autoregressive transformer (BAT)
BAT-Fill inherits the merits of transformers and CNNs in a two-stage manner, which allows to generate high-resolution contents without being constrained by the quadratic complexity of attention in transformers.
arXiv Detail & Related papers (2021-04-26T03:52:27Z) - IMAGINE: Image Synthesis by Image-Guided Model Inversion [79.4691654458141]
We introduce an inversion based method, denoted as IMAge-Guided model INvErsion (IMAGINE), to generate high-quality and diverse images.
We leverage the knowledge of image semantics from a pre-trained classifier to achieve plausible generations.
IMAGINE enables the synthesis procedure to simultaneously 1) enforce semantic specificity constraints during the synthesis, 2) produce realistic images without generator training, and 3) give users intuitive control over the generation process.
arXiv Detail & Related papers (2021-04-13T02:00:24Z) - SFANet: A Spectrum-aware Feature Augmentation Network for
Visible-Infrared Person Re-Identification [12.566284647658053]
We propose a novel spectrum-aware feature augementation network named SFANet for cross-modality matching problem.
Learning with grayscale-spectrum images, our model can apparently reduce modality discrepancy and detect inner structure relations.
In feature-level, we improve the conventional two-stream network through balancing the number of specific and sharable convolutional blocks.
arXiv Detail & Related papers (2021-02-24T08:57:32Z) - Self-Supervised Linear Motion Deblurring [112.75317069916579]
Deep convolutional neural networks are state-of-the-art for image deblurring.
We present a differentiable reblur model for self-supervised motion deblurring.
Our experiments demonstrate that self-supervised single image deblurring is really feasible.
arXiv Detail & Related papers (2020-02-10T20:15:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.