Image-Based Virtual Try-On: A Survey
- URL: http://arxiv.org/abs/2311.04811v3
- Date: Wed, 1 May 2024 14:56:23 GMT
- Title: Image-Based Virtual Try-On: A Survey
- Authors: Dan Song, Xuanpu Zhang, Juan Zhou, Weizhi Nie, Ruofeng Tong, Mohan Kankanhalli, An-An Liu,
- Abstract summary: Image-based virtual try-on aims to synthesize a naturally dressed person image with a clothing image, showing both research significance and commercial potential.
We provide a comprehensive analysis of the state-of-the-art techniques and methodologies in aspects of pipeline architecture, person representation and key modules such as try-on indication.
We propose a new semantic criteria with CLIP, and evaluate representative methods with uniformly implemented evaluation metrics on the same dataset.
- Score: 38.6177665201224
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Image-based virtual try-on aims to synthesize a naturally dressed person image with a clothing image, which revolutionizes online shopping and inspires related topics within image generation, showing both research significance and commercial potential. However, there is a gap between current research progress and commercial applications and an absence of comprehensive overview of this field to accelerate the development. In this survey, we provide a comprehensive analysis of the state-of-the-art techniques and methodologies in aspects of pipeline architecture, person representation and key modules such as try-on indication, clothing warping and try-on stage. We propose a new semantic criteria with CLIP, and evaluate representative methods with uniformly implemented evaluation metrics on the same dataset. In addition to quantitative and qualitative evaluation of current open-source methods, unresolved issues are highlighted and future research directions are prospected to identify key trends and inspire further exploration. The uniformly implemented evaluation metrics, dataset and collected methods will be made public available at https://github.com/little-misfit/Survey-Of-Virtual-Try-On.
Related papers
- Place Recognition Meet Multiple Modalitie: A Comprehensive Review, Current Challenges and Future Directions [2.4775350526606355]
We review recent advancements in place recognition, emphasizing three methodological paradigms.<n>CNN-based approaches, Transformer-based frameworks, and cross-modal strategies are discussed.<n>We identify current research challenges and outline prospective directions, including domain adaptation, real-time performance, and lifelong learning, to inspire future advancements in this domain.
arXiv Detail & Related papers (2025-05-20T08:16:37Z) - A Survey on All-in-One Image Restoration: Taxonomy, Evaluation and Future Trends [67.43992456058541]
Image restoration (IR) refers to the process of improving visual quality of images while removing degradation, such as noise, blur, weather effects, and so on.
Traditional IR methods typically target specific types of degradation, which limits their effectiveness in real-world scenarios with complex distortions.
The all-in-one image restoration (AiOIR) paradigm has emerged, offering a unified framework that adeptly addresses multiple degradation types.
arXiv Detail & Related papers (2024-10-19T11:11:09Z) - Mismatched: Evaluating the Limits of Image Matching Approaches and Benchmarks [9.388897214344572]
Three-dimensional (3D) reconstruction from two-dimensional images is an active research field in computer vision.
Traditionally, parametric techniques have been employed for this task.
Recent advancements have seen a shift towards learning-based methods.
arXiv Detail & Related papers (2024-08-29T11:16:34Z) - Zero-Shot Object-Centric Representation Learning [72.43369950684057]
We study current object-centric methods through the lens of zero-shot generalization.
We introduce a benchmark comprising eight different synthetic and real-world datasets.
We find that training on diverse real-world images improves transferability to unseen scenarios.
arXiv Detail & Related papers (2024-08-17T10:37:07Z) - Advances in Diffusion Models for Image Data Augmentation: A Review of Methods, Models, Evaluation Metrics and Future Research Directions [6.2719115566879236]
Diffusion Models (DMs) have emerged as a powerful tool for image data augmentation.
DMs generate realistic and diverse images by learning the underlying data distribution.
Current challenges and future research directions in the field are discussed.
arXiv Detail & Related papers (2024-07-04T18:06:48Z) - Cross-view geo-localization: a survey [1.3686993145787065]
Cross-view geo-localization has garnered notable attention in the realm of computer vision, spurred by the widespread availability of copious geotagged datasets.
This paper provides a thorough survey of cutting-edge methodologies, techniques, and associated challenges that are integral to this domain.
arXiv Detail & Related papers (2024-06-14T05:14:54Z) - Deep Learning-Based Object Pose Estimation: A Comprehensive Survey [73.74933379151419]
We discuss the recent advances in deep learning-based object pose estimation.
Our survey also covers multiple input data modalities, degrees-of-freedom of output poses, object properties, and downstream tasks.
arXiv Detail & Related papers (2024-05-13T14:44:22Z) - Local Feature Matching Using Deep Learning: A Survey [19.322545965903608]
Local feature matching enjoys wide-ranging applications in the realm of computer vision, encompassing domains such as image retrieval, 3D reconstruction, and object recognition.
In recent years, the introduction of deep learning models has sparked widespread exploration into local feature matching techniques.
The paper also explores the practical application of local feature matching in diverse domains such as Structure from Motion, Remote Sensing Image Registration, and Medical Image Registration.
arXiv Detail & Related papers (2024-01-31T04:32:41Z) - Revisiting Self-supervised Learning of Speech Representation from a
Mutual Information Perspective [68.20531518525273]
We take a closer look into existing self-supervised methods of speech from an information-theoretic perspective.
We use linear probes to estimate the mutual information between the target information and learned representations.
We explore the potential of evaluating representations in a self-supervised fashion, where we estimate the mutual information between different parts of the data without using any labels.
arXiv Detail & Related papers (2024-01-16T21:13:22Z) - Dataset and Case Studies for Visual Near-Duplicates Detection in the
Context of Social Media [11.569861200214294]
Tracking visually-similar content is an important task for studying and analyzing social phenomena related to the spread of such content.
We build a dataset of social media images and evaluate visual near-duplicates retrieval methods based on image retrieval and several advanced visual feature extraction methods.
arXiv Detail & Related papers (2022-03-14T15:10:30Z) - Deep Co-Attention Network for Multi-View Subspace Learning [73.3450258002607]
We propose a deep co-attention network for multi-view subspace learning.
It aims to extract both the common information and the complementary information in an adversarial setting.
In particular, it uses a novel cross reconstruction loss and leverages the label information to guide the construction of the latent representation.
arXiv Detail & Related papers (2021-02-15T18:46:44Z) - Recent Progress in Appearance-based Action Recognition [73.6405863243707]
Action recognition is a task to identify various human actions in a video.
Recent appearance-based methods have achieved promising progress towards accurate action recognition.
arXiv Detail & Related papers (2020-11-25T10:18:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.