Mobile Fitting Room: On-device Virtual Try-on via Diffusion Models
- URL: http://arxiv.org/abs/2402.01877v1
- Date: Fri, 2 Feb 2024 20:05:45 GMT
- Title: Mobile Fitting Room: On-device Virtual Try-on via Diffusion Models
- Authors: Justin Blalock, David Munechika, Harsha Karanth, Alec Helbling,
Pratham Mehta, Seongmin Lee, Duen Horng Chau
- Abstract summary: Mobile Fitting Room is the first on-device diffusion-based virtual try-on system.
A usage scenario highlights how our tool can provide a seamless, interactive virtual try-on experience for customers.
- Score: 19.10976982327356
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The growing digital landscape of fashion e-commerce calls for interactive and
user-friendly interfaces for virtually trying on clothes. Traditional try-on
methods grapple with challenges in adapting to diverse backgrounds, poses, and
subjects. While newer methods, utilizing the recent advances of diffusion
models, have achieved higher-quality image generation, the human-centered
dimensions of mobile interface delivery and privacy concerns remain largely
unexplored. We present Mobile Fitting Room, the first on-device diffusion-based
virtual try-on system. To address multiple inter-related technical challenges
such as high-quality garment placement and model compression for mobile
devices, we present a novel technical pipeline and an interface design that
enables privacy preservation and user customization. A usage scenario
highlights how our tool can provide a seamless, interactive virtual try-on
experience for customers and provide a valuable service for fashion e-commerce
businesses.
Related papers
- Self-Supervised Vision Transformer for Enhanced Virtual Clothes Try-On [21.422611451978863]
We introduce an innovative approach for virtual clothes try-on, utilizing a self-supervised Vision Transformer (ViT) and a diffusion model.
Our method emphasizes detail enhancement by contrasting local clothing image embeddings, generated by ViT, with their global counterparts.
The experimental results showcase substantial advancements in the realism and precision of details in virtual try-on experiences.
arXiv Detail & Related papers (2024-06-15T07:46:22Z) - AnyFit: Controllable Virtual Try-on for Any Combination of Attire Across Any Scenario [50.62711489896909]
AnyFit surpasses all baselines on high-resolution benchmarks and real-world data by a large gap.
AnyFit's impressive performance on high-fidelity virtual try-ons in any scenario from any image, paves a new path for future research within the fashion community.
arXiv Detail & Related papers (2024-05-28T13:33:08Z) - Generating Human Interaction Motions in Scenes with Text Control [66.74298145999909]
We present TeSMo, a method for text-controlled scene-aware motion generation based on denoising diffusion models.
Our approach begins with pre-training a scene-agnostic text-to-motion diffusion model.
To facilitate training, we embed annotated navigation and interaction motions within scenes.
arXiv Detail & Related papers (2024-04-16T16:04:38Z) - Systematic Adaptation of Communication-focused Machine Learning Models
from Real to Virtual Environments for Human-Robot Collaboration [1.392250707100996]
This paper presents a systematic framework for the real to virtual adaptation using limited size of virtual dataset.
Hand gestures recognition which has been a topic of much research and subsequent commercialization in the real world has been possible because of the creation of large, labelled datasets.
arXiv Detail & Related papers (2023-07-21T03:24:55Z) - LaDI-VTON: Latent Diffusion Textual-Inversion Enhanced Virtual Try-On [35.4056826207203]
This work introduces LaDI-VTON, the first Latent Diffusion textual Inversion-enhanced model for the Virtual Try-ON task.
The proposed architecture relies on a latent diffusion model extended with a novel additional autoencoder module.
We show that our approach outperforms the competitors by a consistent margin, achieving a significant milestone for the task.
arXiv Detail & Related papers (2023-05-22T21:38:06Z) - Multimodal Garment Designer: Human-Centric Latent Diffusion Models for
Fashion Image Editing [40.70752781891058]
We propose the task of multimodal-conditioned fashion image editing, guiding the generation of human-centric fashion images.
We tackle this problem by proposing a new architecture based on latent diffusion models.
Given the lack of existing datasets suitable for the task, we also extend two existing fashion datasets.
arXiv Detail & Related papers (2023-04-04T18:03:04Z) - Multiface: A Dataset for Neural Face Rendering [108.44505415073579]
In this work, we present Multiface, a new multi-view, high-resolution human face dataset.
We introduce Mugsy, a large scale multi-camera apparatus to capture high-resolution synchronized videos of a facial performance.
The goal of Multiface is to close the gap in accessibility to high quality data in the academic community and to enable research in VR telepresence.
arXiv Detail & Related papers (2022-07-22T17:55:39Z) - The Gesture Authoring Space: Authoring Customised Hand Gestures for
Grasping Virtual Objects in Immersive Virtual Environments [81.5101473684021]
This work proposes a hand gesture authoring tool for object specific grab gestures allowing virtual objects to be grabbed as in the real world.
The presented solution uses template matching for gesture recognition and requires no technical knowledge to design and create custom tailored hand gestures.
The study showed that gestures created with the proposed approach are perceived by users as a more natural input modality than the others.
arXiv Detail & Related papers (2022-07-03T18:33:33Z) - FitGAN: Fit- and Shape-Realistic Generative Adversarial Networks for
Fashion [5.478764356647437]
We present FitGAN, a generative adversarial model that accounts for garments' entangled size and fit characteristics at scale.
Our model learns disentangled item representations and generates realistic images reflecting the true fit and shape properties of fashion articles.
arXiv Detail & Related papers (2022-06-23T15:10:28Z) - Cloth Interactive Transformer for Virtual Try-On [106.21605249649957]
We propose a novel two-stage cloth interactive transformer (CIT) method for the virtual try-on task.
In the first stage, we design a CIT matching block, aiming to precisely capture the long-range correlations between the cloth-agnostic person information and the in-shop cloth information.
In the second stage, we put forth a CIT reasoning block for establishing global mutual interactive dependencies among person representation, the warped clothing item, and the corresponding warped cloth mask.
arXiv Detail & Related papers (2021-04-12T14:45:32Z) - ThreeDWorld: A Platform for Interactive Multi-Modal Physical Simulation [75.0278287071591]
ThreeDWorld (TDW) is a platform for interactive multi-modal physical simulation.
TDW enables simulation of high-fidelity sensory data and physical interactions between mobile agents and objects in rich 3D environments.
We present initial experiments enabled by TDW in emerging research directions in computer vision, machine learning, and cognitive science.
arXiv Detail & Related papers (2020-07-09T17:33:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.