Mobile Fitting Room: On-device Virtual Try-on via Diffusion Models
- URL: http://arxiv.org/abs/2402.01877v1
- Date: Fri, 2 Feb 2024 20:05:45 GMT
- Title: Mobile Fitting Room: On-device Virtual Try-on via Diffusion Models
- Authors: Justin Blalock, David Munechika, Harsha Karanth, Alec Helbling,
Pratham Mehta, Seongmin Lee, Duen Horng Chau
- Abstract summary: Mobile Fitting Room is the first on-device diffusion-based virtual try-on system.
A usage scenario highlights how our tool can provide a seamless, interactive virtual try-on experience for customers.
- Score: 19.10976982327356
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The growing digital landscape of fashion e-commerce calls for interactive and
user-friendly interfaces for virtually trying on clothes. Traditional try-on
methods grapple with challenges in adapting to diverse backgrounds, poses, and
subjects. While newer methods, utilizing the recent advances of diffusion
models, have achieved higher-quality image generation, the human-centered
dimensions of mobile interface delivery and privacy concerns remain largely
unexplored. We present Mobile Fitting Room, the first on-device diffusion-based
virtual try-on system. To address multiple inter-related technical challenges
such as high-quality garment placement and model compression for mobile
devices, we present a novel technical pipeline and an interface design that
enables privacy preservation and user customization. A usage scenario
highlights how our tool can provide a seamless, interactive virtual try-on
experience for customers and provide a valuable service for fashion e-commerce
businesses.
Related papers
- Foundations and Recent Trends in Multimodal Mobile Agents: A Survey [57.677161006710065]
Mobile agents are essential for automating tasks in complex and dynamic mobile environments.
Recent advancements enhance real-time adaptability and multimodal interaction.
We categorize these advancements into two main approaches: prompt-based methods and training-based methods.
arXiv Detail & Related papers (2024-11-04T11:50:58Z) - SKT: Integrating State-Aware Keypoint Trajectories with Vision-Language Models for Robotic Garment Manipulation [82.61572106180705]
This paper presents a unified approach using vision-language models (VLMs) to improve keypoint prediction across various garment categories.
We created a large-scale synthetic dataset using advanced simulation techniques, allowing scalable training without extensive real-world data.
Experimental results indicate that the VLM-based method significantly enhances keypoint detection accuracy and task success rates.
arXiv Detail & Related papers (2024-09-26T17:26:16Z) - GlamTry: Advancing Virtual Try-On for High-End Accessories [0.0]
Existing virtual try-on models focus primarily on clothing items, but there is a gap in the market for accessories.
This research explores the application of techniques from 2D virtual try-on models for clothing, such as VITON-HD, and integrates them with other computer vision models.
Results demonstrate improved location prediction compared to the original model for clothes, even with a small dataset.
arXiv Detail & Related papers (2024-09-22T18:29:32Z) - Self-Supervised Vision Transformer for Enhanced Virtual Clothes Try-On [21.422611451978863]
We introduce an innovative approach for virtual clothes try-on, utilizing a self-supervised Vision Transformer (ViT) and a diffusion model.
Our method emphasizes detail enhancement by contrasting local clothing image embeddings, generated by ViT, with their global counterparts.
The experimental results showcase substantial advancements in the realism and precision of details in virtual try-on experiences.
arXiv Detail & Related papers (2024-06-15T07:46:22Z) - AnyFit: Controllable Virtual Try-on for Any Combination of Attire Across Any Scenario [50.62711489896909]
AnyFit surpasses all baselines on high-resolution benchmarks and real-world data by a large gap.
AnyFit's impressive performance on high-fidelity virtual try-ons in any scenario from any image, paves a new path for future research within the fashion community.
arXiv Detail & Related papers (2024-05-28T13:33:08Z) - Generating Human Interaction Motions in Scenes with Text Control [66.74298145999909]
We present TeSMo, a method for text-controlled scene-aware motion generation based on denoising diffusion models.
Our approach begins with pre-training a scene-agnostic text-to-motion diffusion model.
To facilitate training, we embed annotated navigation and interaction motions within scenes.
arXiv Detail & Related papers (2024-04-16T16:04:38Z) - Systematic Adaptation of Communication-focused Machine Learning Models
from Real to Virtual Environments for Human-Robot Collaboration [1.392250707100996]
This paper presents a systematic framework for the real to virtual adaptation using limited size of virtual dataset.
Hand gestures recognition which has been a topic of much research and subsequent commercialization in the real world has been possible because of the creation of large, labelled datasets.
arXiv Detail & Related papers (2023-07-21T03:24:55Z) - Multimodal Garment Designer: Human-Centric Latent Diffusion Models for
Fashion Image Editing [40.70752781891058]
We propose the task of multimodal-conditioned fashion image editing, guiding the generation of human-centric fashion images.
We tackle this problem by proposing a new architecture based on latent diffusion models.
Given the lack of existing datasets suitable for the task, we also extend two existing fashion datasets.
arXiv Detail & Related papers (2023-04-04T18:03:04Z) - The Gesture Authoring Space: Authoring Customised Hand Gestures for
Grasping Virtual Objects in Immersive Virtual Environments [81.5101473684021]
This work proposes a hand gesture authoring tool for object specific grab gestures allowing virtual objects to be grabbed as in the real world.
The presented solution uses template matching for gesture recognition and requires no technical knowledge to design and create custom tailored hand gestures.
The study showed that gestures created with the proposed approach are perceived by users as a more natural input modality than the others.
arXiv Detail & Related papers (2022-07-03T18:33:33Z) - FitGAN: Fit- and Shape-Realistic Generative Adversarial Networks for
Fashion [5.478764356647437]
We present FitGAN, a generative adversarial model that accounts for garments' entangled size and fit characteristics at scale.
Our model learns disentangled item representations and generates realistic images reflecting the true fit and shape properties of fashion articles.
arXiv Detail & Related papers (2022-06-23T15:10:28Z) - Cloth Interactive Transformer for Virtual Try-On [106.21605249649957]
We propose a novel two-stage cloth interactive transformer (CIT) method for the virtual try-on task.
In the first stage, we design a CIT matching block, aiming to precisely capture the long-range correlations between the cloth-agnostic person information and the in-shop cloth information.
In the second stage, we put forth a CIT reasoning block for establishing global mutual interactive dependencies among person representation, the warped clothing item, and the corresponding warped cloth mask.
arXiv Detail & Related papers (2021-04-12T14:45:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.