Related papers: Mobile Fitting Room: On-device Virtual Try-on via Diffusion Models

Mobile Fitting Room: On-device Virtual Try-on via Diffusion Models

URL: http://arxiv.org/abs/2402.01877v1
Date: Fri, 2 Feb 2024 20:05:45 GMT
Title: Mobile Fitting Room: On-device Virtual Try-on via Diffusion Models
Authors: Justin Blalock, David Munechika, Harsha Karanth, Alec Helbling, Pratham Mehta, Seongmin Lee, Duen Horng Chau
Abstract summary: Mobile Fitting Room is the first on-device diffusion-based virtual try-on system. A usage scenario highlights how our tool can provide a seamless, interactive virtual try-on experience for customers.
Score: 19.10976982327356
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The growing digital landscape of fashion e-commerce calls for interactive and user-friendly interfaces for virtually trying on clothes. Traditional try-on methods grapple with challenges in adapting to diverse backgrounds, poses, and subjects. While newer methods, utilizing the recent advances of diffusion models, have achieved higher-quality image generation, the human-centered dimensions of mobile interface delivery and privacy concerns remain largely unexplored. We present Mobile Fitting Room, the first on-device diffusion-based virtual try-on system. To address multiple inter-related technical challenges such as high-quality garment placement and model compression for mobile devices, we present a novel technical pipeline and an interface design that enables privacy preservation and user customization. A usage scenario highlights how our tool can provide a seamless, interactive virtual try-on experience for customers and provide a valuable service for fashion e-commerce businesses.

Related papers

Seamless Interaction: Dyadic Audiovisual Motion Modeling and Large-Scale Dataset [113.25650486482762]
We introduce the Seamless Interaction dataset, a large-scale collection of over 4,000 hours of face-to-face interaction footage.<n>This dataset enables the development of AI technologies that understand dyadic embodied dynamics.<n>We develop a suite of models that utilize the dataset to generate dyadic motion gestures and facial expressions aligned with human speech.
arXiv Detail & Related papers (2025-06-27T18:09:49Z)
Inverse Virtual Try-On: Generating Multi-Category Product-Style Images from Clothed Individuals [76.96387718150542]
We present Text-Enhanced MUlti-category Virtual Try-Off (TEMU-VTOFF)<n>Our architecture is designed to receive garment information from multiple modalities like images, text, and masks to work in a multi-category setting.<n> Experiments on VITON-HD and Dress Code datasets show that TEMU-VTOFF sets a new state-of-the-art on the VTOFF task.
arXiv Detail & Related papers (2025-05-27T11:47:51Z)
Re-HOLD: Video Hand Object Interaction Reenactment via adaptive Layout-instructed Diffusion Model [72.90370736032115]
We present a novel video Reenactment framework focusing on Human-Object Interaction (HOI) via an adaptive layout-instructed Diffusion model (Re-HOLD) Our key insight is to employ specialized layout representation for hands and objects, respectively. To further improve the generation quality of HOI, we design an interactive textural enhancement module for both hands and objects.
arXiv Detail & Related papers (2025-03-21T08:40:35Z)
Large Model Empowered Metaverse: State-of-the-Art, Challenges and Opportunities [23.465545107612595]
The Metaverse is an immersive, persistent digital ecosystem where users can interact, socialize, and work within 3D virtual environments. This paper investigates the integration of large models within the Metaverse. We propose a generative AI-based framework for optimizing Metaverse rendering.
arXiv Detail & Related papers (2025-01-18T13:52:48Z)
RealVVT: Towards Photorealistic Video Virtual Try-on via Spatio-Temporal Consistency [26.410982262831975]
RealVVT is a photoRealistic Video Virtual Try-on framework tailored to bolster stability and realism within dynamic video contexts. Our approach outperforms existing state-of-the-art models in both single-image and video VTO tasks.
arXiv Detail & Related papers (2025-01-15T09:22:38Z)
Generative AI in Multimodal User Interfaces: Trends, Challenges, and Cross-Platform Adaptability [0.0]
Generative AI emerges as a key driver in reshaping user interfaces. This paper explores the integration of generative AI in modern user interfaces. It focuses on multimodal interaction, cross-platform adaptability and dynamic personalization.
arXiv Detail & Related papers (2024-11-15T14:49:58Z)
Foundations and Recent Trends in Multimodal Mobile Agents: A Survey [57.677161006710065]
Mobile agents are essential for automating tasks in complex and dynamic mobile environments. Recent advancements enhance real-time adaptability and multimodal interaction. We categorize these advancements into two main approaches: prompt-based methods and training-based methods.
arXiv Detail & Related papers (2024-11-04T11:50:58Z)
SKT: Integrating State-Aware Keypoint Trajectories with Vision-Language Models for Robotic Garment Manipulation [82.61572106180705]
This paper presents a unified approach using vision-language models (VLMs) to improve keypoint prediction across various garment categories. We created a large-scale synthetic dataset using advanced simulation techniques, allowing scalable training without extensive real-world data. Experimental results indicate that the VLM-based method significantly enhances keypoint detection accuracy and task success rates.
arXiv Detail & Related papers (2024-09-26T17:26:16Z)
GlamTry: Advancing Virtual Try-On for High-End Accessories [0.0]
Existing virtual try-on models focus primarily on clothing items, but there is a gap in the market for accessories. This research explores the application of techniques from 2D virtual try-on models for clothing, such as VITON-HD, and integrates them with other computer vision models. Results demonstrate improved location prediction compared to the original model for clothes, even with a small dataset.
arXiv Detail & Related papers (2024-09-22T18:29:32Z)
Self-Supervised Vision Transformer for Enhanced Virtual Clothes Try-On [21.422611451978863]
We introduce an innovative approach for virtual clothes try-on, utilizing a self-supervised Vision Transformer (ViT) and a diffusion model. Our method emphasizes detail enhancement by contrasting local clothing image embeddings, generated by ViT, with their global counterparts. The experimental results showcase substantial advancements in the realism and precision of details in virtual try-on experiences.
arXiv Detail & Related papers (2024-06-15T07:46:22Z)
AnyFit: Controllable Virtual Try-on for Any Combination of Attire Across Any Scenario [50.62711489896909]
AnyFit surpasses all baselines on high-resolution benchmarks and real-world data by a large gap. AnyFit's impressive performance on high-fidelity virtual try-ons in any scenario from any image, paves a new path for future research within the fashion community.
arXiv Detail & Related papers (2024-05-28T13:33:08Z)
Generating Human Interaction Motions in Scenes with Text Control [66.74298145999909]
We present TeSMo, a method for text-controlled scene-aware motion generation based on denoising diffusion models. Our approach begins with pre-training a scene-agnostic text-to-motion diffusion model. To facilitate training, we embed annotated navigation and interaction motions within scenes.
arXiv Detail & Related papers (2024-04-16T16:04:38Z)
Systematic Adaptation of Communication-focused Machine Learning Models from Real to Virtual Environments for Human-Robot Collaboration [1.392250707100996]
This paper presents a systematic framework for the real to virtual adaptation using limited size of virtual dataset. Hand gestures recognition which has been a topic of much research and subsequent commercialization in the real world has been possible because of the creation of large, labelled datasets.
arXiv Detail & Related papers (2023-07-21T03:24:55Z)
The Gesture Authoring Space: Authoring Customised Hand Gestures for Grasping Virtual Objects in Immersive Virtual Environments [81.5101473684021]
This work proposes a hand gesture authoring tool for object specific grab gestures allowing virtual objects to be grabbed as in the real world. The presented solution uses template matching for gesture recognition and requires no technical knowledge to design and create custom tailored hand gestures. The study showed that gestures created with the proposed approach are perceived by users as a more natural input modality than the others.
arXiv Detail & Related papers (2022-07-03T18:33:33Z)
FitGAN: Fit- and Shape-Realistic Generative Adversarial Networks for Fashion [5.478764356647437]
We present FitGAN, a generative adversarial model that accounts for garments' entangled size and fit characteristics at scale. Our model learns disentangled item representations and generates realistic images reflecting the true fit and shape properties of fashion articles.
arXiv Detail & Related papers (2022-06-23T15:10:28Z)
Cloth Interactive Transformer for Virtual Try-On [106.21605249649957]
We propose a novel two-stage cloth interactive transformer (CIT) method for the virtual try-on task. In the first stage, we design a CIT matching block, aiming to precisely capture the long-range correlations between the cloth-agnostic person information and the in-shop cloth information. In the second stage, we put forth a CIT reasoning block for establishing global mutual interactive dependencies among person representation, the warped clothing item, and the corresponding warped cloth mask.
arXiv Detail & Related papers (2021-04-12T14:45:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.