DM-VTON: Distilled Mobile Real-time Virtual Try-On
- URL: http://arxiv.org/abs/2308.13798v1
- Date: Sat, 26 Aug 2023 07:46:27 GMT
- Title: DM-VTON: Distilled Mobile Real-time Virtual Try-On
- Authors: Khoi-Nguyen Nguyen-Ngoc and Thanh-Tung Phan-Nguyen and Khanh-Duy Le
and Tam V. Nguyen and Minh-Triet Tran and Trung-Nghia Le
- Abstract summary: Distilled Mobile Real-time Virtual Try-On (DM-VTON) is a novel virtual try-on framework designed to achieve simplicity and efficiency.
We introduce an efficient Mobile Generative Module within the Student network, significantly reducing the runtime.
Experimental results show that the proposed method can achieve 40 frames per second on a single Nvidia Tesla T4 GPU.
- Score: 16.35842298296878
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The fashion e-commerce industry has witnessed significant growth in recent
years, prompting exploring image-based virtual try-on techniques to incorporate
Augmented Reality (AR) experiences into online shopping platforms. However,
existing research has primarily overlooked a crucial aspect - the runtime of
the underlying machine-learning model. While existing methods prioritize
enhancing output quality, they often disregard the execution time, which
restricts their applications on a limited range of devices. To address this
gap, we propose Distilled Mobile Real-time Virtual Try-On (DM-VTON), a novel
virtual try-on framework designed to achieve simplicity and efficiency. Our
approach is based on a knowledge distillation scheme that leverages a strong
Teacher network as supervision to guide a Student network without relying on
human parsing. Notably, we introduce an efficient Mobile Generative Module
within the Student network, significantly reducing the runtime while ensuring
high-quality output. Additionally, we propose Virtual Try-on-guided Pose for
Data Synthesis to address the limited pose variation observed in training
images. Experimental results show that the proposed method can achieve 40
frames per second on a single Nvidia Tesla T4 GPU and only take up 37 MB of
memory while producing almost the same output quality as other state-of-the-art
methods. DM-VTON stands poised to facilitate the advancement of real-time AR
applications, in addition to the generation of lifelike attired human figures
tailored for diverse specialized training tasks.
https://sites.google.com/view/ltnghia/research/DMVTON
Related papers
- MiniVLN: Efficient Vision-and-Language Navigation by Progressive Knowledge Distillation [17.27883003990266]
Vision-and-Language Navigation (VLN) is a core task in Embodied AI.
This paper introduces a two-stage knowledge distillation framework, producing a student model, MiniVLN.
Our findings indicate that the two-stage distillation approach is more effective in narrowing the performance gap between the teacher model and the student model.
arXiv Detail & Related papers (2024-09-27T14:54:54Z) - Affordance-Guided Reinforcement Learning via Visual Prompting [51.361977466993345]
Keypoint-based Affordance Guidance for Improvements (KAGI) is a method leveraging rewards shaped by vision-language models (VLMs) for autonomous RL.
On real-world manipulation tasks specified by natural language descriptions, KAGI improves the sample efficiency of autonomous RL and enables successful task completion in 20K online fine-tuning steps.
arXiv Detail & Related papers (2024-07-14T21:41:29Z) - TSCM: A Teacher-Student Model for Vision Place Recognition Using Cross-Metric Knowledge Distillation [6.856317526681759]
Visual place recognition plays a pivotal role in autonomous exploration and navigation of mobile robots.
Existing methods overcome this by exploiting powerful yet large networks.
We propose a high-performance teacher and lightweight student distillation framework called TSCM.
arXiv Detail & Related papers (2024-04-02T02:29:41Z) - Robot Fine-Tuning Made Easy: Pre-Training Rewards and Policies for
Autonomous Real-World Reinforcement Learning [58.3994826169858]
We introduce RoboFuME, a reset-free fine-tuning system for robotic reinforcement learning.
Our insights are to utilize offline reinforcement learning techniques to ensure efficient online fine-tuning of a pre-trained policy.
Our method can incorporate data from an existing robot dataset and improve on a target task within as little as 3 hours of autonomous real-world experience.
arXiv Detail & Related papers (2023-10-23T17:50:08Z) - Real-Time Onboard Object Detection for Augmented Reality: Enhancing
Head-Mounted Display with YOLOv8 [2.1530718840070784]
This paper introduces a software architecture for real-time object detection using machine learning (ML) in an augmented reality (AR) environment.
We show the image processing pipeline for the YOLOv8 model and the techniques used to make it real-time on the resource-limited edge computing platform of the headset.
arXiv Detail & Related papers (2023-06-06T09:35:45Z) - Dynamic Contrastive Distillation for Image-Text Retrieval [90.05345397400144]
We present a novel plug-in dynamic contrastive distillation (DCD) framework to compress image-text retrieval models.
We successfully apply our proposed DCD strategy to two state-of-the-art vision-language pretrained models, i.e. ViLT and METER.
Experiments on MS-COCO and Flickr30K benchmarks show the effectiveness and efficiency of our DCD framework.
arXiv Detail & Related papers (2022-07-04T14:08:59Z) - ARShoe: Real-Time Augmented Reality Shoe Try-on System on Smartphones [14.494454213703111]
This work proposes a real-time augmented reality virtual shoe try-on system for smartphones, namely ARShoe.
ARShoe adopts a novel multi-branch network to realize pose estimation and segmentation simultaneously.
For training and evaluation, we construct the very first large-scale foot benchmark with multiple virtual shoe try-on task-related labels.
arXiv Detail & Related papers (2021-08-24T03:54:45Z) - PlayVirtual: Augmenting Cycle-Consistent Virtual Trajectories for
Reinforcement Learning [84.30765628008207]
We propose a novel method, dubbed PlayVirtual, which augments cycle-consistent virtual trajectories to enhance the data efficiency for RL feature representation learning.
Our method outperforms the current state-of-the-art methods by a large margin on both benchmarks.
arXiv Detail & Related papers (2021-06-08T07:37:37Z) - Generative Adversarial Simulator [2.3986080077861787]
We introduce a simulator-free approach to knowledge distillation in the context of reinforcement learning.
A key challenge is having the student learn the multiplicity of cases that correspond to a given action.
This is the first demonstration of simulator-free knowledge distillation between a teacher and a student policy.
arXiv Detail & Related papers (2020-11-23T15:31:12Z) - Visual Imitation Made Easy [102.36509665008732]
We present an alternate interface for imitation that simplifies the data collection process while allowing for easy transfer to robots.
We use commercially available reacher-grabber assistive tools both as a data collection device and as the robot's end-effector.
We experimentally evaluate on two challenging tasks: non-prehensile pushing and prehensile stacking, with 1000 diverse demonstrations for each task.
arXiv Detail & Related papers (2020-08-11T17:58:50Z) - Intrinsic Reward Driven Imitation Learning via Generative Model [48.97800481338626]
Most inverse reinforcement learning (IRL) methods fail to outperform the demonstrator in a high-dimensional environment.
We propose a novel reward learning module to generate intrinsic reward signals via a generative model.
Empirical results show that our method outperforms state-of-the-art IRL methods on multiple Atari games, even with one-life demonstration.
arXiv Detail & Related papers (2020-06-26T15:39:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.