Related papers: ShineOn: Illuminating Design Choices for Practical Video-based Virtual Clothing Try-on

ShineOn: Illuminating Design Choices for Practical Video-based Virtual Clothing Try-on

URL: http://arxiv.org/abs/2012.10495v2
Date: Wed, 13 Jan 2021 00:14:31 GMT
Title: ShineOn: Illuminating Design Choices for Practical Video-based Virtual Clothing Try-on
Authors: Gaurav Kuppa, Andrew Jong, Vera Liu, Ziwei Liu, and Teng-Sheng Moh
Abstract summary: We build a series of scientific experiments to isolate effective design choices in video synthesis for virtual clothing try-on. Specifically, we investigate the effect of different pose annotations, self-attention layer placement, and activation functions. GELU and ReLU activation functions are the most effective in our experiments despite the appeal of newer activations such as Swish and Sine.
Score: 8.909228149756993
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Virtual try-on has garnered interest as a neural rendering benchmark task to evaluate complex object transfer and scene composition. Recent works in virtual clothing try-on feature a plethora of possible architectural and data representation choices. However, they present little clarity on quantifying the isolated visual effect of each choice, nor do they specify the hyperparameter details that are key to experimental reproduction. Our work, ShineOn, approaches the try-on task from a bottom-up approach and aims to shine light on the visual and quantitative effects of each experiment. We build a series of scientific experiments to isolate effective design choices in video synthesis for virtual clothing try-on. Specifically, we investigate the effect of different pose annotations, self-attention layer placement, and activation functions on the quantitative and qualitative performance of video virtual try-on. We find that DensePose annotations not only enhance face details but also decrease memory usage and training time. Next, we find that attention layers improve face and neck quality. Finally, we show that GELU and ReLU activation functions are the most effective in our experiments despite the appeal of newer activations such as Swish and Sine. We will release a well-organized code base, hyperparameters, and model checkpoints to support the reproducibility of our results. We expect our extensive experiments and code to greatly inform future design choices in video virtual try-on. Our code may be accessed at https://github.com/andrewjong/ShineOn-Virtual-Tryon.

Related papers

V-HOP: Visuo-Haptic 6D Object Pose Tracking [18.984396185797667]
Humans naturally integrate vision and haptics for robust object perception during manipulation. Prior object pose estimation research has attempted to combine visual and haptic/tactile feedback. We introduce a new visuo-haptic transformer-based object pose tracker that seamlessly integrates visual and haptic input.
arXiv Detail & Related papers (2025-02-24T18:59:50Z)
ViViD: Video Virtual Try-on using Diffusion Models [46.710863047471264]
Video virtual try-on aims to transfer a clothing item onto the video of a target person. Previous video-based try-on solutions can only generate low visual quality and blurring results. We present ViViD, a novel framework employing powerful diffusion models to tackle the task of video virtual try-on.
arXiv Detail & Related papers (2024-05-20T05:28:22Z)
Neural feels with neural fields: Visuo-tactile perception for in-hand manipulation [57.60490773016364]
We combine vision and touch sensing on a multi-fingered hand to estimate an object's pose and shape during in-hand manipulation. Our method, NeuralFeels, encodes object geometry by learning a neural field online and jointly tracks it by optimizing a pose graph problem. Our results demonstrate that touch, at the very least, refines and, at the very best, disambiguates visual estimates during in-hand manipulation.
arXiv Detail & Related papers (2023-12-20T22:36:37Z)
AnyDoor: Zero-shot Object-level Image Customization [63.44307304097742]
This work presents AnyDoor, a diffusion-based image generator with the power to teleport target objects to new scenes at user-specified locations. Our model is trained only once and effortlessly generalizes to diverse object-scene combinations at the inference stage.
arXiv Detail & Related papers (2023-07-18T17:59:02Z)
Vision Transformer Visualization: What Neurons Tell and How Neurons Behave? [33.87454837848252]
We propose an effective visualization technique to assist us in exposing the information carried in neurons and feature embeddings across the vision transformers (ViTs) Our approach departs from the computational process of ViTs with a focus on visualizing the local and global information in input images and the latent feature embeddings at multiple levels. Next, we develop a rigorous framework to perform effective visualizations across layers, exposing the effects of ViTs filters and grouping/clustering behaviors to object patches.
arXiv Detail & Related papers (2022-10-14T08:56:24Z)
ARShoe: Real-Time Augmented Reality Shoe Try-on System on Smartphones [14.494454213703111]
This work proposes a real-time augmented reality virtual shoe try-on system for smartphones, namely ARShoe. ARShoe adopts a novel multi-branch network to realize pose estimation and segmentation simultaneously. For training and evaluation, we construct the very first large-scale foot benchmark with multiple virtual shoe try-on task-related labels.
arXiv Detail & Related papers (2021-08-24T03:54:45Z)
Agents that Listen: High-Throughput Reinforcement Learning with Multiple Sensory Systems [6.952659395337689]
We introduce a new version of VizDoom simulator to create a highly efficient learning environment that provides raw audio observations. We train our agent to play the full game of Doom and find that it can consistently defeat a traditional vision-based adversary.
arXiv Detail & Related papers (2021-07-05T18:00:50Z)
Cloth Interactive Transformer for Virtual Try-On [106.21605249649957]
We propose a novel two-stage cloth interactive transformer (CIT) method for the virtual try-on task. In the first stage, we design a CIT matching block, aiming to precisely capture the long-range correlations between the cloth-agnostic person information and the in-shop cloth information. In the second stage, we put forth a CIT reasoning block for establishing global mutual interactive dependencies among person representation, the warped clothing item, and the corresponding warped cloth mask.
arXiv Detail & Related papers (2021-04-12T14:45:32Z)
CharacterGAN: Few-Shot Keypoint Character Animation and Reposing [64.19520387536741]
We introduce CharacterGAN, a generative model that can be trained on only a few samples of a given character. Our model generates novel poses based on keypoint locations, which can be modified in real time while providing interactive feedback. We show that our approach outperforms recent baselines and creates realistic animations for diverse characters.
arXiv Detail & Related papers (2021-02-05T12:38:15Z)
What Can You Learn from Your Muscles? Learning Visual Representation from Human Interactions [50.435861435121915]
We use human interaction and attention cues to investigate whether we can learn better representations compared to visual-only representations. Our experiments show that our "muscly-supervised" representation outperforms a visual-only state-of-the-art method MoCo.
arXiv Detail & Related papers (2020-10-16T17:46:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.