ShineOn: Illuminating Design Choices for Practical Video-based Virtual
Clothing Try-on
- URL: http://arxiv.org/abs/2012.10495v2
- Date: Wed, 13 Jan 2021 00:14:31 GMT
- Title: ShineOn: Illuminating Design Choices for Practical Video-based Virtual
Clothing Try-on
- Authors: Gaurav Kuppa, Andrew Jong, Vera Liu, Ziwei Liu, and Teng-Sheng Moh
- Abstract summary: We build a series of scientific experiments to isolate effective design choices in video synthesis for virtual clothing try-on.
Specifically, we investigate the effect of different pose annotations, self-attention layer placement, and activation functions.
GELU and ReLU activation functions are the most effective in our experiments despite the appeal of newer activations such as Swish and Sine.
- Score: 8.909228149756993
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Virtual try-on has garnered interest as a neural rendering benchmark task to
evaluate complex object transfer and scene composition. Recent works in virtual
clothing try-on feature a plethora of possible architectural and data
representation choices. However, they present little clarity on quantifying the
isolated visual effect of each choice, nor do they specify the hyperparameter
details that are key to experimental reproduction. Our work, ShineOn,
approaches the try-on task from a bottom-up approach and aims to shine light on
the visual and quantitative effects of each experiment. We build a series of
scientific experiments to isolate effective design choices in video synthesis
for virtual clothing try-on. Specifically, we investigate the effect of
different pose annotations, self-attention layer placement, and activation
functions on the quantitative and qualitative performance of video virtual
try-on. We find that DensePose annotations not only enhance face details but
also decrease memory usage and training time. Next, we find that attention
layers improve face and neck quality. Finally, we show that GELU and ReLU
activation functions are the most effective in our experiments despite the
appeal of newer activations such as Swish and Sine. We will release a
well-organized code base, hyperparameters, and model checkpoints to support the
reproducibility of our results. We expect our extensive experiments and code to
greatly inform future design choices in video virtual try-on. Our code may be
accessed at https://github.com/andrewjong/ShineOn-Virtual-Tryon.
Related papers
- ViViD: Video Virtual Try-on using Diffusion Models [46.710863047471264]
Video virtual try-on aims to transfer a clothing item onto the video of a target person.
Previous video-based try-on solutions can only generate low visual quality and blurring results.
We present ViViD, a novel framework employing powerful diffusion models to tackle the task of video virtual try-on.
arXiv Detail & Related papers (2024-05-20T05:28:22Z) - Neural feels with neural fields: Visuo-tactile perception for in-hand
manipulation [57.60490773016364]
We combine vision and touch sensing on a multi-fingered hand to estimate an object's pose and shape during in-hand manipulation.
Our method, NeuralFeels, encodes object geometry by learning a neural field online and jointly tracks it by optimizing a pose graph problem.
Our results demonstrate that touch, at the very least, refines and, at the very best, disambiguates visual estimates during in-hand manipulation.
arXiv Detail & Related papers (2023-12-20T22:36:37Z) - AnyDoor: Zero-shot Object-level Image Customization [63.44307304097742]
This work presents AnyDoor, a diffusion-based image generator with the power to teleport target objects to new scenes at user-specified locations.
Our model is trained only once and effortlessly generalizes to diverse object-scene combinations at the inference stage.
arXiv Detail & Related papers (2023-07-18T17:59:02Z) - Vision Transformer Visualization: What Neurons Tell and How Neurons
Behave? [33.87454837848252]
We propose an effective visualization technique to assist us in exposing the information carried in neurons and feature embeddings across the vision transformers (ViTs)
Our approach departs from the computational process of ViTs with a focus on visualizing the local and global information in input images and the latent feature embeddings at multiple levels.
Next, we develop a rigorous framework to perform effective visualizations across layers, exposing the effects of ViTs filters and grouping/clustering behaviors to object patches.
arXiv Detail & Related papers (2022-10-14T08:56:24Z) - ARShoe: Real-Time Augmented Reality Shoe Try-on System on Smartphones [14.494454213703111]
This work proposes a real-time augmented reality virtual shoe try-on system for smartphones, namely ARShoe.
ARShoe adopts a novel multi-branch network to realize pose estimation and segmentation simultaneously.
For training and evaluation, we construct the very first large-scale foot benchmark with multiple virtual shoe try-on task-related labels.
arXiv Detail & Related papers (2021-08-24T03:54:45Z) - Agents that Listen: High-Throughput Reinforcement Learning with Multiple
Sensory Systems [6.952659395337689]
We introduce a new version of VizDoom simulator to create a highly efficient learning environment that provides raw audio observations.
We train our agent to play the full game of Doom and find that it can consistently defeat a traditional vision-based adversary.
arXiv Detail & Related papers (2021-07-05T18:00:50Z) - Cloth Interactive Transformer for Virtual Try-On [106.21605249649957]
We propose a novel two-stage cloth interactive transformer (CIT) method for the virtual try-on task.
In the first stage, we design a CIT matching block, aiming to precisely capture the long-range correlations between the cloth-agnostic person information and the in-shop cloth information.
In the second stage, we put forth a CIT reasoning block for establishing global mutual interactive dependencies among person representation, the warped clothing item, and the corresponding warped cloth mask.
arXiv Detail & Related papers (2021-04-12T14:45:32Z) - CharacterGAN: Few-Shot Keypoint Character Animation and Reposing [64.19520387536741]
We introduce CharacterGAN, a generative model that can be trained on only a few samples of a given character.
Our model generates novel poses based on keypoint locations, which can be modified in real time while providing interactive feedback.
We show that our approach outperforms recent baselines and creates realistic animations for diverse characters.
arXiv Detail & Related papers (2021-02-05T12:38:15Z) - What Can You Learn from Your Muscles? Learning Visual Representation
from Human Interactions [50.435861435121915]
We use human interaction and attention cues to investigate whether we can learn better representations compared to visual-only representations.
Our experiments show that our "muscly-supervised" representation outperforms a visual-only state-of-the-art method MoCo.
arXiv Detail & Related papers (2020-10-16T17:46:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.