Related papers: AvatarVTON: 4D Virtual Try-On for Animatable Avatars

AvatarVTON: 4D Virtual Try-On for Animatable Avatars

URL: http://arxiv.org/abs/2510.04822v1
Date: Mon, 06 Oct 2025 14:06:34 GMT
Title: AvatarVTON: 4D Virtual Try-On for Animatable Avatars
Authors: Zicheng Jiang, Jixin Gao, Shengfeng He, Xinzhe Li, Yulong Zheng, Zhaotong Yang, Junyu Dong, Yong Du,
Abstract summary: AvatarVTON generates realistic try-on results from a single in-shop garment image.<n>It supports dynamic garment interactions under single-view supervision.<n>It is well-suited for AR/VR, gaming, and digital-human applications.
Score: 67.13031660684457
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We propose AvatarVTON, the first 4D virtual try-on framework that generates realistic try-on results from a single in-shop garment image, enabling free pose control, novel-view rendering, and diverse garment choices. Unlike existing methods, AvatarVTON supports dynamic garment interactions under single-view supervision, without relying on multi-view garment captures or physics priors. The framework consists of two key modules: (1) a Reciprocal Flow Rectifier, a prior-free optical-flow correction strategy that stabilizes avatar fitting and ensures temporal coherence; and (2) a Non-Linear Deformer, which decomposes Gaussian maps into view-pose-invariant and view-pose-specific components, enabling adaptive, non-linear garment deformations. To establish a benchmark for 4D virtual try-on, we extend existing baselines with unified modules for fair qualitative and quantitative comparisons. Extensive experiments show that AvatarVTON achieves high fidelity, diversity, and dynamic garment realism, making it well-suited for AR/VR, gaming, and digital-human applications.

Related papers

SteadyDancer: Harmonized and Coherent Human Image Animation with First-Frame Preservation [50.792027578906804]
We introduce SteadyDancer, an Image-to-Video (R2V) paradigm-based framework that achieves harmonized and coherent animation.<n> Experiments demonstrate that SteadyDancer achieves state-of-the-art performance in both appearance fidelity and motion control.
arXiv Detail & Related papers (2025-11-24T17:15:55Z)
EVA: Expressive Virtual Avatars from Multi-view Videos [51.33851869426057]
We introduce Expressive Virtual Avatars (EVA), an actor-specific, fully controllable, and expressive human avatar framework.<n>EVA achieves high-fidelity, lifelike renderings in real time while enabling independent control of facial expressions, body movements, and hand gestures.<n>This work represents a significant advancement towards fully drivable digital human models.
arXiv Detail & Related papers (2025-05-21T11:22:52Z)
SEGA: Drivable 3D Gaussian Head Avatar from a Single Image [15.117619290414064]
We propose SEGA, a novel approach for 3D drivable Gaussian head Avatar creation.<n>SEGA seamlessly combines priors derived from large-scale 2D datasets with 3D priors learned from multi-view, multi-expression, and multi-ID data.<n>Experiments show our method outperforms state-of-the-art approaches in generalization ability, identity preservation, and expression realism.
arXiv Detail & Related papers (2025-04-19T18:23:31Z)
UniViTAR: Unified Vision Transformer with Native Resolution [37.63387029787732]
We introduce UniViTAR, a family of homogeneous vision foundation models tailored for unified visual modality and native resolution scenario.<n>A progressive training paradigm is introduced, which strategically combines two core mechanisms.<n>In parallel, a hybrid training framework further synergizes sigmoid-based contrastive loss with feature distillation from a frozen teacher model.
arXiv Detail & Related papers (2025-04-02T14:59:39Z)
VTON 360: High-Fidelity Virtual Try-On from Any Viewing Direction [103.0918705283309]
Virtual Try-On (VTON) is a transformative technology in e-commerce and fashion design, enabling realistic digital visualization of clothing on individuals.<n>We propose VTON 360, a novel 3D VTON method that addresses the open challenge of achieving high-fidelity VTON that supports any-view rendering.
arXiv Detail & Related papers (2025-03-15T15:08:48Z)
ITVTON: Virtual Try-On Diffusion Transformer Based on Integrated Image and Text [1.7071356210178177]
ITVTON is an efficient framework that leverages the Diffusion Transformer (DiT) as its single generator to improve image fidelity.<n>ITVTON effectively captures garment and person images along the width dimension and incorporating textual descriptions from both.<n>Experiments on 10,257 image pairs from IGPair confirm ITVTON's robustness in real-world scenarios.
arXiv Detail & Related papers (2025-01-28T07:24:15Z)
Improving Diffusion Models for Authentic Virtual Try-on in the Wild [53.96244595495942]
This paper considers image-based virtual try-on, which renders an image of a person wearing a curated garment. We propose a novel diffusion model that improves garment fidelity and generates authentic virtual try-on images. We present a customization method using a pair of person-garment images, which significantly improves fidelity and authenticity.
arXiv Detail & Related papers (2024-03-08T08:12:18Z)
MonoHuman: Animatable Human Neural Field from Monocular Video [30.113937856494726]
We propose a novel framework MonoHuman, which robustly renders view-consistent and high-fidelity avatars under arbitrary novel poses. Our key insight is to model the deformation field with bi-directional constraints and explicitly leverage the off-the-peg information to reason the feature for coherent results.
arXiv Detail & Related papers (2023-04-04T17:55:03Z)
Drivable Volumetric Avatars using Texel-Aligned Features [52.89305658071045]
Photo telepresence requires both high-fidelity body modeling and faithful driving to enable dynamically synthesized appearance. We propose an end-to-end framework that addresses two core challenges in modeling and driving full-body avatars of real people.
arXiv Detail & Related papers (2022-07-20T09:28:16Z)
Single Stage Virtual Try-on via Deformable Attention Flows [51.70606454288168]
Virtual try-on aims to generate a photo-realistic fitting result given an in-shop garment and a reference person image. We develop a novel Deformable Attention Flow (DAFlow) which applies the deformable attention scheme to multi-flow estimation. Our proposed method achieves state-of-the-art performance both qualitatively and quantitatively.
arXiv Detail & Related papers (2022-07-19T10:01:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.