Related papers: Eevee: Towards Close-up High-resolution Video-based Virtual Try-on

Eevee: Towards Close-up High-resolution Video-based Virtual Try-on

URL: http://arxiv.org/abs/2511.18957v1
Date: Mon, 24 Nov 2025 10:19:56 GMT
Title: Eevee: Towards Close-up High-resolution Video-based Virtual Try-on
Authors: Jianhao Zeng, Yancheng Bai, Ruidong Chen, Xuanpu Zhang, Lei Sun, Dongyang Jin, Ryan Xu, Nannan Zhang, Dan Song, Xiangxiang Chu,
Abstract summary: We introduce a high-resolution dataset for video-based virtual try-on.<n>This dataset includes full-shot and close-up try-on videos of real human models.<n>We propose a new garment consistency metric VGID that quantifies the preservation of both texture and structure.
Score: 23.37783900582483
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Video virtual try-on technology provides a cost-effective solution for creating marketing videos in fashion e-commerce. However, its practical adoption is hindered by two critical limitations. First, the reliance on a single garment image as input in current virtual try-on datasets limits the accurate capture of realistic texture details. Second, most existing methods focus solely on generating full-shot virtual try-on videos, neglecting the business's demand for videos that also provide detailed close-ups. To address these challenges, we introduce a high-resolution dataset for video-based virtual try-on. This dataset offers two key features. First, it provides more detailed information on the garments, which includes high-fidelity images with detailed close-ups and textual descriptions; Second, it uniquely includes full-shot and close-up try-on videos of real human models. Furthermore, accurately assessing consistency becomes significantly more critical for the close-up videos, which demand high-fidelity preservation of garment details. To facilitate such fine-grained evaluation, we propose a new garment consistency metric VGID (Video Garment Inception Distance) that quantifies the preservation of both texture and structure. Our experiments validate these contributions. We demonstrate that by utilizing the detailed images from our dataset, existing video generation models can extract and incorporate texture features, significantly enhancing the realism and detail fidelity of virtual try-on results. Furthermore, we conduct a comprehensive benchmark of recent models. The benchmark effectively identifies the texture and structural preservation problems among current methods.

Related papers

The devil is in the details: Enhancing Video Virtual Try-On via Keyframe-Driven Details Injection [90.30501870268911]
KeyTailor is a novel framework for realistic try-on video.<n>It uses an instruction-guided sampling strategy to filter informative frames from the input video.<n>Our dataset ViT-HD comprises 15, 070 high-quality video samples at a resolution of 810*1080, covering diverse garments.
arXiv Detail & Related papers (2025-12-23T13:15:31Z)
Generative Video Matting [57.186684844156595]
Video matting has traditionally been limited by the lack of high-quality ground-truth data.<n>Most existing video matting datasets provide only human-annotated imperfect alpha and foreground annotations.<n>We introduce a novel video matting approach that can effectively leverage the rich priors from pre-trained video diffusion models.
arXiv Detail & Related papers (2025-08-11T12:18:55Z)
DreamVVT: Mastering Realistic Video Virtual Try-On in the Wild via a Stage-Wise Diffusion Transformer Framework [26.661935208583756]
virtual try-on (VVT) technology has garnered considerable academic interest owing to its promising applications in e-commerce advertising and entertainment.<n>We propose DreamVVT, which is inherently capable of leveraging diverse unpaired human-centric data to enhance adaptability in real-world scenarios.<n>In the first stage, we sample representative frames from the input video and utilize a multi-frame try-on model integrated with a vision-language model (VLM), to synthesize high-fidelity and semantically consistent try-on images.<n>In the second stage, skeleton maps together with fine-grained motion and appearance descriptions are
arXiv Detail & Related papers (2025-08-04T18:27:55Z)
RealVVT: Towards Photorealistic Video Virtual Try-on via Spatio-Temporal Consistency [26.410982262831975]
RealVVT is a photoRealistic Video Virtual Try-on framework tailored to bolster stability and realism within dynamic video contexts.<n>Our approach outperforms existing state-of-the-art models in both single-image and video VTO tasks.
arXiv Detail & Related papers (2025-01-15T09:22:38Z)
Multi-subject Open-set Personalization in Video Generation [110.02124633005516]
We present Video Alchemist $-$ a video model with built-in multi-subject, open-set personalization capabilities.<n>Our model is built on a new Diffusion Transformer module that fuses each conditional reference image and its corresponding subject-level text prompt.<n>Our method significantly outperforms existing personalization methods in both quantitative and qualitative evaluations.
arXiv Detail & Related papers (2025-01-10T18:59:54Z)
WildVidFit: Video Virtual Try-On in the Wild via Image-Based Controlled Diffusion Models [132.77237314239025]
Video virtual try-on aims to generate realistic sequences that maintain garment identity and adapt to a person's pose and body shape in source videos. Traditional image-based methods, relying on warping and blending, struggle with complex human movements and occlusions. We reconceptualize video try-on as a process of generating videos conditioned on garment descriptions and human motion. Our solution, WildVidFit, employs image-based controlled diffusion models for a streamlined, one-stage approach.
arXiv Detail & Related papers (2024-07-15T11:21:03Z)
ViViD: Video Virtual Try-on using Diffusion Models [46.710863047471264]
Video virtual try-on aims to transfer a clothing item onto the video of a target person. Previous video-based try-on solutions can only generate low visual quality and blurring results. We present ViViD, a novel framework employing powerful diffusion models to tackle the task of video virtual try-on.
arXiv Detail & Related papers (2024-05-20T05:28:22Z)
Time-Efficient and Identity-Consistent Virtual Try-On Using A Variant of Altered Diffusion Models [4.038493506169702]
This study emphasizes the challenges of preserving intricate texture details and distinctive features of the target person and the clothes in various scenarios. Various existing approaches are explored, highlighting the limitations and unresolved aspects. It then proposes a novel diffusion-based solution that addresses garment texture preservation and user identity retention during virtual try-on.
arXiv Detail & Related papers (2024-03-12T07:15:29Z)
Improving Diffusion Models for Authentic Virtual Try-on in the Wild [53.96244595495942]
This paper considers image-based virtual try-on, which renders an image of a person wearing a curated garment. We propose a novel diffusion model that improves garment fidelity and generates authentic virtual try-on images. We present a customization method using a pair of person-garment images, which significantly improves fidelity and authenticity.
arXiv Detail & Related papers (2024-03-08T08:12:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.