Generating Fit Check Videos with a Handheld Camera
- URL: http://arxiv.org/abs/2505.23886v1
- Date: Thu, 29 May 2025 17:58:49 GMT
- Title: Generating Fit Check Videos with a Handheld Camera
- Authors: Bowei Chen, Brian Curless, Ira Kemelmacher-Shlizerman, Steven M. Seitz,
- Abstract summary: We propose a more convenient solution that enables full-body video capture using handheld mobile devices.<n>Our approach takes as input two static photos (front and back) of you in a mirror, along with an IMU motion reference that you perform while holding your mobile phone.<n>We enable rendering into a new scene, with consistent illumination and shadows.
- Score: 21.020454186769655
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Self-captured full-body videos are popular, but most deployments require mounted cameras, carefully-framed shots, and repeated practice. We propose a more convenient solution that enables full-body video capture using handheld mobile devices. Our approach takes as input two static photos (front and back) of you in a mirror, along with an IMU motion reference that you perform while holding your mobile phone, and synthesizes a realistic video of you performing a similar target motion. We enable rendering into a new scene, with consistent illumination and shadows. We propose a novel video diffusion-based model to achieve this. Specifically, we propose a parameter-free frame generation strategy, as well as a multi-reference attention mechanism, that effectively integrate appearance information from both the front and back selfies into the video diffusion model. Additionally, we introduce an image-based fine-tuning strategy to enhance frame sharpness and improve the generation of shadows and reflections, achieving a more realistic human-scene composition.
Related papers
- Moaw: Unleashing Motion Awareness for Video Diffusion Models [71.34328578845721]
Moaw is a framework that unleashes motion awareness for video diffusion models.<n>We train a diffusion model for motion perception, shifting its modality from image-to-video generation to video-to-dense-tracking.<n>We then construct a motion-labeled dataset to identify features that encode the strongest motion information, and inject them into a structurally identical video generation model.
arXiv Detail & Related papers (2026-01-19T06:45:46Z) - ReLumix: Extending Image Relighting to Video via Video Diffusion Models [5.890782804843724]
Controlling illumination during video post-production is a crucial yet elusive goal in computational photography.<n>This paper introduces ReLumix, a novel framework that decouples the relighting from temporal synthesis.<n>Although trained on synthetic data, ReLumix shows competitive generalization to real-world videos.
arXiv Detail & Related papers (2025-09-28T09:35:33Z) - Stable Video-Driven Portraits [52.008400639227034]
Animation aims to generate photo-realistic videos from a single source image by reenacting the expression and pose from a driving video.<n>Recent advances using diffusion models have demonstrated improved quality but remain constrained by weak control signals and architectural limitations.<n>We propose a novel diffusion based framework that leverages masked facial regions specifically the eyes, nose, and mouth from the driving video as strong motion control cues.
arXiv Detail & Related papers (2025-09-22T08:11:08Z) - SelfHVD: Self-Supervised Handheld Video Deblurring for Mobile Phones [54.427316707517406]
We propose a self-supervised method for handheld video deblurring, driven by sharp clues in the video.<n>To train the deblurring model, we extract the sharp clues from the video and take them as misalignment labels of neighboring blurry frames.<n>We construct a synthetic and a real-world handheld video dataset for handheld video deblurring.
arXiv Detail & Related papers (2025-08-12T03:38:14Z) - DreamJourney: Perpetual View Generation with Video Diffusion Models [91.88716097573206]
Perpetual view generation aims to synthesize a long-term video corresponding to an arbitrary camera trajectory solely from a single input image.<n>Recent methods commonly utilize a pre-trained text-to-image diffusion model to synthesize new content of previously unseen regions along camera movement.<n>We present DreamJourney, a two-stage framework that leverages the world simulation capacity of video diffusion models to trigger a new perpetual scene view generation task.
arXiv Detail & Related papers (2025-06-21T12:51:34Z) - CamMimic: Zero-Shot Image To Camera Motion Personalized Video Generation Using Diffusion Models [47.65379612084075]
CamMimic is designed to seamlessly transfer the camera motion observed in a given reference video onto any scene of the user's choice.<n>In the absence of an established metric for assessing camera motion transfer between unrelated scenes, we propose CameraScore.
arXiv Detail & Related papers (2025-04-13T08:04:11Z) - ReCamMaster: Camera-Controlled Generative Rendering from A Single Video [72.42376733537925]
ReCamMaster is a camera-controlled generative video re-rendering framework.<n>It reproduces the dynamic scene of an input video at novel camera trajectories.<n>Our method also finds promising applications in video stabilization, super-resolution, and outpainting.
arXiv Detail & Related papers (2025-03-14T17:59:31Z) - Long Context Tuning for Video Generation [63.060794860098795]
Long Context Tuning (LCT) is a training paradigm that expands the context window of pre-trained single-shot video diffusion models.<n>Our method expands full attention mechanisms from individual shots to encompass all shots within a scene.<n>Experiments demonstrate coherent multi-shot scenes and exhibit emerging capabilities, including compositional generation and interactive shot extension.
arXiv Detail & Related papers (2025-03-13T17:40:07Z) - SynCamMaster: Synchronizing Multi-Camera Video Generation from Diverse Viewpoints [43.14498014617223]
We propose a plug-and-play module that enhances a pre-trained text-to-video model for multi-camera video generation.<n>We introduce a multi-view synchronization module to maintain appearance and geometry consistency across different viewpoints.<n>Our method enables intriguing extensions, such as re-rendering a video from novel viewpoints.
arXiv Detail & Related papers (2024-12-10T18:55:17Z) - ReCapture: Generative Video Camera Controls for User-Provided Videos using Masked Video Fine-Tuning [32.08206580711449]
We present ReCapture, a method for generating new videos with novel camera trajectories from a single user-provided video.
Our method allows us to re-generate the reference video, with all its existing scene motion, from vastly different angles and with cinematic camera motion.
arXiv Detail & Related papers (2024-11-07T18:59:45Z) - DreamVideo: Composing Your Dream Videos with Customized Subject and
Motion [52.7394517692186]
We present DreamVideo, a novel approach to generating personalized videos from a few static images of the desired subject.
DreamVideo decouples this task into two stages, subject learning and motion learning, by leveraging a pre-trained video diffusion model.
In motion learning, we architect a motion adapter and fine-tune it on the given videos to effectively model the target motion pattern.
arXiv Detail & Related papers (2023-12-07T16:57:26Z) - VMC: Video Motion Customization using Temporal Attention Adaption for
Text-to-Video Diffusion Models [58.93124686141781]
Video Motion Customization (VMC) is a novel one-shot tuning approach crafted to adapt temporal attention layers within video diffusion models.
Our approach introduces a novel motion distillation objective using residual vectors between consecutive frames as a motion reference.
We validate our method against state-of-the-art video generative models across diverse real-world motions and contexts.
arXiv Detail & Related papers (2023-12-01T06:50:11Z) - MicroCinema: A Divide-and-Conquer Approach for Text-to-Video Generation [64.77240137998862]
MicroCinema is a framework for high-quality and coherent text-to-video generation.
We introduce a Divide-and-Conquer strategy which divides the text-to-video into a two-stage process: text-to-image generation and image&text-to-video generation.
We show that MicroCinema achieves SOTA zero-shot FVD of 342.86 on UCF-101 and 377.40 on MSR-VTT.
arXiv Detail & Related papers (2023-11-30T18:59:30Z) - Dreamix: Video Diffusion Models are General Video Editors [22.127604561922897]
Text-driven image and video diffusion models have recently achieved unprecedented generation realism.
We present the first diffusion-based method that is able to perform text-based motion and appearance editing of general videos.
arXiv Detail & Related papers (2023-02-02T18:58:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.