Related papers: Real-Time Person Image Synthesis Using a Flow Matching Model

Real-Time Person Image Synthesis Using a Flow Matching Model

URL: http://arxiv.org/abs/2505.03562v1
Date: Tue, 06 May 2025 14:13:44 GMT
Title: Real-Time Person Image Synthesis Using a Flow Matching Model
Authors: Jiwoo Jeong, Kirok Kim, Wooju Kim, Nam-Joon Kim,
Abstract summary: Pose-Guided Person Image Synthesis (PGPIS) generates realistic person images conditioned on a target pose and a source image.<n>Recent diffusion-based methods have shown impressive image quality in PGPIS.<n>Our approach enables faster, more stable, and more efficient training and sampling.
Score: 3.149883354098941
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Pose-Guided Person Image Synthesis (PGPIS) generates realistic person images conditioned on a target pose and a source image. This task plays a key role in various real-world applications, such as sign language video generation, AR/VR, gaming, and live streaming. In these scenarios, real-time PGPIS is critical for providing immediate visual feedback and maintaining user immersion.However, achieving real-time performance remains a significant challenge due to the complexity of synthesizing high-fidelity images from diverse and dynamic human poses. Recent diffusion-based methods have shown impressive image quality in PGPIS, but their slow sampling speeds hinder deployment in time-sensitive applications. This latency is particularly problematic in tasks like generating sign language videos during live broadcasts, where rapid image updates are required. Therefore, developing a fast and reliable PGPIS model is a crucial step toward enabling real-time interactive systems. To address this challenge, we propose a generative model based on flow matching (FM). Our approach enables faster, more stable, and more efficient training and sampling. Furthermore, the proposed model supports conditional generation and can operate in latent space, making it especially suitable for real-time PGPIS applications where both speed and quality are critical. We evaluate our proposed method, Real-Time Person Image Synthesis Using a Flow Matching Model (RPFM), on the widely used DeepFashion dataset for PGPIS tasks. Our results show that RPFM achieves near-real-time sampling speeds while maintaining performance comparable to the state-of-the-art models. Our methodology trades off a slight, acceptable decrease in generated-image accuracy for over a twofold increase in generation speed, thereby ensuring real-time performance.

Related papers

StreamAvatar: Streaming Diffusion Models for Real-Time Interactive Human Avatars [32.75338796722652]
We propose a two-stage autoregressive adaptation and acceleration framework to adapt a high-fidelity human video diffusion model for real-time, interactive streaming.<n>We develop a one-shot, interactive, human avatar model capable of generating both natural talking and listening behaviors with coherent gestures.<n>Our method achieves state-of-the-art performance, surpassing existing approaches in generation quality, real-time efficiency, and interaction naturalness.
arXiv Detail & Related papers (2025-12-26T15:41:24Z)
Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length [57.458450695137664]
We present Live Avatar, an algorithm-system co-designed framework for efficient, high-fidelity, and infinite-length avatar generation.<n>Live Avatar is first to achieve practical, real-time, high-fidelity avatar generation at this scale.
arXiv Detail & Related papers (2025-12-04T11:11:24Z)
Dual-domain Adaptation Networks for Realistic Image Super-resolution [81.34345637776408]
Realistic image super-resolution (SR) focuses on transforming real-world low-resolution (LR) images into high-resolution (HR) ones.<n>Current methods struggle with limited real-world LR-HR data, impacting the learning of basic image features.<n>We introduce a novel approach, which is able to efficiently adapt pre-trained image SR models from simulated to real-world datasets.
arXiv Detail & Related papers (2025-11-21T12:57:23Z)
Design, Implementation and Evaluation of a Real-Time Remote Photoplethysmography (rPPG) Acquisition System for Non-Invasive Vital Sign Monitoring [10.154892578360151]
This paper presents a real-time remote photoplethysmography (rthy) system optimized for low-power devices.<n>It is designed to extract physiological signals, such as heart rate (HR), respiratory rate (RR), and oxygen saturation from facial video streams.
arXiv Detail & Related papers (2025-08-26T08:12:57Z)
Underlying Semantic Diffusion for Effective and Efficient In-Context Learning [113.4003355229632]
Underlying Semantic Diffusion (US-Diffusion) is an enhanced diffusion model that boosts underlying semantics learning, computational efficiency, and in-context learning capabilities.<n>We present a Feedback-Aided Learning (FAL) framework, which leverages feedback signals to guide the model in capturing semantic details.<n>We also propose a plug-and-play Efficient Sampling Strategy (ESS) for dense sampling at time steps with high-noise levels.
arXiv Detail & Related papers (2025-03-06T03:06:22Z)
Online hand gesture recognition using Continual Graph Transformers [1.3927943269211591]
We propose a novel online recognition system designed for real-time skeleton sequence streaming.<n>Our approach achieves state-of-the-art accuracy and significantly reduces false positive rates, making it a compelling solution for real-time applications.<n>The proposed system can be seamlessly integrated into various domains, including human-robot collaboration and assistive technologies.
arXiv Detail & Related papers (2025-02-20T17:27:55Z)
XR-VIO: High-precision Visual Inertial Odometry with Fast Initialization for XR Applications [34.2082611110639]
This paper presents a novel approach to Visual Inertial Odometry (VIO) focusing on the initialization and feature matching modules.<n>Existing methods for gyroscopes often suffer from poor stability in visual Structure from Motion (SfM) or in solving a huge number of parameters simultaneously.<n>By tightly coupling measurements, we enhance the robustness and accuracy of visual SfM.<n>In terms of feature matching, we introduce a hybrid method that combines optical flow and descriptor-based matching.
arXiv Detail & Related papers (2025-02-03T12:17:51Z)
Generative Adversarial Network on Motion-Blur Image Restoration [0.0]
We will focus on leveraging Generative Adrial Networks (GANs) to effectively deblur images affected by motion blur.<n>A GAN-based adversarialflow model is defined, training and evaluating by GoPro dataset.<n> Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity Index Measure (SSIM) are the two evaluation metrics used to provide quantitative measures of image quality.
arXiv Detail & Related papers (2024-12-27T06:12:50Z)
Time Step Generating: A Universal Synthesized Deepfake Image Detector [0.4488895231267077]
We propose a universal synthetic image detector Time Step Generating (TSG) TSG does not rely on pre-trained models' reconstructing ability, specific datasets, or sampling algorithms. We test the proposed TSG on the large-scale GenImage benchmark and it achieves significant improvements in both accuracy and generalizability.
arXiv Detail & Related papers (2024-11-17T09:39:50Z)
Cross-Scan Mamba with Masked Training for Robust Spectral Imaging [51.557804095896174]
We propose the Cross-Scanning Mamba, named CS-Mamba, that employs a Spatial-Spectral SSM for global-local balanced context encoding.<n>Experiment results show that our CS-Mamba achieves state-of-the-art performance and the masked training method can better reconstruct smooth features to improve the visual quality.
arXiv Detail & Related papers (2024-08-01T15:14:10Z)
DGNet: Dynamic Gradient-Guided Network for Water-Related Optics Image Enhancement [77.0360085530701]
Underwater image enhancement (UIE) is a challenging task due to the complex degradation caused by underwater environments. Previous methods often idealize the degradation process, and neglect the impact of medium noise and object motion on the distribution of image features. Our approach utilizes predicted images to dynamically update pseudo-labels, adding a dynamic gradient to optimize the network's gradient space.
arXiv Detail & Related papers (2023-12-12T06:07:21Z)
Physics-Driven Turbulence Image Restoration with Stochastic Refinement [80.79900297089176]
Image distortion by atmospheric turbulence is a critical problem in long-range optical imaging systems. Fast and physics-grounded simulation tools have been introduced to help the deep-learning models adapt to real-world turbulence conditions. This paper proposes the Physics-integrated Restoration Network (PiRN) to help the network to disentangle theity from the degradation and the underlying image.
arXiv Detail & Related papers (2023-07-20T05:49:21Z)
Recovering Continuous Scene Dynamics from A Single Blurry Image with Events [58.7185835546638]
An Implicit Video Function (IVF) is learned to represent a single motion blurred image with concurrent events. A dual attention transformer is proposed to efficiently leverage merits from both modalities. The proposed network is trained only with the supervision of ground-truth images of limited referenced timestamps.
arXiv Detail & Related papers (2023-04-05T18:44:17Z)
Motion-aware Dynamic Graph Neural Network for Video Compressive Sensing [14.67994875448175]
Video snapshot imaging (SCI) utilizes a 2D detector to capture sequential video frames and compress them into a single measurement. Most existing reconstruction methods are incapable of efficiently capturing long-range spatial and temporal dependencies. We propose a flexible and robust approach based on the graph neural network (GNN) to efficiently model non-local interactions between pixels in space and time regardless of the distance.
arXiv Detail & Related papers (2022-03-01T12:13:46Z)
TimeLens: Event-based Video Frame Interpolation [54.28139783383213]
We introduce Time Lens, a novel indicates equal contribution method that leverages the advantages of both synthesis-based and flow-based approaches. We show an up to 5.21 dB improvement in terms of PSNR over state-of-the-art frame-based and event-based methods.
arXiv Detail & Related papers (2021-06-14T10:33:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.