Related papers: Face Consistency Benchmark for GenAI Video

Face Consistency Benchmark for GenAI Video

URL: http://arxiv.org/abs/2505.11425v1
Date: Fri, 16 May 2025 16:41:44 GMT
Title: Face Consistency Benchmark for GenAI Video
Authors: Michal Podstawski, Malgorzata Kudelska, Haohong Wang,
Abstract summary: This paper introduces the Face Consistency Benchmark (FCB), a framework for evaluating and comparing the consistency of characters in AI-generated videos.<n>This work represents a crucial step toward improving character consistency in AI video generation technologies.
Score: 1.137903861863692
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Video generation driven by artificial intelligence has advanced significantly, enabling the creation of dynamic and realistic content. However, maintaining character consistency across video sequences remains a major challenge, with current models struggling to ensure coherence in appearance and attributes. This paper introduces the Face Consistency Benchmark (FCB), a framework for evaluating and comparing the consistency of characters in AI-generated videos. By providing standardized metrics, the benchmark highlights gaps in existing solutions and promotes the development of more reliable approaches. This work represents a crucial step toward improving character consistency in AI video generation technologies.

Related papers

EndoGen: Conditional Autoregressive Endoscopic Video Generation [51.97720772069513]
We propose the first conditional endoscopic video generation framework, namely EndoGen.<n>Specifically, we build an autoregressive model with a tailored Spatiotemporal Grid-Frame Patterning strategy.<n>We demonstrate the effectiveness of our framework in generating high-quality, conditionally guided endoscopic content.
arXiv Detail & Related papers (2025-07-23T10:32:20Z)
VQ-Insight: Teaching VLMs for AI-Generated Video Quality Understanding via Progressive Visual Reinforcement Learning [21.35520258725298]
VQ-Insight is a novel reasoning-style framework for AIGC video quality assessment.<n>It combines image quality warm-up, general task-specific temporal learning, and joint optimization with the video generation model.<n>It consistently outperforms state-of-the-art baselines in preference comparison, multi-dimension scoring, and natural video scoring.
arXiv Detail & Related papers (2025-06-23T12:20:14Z)
InfLVG: Reinforce Inference-Time Consistent Long Video Generation with GRPO [73.33751812982342]
InfLVG is an inference-time framework that enables coherent long video generation without requiring additional long-form video data.<n>We show that InfLVG can extend video length by up to 9$times$, achieving strong consistency and semantic fidelity across scenes.
arXiv Detail & Related papers (2025-05-23T07:33:25Z)
ASurvey: Spatiotemporal Consistency in Video Generation [72.82267240482874]
Video generation schemes by leveraging a dynamic visual generation method, pushes the boundaries of Artificial Intelligence Generated Content (AIGC)<n>Recent works have aimed at addressing thetemporal consistency issue in video generation, while few literature review has been organized from this perspective.<n>We systematically review recent advances in video generation, covering five key aspects: foundation models, information representations, generation schemes, post-processing techniques, and evaluation metrics.
arXiv Detail & Related papers (2025-02-25T05:20:51Z)
Enhance-A-Video: Better Generated Video for Free [57.620595159855064]
We introduce a training-free approach to enhance the coherence and quality of DiT-based generated videos.<n>Our approach can be easily applied to most DiT-based video generation frameworks without any retraining or fine-tuning.
arXiv Detail & Related papers (2025-02-11T12:22:35Z)
Scalable Framework for Classifying AI-Generated Content Across Modalities [0.0]
This paper presents a scalable framework that integrates perceptual hashing, similarity measurement, and pseudo-labeling.<n> Comprehensive evaluations on the Defactify4 dataset demonstrate competitive performance in text and image classification tasks.<n>These results highlight the framework's potential for real-world applications as generative AI continues to evolve.
arXiv Detail & Related papers (2025-02-01T09:28:40Z)
Understanding Long Videos via LLM-Powered Entity Relation Graphs [51.13422967711056]
GraphVideoAgent is a framework that maps and monitors the evolving relationships between visual entities throughout the video sequence.<n>Our approach demonstrates remarkable effectiveness when tested against industry benchmarks.
arXiv Detail & Related papers (2025-01-27T10:57:24Z)
RepVideo: Rethinking Cross-Layer Representation for Video Generation [53.701548524818534]
We propose RepVideo, an enhanced representation framework for text-to-video diffusion models.<n>By accumulating features from neighboring layers to form enriched representations, this approach captures more stable semantic information.<n>Our experiments demonstrate that our RepVideo not only significantly enhances the ability to generate accurate spatial appearances, but also improves temporal consistency in video generation.
arXiv Detail & Related papers (2025-01-15T18:20:37Z)
Advancing Video Quality Assessment for AIGC [17.23281750562252]
We propose a novel loss function that combines mean absolute error with cross-entropy loss to mitigate inter-frame quality inconsistencies. We also introduce the innovative S2CNet technique to retain critical content, while leveraging adversarial training to enhance the model's generalization capabilities.
arXiv Detail & Related papers (2024-09-23T10:36:22Z)
TC-Bench: Benchmarking Temporal Compositionality in Text-to-Video and Image-to-Video Generation [97.96178992465511]
We argue that generated videos should incorporate the emergence of new concepts and their relation transitions like in real-world videos as time progresses. To assess the Temporal Compositionality of video generation models, we propose TC-Bench, a benchmark of meticulously crafted text prompts, corresponding ground truth videos, and robust evaluation metrics.
arXiv Detail & Related papers (2024-06-12T21:41:32Z)
AIGCBench: Comprehensive Evaluation of Image-to-Video Content Generated by AI [1.1035305628305816]
This paper introduces AIGCBench, a pioneering comprehensive benchmark designed to evaluate a variety of video generation tasks. A varied and open-domain image-text dataset that evaluates different state-of-the-art algorithms under equivalent conditions. We employ a novel text combiner and GPT-4 to create rich text prompts, which are then used to generate images via advanced Text-to-Image models.
arXiv Detail & Related papers (2024-01-03T10:08:40Z)
Upscale-A-Video: Temporal-Consistent Diffusion Model for Real-World Video Super-Resolution [65.91317390645163]
Upscale-A-Video is a text-guided latent diffusion framework for video upscaling. It ensures temporal coherence through two key mechanisms: locally, it integrates temporal layers into U-Net and VAE-Decoder, maintaining consistency within short sequences. It also offers greater flexibility by allowing text prompts to guide texture creation and adjustable noise levels to balance restoration and generation.
arXiv Detail & Related papers (2023-12-11T18:54:52Z)
HRVGAN: High Resolution Video Generation using Spatio-Temporal GAN [0.0]
We propose a novel deep generative network architecture designed specifically for high-resolution video synthesis.<n>Our approach integrates key concepts from Wasserstein Generative Adrial Networks (WGANs)<n>Our training objective combines a pixel-wise mean squared error loss with an adversarial loss to balance frame-level accuracy and video realism.
arXiv Detail & Related papers (2020-08-17T20:45:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.