Related papers: Is this Generated Person Existed in Real-world? Fine-grained Detecting and Calibrating Abnormal Human-body

Is this Generated Person Existed in Real-world? Fine-grained Detecting and Calibrating Abnormal Human-body

URL: http://arxiv.org/abs/2411.14205v1
Date: Thu, 21 Nov 2024 15:13:38 GMT
Title: Is this Generated Person Existed in Real-world? Fine-grained Detecting and Calibrating Abnormal Human-body
Authors: Zeqing Wang, Qingyang Ma, Wentao Wan, Haojie Li, Keze Wang, Yonghong Tian,
Abstract summary: Existing text-to-image or text-to-video models often generate low-quality human photos that might differ considerably from real-world body structures. This paper introduces a simple yet challenging task, i.e., textbfFine-grained textbfHuman-body textbfAbnormality textbfDetection textbf(D). We propose a framework, named HumanCalibrator, which identifies and repairs abnormalities in human body structures while preserving the other content.
Score: 40.77110649866136
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent improvements in visual synthesis have significantly enhanced the depiction of generated human photos, which are pivotal due to their wide applicability and demand. Nonetheless, the existing text-to-image or text-to-video models often generate low-quality human photos that might differ considerably from real-world body structures, referred to as "abnormal human bodies". Such abnormalities, typically deemed unacceptable, pose considerable challenges in the detection and repair of them within human photos. These challenges require precise abnormality recognition capabilities, which entail pinpointing both the location and the abnormality type. Intuitively, Visual Language Models (VLMs) that have obtained remarkable performance on various visual tasks are quite suitable for this task. However, their performance on abnormality detection in human photos is quite poor. Hence, it is quite important to highlight this task for the research community. In this paper, we first introduce a simple yet challenging task, i.e., \textbf{F}ine-grained \textbf{H}uman-body \textbf{A}bnormality \textbf{D}etection \textbf{(FHAD)}, and construct two high-quality datasets for evaluation. Then, we propose a meticulous framework, named HumanCalibrator, which identifies and repairs abnormalities in human body structures while preserving the other content. Experiments indicate that our HumanCalibrator achieves high accuracy in abnormality detection and accomplishes an increase in visual comparisons while preserving the other visual content.

Related papers

Human Body Restoration with One-Step Diffusion Model and A New Benchmark [74.66514054623669]
We propose a high-quality dataset automated cropping and filtering (HQ-ACF) pipeline. This pipeline leverages existing object detection datasets and other unlabeled images to automatically crop and filter high-quality human images. We also propose emphOSDHuman, a novel one-step diffusion model for human body restoration.
arXiv Detail & Related papers (2025-02-03T14:48:40Z)
WonderHuman: Hallucinating Unseen Parts in Dynamic 3D Human Reconstruction [51.22641018932625]
We present WonderHuman to reconstruct dynamic human avatars from a monocular video for high-fidelity novel view synthesis. Our method achieves SOTA performance in producing photorealistic renderings from the given monocular video.
arXiv Detail & Related papers (2025-02-03T04:43:41Z)
GeneMAN: Generalizable Single-Image 3D Human Reconstruction from Multi-Source Human Data [61.05815629606135]
Given a single in-the-wild human photo, it remains a challenging task to reconstruct a high-fidelity 3D human model. GeneMAN builds upon a comprehensive collection of high-quality human data. GeneMAN could generate high-quality 3D human models from a single image input, outperforming prior state-of-the-art methods.
arXiv Detail & Related papers (2024-11-27T18:59:54Z)
Beyond Walking: A Large-Scale Image-Text Benchmark for Text-based Person Anomaly Search [25.907668574771705]
We propose a new task, text-based person anomaly search, locating pedestrians engaged in both routine or anomalous activities via text. To enable the training and evaluation of this new task, we construct a large-scale image-text Pedestrian Anomaly Behavior benchmark. Experiments on the proposed benchmark show that synthetic training data facilitates the fine-grained behavior retrieval, and the proposed pose-aware method arrives at 84.93% recall@1 accuracy.
arXiv Detail & Related papers (2024-11-26T09:50:15Z)
Detecting Human Artifacts from Text-to-Image Models [16.261759535724778]
This dataset contains images containing images containing images containing a human body. Images include images of poorly generated human bodies, including distorted and missing parts of the human body.
arXiv Detail & Related papers (2024-11-21T05:02:13Z)
PoseWatch: A Transformer-based Architecture for Human-centric Video Anomaly Detection Using Spatio-temporal Pose Tokenization [2.3349787245442966]
Video Anomaly Detection (VAD) presents a significant challenge in computer vision. Human-centric VAD faces additional complexities, including variations in human behavior, potential biases in data, and substantial privacy concerns related to human subjects. Recent advancements have focused on pose-based VAD, which leverages human pose as a high-level feature to mitigate privacy concerns, reduce appearance biases, and minimize background interference. In this paper, we introduce PoseWatch, a novel transformer-based architecture designed specifically for human-centric pose-based VAD.
arXiv Detail & Related papers (2024-08-27T16:40:14Z)
HumanRefiner: Benchmarking Abnormal Human Generation and Refining with Coarse-to-fine Pose-Reversible Guidance [80.97360194728705]
AbHuman is the first large-scale synthesized human benchmark focusing on anatomical anomalies. HumanRefiner is a novel plug-and-play approach for the coarse-to-fine refinement of human anomalies in text-to-image generation.
arXiv Detail & Related papers (2024-07-09T15:14:41Z)
HINT: Learning Complete Human Neural Representations from Limited Viewpoints [69.76947323932107]
We propose a NeRF-based algorithm able to learn a detailed and complete human model from limited viewing angles. As a result, our method can reconstruct complete humans even from a few viewing angles, increasing performance by more than 15% PSNR.
arXiv Detail & Related papers (2024-05-30T05:43:09Z)
AiOS: All-in-One-Stage Expressive Human Pose and Shape Estimation [55.179287851188036]
We introduce a novel all-in-one-stage framework, AiOS, for expressive human pose and shape recovery without an additional human detection step. We first employ a human token to probe a human location in the image and encode global features for each instance. Then, we introduce a joint-related token to probe the human joint in the image and encoder a fine-grained local feature.
arXiv Detail & Related papers (2024-03-26T17:59:23Z)
Towards Effective Usage of Human-Centric Priors in Diffusion Models for Text-based Human Image Generation [24.49857926071974]
Vanilla text-to-image diffusion models struggle with generating accurate human images. Existing methods address this issue mostly by fine-tuning the model with extra images or adding additional controls. This paper explores the integration of human-centric priors directly into the model fine-tuning stage.
arXiv Detail & Related papers (2024-03-08T11:59:32Z)
HyperHuman: Hyper-Realistic Human Generation with Latent Structural Diffusion [114.15397904945185]
We propose a unified framework, HyperHuman, that generates in-the-wild human images of high realism and diverse layouts. Our model enforces the joint learning of image appearance, spatial relationship, and geometry in a unified network. Our framework yields the state-of-the-art performance, generating hyper-realistic human images under diverse scenarios.
arXiv Detail & Related papers (2023-10-12T17:59:34Z)
Zolly: Zoom Focal Length Correctly for Perspective-Distorted Human Mesh Reconstruction [66.10717041384625]
Zolly is the first 3DHMR method focusing on perspective-distorted images. We propose a new camera model and a novel 2D representation, termed distortion image, which describes the 2D dense distortion scale of the human body. We extend two real-world datasets tailored for this task, all containing perspective-distorted human images.
arXiv Detail & Related papers (2023-03-24T04:22:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.