Related papers: CelebV-HQ: A Large-Scale Video Facial Attributes Dataset

CelebV-HQ: A Large-Scale Video Facial Attributes Dataset

URL: http://arxiv.org/abs/2207.12393v1
Date: Mon, 25 Jul 2022 17:57:07 GMT
Title: CelebV-HQ: A Large-Scale Video Facial Attributes Dataset
Authors: Hao Zhu, Wayne Wu, Wentao Zhu, Liming Jiang, Siwei Tang, Li Zhang, Ziwei Liu, Chen Change Loy
Abstract summary: CelebV-HQ contains 35,666 video clips with the resolution of 512x512 at least, involving 15,653 identities. We conduct a comprehensive analysis in terms of age, ethnicity, brightness stability, motion smoothness, head pose diversity, and data quality. Its versatility and potential are validated on two representative tasks, i.e., unconditional video generation and video facial attribute editing.
Score: 94.31308012569062
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Large-scale datasets have played indispensable roles in the recent success of face generation/editing and significantly facilitated the advances of emerging research fields. However, the academic community still lacks a video dataset with diverse facial attribute annotations, which is crucial for the research on face-related videos. In this work, we propose a large-scale, high-quality, and diverse video dataset with rich facial attribute annotations, named the High-Quality Celebrity Video Dataset (CelebV-HQ). CelebV-HQ contains 35,666 video clips with the resolution of 512x512 at least, involving 15,653 identities. All clips are labeled manually with 83 facial attributes, covering appearance, action, and emotion. We conduct a comprehensive analysis in terms of age, ethnicity, brightness stability, motion smoothness, head pose diversity, and data quality to demonstrate the diversity and temporal coherence of CelebV-HQ. Besides, its versatility and potential are validated on two representative tasks, i.e., unconditional video generation and video facial attribute editing. Furthermore, we envision the future potential of CelebV-HQ, as well as the new opportunities and challenges it would bring to related research directions. Data, code, and models are publicly available. Project page: https://celebv-hq.github.io.

Related papers

FaVChat: Unlocking Fine-Grained Facial Video Understanding with Multimodal Large Language Models [12.029771909598647]
FaVChat is the first VMLLM specifically designed for fine-grained facial video understanding. We construct a large-scale facial video dataset comprising over 60k videos, with the majority annotated with 83 fine-grained facial attributes. We employ a progressive training paradigm, transitioning from video summarization to a high-quality subset of video QA, gradually increasing task complexity to enhance the model's fine-grained visual perception.
arXiv Detail & Related papers (2025-03-12T08:33:46Z)
MovieBench: A Hierarchical Movie Level Dataset for Long Video Generation [62.85764872989189]
There is no publicly available dataset tailored for the analysis, evaluation, and training of long video generation models. We present MovieBench: A Hierarchical Movie-Level dataset for Long Video Generation. The dataset will be public and continuously maintained, aiming to advance the field of long video generation.
arXiv Detail & Related papers (2024-11-22T10:25:08Z)
FaceVid-1K: A Large-Scale High-Quality Multiracial Human Face Video Dataset [15.917564646478628]
We create a high-quality multiracial face collection named textbfFaceVid-1K. We conduct experiments with several well-established video generation models, including text-to-video, image-to-video, and unconditional video generation. We obtain the corresponding performance benchmarks and compared them with those trained on public datasets to demonstrate the superiority of our dataset.
arXiv Detail & Related papers (2024-09-23T07:27:02Z)
CelebV-Text: A Large-Scale Facial Text-Video Dataset [91.22496444328151]
CelebV-Text is a large-scale, diverse, and high-quality dataset of facial text-video pairs. CelebV-Text comprises 70,000 in-the-wild face video clips with diverse visual content, each paired with 20 texts generated using the proposed semi-automatic text generation strategy. The superiority of CelebV-Text over other datasets is demonstrated via comprehensive statistical analysis of the videos, texts, and text-video relevance.
arXiv Detail & Related papers (2023-03-26T13:06:35Z)
NaQ: Leveraging Narrations as Queries to Supervise Episodic Memory [92.98552727430483]
Narrations-as-Queries (NaQ) is a data augmentation strategy that transforms standard video-text narrations into training data for a video query localization model. NaQ improves multiple top models by substantial margins (even doubling their accuracy) We also demonstrate unique properties of our approach such as the ability to perform zero-shot and few-shot NLQ, and improved performance on queries about long-tail object categories.
arXiv Detail & Related papers (2023-01-02T16:40:15Z)
MEVID: Multi-view Extended Videos with Identities for Video Person Re-Identification [17.72434646703505]
We present the Multi-view Extended Videos with Identities (MEVID) dataset for large-scale, video person re-identification (ReID) in the wild. We label the identities of 158 unique people wearing 598 outfits taken from 8, 092 tracklets, average length of about 590 frames. Being based on the MEVA video dataset, we also inherit data that is intentionally demographically balanced to the continental United States.
arXiv Detail & Related papers (2022-11-09T03:07:31Z)
Multiface: A Dataset for Neural Face Rendering [108.44505415073579]
In this work, we present Multiface, a new multi-view, high-resolution human face dataset. We introduce Mugsy, a large scale multi-camera apparatus to capture high-resolution synchronized videos of a facial performance. The goal of Multiface is to close the gap in accessibility to high quality data in the academic community and to enable research in VR telepresence.
arXiv Detail & Related papers (2022-07-22T17:55:39Z)
Video Person Re-identification using Attribute-enhanced Features [49.68392018281875]
We propose a novel network architecture named Attribute Salience Assisted Network (ASA-Net) for attribute-assisted video person Re-ID. To learn a better separation of the target from background, we propose to learn the visual attention from middle-level attribute instead of high-level identities.
arXiv Detail & Related papers (2021-08-16T07:41:27Z)
Robust Character Labeling in Movie Videos: Data Resources and Self-supervised Feature Adaptation [39.373699774220775]
We present a dataset of over 169,000 face tracks curated from 240 Hollywood movies with weak labels. We propose an offline algorithm based on nearest-neighbor search in the embedding space to mine hard-examples from these tracks. Overall, we find that multiview correlation-based adaptation yields more discriminative and robust face embeddings.
arXiv Detail & Related papers (2020-08-25T22:07:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.