Benchmarking and Analyzing Generative Data for Visual Recognition
- URL: http://arxiv.org/abs/2307.13697v1
- Date: Tue, 25 Jul 2023 17:59:59 GMT
- Title: Benchmarking and Analyzing Generative Data for Visual Recognition
- Authors: Bo Li, Haotian Liu, Liangyu Chen, Yong Jae Lee, Chunyuan Li, Ziwei Liu
- Abstract summary: This work delves into the impact of generative images, primarily comparing paradigms that harness external data.
We devise textbfGenBench, a benchmark comprising 22 datasets with 2548 categories, to appraise generative data across various visual recognition tasks.
Our exhaustive benchmark and analysis spotlight generative data's promise in visual recognition, while identifying key challenges for future investigation.
- Score: 66.55174903469722
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Advancements in large pre-trained generative models have expanded their
potential as effective data generators in visual recognition. This work delves
into the impact of generative images, primarily comparing paradigms that
harness external data (\ie generative \vs retrieval \vs original).
Our key contributions are: \textbf{1) GenBench Construction:} We devise
\textbf{GenBench}, a broad benchmark comprising 22 datasets with 2548
categories, to appraise generative data across various visual recognition
tasks. \textbf{2) CLER Score:} To address the insufficient correlation of
existing metrics (\eg, FID, CLIP score) with downstream recognition
performance, we propose \textbf{CLER}, a training-free metric indicating
generative data's efficiency for recognition tasks prior to training.
\textbf{3) New Baselines:} Comparisons of generative data with retrieved data
from the same external pool help to elucidate the unique traits of generative
data. \textbf{4) External Knowledge Injection:} By fine-tuning special token
embeddings for each category via Textual Inversion, performance improves across
17 datasets, except when dealing with low-resolution reference images.
Our exhaustive benchmark and analysis spotlight generative data's promise in
visual recognition, while identifying key challenges for future investigation.
Related papers
- Weak-Annotation of HAR Datasets using Vision Foundation Models [9.948823510429902]
We propose a novel, clustering-based annotation pipeline to significantly reduce the amount of data that needs to be annotated by a human annotator.
We show that using our approach, the annotation of centroid clips suffices to achieve average labelling accuracies close to 90% across three publicly available HAR benchmark datasets.
arXiv Detail & Related papers (2024-08-09T16:46:53Z) - Diffusion Models as Data Mining Tools [87.77999285241219]
This paper demonstrates how to use generative models trained for image synthesis as tools for visual data mining.
We show that after finetuning conditional diffusion models to synthesize images from a specific dataset, we can use these models to define a typicality measure.
This measure assesses how typical visual elements are for different data labels, such as geographic location, time stamps, semantic labels, or even the presence of a disease.
arXiv Detail & Related papers (2024-07-20T17:14:31Z) - Attribute-Aware Deep Hashing with Self-Consistency for Large-Scale
Fine-Grained Image Retrieval [65.43522019468976]
We propose attribute-aware hashing networks with self-consistency for generating attribute-aware hash codes.
We develop an encoder-decoder structure network of a reconstruction task to unsupervisedly distill high-level attribute-specific vectors.
Our models are equipped with a feature decorrelation constraint upon these attribute vectors to strengthen their representative abilities.
arXiv Detail & Related papers (2023-11-21T08:20:38Z) - Modeling Entities as Semantic Points for Visual Information Extraction
in the Wild [55.91783742370978]
We propose an alternative approach to precisely and robustly extract key information from document images.
We explicitly model entities as semantic points, i.e., center points of entities are enriched with semantic information describing the attributes and relationships of different entities.
The proposed method can achieve significantly enhanced performance on entity labeling and linking, compared with previous state-of-the-art models.
arXiv Detail & Related papers (2023-03-23T08:21:16Z) - Leveraging Data Recasting to Enhance Tabular Reasoning [21.970920861791015]
Prior work has mostly relied on two data generation strategies.
The first is human annotation, which yields linguistically diverse data but is difficult to scale.
The second category for creation is synthetic generation, which is scalable and cost effective but lacks inventiveness.
arXiv Detail & Related papers (2022-11-23T00:04:57Z) - Cluster-level pseudo-labelling for source-free cross-domain facial
expression recognition [94.56304526014875]
We propose the first Source-Free Unsupervised Domain Adaptation (SFUDA) method for Facial Expression Recognition (FER)
Our method exploits self-supervised pretraining to learn good feature representations from the target data.
We validate the effectiveness of our method in four adaptation setups, proving that it consistently outperforms existing SFUDA methods when applied to FER.
arXiv Detail & Related papers (2022-10-11T08:24:50Z) - Intermediate Training on Question Answering Datasets Improves Generative
Data Augmentation [32.83012699501051]
We improve generative data augmentation by formulating the data generation as context generation task.
We cast downstream tasks into question answering format and adapt the fine-tuned context generators to the target task domain.
We demonstrate substantial improvements in performance in few-shot, zero-shot settings.
arXiv Detail & Related papers (2022-05-25T09:28:21Z) - View Distillation with Unlabeled Data for Extracting Adverse Drug
Effects from User-Generated Data [21.0706831551535]
We present an algorithm for identifying Adverse Drug Reactions in social media data.
Our model relies on the properties of the problem and the characteristics of contextual word embeddings.
We evaluate our model in the largest publicly available ADR dataset.
arXiv Detail & Related papers (2021-05-24T15:38:08Z) - Omni-supervised Facial Expression Recognition via Distilled Data [120.11782405714234]
We propose omni-supervised learning to exploit reliable samples in a large amount of unlabeled data for network training.
We experimentally verify that the new dataset can significantly improve the ability of the learned FER model.
To tackle this, we propose to apply a dataset distillation strategy to compress the created dataset into several informative class-wise images.
arXiv Detail & Related papers (2020-05-18T09:36:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.