Benchmarking and Analyzing Generative Data for Visual Recognition
- URL: http://arxiv.org/abs/2307.13697v1
- Date: Tue, 25 Jul 2023 17:59:59 GMT
- Title: Benchmarking and Analyzing Generative Data for Visual Recognition
- Authors: Bo Li, Haotian Liu, Liangyu Chen, Yong Jae Lee, Chunyuan Li, Ziwei Liu
- Abstract summary: This work delves into the impact of generative images, primarily comparing paradigms that harness external data.
We devise textbfGenBench, a benchmark comprising 22 datasets with 2548 categories, to appraise generative data across various visual recognition tasks.
Our exhaustive benchmark and analysis spotlight generative data's promise in visual recognition, while identifying key challenges for future investigation.
- Score: 66.55174903469722
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Advancements in large pre-trained generative models have expanded their
potential as effective data generators in visual recognition. This work delves
into the impact of generative images, primarily comparing paradigms that
harness external data (\ie generative \vs retrieval \vs original).
Our key contributions are: \textbf{1) GenBench Construction:} We devise
\textbf{GenBench}, a broad benchmark comprising 22 datasets with 2548
categories, to appraise generative data across various visual recognition
tasks. \textbf{2) CLER Score:} To address the insufficient correlation of
existing metrics (\eg, FID, CLIP score) with downstream recognition
performance, we propose \textbf{CLER}, a training-free metric indicating
generative data's efficiency for recognition tasks prior to training.
\textbf{3) New Baselines:} Comparisons of generative data with retrieved data
from the same external pool help to elucidate the unique traits of generative
data. \textbf{4) External Knowledge Injection:} By fine-tuning special token
embeddings for each category via Textual Inversion, performance improves across
17 datasets, except when dealing with low-resolution reference images.
Our exhaustive benchmark and analysis spotlight generative data's promise in
visual recognition, while identifying key challenges for future investigation.
Related papers
- Assessing Brittleness of Image-Text Retrieval Benchmarks from Vision-Language Models Perspective [44.045767657945895]
We focus on examining the brittleness of the ITR evaluation pipeline with a focus on concept granularity.
To investigate the performance of VLMs on coarse and fine-grained datasets, we introduce a taxonomy of perturbations.
The results demonstrate that although perturbations generally degrade model performance, the fine-grained datasets exhibit a smaller performance drop than their standard counterparts.
arXiv Detail & Related papers (2024-07-21T18:08:44Z) - Attribute-Aware Deep Hashing with Self-Consistency for Large-Scale
Fine-Grained Image Retrieval [65.43522019468976]
We propose attribute-aware hashing networks with self-consistency for generating attribute-aware hash codes.
We develop an encoder-decoder structure network of a reconstruction task to unsupervisedly distill high-level attribute-specific vectors.
Our models are equipped with a feature decorrelation constraint upon these attribute vectors to strengthen their representative abilities.
arXiv Detail & Related papers (2023-11-21T08:20:38Z) - Modeling Entities as Semantic Points for Visual Information Extraction
in the Wild [55.91783742370978]
We propose an alternative approach to precisely and robustly extract key information from document images.
We explicitly model entities as semantic points, i.e., center points of entities are enriched with semantic information describing the attributes and relationships of different entities.
The proposed method can achieve significantly enhanced performance on entity labeling and linking, compared with previous state-of-the-art models.
arXiv Detail & Related papers (2023-03-23T08:21:16Z) - Leveraging Data Recasting to Enhance Tabular Reasoning [21.970920861791015]
Prior work has mostly relied on two data generation strategies.
The first is human annotation, which yields linguistically diverse data but is difficult to scale.
The second category for creation is synthetic generation, which is scalable and cost effective but lacks inventiveness.
arXiv Detail & Related papers (2022-11-23T00:04:57Z) - Cluster-level pseudo-labelling for source-free cross-domain facial
expression recognition [94.56304526014875]
We propose the first Source-Free Unsupervised Domain Adaptation (SFUDA) method for Facial Expression Recognition (FER)
Our method exploits self-supervised pretraining to learn good feature representations from the target data.
We validate the effectiveness of our method in four adaptation setups, proving that it consistently outperforms existing SFUDA methods when applied to FER.
arXiv Detail & Related papers (2022-10-11T08:24:50Z) - Intermediate Training on Question Answering Datasets Improves Generative
Data Augmentation [32.83012699501051]
We improve generative data augmentation by formulating the data generation as context generation task.
We cast downstream tasks into question answering format and adapt the fine-tuned context generators to the target task domain.
We demonstrate substantial improvements in performance in few-shot, zero-shot settings.
arXiv Detail & Related papers (2022-05-25T09:28:21Z) - View Distillation with Unlabeled Data for Extracting Adverse Drug
Effects from User-Generated Data [21.0706831551535]
We present an algorithm for identifying Adverse Drug Reactions in social media data.
Our model relies on the properties of the problem and the characteristics of contextual word embeddings.
We evaluate our model in the largest publicly available ADR dataset.
arXiv Detail & Related papers (2021-05-24T15:38:08Z) - Partially-Aligned Data-to-Text Generation with Distant Supervision [69.15410325679635]
We propose a new generation task called Partially-Aligned Data-to-Text Generation (PADTG)
It is more practical since it utilizes automatically annotated data for training and thus considerably expands the application domains.
Our framework outperforms all baseline models as well as verify the feasibility of utilizing partially-aligned data.
arXiv Detail & Related papers (2020-10-03T03:18:52Z) - Omni-supervised Facial Expression Recognition via Distilled Data [120.11782405714234]
We propose omni-supervised learning to exploit reliable samples in a large amount of unlabeled data for network training.
We experimentally verify that the new dataset can significantly improve the ability of the learned FER model.
To tackle this, we propose to apply a dataset distillation strategy to compress the created dataset into several informative class-wise images.
arXiv Detail & Related papers (2020-05-18T09:36:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.