Related papers: Benchmarking and Analyzing Generative Data for Visual Recognition

Benchmarking and Analyzing Generative Data for Visual Recognition

URL: http://arxiv.org/abs/2307.13697v1
Date: Tue, 25 Jul 2023 17:59:59 GMT
Title: Benchmarking and Analyzing Generative Data for Visual Recognition
Authors: Bo Li, Haotian Liu, Liangyu Chen, Yong Jae Lee, Chunyuan Li, Ziwei Liu
Abstract summary: This work delves into the impact of generative images, primarily comparing paradigms that harness external data. We devise textbfGenBench, a benchmark comprising 22 datasets with 2548 categories, to appraise generative data across various visual recognition tasks. Our exhaustive benchmark and analysis spotlight generative data's promise in visual recognition, while identifying key challenges for future investigation.
Score: 66.55174903469722
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Advancements in large pre-trained generative models have expanded their potential as effective data generators in visual recognition. This work delves into the impact of generative images, primarily comparing paradigms that harness external data (\ie generative \vs retrieval \vs original). Our key contributions are: \textbf{1) GenBench Construction:} We devise \textbf{GenBench}, a broad benchmark comprising 22 datasets with 2548 categories, to appraise generative data across various visual recognition tasks. \textbf{2) CLER Score:} To address the insufficient correlation of existing metrics (\eg, FID, CLIP score) with downstream recognition performance, we propose \textbf{CLER}, a training-free metric indicating generative data's efficiency for recognition tasks prior to training. \textbf{3) New Baselines:} Comparisons of generative data with retrieved data from the same external pool help to elucidate the unique traits of generative data. \textbf{4) External Knowledge Injection:} By fine-tuning special token embeddings for each category via Textual Inversion, performance improves across 17 datasets, except when dealing with low-resolution reference images. Our exhaustive benchmark and analysis spotlight generative data's promise in visual recognition, while identifying key challenges for future investigation.

Related papers

Object Recognition Datasets and Challenges: A Review [5.638005500131518]
We provide a detailed analysis of datasets in the highly investigated object recognition areas.<n>We present an overview of the prominent object recognition benchmarks and competitions.<n>All introduced datasets and challenges can be found online at.com/AbtinDjavadifar/ORDC.
arXiv Detail & Related papers (2025-07-30T03:56:37Z)
Metadata-based Data Exploration with Retrieval-Augmented Generation for Large Language Models [3.7685718201378746]
This research introduces a new architecture for data exploration which employs a form of Retrieval-Augmented Generation (RAG) to enhance metadata-based data discovery. The proposed framework offers a new method for evaluating semantic similarity among heterogeneous data sources.
arXiv Detail & Related papers (2024-10-05T17:11:37Z)
Weak-Annotation of HAR Datasets using Vision Foundation Models [9.948823510429902]
We propose a novel, clustering-based annotation pipeline to significantly reduce the amount of data that needs to be annotated by a human annotator. We show that using our approach, the annotation of centroid clips suffices to achieve average labelling accuracies close to 90% across three publicly available HAR benchmark datasets.
arXiv Detail & Related papers (2024-08-09T16:46:53Z)
Diffusion Models as Data Mining Tools [87.77999285241219]
This paper demonstrates how to use generative models trained for image synthesis as tools for visual data mining. We show that after finetuning conditional diffusion models to synthesize images from a specific dataset, we can use these models to define a typicality measure. This measure assesses how typical visual elements are for different data labels, such as geographic location, time stamps, semantic labels, or even the presence of a disease.
arXiv Detail & Related papers (2024-07-20T17:14:31Z)
Attribute-Aware Deep Hashing with Self-Consistency for Large-Scale Fine-Grained Image Retrieval [65.43522019468976]
We propose attribute-aware hashing networks with self-consistency for generating attribute-aware hash codes. We develop an encoder-decoder structure network of a reconstruction task to unsupervisedly distill high-level attribute-specific vectors. Our models are equipped with a feature decorrelation constraint upon these attribute vectors to strengthen their representative abilities.
arXiv Detail & Related papers (2023-11-21T08:20:38Z)
Modeling Entities as Semantic Points for Visual Information Extraction in the Wild [55.91783742370978]
We propose an alternative approach to precisely and robustly extract key information from document images. We explicitly model entities as semantic points, i.e., center points of entities are enriched with semantic information describing the attributes and relationships of different entities. The proposed method can achieve significantly enhanced performance on entity labeling and linking, compared with previous state-of-the-art models.
arXiv Detail & Related papers (2023-03-23T08:21:16Z)
Leveraging Data Recasting to Enhance Tabular Reasoning [21.970920861791015]
Prior work has mostly relied on two data generation strategies. The first is human annotation, which yields linguistically diverse data but is difficult to scale. The second category for creation is synthetic generation, which is scalable and cost effective but lacks inventiveness.
arXiv Detail & Related papers (2022-11-23T00:04:57Z)
Cluster-level pseudo-labelling for source-free cross-domain facial expression recognition [94.56304526014875]
We propose the first Source-Free Unsupervised Domain Adaptation (SFUDA) method for Facial Expression Recognition (FER) Our method exploits self-supervised pretraining to learn good feature representations from the target data. We validate the effectiveness of our method in four adaptation setups, proving that it consistently outperforms existing SFUDA methods when applied to FER.
arXiv Detail & Related papers (2022-10-11T08:24:50Z)
Intermediate Training on Question Answering Datasets Improves Generative Data Augmentation [32.83012699501051]
We improve generative data augmentation by formulating the data generation as context generation task. We cast downstream tasks into question answering format and adapt the fine-tuned context generators to the target task domain. We demonstrate substantial improvements in performance in few-shot, zero-shot settings.
arXiv Detail & Related papers (2022-05-25T09:28:21Z)
View Distillation with Unlabeled Data for Extracting Adverse Drug Effects from User-Generated Data [21.0706831551535]
We present an algorithm for identifying Adverse Drug Reactions in social media data. Our model relies on the properties of the problem and the characteristics of contextual word embeddings. We evaluate our model in the largest publicly available ADR dataset.
arXiv Detail & Related papers (2021-05-24T15:38:08Z)
Omni-supervised Facial Expression Recognition via Distilled Data [120.11782405714234]
We propose omni-supervised learning to exploit reliable samples in a large amount of unlabeled data for network training. We experimentally verify that the new dataset can significantly improve the ability of the learned FER model. To tackle this, we propose to apply a dataset distillation strategy to compress the created dataset into several informative class-wise images.
arXiv Detail & Related papers (2020-05-18T09:36:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.