Related papers: Tune It or Don't Use It: Benchmarking Data-Efficient Image Classification

Tune It or Don't Use It: Benchmarking Data-Efficient Image Classification

URL: http://arxiv.org/abs/2108.13122v1
Date: Mon, 30 Aug 2021 11:24:51 GMT
Title: Tune It or Don't Use It: Benchmarking Data-Efficient Image Classification
Authors: Lorenzo Brigato, Bj\"orn Barz, Luca Iocchi, Joachim Denzler
Abstract summary: We design a benchmark for data-efficient image classification consisting of six diverse datasets spanning various domains. We re-evaluate the standard cross-entropy baseline and eight methods for data-efficient deep learning published between 2017 and 2021 at renowned venues. tuning learning rate, weight decay, and batch size on a separate validation split results in a highly competitive baseline.
Score: 9.017660524497389
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Data-efficient image classification using deep neural networks in settings, where only small amounts of labeled data are available, has been an active research area in the recent past. However, an objective comparison between published methods is difficult, since existing works use different datasets for evaluation and often compare against untuned baselines with default hyper-parameters. We design a benchmark for data-efficient image classification consisting of six diverse datasets spanning various domains (e.g., natural images, medical imagery, satellite data) and data types (RGB, grayscale, multispectral). Using this benchmark, we re-evaluate the standard cross-entropy baseline and eight methods for data-efficient deep learning published between 2017 and 2021 at renowned venues. For a fair and realistic comparison, we carefully tune the hyper-parameters of all methods on each dataset. Surprisingly, we find that tuning learning rate, weight decay, and batch size on a separate validation split results in a highly competitive baseline, which outperforms all but one specialized method and performs competitively to the remaining one.

Related papers

Rethinking Large-scale Dataset Compression: Shifting Focus From Labels to Images [60.42768987736088]
We introduce a benchmark that equitably evaluates methodologies across both distillation and pruning literatures. Our benchmark reveals that in the mainstream dataset distillation setting for large-scale datasets, even randomly selected subsets can achieve surprisingly competitive performance. We propose a new framework for dataset compression, termed Prune, Combine, and Augment (PCA), which focuses on leveraging image data exclusively.
arXiv Detail & Related papers (2025-02-10T13:11:40Z)
Efficient Curation of Invertebrate Image Datasets Using Feature Embeddings and Automatic Size Comparison [5.480305055542485]
We present a method for curating large-scale image datasets of invertebrates. Our approach is based on extracting feature embeddings with pretrained deep neural networks. Also, we show that a simple area-based size comparison approach is able to find a lot of common erroneous images.
arXiv Detail & Related papers (2024-12-20T12:35:41Z)
Additional Look into GAN-based Augmentation for Deep Learning COVID-19 Image Classification [57.1795052451257]
We study the dependence of the GAN-based augmentation performance on dataset size with a focus on small samples. We train StyleGAN2-ADA with both sets and then, after validating the quality of generated images, we use trained GANs as one of the augmentations approaches in multi-class classification problems. The GAN-based augmentation approach is found to be comparable with classical augmentation in the case of medium and large datasets but underperforms in the case of smaller datasets.
arXiv Detail & Related papers (2024-01-26T08:28:13Z)
Fine-grained Recognition with Learnable Semantic Data Augmentation [68.48892326854494]
Fine-grained image recognition is a longstanding computer vision challenge. We propose diversifying the training data at the feature-level to alleviate the discriminative region loss problem. Our method significantly improves the generalization performance on several popular classification networks.
arXiv Detail & Related papers (2023-09-01T11:15:50Z)
Exploring Data Redundancy in Real-world Image Classification through Data Selection [20.389636181891515]
Deep learning models often require large amounts of data for training, leading to increased costs. We present two data valuation metrics based on Synaptic Intelligence and gradient norms, respectively, to study redundancy in real-world image data. Online and offline data selection algorithms are then proposed via clustering and grouping based on the examined data values.
arXiv Detail & Related papers (2023-06-25T03:31:05Z)
Evaluating Graph Neural Networks for Link Prediction: Current Pitfalls and New Benchmarking [66.83273589348758]
Link prediction attempts to predict whether an unseen edge exists based on only a portion of edges of a graph. A flurry of methods have been introduced in recent years that attempt to make use of graph neural networks (GNNs) for this task. New and diverse datasets have also been created to better evaluate the effectiveness of these new models.
arXiv Detail & Related papers (2023-06-18T01:58:59Z)
Image Classification with Small Datasets: Overview and Benchmark [0.0]
We systematically organize and connect past studies to consolidate a community that is currently fragmented and scattered. We propose a common benchmark that allows for an objective comparison of approaches. We use this benchmark to re-evaluate the standard cross-entropy baseline and ten existing methods published between 2017 and 2021 at renowned venues.
arXiv Detail & Related papers (2022-12-23T17:11:16Z)
Dominant Set-based Active Learning for Text Classification and its Application to Online Social Media [0.0]
We present a novel pool-based active learning method for the training of large unlabeled corpus with minimum annotation cost. Our proposed method does not have any parameters to be tuned, making it dataset-independent. Our method achieves a higher performance in comparison to the state-of-the-art active learning strategies.
arXiv Detail & Related papers (2022-01-28T19:19:03Z)
Multi-dataset Pretraining: A Unified Model for Semantic Segmentation [97.61605021985062]
We propose a unified framework, termed as Multi-Dataset Pretraining, to take full advantage of the fragmented annotations of different datasets. This is achieved by first pretraining the network via the proposed pixel-to-prototype contrastive loss over multiple datasets. In order to better model the relationship among images and classes from different datasets, we extend the pixel level embeddings via cross dataset mixing.
arXiv Detail & Related papers (2021-06-08T06:13:11Z)
How to distribute data across tasks for meta-learning? [59.608652082495624]
We show that the optimal number of data points per task depends on the budget, but it converges to a unique constant value for large budgets. Our results suggest a simple and efficient procedure for data collection.
arXiv Detail & Related papers (2021-03-15T15:38:47Z)
A pipeline for fair comparison of graph neural networks in node classification tasks [4.418753792543564]
Graph neural networks (GNNs) have been investigated for potential applicability in multiple fields that employ graph data. There are no standard training settings to ensure fair comparisons among new methods. We introduce a standard, reproducible benchmark to which the same training settings can be applied for node classification.
arXiv Detail & Related papers (2020-12-19T07:43:05Z)
Adversarial Learning for Personalized Tag Recommendation [61.76193196463919]
We propose an end-to-end deep network which can be trained on large-scale datasets. A joint training of user-preference and visual encoding allows the network to efficiently integrate the visual preference with tagging behavior. We demonstrate the effectiveness of the proposed model on two different large-scale and publicly available datasets.
arXiv Detail & Related papers (2020-04-01T20:41:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.