Related papers: Few-Shot Image Classification Benchmarks are Too Far From Reality: Build Back Better with Semantic Task Sampling

Few-Shot Image Classification Benchmarks are Too Far From Reality: Build Back Better with Semantic Task Sampling

URL: http://arxiv.org/abs/2205.05155v1
Date: Tue, 10 May 2022 20:25:43 GMT
Title: Few-Shot Image Classification Benchmarks are Too Far From Reality: Build Back Better with Semantic Task Sampling
Authors: Etienne Bennequin, Myriam Tami, Antoine Toubhans, Celine Hudelot
Abstract summary: We introduce a new benchmark for Few-Shot Image Classification using the Danish Fungi 2020 dataset. This benchmark proposes a wide variety of evaluation tasks with various fine-graininess. Our experiments bring out the correlation between the difficulty of a task and the semantic similarity between its classes.
Score: 4.855663359344748
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Every day, a new method is published to tackle Few-Shot Image Classification, showing better and better performances on academic benchmarks. Nevertheless, we observe that these current benchmarks do not accurately represent the real industrial use cases that we encountered. In this work, through both qualitative and quantitative studies, we expose that the widely used benchmark tieredImageNet is strongly biased towards tasks composed of very semantically dissimilar classes e.g. bathtub, cabbage, pizza, schipperke, and cardoon. This makes tieredImageNet (and similar benchmarks) irrelevant to evaluate the ability of a model to solve real-life use cases usually involving more fine-grained classification. We mitigate this bias using semantic information about the classes of tieredImageNet and generate an improved, balanced benchmark. Going further, we also introduce a new benchmark for Few-Shot Image Classification using the Danish Fungi 2020 dataset. This benchmark proposes a wide variety of evaluation tasks with various fine-graininess. Moreover, this benchmark includes many-way tasks (e.g. composed of 100 classes), which is a challenging setting yet very common in industrial applications. Our experiments bring out the correlation between the difficulty of a task and the semantic similarity between its classes, as well as a heavy performance drop of state-of-the-art methods on many-way few-shot classification, raising questions about the scaling abilities of these methods. We hope that our work will encourage the community to further question the quality of standard evaluation processes and their relevance to real-life applications.

Related papers

Zero-Shot Fine-Grained Image Classification Using Large Vision-Language Models [4.499940819352075]
Large Vision-Language Models (LVLMs) have demonstrated impressive performance on vision-language reasoning tasks.<n>We present a novel method that transforms zero-shot fine-grained image classification into a visual question-answering framework.<n>Our proposed method consistently outperforms the current state-of-the-art (SOTA) approach.
arXiv Detail & Related papers (2025-10-04T18:56:41Z)
Inherent Trade-Offs between Diversity and Stability in Multi-Task Benchmarks [20.24270790628136]
We examine multi-task benchmarks in machine learning through the lens of social choice theory. We show that the more diverse a benchmark, the more sensitive to trivial changes it is.
arXiv Detail & Related papers (2024-05-02T20:28:54Z)
Classes Are Not Equal: An Empirical Study on Image Recognition Fairness [100.36114135663836]
We experimentally demonstrate that classes are not equal and the fairness issue is prevalent for image classification models across various datasets. Our findings reveal that models tend to exhibit greater prediction biases for classes that are more challenging to recognize. Data augmentation and representation learning algorithms improve overall performance by promoting fairness to some degree in image classification.
arXiv Detail & Related papers (2024-02-28T07:54:50Z)
When Benchmarks are Targets: Revealing the Sensitivity of Large Language Model Leaderboards [9.751405901938895]
We show that under existing leaderboards, the relative performance of LLMs is highly sensitive to minute details. We show that for popular multiple-choice question benchmarks (e.g., MMLU), minor perturbations to the benchmark, such as changing the order of choices or the method of answer selection, result in changes in rankings up to 8 positions.
arXiv Detail & Related papers (2024-02-01T19:12:25Z)
Fine-Grained ImageNet Classification in the Wild [0.0]
Robustness tests can uncover several vulnerabilities and biases which go unnoticed during the typical model evaluation stage. In our work, we perform fine-grained classification on closely related categories, which are identified with the help of hierarchical knowledge.
arXiv Detail & Related papers (2023-03-04T12:25:07Z)
Class-Incremental Learning: A Survey [84.30083092434938]
Class-Incremental Learning (CIL) enables the learner to incorporate the knowledge of new classes incrementally. CIL tends to catastrophically forget the characteristics of former ones, and its performance drastically degrades. We provide a rigorous and unified evaluation of 17 methods in benchmark image classification tasks to find out the characteristics of different algorithms.
arXiv Detail & Related papers (2023-02-07T17:59:05Z)
Benchopt: Reproducible, efficient and collaborative optimization benchmarks [67.29240500171532]
Benchopt is a framework to automate, reproduce and publish optimization benchmarks in machine learning. Benchopt simplifies benchmarking for the community by providing an off-the-shelf tool for running, sharing and extending experiments.
arXiv Detail & Related papers (2022-06-27T16:19:24Z)
A Closer Look at Few-Shot Video Classification: A New Baseline and Benchmark [33.86872697028233]
We present an in-depth study on few-shot video classification by making three contributions. First, we perform a consistent comparative study on the existing metric-based methods to figure out their limitations in representation learning. Second, we discover that there is a high correlation between the novel action class and the ImageNet object class, which is problematic in the few-shot recognition setting. Third, we present a new benchmark with more base data to facilitate future few-shot video classification without pre-training.
arXiv Detail & Related papers (2021-10-24T06:01:46Z)
A Contrastive Learning Approach to Auroral Identification and Classification [0.8399688944263843]
We present a novel application of unsupervised learning to the task of auroral image classification. We modify and adapt the Simple framework for Contrastive Learning of Representations (SimCLR) algorithm to learn representations of auroral images. Our approach exceeds an established threshold for operational purposes, demonstrating readiness for deployment and utilization.
arXiv Detail & Related papers (2021-09-28T17:51:25Z)
Multi-Label Image Classification with Contrastive Learning [57.47567461616912]
We show that a direct application of contrastive learning can hardly improve in multi-label cases. We propose a novel framework for multi-label classification with contrastive learning in a fully supervised setting.
arXiv Detail & Related papers (2021-07-24T15:00:47Z)
Region Comparison Network for Interpretable Few-shot Image Classification [97.97902360117368]
Few-shot image classification has been proposed to effectively use only a limited number of labeled examples to train models for new classes. We propose a metric learning based method named Region Comparison Network (RCN), which is able to reveal how few-shot learning works. We also present a new way to generalize the interpretability from the level of tasks to categories.
arXiv Detail & Related papers (2020-09-08T07:29:05Z)
I Am Going MAD: Maximum Discrepancy Competition for Comparing Classifiers Adaptively [135.7695909882746]
We name the MAximum Discrepancy (MAD) competition. We adaptively sample a small test set from an arbitrarily large corpus of unlabeled images. Human labeling on the resulting model-dependent image sets reveals the relative performance of the competing classifiers.
arXiv Detail & Related papers (2020-02-25T03:32:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.