Webly Supervised Fine-Grained Recognition: Benchmark Datasets and An
Approach
- URL: http://arxiv.org/abs/2108.02399v1
- Date: Thu, 5 Aug 2021 06:28:32 GMT
- Title: Webly Supervised Fine-Grained Recognition: Benchmark Datasets and An
Approach
- Authors: Zeren Sun, Yazhou Yao, Xiu-Shen Wei, Yongshun Zhang, Fumin Shen,
Jianxin Wu, Jian Zhang, Heng-Tao Shen
- Abstract summary: We construct two new benchmark webly supervised fine-grained datasets, WebFG-496 and WebiNat-5089, respectively.
For WebiNat-5089, it contains 5089 sub-categories and more than 1.1 million web training images, which is the largest webly supervised fine-grained dataset ever.
As a minor contribution, we also propose a novel webly supervised method (termed Peer-learning'') for benchmarking these datasets.
- Score: 115.91099791629104
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learning from the web can ease the extreme dependence of deep learning on
large-scale manually labeled datasets. Especially for fine-grained recognition,
which targets at distinguishing subordinate categories, it will significantly
reduce the labeling costs by leveraging free web data. Despite its significant
practical and research value, the webly supervised fine-grained recognition
problem is not extensively studied in the computer vision community, largely
due to the lack of high-quality datasets. To fill this gap, in this paper we
construct two new benchmark webly supervised fine-grained datasets, termed
WebFG-496 and WebiNat-5089, respectively. In concretely, WebFG-496 consists of
three sub-datasets containing a total of 53,339 web training images with 200
species of birds (Web-bird), 100 types of aircrafts (Web-aircraft), and 196
models of cars (Web-car). For WebiNat-5089, it contains 5089 sub-categories and
more than 1.1 million web training images, which is the largest webly
supervised fine-grained dataset ever. As a minor contribution, we also propose
a novel webly supervised method (termed ``{Peer-learning}'') for benchmarking
these datasets.~Comprehensive experimental results and analyses on two new
benchmark datasets demonstrate that the proposed method achieves superior
performance over the competing baseline models and states-of-the-art. Our
benchmark datasets and the source codes of Peer-learning have been made
available at
{\url{https://github.com/NUST-Machine-Intelligence-Laboratory/weblyFG-dataset}}.
Related papers
- Web-Scale Visual Entity Recognition: An LLM-Driven Data Approach [56.55633052479446]
Web-scale visual entity recognition presents significant challenges due to the lack of clean, large-scale training data.
We propose a novel methodology to curate such a dataset, leveraging a multimodal large language model (LLM) for label verification, metadata generation, and rationale explanation.
Experiments demonstrate that models trained on this automatically curated data achieve state-of-the-art performance on web-scale visual entity recognition tasks.
arXiv Detail & Related papers (2024-10-31T06:55:24Z) - Decorrelating Structure via Adapters Makes Ensemble Learning Practical for Semi-supervised Learning [50.868594148443215]
In computer vision, traditional ensemble learning methods exhibit either a low training efficiency or the limited performance.
We propose a lightweight, loss-function-free, and architecture-agnostic ensemble learning by the Decorrelating Structure via Adapters (DSA) for various visual tasks.
arXiv Detail & Related papers (2024-08-08T01:31:38Z) - The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale [30.955171096569618]
FineWeb is a 15-trillion token dataset derived from 96 Common Crawl snapshots.
FineWeb-Edu is a 1.3-trillion token collection of educational text filtered from FineWeb.
arXiv Detail & Related papers (2024-06-25T13:50:56Z) - From Categories to Classifiers: Name-Only Continual Learning by Exploring the Web [118.67589717634281]
Continual learning often relies on the availability of extensive annotated datasets, an assumption that is unrealistically time-consuming and costly in practice.
We explore a novel paradigm termed name-only continual learning where time and cost constraints prohibit manual annotation.
Our proposed solution leverages the expansive and ever-evolving internet to query and download uncurated webly-supervised data for image classification.
arXiv Detail & Related papers (2023-11-19T10:43:43Z) - ELFIS: Expert Learning for Fine-grained Image Recognition Using Subsets [6.632855264705276]
We propose ELFIS, an expert learning framework for Fine-Grained Visual Recognition.
A set of neural networks-based experts are trained focusing on the meta-categories and are integrated into a multi-task framework.
Experiments show improvements in the SoTA FGVR benchmarks of up to +1.3% of accuracy using both CNNs and transformer-based networks.
arXiv Detail & Related papers (2023-03-16T12:45:19Z) - GROWN+UP: A Graph Representation Of a Webpage Network Utilizing
Pre-training [0.2538209532048866]
We introduce an agnostic deep graph neural network feature extractor that can ingest webpage structures, pre-train self-supervised on massive unlabeled data, and fine-tune to arbitrary tasks on webpages effectually.
We show that our pre-trained model achieves state-of-the-art results using multiple datasets on two very different benchmarks: webpage boilerplate removal and genre classification.
arXiv Detail & Related papers (2022-08-03T13:37:27Z) - The Klarna Product Page Dataset: Web Element Nomination with Graph
Neural Networks and Large Language Models [51.39011092347136]
We introduce the Klarna Product Page dataset, a collection of webpages that surpasses existing datasets in richness and variety.
We empirically benchmark a range of Graph Neural Networks (GNNs) on the web element nomination task.
Second, we introduce a training refinement procedure that involves identifying a small number of relevant elements from each page.
Third, we introduce the Challenge Nomination Training Procedure, a novel training approach that further boosts nomination accuracy.
arXiv Detail & Related papers (2021-11-03T12:13:52Z) - On The State of Data In Computer Vision: Human Annotations Remain
Indispensable for Developing Deep Learning Models [0.0]
High-quality labeled datasets play a crucial role in fueling the development of machine learning (ML)
Since the emergence of the ImageNet dataset and the AlexNet model in 2012, the size of new open-source labeled vision datasets has remained roughly constant.
Only a minority of publications in the computer vision community tackle supervised learning on datasets that are orders of magnitude larger than Imagenet.
arXiv Detail & Related papers (2021-07-31T00:08:21Z) - NWPU-Crowd: A Large-Scale Benchmark for Crowd Counting and Localization [101.13851473792334]
We construct a large-scale congested crowd counting and localization dataset, NWPU-Crowd, consisting of 5,109 images, in a total of 2,133,375 annotated heads with points and boxes.
Compared with other real-world datasets, it contains various illumination scenes and has the largest density range (020,033)
We describe the data characteristics, evaluate the performance of some mainstream state-of-the-art (SOTA) methods, and analyze the new problems that arise on the new data.
arXiv Detail & Related papers (2020-01-10T09:26:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.