From Categories to Classifier: Name-Only Continual Learning by Exploring
the Web
- URL: http://arxiv.org/abs/2311.11293v1
- Date: Sun, 19 Nov 2023 10:43:43 GMT
- Title: From Categories to Classifier: Name-Only Continual Learning by Exploring
the Web
- Authors: Ameya Prabhu, Hasan Abed Al Kader Hammoud, Ser-Nam Lim, Bernard
Ghanem, Philip H.S. Torr, Adel Bibi
- Abstract summary: Continual learning often relies on the availability of extensive annotated datasets, an assumption that is unrealistically time-consuming and costly in practice.
We explore a novel paradigm termed name-only continual learning where time and cost constraints prohibit manual annotation.
Our proposed solution leverages the expansive and ever-evolving internet to query and download uncurated webly-supervised data for image classification.
- Score: 125.75085825742092
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Continual Learning (CL) often relies on the availability of extensive
annotated datasets, an assumption that is unrealistically time-consuming and
costly in practice. We explore a novel paradigm termed name-only continual
learning where time and cost constraints prohibit manual annotation. In this
scenario, learners adapt to new category shifts using only category names
without the luxury of annotated training data. Our proposed solution leverages
the expansive and ever-evolving internet to query and download uncurated
webly-supervised data for image classification. We investigate the reliability
of our web data and find them comparable, and in some cases superior, to
manually annotated datasets. Additionally, we show that by harnessing the web,
we can create support sets that surpass state-of-the-art name-only
classification that create support sets using generative models or image
retrieval from LAION-5B, achieving up to 25% boost in accuracy. When applied
across varied continual learning contexts, our method consistently exhibits a
small performance gap in comparison to models trained on manually annotated
datasets. We present EvoTrends, a class-incremental dataset made from the web
to capture real-world trends, created in just minutes. Overall, this paper
underscores the potential of using uncurated webly-supervised data to mitigate
the challenges associated with manual data labeling in continual learning.
Related papers
- Just Say the Name: Online Continual Learning with Category Names Only via Data Generation [15.163200258819712]
We present an online continual learning framework - Generative Name only Continual Learning (G-NoCL)
G-NoCL employs the novel sample complexity-guided data ensembling technique DIverSity and COmplexity enhancing ensemBlER (DISCOBER) to optimally sample training data from generated data.
arXiv Detail & Related papers (2024-03-16T08:28:42Z) - A Self Supervised StyleGAN for Image Annotation and Classification with
Extremely Limited Labels [35.43549147657739]
We propose SS-StyleGAN, a self-supervised approach for image annotation and classification suitable for extremely small annotated datasets.
We show that the proposed method attains strong classification results using small labeled datasets of sizes 50 and even 10.
arXiv Detail & Related papers (2023-12-26T09:46:50Z) - Bridging the Gap: Learning Pace Synchronization for Open-World Semi-Supervised Learning [44.91863420044712]
In open-world semi-supervised learning, a machine learning model is tasked with uncovering novel categories from unlabeled data.
We introduce 1) the adaptive synchronizing marginal loss which imposes class-specific negative margins to alleviate the model bias towards seen classes, and 2) the pseudo-label contrastive clustering which exploits pseudo-labels predicted by the model to group unlabeled data from the same category together.
Our method balances the learning pace between seen and novel classes, achieving a remarkable 3% average accuracy increase on the ImageNet dataset.
arXiv Detail & Related papers (2023-09-21T09:44:39Z) - Towards Open-Domain Topic Classification [69.21234350688098]
We introduce an open-domain topic classification system that accepts user-defined taxonomy in real time.
Users will be able to classify a text snippet with respect to any candidate labels they want, and get instant response from our web interface.
arXiv Detail & Related papers (2023-06-29T20:25:28Z) - Harnessing the Power of Text-image Contrastive Models for Automatic
Detection of Online Misinformation [50.46219766161111]
We develop a self-learning model to explore the constrastive learning in the domain of misinformation identification.
Our model shows the superior performance of non-matched image-text pair detection when the training data is insufficient.
arXiv Detail & Related papers (2023-04-19T02:53:59Z) - Annotation Curricula to Implicitly Train Non-Expert Annotators [56.67768938052715]
voluntary studies often require annotators to familiarize themselves with the task, its annotation scheme, and the data domain.
This can be overwhelming in the beginning, mentally taxing, and induce errors into the resulting annotations.
We propose annotation curricula, a novel approach to implicitly train annotators.
arXiv Detail & Related papers (2021-06-04T09:48:28Z) - ORDisCo: Effective and Efficient Usage of Incremental Unlabeled Data for
Semi-supervised Continual Learning [52.831894583501395]
Continual learning assumes the incoming data are fully labeled, which might not be applicable in real applications.
We propose deep Online Replay with Discriminator Consistency (ORDisCo) to interdependently learn a classifier with a conditional generative adversarial network (GAN)
We show ORDisCo achieves significant performance improvement on various semi-supervised learning benchmark datasets for SSCL.
arXiv Detail & Related papers (2021-01-02T09:04:14Z) - A Survey on Deep Learning with Noisy Labels: How to train your model
when you cannot trust on the annotations? [21.562089974755125]
Several approaches have been proposed to improve the training of deep learning models in the presence of noisy labels.
This paper presents a survey on the main techniques in literature, in which we classify the algorithm in the following groups: robust losses, sample weighting, sample selection, meta-learning, and combined approaches.
arXiv Detail & Related papers (2020-12-05T15:45:20Z) - SLADE: A Self-Training Framework For Distance Metric Learning [75.54078592084217]
We present a self-training framework, SLADE, to improve retrieval performance by leveraging additional unlabeled data.
We first train a teacher model on the labeled data and use it to generate pseudo labels for the unlabeled data.
We then train a student model on both labels and pseudo labels to generate final feature embeddings.
arXiv Detail & Related papers (2020-11-20T08:26:10Z) - Move-to-Data: A new Continual Learning approach with Deep CNNs,
Application for image-class recognition [0.0]
It is necessary to pre-train the model at a "training recording phase" and then adjust it to the new coming data.
We propose a fast continual learning layer at the end of the neuronal network.
arXiv Detail & Related papers (2020-06-12T13:04:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.