FoPro: Few-Shot Guided Robust Webly-Supervised Prototypical Learning
- URL: http://arxiv.org/abs/2212.00465v1
- Date: Thu, 1 Dec 2022 12:39:03 GMT
- Title: FoPro: Few-Shot Guided Robust Webly-Supervised Prototypical Learning
- Authors: Yulei Qin, Xingyu Chen, Chao Chen, Yunhang Shen, Bo Ren, Yun Gu, Jie
Yang, Chunhua Shen
- Abstract summary: We propose a Few-shot guided Prototypical (FoPro) representation learning method.
FoPro is trained on web datasets with a few real-world examples guided and evaluated on real-world datasets.
Our method achieves the state-of-the-art performance on three fine-grained datasets and two large-scale datasets.
- Score: 82.75157675790553
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recently, webly supervised learning (WSL) has been studied to leverage
numerous and accessible data from the Internet. Most existing methods focus on
learning noise-robust models from web images while neglecting the performance
drop caused by the differences between web domain and real-world domain.
However, only by tackling the performance gap above can we fully exploit the
practical value of web datasets. To this end, we propose a Few-shot guided
Prototypical (FoPro) representation learning method, which only needs a few
labeled examples from reality and can significantly improve the performance in
the real-world domain. Specifically, we initialize each class center with
few-shot real-world data as the ``realistic" prototype. Then, the intra-class
distance between web instances and ``realistic" prototypes is narrowed by
contrastive learning. Finally, we measure image-prototype distance with a
learnable metric. Prototypes are polished by adjacent high-quality web images
and involved in removing distant out-of-distribution samples. In experiments,
FoPro is trained on web datasets with a few real-world examples guided and
evaluated on real-world datasets. Our method achieves the state-of-the-art
performance on three fine-grained datasets and two large-scale datasets.
Compared with existing WSL methods under the same few-shot settings, FoPro
still excels in real-world generalization. Code is available at
https://github.com/yuleiqin/fopro.
Related papers
- Enhancing Environmental Robustness in Few-shot Learning via Conditional Representation Learning [27.549889991320203]
Few-shot learning has been extensively utilized to overcome the scarcity of training data in domain-specific visual recognition.
In real-world scenarios, environmental factors such as complex backgrounds, varying lighting conditions, long-distance shooting, and moving targets often cause test images to exhibit numerous incomplete targets or noise disruptions.
We propose a novel conditional representation learning network (CRLNet) that integrates the interactions between training and testing images as conditional information in their respective representation processes.
arXiv Detail & Related papers (2025-02-03T09:18:03Z) - Can Out-of-Domain data help to Learn Domain-Specific Prompts for Multimodal Misinformation Detection? [14.722270908687216]
Domain-specific Prompt tuning can exploit out-of-domain data during training to improve fake news detection of all desired domains simultaneously.
Experiments on the large-scale NewsCLIPpings and VERITE benchmarks demonstrate that DPOD achieves state-the-art performance for this challenging task.
arXiv Detail & Related papers (2023-11-27T08:49:26Z) - CAPro: Webly Supervised Learning with Cross-Modality Aligned Prototypes [93.71909293023663]
Cross-modality Aligned Prototypes (CAPro) is a unified contrastive learning framework to learn visual representations with correct semantics.
CAPro achieves new state-of-the-art performance and exhibits robustness to open-set recognition.
arXiv Detail & Related papers (2023-10-15T07:20:22Z) - What Makes for Effective Few-shot Point Cloud Classification? [18.62689395276194]
We show that 3D few-shot learning is more challenging with unordered structures, high intra-class variances, and subtle inter-class differences.
We propose a novel plug-and-play component called Cross-Instance Adaptation (CIA) module, to address the high intra-class variances and subtle inter-class differences issues.
arXiv Detail & Related papers (2023-03-31T15:55:06Z) - Internet Explorer: Targeted Representation Learning on the Open Web [121.02587846761627]
Modern vision models typically rely on fine-tuning general-purpose models pre-trained on large, static datasets.
We propose dynamically utilizing the Internet to quickly train a small-scale model that does extremely well on the task at hand.
Our approach, called Internet Explorer, explores the web in a self-supervised manner to progressively find relevant examples that improve performance on a desired target dataset.
arXiv Detail & Related papers (2023-02-27T18:59:55Z) - Pushing the Limits of Simple Pipelines for Few-Shot Learning: External
Data and Fine-Tuning Make a Difference [74.80730361332711]
Few-shot learning is an important and topical problem in computer vision.
We show that a simple transformer-based pipeline yields surprisingly good performance on standard benchmarks.
arXiv Detail & Related papers (2022-04-15T02:55:58Z) - What Stops Learning-based 3D Registration from Working in the Real
World? [53.68326201131434]
This work identifies the sources of 3D point cloud registration failures, analyze the reasons behind them, and propose solutions.
Ultimately, this translates to a best-practice 3D registration network (BPNet), constituting the first learning-based method able to handle previously-unseen objects in real-world data.
Our model generalizes to real data without any fine-tuning, reaching an accuracy of up to 67% on point clouds of unseen objects obtained with a commercial sensor.
arXiv Detail & Related papers (2021-11-19T19:24:27Z) - Revisiting Contrastive Methods for Unsupervised Learning of Visual
Representations [78.12377360145078]
Contrastive self-supervised learning has outperformed supervised pretraining on many downstream tasks like segmentation and object detection.
In this paper, we first study how biases in the dataset affect existing methods.
We show that current contrastive approaches work surprisingly well across: (i) object- versus scene-centric, (ii) uniform versus long-tailed and (iii) general versus domain-specific datasets.
arXiv Detail & Related papers (2021-06-10T17:59:13Z) - Virtual to Real adaptation of Pedestrian Detectors [9.432150710329607]
ViPeD is a new synthetically generated set of images collected with the graphical engine of the video game GTA V - Grand Theft Auto V.
We propose two different Domain Adaptation techniques suitable for the pedestrian detection task, but possibly applicable to general object detection.
Experiments show that the network trained with ViPeD can generalize over unseen real-world scenarios better than the detector trained over real-world data.
arXiv Detail & Related papers (2020-01-09T14:50:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.