Multimodal Prototypical Networks for Few-shot Learning
- URL: http://arxiv.org/abs/2011.08899v1
- Date: Tue, 17 Nov 2020 19:32:59 GMT
- Title: Multimodal Prototypical Networks for Few-shot Learning
- Authors: Frederik Pahde, Mihai Puscas, Tassilo Klein, Moin Nabi
- Abstract summary: Cross-modal feature generation framework is used to enrich the low populated embedding space in few-shot scenarios.
We show that in such cases nearest neighbor classification is a viable approach and outperform state-of-the-art single-modal and multimodal few-shot learning methods.
- Score: 20.100480009813953
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Although providing exceptional results for many computer vision tasks,
state-of-the-art deep learning algorithms catastrophically struggle in low data
scenarios. However, if data in additional modalities exist (e.g. text) this can
compensate for the lack of data and improve the classification results. To
overcome this data scarcity, we design a cross-modal feature generation
framework capable of enriching the low populated embedding space in few-shot
scenarios, leveraging data from the auxiliary modality. Specifically, we train
a generative model that maps text data into the visual feature space to obtain
more reliable prototypes. This allows to exploit data from additional
modalities (e.g. text) during training while the ultimate task at test time
remains classification with exclusively visual data. We show that in such cases
nearest neighbor classification is a viable approach and outperform
state-of-the-art single-modal and multimodal few-shot learning methods on the
CUB-200 and Oxford-102 datasets.
Related papers
- Improve Meta-learning for Few-Shot Text Classification with All You Can Acquire from the Tasks [10.556477506959888]
Existing methods often encounter difficulties in drawing accurate class prototypes from support set samples.
Recent approaches attempt to incorporate external knowledge or pre-trained language models to augment data, but this requires additional resources.
We propose a novel solution by adequately leveraging the information within the task itself.
arXiv Detail & Related papers (2024-10-14T12:47:11Z) - Data Adaptive Traceback for Vision-Language Foundation Models in Image Classification [34.37262622415682]
We propose a new adaptation framework called Data Adaptive Traceback.
Specifically, we utilize a zero-shot-based method to extract the most downstream task-related subset of the pre-training data.
We adopt a pseudo-label-based semi-supervised technique to reuse the pre-training images and a vision-language contrastive learning method to address the confirmation bias issue in semi-supervised learning.
arXiv Detail & Related papers (2024-07-11T18:01:58Z) - Diffusion Model is an Effective Planner and Data Synthesizer for
Multi-Task Reinforcement Learning [101.66860222415512]
Multi-Task Diffusion Model (textscMTDiff) is a diffusion-based method that incorporates Transformer backbones and prompt learning for generative planning and data synthesis.
For generative planning, we find textscMTDiff outperforms state-of-the-art algorithms across 50 tasks on Meta-World and 8 maps on Maze2D.
arXiv Detail & Related papers (2023-05-29T05:20:38Z) - Metric Based Few-Shot Graph Classification [18.785949422663233]
Few-shot learning allows employing modern deep learning models in scarce data regimes without waiving their effectiveness.
We show that a simple distance metric learning baseline with a state-of-the-art graph embedder allows to obtain competitive results on the task.
We also propose a MixUp-based online data augmentation technique acting in the latent space and show its effectiveness on the task.
arXiv Detail & Related papers (2022-06-08T06:29:46Z) - CHALLENGER: Training with Attribution Maps [63.736435657236505]
We show that utilizing attribution maps for training neural networks can improve regularization of models and thus increase performance.
In particular, we show that our generic domain-independent approach yields state-of-the-art results in vision, natural language processing and on time series tasks.
arXiv Detail & Related papers (2022-05-30T13:34:46Z) - CvS: Classification via Segmentation For Small Datasets [52.821178654631254]
This paper presents CvS, a cost-effective classifier for small datasets that derives the classification labels from predicting the segmentation maps.
We evaluate the effectiveness of our framework on diverse problems showing that CvS is able to achieve much higher classification results compared to previous methods when given only a handful of examples.
arXiv Detail & Related papers (2021-10-29T18:41:15Z) - Combining Feature and Instance Attribution to Detect Artifacts [62.63504976810927]
We propose methods to facilitate identification of training data artifacts.
We show that this proposed training-feature attribution approach can be used to uncover artifacts in training data.
We execute a small user study to evaluate whether these methods are useful to NLP researchers in practice.
arXiv Detail & Related papers (2021-07-01T09:26:13Z) - Robustness to Missing Features using Hierarchical Clustering with Split
Neural Networks [39.29536042476913]
We propose a simple yet effective approach that clusters similar input features together using hierarchical clustering.
We evaluate this approach on a series of benchmark datasets and show promising improvements even with simple imputation techniques.
arXiv Detail & Related papers (2020-11-19T00:35:08Z) - Omni-supervised Facial Expression Recognition via Distilled Data [120.11782405714234]
We propose omni-supervised learning to exploit reliable samples in a large amount of unlabeled data for network training.
We experimentally verify that the new dataset can significantly improve the ability of the learned FER model.
To tackle this, we propose to apply a dataset distillation strategy to compress the created dataset into several informative class-wise images.
arXiv Detail & Related papers (2020-05-18T09:36:51Z) - Diversity inducing Information Bottleneck in Model Ensembles [73.80615604822435]
In this paper, we target the problem of generating effective ensembles of neural networks by encouraging diversity in prediction.
We explicitly optimize a diversity inducing adversarial loss for learning latent variables and thereby obtain diversity in the output predictions necessary for modeling multi-modal data.
Compared to the most competitive baselines, we show significant improvements in classification accuracy, under a shift in the data distribution.
arXiv Detail & Related papers (2020-03-10T03:10:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.