Gestalt-Guided Image Understanding for Few-Shot Learning
- URL: http://arxiv.org/abs/2302.03922v1
- Date: Wed, 8 Feb 2023 07:39:18 GMT
- Title: Gestalt-Guided Image Understanding for Few-Shot Learning
- Authors: Kun Song, Yuchen Wu, Jiansheng Chen, Tianyu Hu, and Huimin Ma
- Abstract summary: This paper introduces Gestalt psychology to few-shot learning and proposes a plug-and-play method called GGIU.
We design Totality-Guided Image Understanding and Closure-Guided Image Understanding to extract image features.
Our method can improve the performance of existing models effectively and flexibly without retraining or fine-tuning.
- Score: 19.83265038667386
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Due to the scarcity of available data, deep learning does not perform well on
few-shot learning tasks. However, human can quickly learn the feature of a new
category from very few samples. Nevertheless, previous work has rarely
considered how to mimic human cognitive behavior and apply it to few-shot
learning. This paper introduces Gestalt psychology to few-shot learning and
proposes Gestalt-Guided Image Understanding, a plug-and-play method called
GGIU. Referring to the principle of totality and the law of closure in Gestalt
psychology, we design Totality-Guided Image Understanding and Closure-Guided
Image Understanding to extract image features. After that, a feature estimation
module is used to estimate the accurate features of images. Extensive
experiments demonstrate that our method can improve the performance of existing
models effectively and flexibly without retraining or fine-tuning. Our code is
released on https://github.com/skingorz/GGIU.
Related papers
- L-WISE: Boosting Human Image Category Learning Through Model-Based Image Selection And Enhancement [12.524893323311108]
We propose to augment visual learning in humans in a way that improves human categorization accuracy at test time.
Our learning augmentation approach consists of selecting images based on their model-estimated recognition difficulty, and (ii) using image perturbations that aid recognition for novice learners.
To the best of our knowledge, this is the first application of ANNs to increase visual learning performance in humans by enhancing category-specific features.
arXiv Detail & Related papers (2024-12-12T23:57:01Z) - Unleashing In-context Learning of Autoregressive Models for Few-shot Image Manipulation [70.95783968368124]
We introduce a novel multi-modal autoregressive model, dubbed $textbfInstaManip$.
We propose an innovative group self-attention mechanism to break down the in-context learning process into two separate stages.
Our method surpasses previous few-shot image manipulation models by a notable margin.
arXiv Detail & Related papers (2024-12-02T01:19:21Z) - Mixture of Self-Supervised Learning [2.191505742658975]
Self-supervised learning works by using a pretext task which will be trained on the model before being applied to a specific task.
Previous studies have only used one type of transformation as a pretext task.
This raises the question of how it affects if more than one pretext task is used and to use a gating network to combine all pretext tasks.
arXiv Detail & Related papers (2023-07-27T14:38:32Z) - MOCA: Self-supervised Representation Learning by Predicting Masked Online Codebook Assignments [72.6405488990753]
Self-supervised learning can be used for mitigating the greedy needs of Vision Transformer networks.
We propose a single-stage and standalone method, MOCA, which unifies both desired properties.
We achieve new state-of-the-art results on low-shot settings and strong experimental results in various evaluation protocols.
arXiv Detail & Related papers (2023-07-18T15:46:20Z) - Exploring CLIP for Assessing the Look and Feel of Images [87.97623543523858]
We introduce Contrastive Language-Image Pre-training (CLIP) models for assessing both the quality perception (look) and abstract perception (feel) of images in a zero-shot manner.
Our results show that CLIP captures meaningful priors that generalize well to different perceptual assessments.
arXiv Detail & Related papers (2022-07-25T17:58:16Z) - Learning an Adaptation Function to Assess Image Visual Similarities [0.0]
We focus here on the specific task of learning visual image similarities when analogy matters.
We propose to compare different supervised, semi-supervised and self-supervised networks, pre-trained on distinct scales and contents datasets.
Our experiments conducted on the Totally Looks Like image dataset highlight the interest of our method, by increasing the retrieval scores of the best model @1 by 2.25x.
arXiv Detail & Related papers (2022-06-03T07:15:00Z) - LibFewShot: A Comprehensive Library for Few-shot Learning [78.58842209282724]
Few-shot learning, especially few-shot image classification, has received increasing attention and witnessed significant advances in recent years.
Some recent studies implicitly show that many generic techniques or tricks, such as data augmentation, pre-training, knowledge distillation, and self-supervision, may greatly boost the performance of a few-shot learning method.
We propose a comprehensive library for few-shot learning (LibFewShot) by re-implementing seventeen state-of-the-art few-shot learning methods in a unified framework with the same single intrinsic in PyTorch.
arXiv Detail & Related papers (2021-09-10T14:12:37Z) - AugNet: End-to-End Unsupervised Visual Representation Learning with
Image Augmentation [3.6790362352712873]
We propose AugNet, a new deep learning training paradigm to learn image features from a collection of unlabeled pictures.
Our experiments demonstrate that the method is able to represent the image in low dimensional space.
Unlike many deep-learning-based image retrieval algorithms, our approach does not require access to external annotated datasets.
arXiv Detail & Related papers (2021-06-11T09:02:30Z) - Learning to Focus: Cascaded Feature Matching Network for Few-shot Image
Recognition [38.49419948988415]
Deep networks can learn to accurately recognize objects of a category by training on a large number of images.
A meta-learning challenge known as a low-shot image recognition task comes when only a few images with annotations are available for learning a recognition model for one category.
Our method, called Cascaded Feature Matching Network (CFMN), is proposed to solve this problem.
Experiments for few-shot learning on two standard datasets, emphminiImageNet and Omniglot, have confirmed the effectiveness of our method.
arXiv Detail & Related papers (2021-01-13T11:37:28Z) - Distilling Localization for Self-Supervised Representation Learning [82.79808902674282]
Contrastive learning has revolutionized unsupervised representation learning.
Current contrastive models are ineffective at localizing the foreground object.
We propose a data-driven approach for learning in variance to backgrounds.
arXiv Detail & Related papers (2020-04-14T16:29:42Z) - Memory-Efficient Incremental Learning Through Feature Adaptation [71.1449769528535]
We introduce an approach for incremental learning that preserves feature descriptors of training images from previously learned classes.
Keeping the much lower-dimensional feature embeddings of images reduces the memory footprint significantly.
Experimental results show that our method achieves state-of-the-art classification accuracy in incremental learning benchmarks.
arXiv Detail & Related papers (2020-04-01T21:16:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.