Related papers: Gestalt-Guided Image Understanding for Few-Shot Learning

Gestalt-Guided Image Understanding for Few-Shot Learning

URL: http://arxiv.org/abs/2302.03922v1
Date: Wed, 8 Feb 2023 07:39:18 GMT
Title: Gestalt-Guided Image Understanding for Few-Shot Learning
Authors: Kun Song, Yuchen Wu, Jiansheng Chen, Tianyu Hu, and Huimin Ma
Abstract summary: This paper introduces Gestalt psychology to few-shot learning and proposes a plug-and-play method called GGIU. We design Totality-Guided Image Understanding and Closure-Guided Image Understanding to extract image features. Our method can improve the performance of existing models effectively and flexibly without retraining or fine-tuning.
Score: 19.83265038667386
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Due to the scarcity of available data, deep learning does not perform well on few-shot learning tasks. However, human can quickly learn the feature of a new category from very few samples. Nevertheless, previous work has rarely considered how to mimic human cognitive behavior and apply it to few-shot learning. This paper introduces Gestalt psychology to few-shot learning and proposes Gestalt-Guided Image Understanding, a plug-and-play method called GGIU. Referring to the principle of totality and the law of closure in Gestalt psychology, we design Totality-Guided Image Understanding and Closure-Guided Image Understanding to extract image features. After that, a feature estimation module is used to estimate the accurate features of images. Extensive experiments demonstrate that our method can improve the performance of existing models effectively and flexibly without retraining or fine-tuning. Our code is released on https://github.com/skingorz/GGIU.

Related papers

Unleashing In-context Learning of Autoregressive Models for Few-shot Image Manipulation [70.95783968368124]
We introduce a novel multi-modal autoregressive model, dubbed $textbfInstaManip$. We propose an innovative group self-attention mechanism to break down the in-context learning process into two separate stages. Our method surpasses previous few-shot image manipulation models by a notable margin.
arXiv Detail & Related papers (2024-12-02T01:19:21Z)
Mixture of Self-Supervised Learning [2.191505742658975]
Self-supervised learning works by using a pretext task which will be trained on the model before being applied to a specific task. Previous studies have only used one type of transformation as a pretext task. This raises the question of how it affects if more than one pretext task is used and to use a gating network to combine all pretext tasks.
arXiv Detail & Related papers (2023-07-27T14:38:32Z)
MOCA: Self-supervised Representation Learning by Predicting Masked Online Codebook Assignments [72.6405488990753]
Self-supervised learning can be used for mitigating the greedy needs of Vision Transformer networks. We propose a single-stage and standalone method, MOCA, which unifies both desired properties. We achieve new state-of-the-art results on low-shot settings and strong experimental results in various evaluation protocols.
arXiv Detail & Related papers (2023-07-18T15:46:20Z)
Learning Transferable Pedestrian Representation from Multimodal Information Supervision [174.5150760804929]
VAL-PAT is a novel framework that learns transferable representations to enhance various pedestrian analysis tasks with multimodal information. We first perform pre-training on LUPerson-TA dataset, where each image contains text and attribute annotations. We then transfer the learned representations to various downstream tasks, including person reID, person attribute recognition and text-based person search.
arXiv Detail & Related papers (2023-04-12T01:20:58Z)
Exploring CLIP for Assessing the Look and Feel of Images [87.97623543523858]
We introduce Contrastive Language-Image Pre-training (CLIP) models for assessing both the quality perception (look) and abstract perception (feel) of images in a zero-shot manner. Our results show that CLIP captures meaningful priors that generalize well to different perceptual assessments.
arXiv Detail & Related papers (2022-07-25T17:58:16Z)
Learning an Adaptation Function to Assess Image Visual Similarities [0.0]
We focus here on the specific task of learning visual image similarities when analogy matters. We propose to compare different supervised, semi-supervised and self-supervised networks, pre-trained on distinct scales and contents datasets. Our experiments conducted on the Totally Looks Like image dataset highlight the interest of our method, by increasing the retrieval scores of the best model @1 by 2.25x.
arXiv Detail & Related papers (2022-06-03T07:15:00Z)
LibFewShot: A Comprehensive Library for Few-shot Learning [78.58842209282724]
Few-shot learning, especially few-shot image classification, has received increasing attention and witnessed significant advances in recent years. Some recent studies implicitly show that many generic techniques or tricks, such as data augmentation, pre-training, knowledge distillation, and self-supervision, may greatly boost the performance of a few-shot learning method. We propose a comprehensive library for few-shot learning (LibFewShot) by re-implementing seventeen state-of-the-art few-shot learning methods in a unified framework with the same single intrinsic in PyTorch.
arXiv Detail & Related papers (2021-09-10T14:12:37Z)
AugNet: End-to-End Unsupervised Visual Representation Learning with Image Augmentation [3.6790362352712873]
We propose AugNet, a new deep learning training paradigm to learn image features from a collection of unlabeled pictures. Our experiments demonstrate that the method is able to represent the image in low dimensional space. Unlike many deep-learning-based image retrieval algorithms, our approach does not require access to external annotated datasets.
arXiv Detail & Related papers (2021-06-11T09:02:30Z)
Learning to Focus: Cascaded Feature Matching Network for Few-shot Image Recognition [38.49419948988415]
Deep networks can learn to accurately recognize objects of a category by training on a large number of images. A meta-learning challenge known as a low-shot image recognition task comes when only a few images with annotations are available for learning a recognition model for one category. Our method, called Cascaded Feature Matching Network (CFMN), is proposed to solve this problem. Experiments for few-shot learning on two standard datasets, emphminiImageNet and Omniglot, have confirmed the effectiveness of our method.
arXiv Detail & Related papers (2021-01-13T11:37:28Z)
Distilling Localization for Self-Supervised Representation Learning [82.79808902674282]
Contrastive learning has revolutionized unsupervised representation learning. Current contrastive models are ineffective at localizing the foreground object. We propose a data-driven approach for learning in variance to backgrounds.
arXiv Detail & Related papers (2020-04-14T16:29:42Z)
Memory-Efficient Incremental Learning Through Feature Adaptation [71.1449769528535]
We introduce an approach for incremental learning that preserves feature descriptors of training images from previously learned classes. Keeping the much lower-dimensional feature embeddings of images reduces the memory footprint significantly. Experimental results show that our method achieves state-of-the-art classification accuracy in incremental learning benchmarks.
arXiv Detail & Related papers (2020-04-01T21:16:05Z)
Deep Image Feature Learning with Fuzzy Rules [25.4399762282053]
The paper proposes a more interpretable and scalable feature learning method, i.e., deep image feature learning with fuzzy rules (DIFL-FR) The method progressively learns image features through a layer-by-layer manner based on fuzzy rules, so the feature learning process can be better explained by the generated rules. In addition, the method is under the settings of unsupervised learning and can be easily extended to scenes of supervised and semi-supervised learning.
arXiv Detail & Related papers (2019-05-25T11:33:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.