Tree Structure-Aware Few-Shot Image Classification via Hierarchical
Aggregation
- URL: http://arxiv.org/abs/2207.06989v1
- Date: Thu, 14 Jul 2022 15:17:19 GMT
- Title: Tree Structure-Aware Few-Shot Image Classification via Hierarchical
Aggregation
- Authors: Min Zhang and Siteng Huang and Wenbin Li and Donglin Wang
- Abstract summary: We focus on how to learn additional feature representations for few-shot image classification through pretext tasks.
This additional knowledge can further improve the performance of few-shot learning.
We present a plug-in Hierarchical Tree Structure-aware (HTS) method, which learns the relationship of FSL and pretext tasks.
- Score: 27.868736254566397
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we mainly focus on the problem of how to learn additional
feature representations for few-shot image classification through pretext tasks
(e.g., rotation or color permutation and so on). This additional knowledge
generated by pretext tasks can further improve the performance of few-shot
learning (FSL) as it differs from human-annotated supervision (i.e., class
labels of FSL tasks). To solve this problem, we present a plug-in Hierarchical
Tree Structure-aware (HTS) method, which not only learns the relationship of
FSL and pretext tasks, but more importantly, can adaptively select and
aggregate feature representations generated by pretext tasks to maximize the
performance of FSL tasks. A hierarchical tree constructing component and a
gated selection aggregating component is introduced to construct the tree
structure and find richer transferable knowledge that can rapidly adapt to
novel classes with a few labeled images. Extensive experiments show that our
HTS can significantly enhance multiple few-shot methods to achieve new
state-of-the-art performance on four benchmark datasets. The code is available
at: https://github.com/remiMZ/HTS-ECCV22.
Related papers
- Learning Visual Hierarchies with Hyperbolic Embeddings [28.35250955426006]
We introduce a learning paradigm that can encode user-defined multi-level visual hierarchies in hyperbolic space without requiring explicit hierarchical labels.
We show significant improvements in hierarchical retrieval tasks, demonstrating the capability of our model in capturing visual hierarchies.
arXiv Detail & Related papers (2024-11-26T14:58:06Z) - SAN: Structure-Aware Network for Complex and Long-tailed Chinese Text Recognition [9.190324058948987]
We propose a structure-aware network utilizing the hierarchical composition information to improve the recognition performance of complex characters.
Experiments demonstrate that the proposed approach can significantly improve the performances of complex characters and tail characters, yielding a better overall performance.
arXiv Detail & Related papers (2024-11-10T07:41:00Z) - HGCLIP: Exploring Vision-Language Models with Graph Representations for Hierarchical Understanding [18.95003393925676]
When classifying categories at different hierarchy levels, traditional uni-modal approaches focus primarily on image features, revealing limitations in complex scenarios.
Recent studies integrating Vision-Language Models (VLMs) with class hierarchies have shown promise, yet they fall short of fully exploiting the hierarchical relationships.
We propose a novel framework that effectively combines CLIP with a deeper exploitation of the Hierarchical class structure via Graph representation learning.
arXiv Detail & Related papers (2023-11-23T15:42:42Z) - Many or Few Samples? Comparing Transfer, Contrastive and Meta-Learning
in Encrypted Traffic Classification [68.19713459228369]
We compare transfer learning, meta-learning and contrastive learning against reference Machine Learning (ML) tree-based and monolithic DL models.
We show that (i) using large datasets we can obtain more general representations, (ii) contrastive learning is the best methodology.
While ML tree-based cannot handle large tasks but fits well small tasks, by means of reusing learned representations, DL methods are reaching tree-based models performance also for small tasks.
arXiv Detail & Related papers (2023-05-21T11:20:49Z) - Deep Hierarchical Semantic Segmentation [76.40565872257709]
hierarchical semantic segmentation (HSS) aims at structured, pixel-wise description of visual observation in terms of a class hierarchy.
HSSN casts HSS as a pixel-wise multi-label classification task, only bringing minimal architecture change to current segmentation models.
With hierarchy-induced margin constraints, HSSN reshapes the pixel embedding space, so as to generate well-structured pixel representations.
arXiv Detail & Related papers (2022-03-27T15:47:44Z) - Multi-level Second-order Few-shot Learning [111.0648869396828]
We propose a Multi-level Second-order (MlSo) few-shot learning network for supervised or unsupervised few-shot image classification and few-shot action recognition.
We leverage so-called power-normalized second-order base learner streams combined with features that express multiple levels of visual abstraction.
We demonstrate respectable results on standard datasets such as Omniglot, mini-ImageNet, tiered-ImageNet, Open MIC, fine-grained datasets such as CUB Birds, Stanford Dogs and Cars, and action recognition datasets such as HMDB51, UCF101, and mini-MIT.
arXiv Detail & Related papers (2022-01-15T19:49:00Z) - Attribute Propagation Network for Graph Zero-shot Learning [57.68486382473194]
We introduce the attribute propagation network (APNet), which is composed of 1) a graph propagation model generating attribute vector for each class and 2) a parameterized nearest neighbor (NN) classifier.
APNet achieves either compelling performance or new state-of-the-art results in experiments with two zero-shot learning settings and five benchmark datasets.
arXiv Detail & Related papers (2020-09-24T16:53:40Z) - HOSE-Net: Higher Order Structure Embedded Network for Scene Graph
Generation [20.148175528691905]
This paper presents a novel structure-aware embedding-to-classifier(SEC) module to incorporate both local and global structural information of relationships into the output space.
We also propose a hierarchical semantic aggregation(HSA) module to reduce the number of subspaces by introducing higher order structural information.
The proposed HOSE-Net achieves the state-of-the-art performance on two popular benchmarks of Visual Genome and VRD.
arXiv Detail & Related papers (2020-08-12T07:58:13Z) - Group Based Deep Shared Feature Learning for Fine-grained Image
Classification [31.84610555517329]
We present a new deep network architecture that explicitly models shared features and removes their effect to achieve enhanced classification results.
We call this framework Group based deep Shared Feature Learning (GSFL) and the resulting learned network as GSFL-Net.
A key benefit of our specialized autoencoder is that it is versatile and can be combined with state-of-the-art fine-grained feature extraction models and trained together with them to improve their performance directly.
arXiv Detail & Related papers (2020-04-04T00:01:11Z) - Adversarial Continual Learning [99.56738010842301]
We propose a hybrid continual learning framework that learns a disjoint representation for task-invariant and task-specific features.
Our model combines architecture growth to prevent forgetting of task-specific skills and an experience replay approach to preserve shared skills.
arXiv Detail & Related papers (2020-03-21T02:08:17Z) - TAFSSL: Task-Adaptive Feature Sub-Space Learning for few-shot
classification [50.358839666165764]
We show that the Task-Adaptive Feature Sub-Space Learning (TAFSSL) can significantly boost the performance in Few-Shot Learning scenarios.
Specifically, we show that on the challenging miniImageNet and tieredImageNet benchmarks, TAFSSL can improve the current state-of-the-art in both transductive and semi-supervised FSL settings by more than $5%$.
arXiv Detail & Related papers (2020-03-14T16:59:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.