ExpNet: A unified network for Expert-Level Classification
- URL: http://arxiv.org/abs/2211.15672v1
- Date: Tue, 29 Nov 2022 12:20:25 GMT
- Title: ExpNet: A unified network for Expert-Level Classification
- Authors: Junde Wu, Huihui Fang, Yehui Yang, Yu Zhang, Haoyi Xiong, Huazhu Fu,
Yanwu Xu
- Abstract summary: We propose Expert Network (ExpNet) to address the unique challenges of expert-level classification through a unified network.
In ExpNet, we hierarchically decouple the part and context features and individually process them using a novel attentive mechanism, called Gaze-Shift.
We conduct the experiments over three representative expert-level classification tasks: FGVC, disease classification, and artwork attributes classification.
- Score: 40.109357254623085
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Different from the general visual classification, some classification tasks
are more challenging as they need the professional categories of the images. In
the paper, we call them expert-level classification. Previous fine-grained
vision classification (FGVC) has made many efforts on some of its specific
sub-tasks. However, they are difficult to expand to the general cases which
rely on the comprehensive analysis of part-global correlation and the
hierarchical features interaction. In this paper, we propose Expert Network
(ExpNet) to address the unique challenges of expert-level classification
through a unified network. In ExpNet, we hierarchically decouple the part and
context features and individually process them using a novel attentive
mechanism, called Gaze-Shift. In each stage, Gaze-Shift produces a focal-part
feature for the subsequent abstraction and memorizes a context-related
embedding. Then we fuse the final focal embedding with all memorized
context-related embedding to make the prediction. Such an architecture realizes
the dual-track processing of partial and global information and hierarchical
feature interactions. We conduct the experiments over three representative
expert-level classification tasks: FGVC, disease classification, and artwork
attributes classification. In these experiments, superior performance of our
ExpNet is observed comparing to the state-of-the-arts in a wide range of
fields, indicating the effectiveness and generalization of our ExpNet. The code
will be made publicly available.
Related papers
- A separability-based approach to quantifying generalization: which layer is best? [0.0]
Generalization to unseen data remains poorly understood for deep learning classification and foundation models.
We provide a new method for evaluating the capacity of networks to represent a sampled domain.
We find that (i) high classification accuracy does not imply high generalizability; and (ii) deeper layers in a model do not always generalize the best.
arXiv Detail & Related papers (2024-05-02T17:54:35Z) - Improving Visual Recognition with Hyperbolical Visual Hierarchy Mapping [33.405667735101595]
We propose a Visual Hierarchy Mapper (Hi-Mapper) for enhancing the structured understanding of the pre-trained Deep Neural Networks (DNNs)
Hi-Mapper investigates the hierarchical organization of the visual scene by 1) pre-defining a hierarchy tree through the encapsulation of probability densities; and 2) learning the hierarchical relations in hyperbolic space with a novel hierarchical contrastive loss.
arXiv Detail & Related papers (2024-04-01T07:45:42Z) - AttrSeg: Open-Vocabulary Semantic Segmentation via Attribute
Decomposition-Aggregation [33.25304533086283]
Open-vocabulary semantic segmentation is a challenging task that requires segmenting novel object categories at inference time.
Recent studies have explored vision-language pre-training to handle this task, but suffer from unrealistic assumptions in practical scenarios.
This work proposes a novel attribute decomposition-aggregation framework, AttrSeg, inspired by human cognition in understanding new concepts.
arXiv Detail & Related papers (2023-08-31T19:34:09Z) - Knowledge-Aware Prompt Tuning for Generalizable Vision-Language Models [64.24227572048075]
We propose a Knowledge-Aware Prompt Tuning (KAPT) framework for vision-language models.
Our approach takes inspiration from human intelligence in which external knowledge is usually incorporated into recognizing novel categories of objects.
arXiv Detail & Related papers (2023-08-22T04:24:45Z) - Open-world Instance Segmentation: Top-down Learning with Bottom-up Supervision [83.57156368908836]
We propose a novel approach for open world instance segmentation called bottom-Up and top-Down Open-world (UDOS)
UDOS first predicts parts of objects using a top-down network trained with weak supervision from bottom-up segmentations.
UDOS enjoys both the speed and efficiency from the topdown architectures and the ability to unseen categories from bottom-up supervision.
arXiv Detail & Related papers (2023-03-09T18:55:03Z) - Association Graph Learning for Multi-Task Classification with Category
Shifts [68.58829338426712]
We focus on multi-task classification, where related classification tasks share the same label space and are learned simultaneously.
We learn an association graph to transfer knowledge among tasks for missing classes.
Our method consistently performs better than representative baselines.
arXiv Detail & Related papers (2022-10-10T12:37:41Z) - Deep Hierarchical Semantic Segmentation [76.40565872257709]
hierarchical semantic segmentation (HSS) aims at structured, pixel-wise description of visual observation in terms of a class hierarchy.
HSSN casts HSS as a pixel-wise multi-label classification task, only bringing minimal architecture change to current segmentation models.
With hierarchy-induced margin constraints, HSSN reshapes the pixel embedding space, so as to generate well-structured pixel representations.
arXiv Detail & Related papers (2022-03-27T15:47:44Z) - Visual Boundary Knowledge Translation for Foreground Segmentation [57.32522585756404]
We make an attempt towards building models that explicitly account for visual boundary knowledge, in hope to reduce the training effort on segmenting unseen categories.
With only tens of labeled samples as guidance, Trans-Net achieves close results on par with fully supervised methods.
arXiv Detail & Related papers (2021-08-01T07:10:25Z) - Classification of Consumer Belief Statements From Social Media [0.0]
We study how complex expert annotations can be leveraged successfully for classification.
We find that automated class abstraction approaches perform remarkably well against domain expert baseline on text classification tasks.
arXiv Detail & Related papers (2021-06-29T15:25:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.