Categorical Knowledge Fused Recognition: Fusing Hierarchical Knowledge with Image Classification through Aligning and Deep Metric Learning
- URL: http://arxiv.org/abs/2407.20600v2
- Date: Sun, 12 Jan 2025 08:15:52 GMT
- Title: Categorical Knowledge Fused Recognition: Fusing Hierarchical Knowledge with Image Classification through Aligning and Deep Metric Learning
- Authors: Yunfeng Zhao, Huiyu Zhou, Fei Wu, Xifeng Wu,
- Abstract summary: We propose a novel deep metric learning based method to fuse prior knowledge about image categories with mainstream backbone image classification models.
The proposed method is effective in enhancing the reasoning aspect of image recognition in terms of weakly-supervised object localization performance.
- Score: 18.534970504136254
- License:
- Abstract: Image classification is a fundamental computer vision task and an important baseline for deep metric learning. In decades efforts have been made on enhancing image classification accuracy by using deep learning models while less attention has been paid on the reasoning aspect of the recognition, i.e., predictions could be made because of background or other surrounding objects rather than the target object. Hierarchical knowledge about image categories depicts inter-class similarities or dissimilarities. Effective fusion of such knowledge with deep learning image classification models is promising in improving target object identification and enhancing the reasoning aspect of the recognition. In this paper, we propose a novel deep metric learning based method to effectively fuse prior knowledge about image categories with mainstream backbone image classification models and enhance the reasoning aspect of the recognition in an end-to-end manner. Existing deep metric learning incorporated image classification methods mainly focus on whether sampled images are from the same class. A new triplet loss function term that aligns distances in the model latent space with those in knowledge space is presented and incorporated in the proposed method to facilitate the dual-modality fusion. Extensive experiments on the CIFAR-10, CIFAR-100, Mini-ImageNet, and ImageNet-1K datasets evaluated the proposed method, and results indicate that the proposed method is effective in enhancing the reasoning aspect of image recognition in terms of weakly-supervised object localization performance.
Related papers
- Learn and Search: An Elegant Technique for Object Lookup using
Contrastive Learning [6.912349403119665]
"Learn and Search" is a novel approach for object lookup that leverages the power of contrastive learning to enhance the efficiency and effectiveness of retrieval systems.
"Learn and Search" achieves superior Similarity Grid Accuracy, showcasing its efficacy in discerning regions of utmost similarity within an image.
arXiv Detail & Related papers (2024-03-12T00:58:19Z) - Introspective Deep Metric Learning [91.47907685364036]
We propose an introspective deep metric learning framework for uncertainty-aware comparisons of images.
The proposed IDML framework improves the performance of deep metric learning through uncertainty modeling.
arXiv Detail & Related papers (2023-09-11T16:21:13Z) - Mitigating Bias: Enhancing Image Classification by Improving Model
Explanations [9.791305104409057]
Deep learning models tend to rely heavily on simple and easily discernible features in the background of images.
We introduce a mechanism that encourages the model to allocate sufficient attention to the foreground.
Our findings highlight the importance of foreground attention in enhancing model understanding and representation of the main concepts within images.
arXiv Detail & Related papers (2023-07-04T04:46:44Z) - Introspective Deep Metric Learning for Image Retrieval [80.29866561553483]
We argue that a good similarity model should consider the semantic discrepancies with caution to better deal with ambiguous images for more robust training.
We propose to represent an image using not only a semantic embedding but also an accompanying uncertainty embedding, which describes the semantic characteristics and ambiguity of an image, respectively.
The proposed IDML framework improves the performance of deep metric learning through uncertainty modeling and attains state-of-the-art results on the widely used CUB-200-2011, Cars196, and Stanford Online Products datasets.
arXiv Detail & Related papers (2022-05-09T17:51:44Z) - LEAD: Self-Supervised Landmark Estimation by Aligning Distributions of
Feature Similarity [49.84167231111667]
Existing works in self-supervised landmark detection are based on learning dense (pixel-level) feature representations from an image.
We introduce an approach to enhance the learning of dense equivariant representations in a self-supervised fashion.
We show that having such a prior in the feature extractor helps in landmark detection, even under drastically limited number of annotations.
arXiv Detail & Related papers (2022-04-06T17:48:18Z) - Hybrid Optimized Deep Convolution Neural Network based Learning Model
for Object Detection [0.0]
Object identification is one of the most fundamental and difficult issues in computer vision.
In recent years, deep learning-based object detection techniques have grabbed the public's interest.
In this study, a unique deep learning classification technique is used to create an autonomous object detecting system.
The suggested framework has a detection accuracy of 0.9864, which is greater than current techniques.
arXiv Detail & Related papers (2022-03-02T04:39:37Z) - Contrastive Object Detection Using Knowledge Graph Embeddings [72.17159795485915]
We compare the error statistics of the class embeddings learned from a one-hot approach with semantically structured embeddings from natural language processing or knowledge graphs.
We propose a knowledge-embedded design for keypoint-based and transformer-based object detection architectures.
arXiv Detail & Related papers (2021-12-21T17:10:21Z) - Unifying Remote Sensing Image Retrieval and Classification with Robust
Fine-tuning [3.6526118822907594]
We aim at unifying remote sensing image retrieval and classification with a new large-scale training and testing dataset, SF300.
We show that our framework systematically achieves a boost of retrieval and classification performance on nine different datasets compared to an ImageNet pretrained baseline.
arXiv Detail & Related papers (2021-02-26T11:01:30Z) - Distilling Localization for Self-Supervised Representation Learning [82.79808902674282]
Contrastive learning has revolutionized unsupervised representation learning.
Current contrastive models are ineffective at localizing the foreground object.
We propose a data-driven approach for learning in variance to backgrounds.
arXiv Detail & Related papers (2020-04-14T16:29:42Z) - DeepEMD: Differentiable Earth Mover's Distance for Few-Shot Learning [122.51237307910878]
We develop methods for few-shot image classification from a new perspective of optimal matching between image regions.
We employ the Earth Mover's Distance (EMD) as a metric to compute a structural distance between dense image representations.
To generate the important weights of elements in the formulation, we design a cross-reference mechanism.
arXiv Detail & Related papers (2020-03-15T08:13:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.