Related papers: Categorical Knowledge Fused Recognition: Fusing Hierarchical Knowledge with Image Classification through Aligning and Deep Metric Learning

Categorical Knowledge Fused Recognition: Fusing Hierarchical Knowledge with Image Classification through Aligning and Deep Metric Learning

URL: http://arxiv.org/abs/2407.20600v2
Date: Sun, 12 Jan 2025 08:15:52 GMT
Title: Categorical Knowledge Fused Recognition: Fusing Hierarchical Knowledge with Image Classification through Aligning and Deep Metric Learning
Authors: Yunfeng Zhao, Huiyu Zhou, Fei Wu, Xifeng Wu,
Abstract summary: We propose a novel deep metric learning based method to fuse prior knowledge about image categories with mainstream backbone image classification models.<n>The proposed method is effective in enhancing the reasoning aspect of image recognition in terms of weakly-supervised object localization performance.
Score: 18.534970504136254
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Image classification is a fundamental computer vision task and an important baseline for deep metric learning. In decades efforts have been made on enhancing image classification accuracy by using deep learning models while less attention has been paid on the reasoning aspect of the recognition, i.e., predictions could be made because of background or other surrounding objects rather than the target object. Hierarchical knowledge about image categories depicts inter-class similarities or dissimilarities. Effective fusion of such knowledge with deep learning image classification models is promising in improving target object identification and enhancing the reasoning aspect of the recognition. In this paper, we propose a novel deep metric learning based method to effectively fuse prior knowledge about image categories with mainstream backbone image classification models and enhance the reasoning aspect of the recognition in an end-to-end manner. Existing deep metric learning incorporated image classification methods mainly focus on whether sampled images are from the same class. A new triplet loss function term that aligns distances in the model latent space with those in knowledge space is presented and incorporated in the proposed method to facilitate the dual-modality fusion. Extensive experiments on the CIFAR-10, CIFAR-100, Mini-ImageNet, and ImageNet-1K datasets evaluated the proposed method, and results indicate that the proposed method is effective in enhancing the reasoning aspect of image recognition in terms of weakly-supervised object localization performance.

Related papers

Learn and Search: An Elegant Technique for Object Lookup using Contrastive Learning [6.912349403119665]
"Learn and Search" is a novel approach for object lookup that leverages the power of contrastive learning to enhance the efficiency and effectiveness of retrieval systems. "Learn and Search" achieves superior Similarity Grid Accuracy, showcasing its efficacy in discerning regions of utmost similarity within an image.
arXiv Detail & Related papers (2024-03-12T00:58:19Z)
Stitching Gaps: Fusing Situated Perceptual Knowledge with Vision Transformers for High-Level Image Classification [0.1843404256219181]
We leverage situated perceptual knowledge of cultural images to enhance performance and interpretability in AC image classification. This resource captures situated perceptual semantics gleaned from over 14,000 cultural images labeled with ACs. We demonstrate the synergy and complementarity between KGE embeddings' situated perceptual knowledge and deep visual model's sensory-perceptual understanding for AC image classification.
arXiv Detail & Related papers (2024-02-29T16:46:48Z)
Introspective Deep Metric Learning [91.47907685364036]
We propose an introspective deep metric learning framework for uncertainty-aware comparisons of images. The proposed IDML framework improves the performance of deep metric learning through uncertainty modeling.
arXiv Detail & Related papers (2023-09-11T16:21:13Z)
Label-Free Event-based Object Recognition via Joint Learning with Image Reconstruction from Events [42.71383489578851]
We study label-free event-based object recognition where category labels and paired images are not available. Our method first reconstructs images from events and performs object recognition through Contrastive Language-Image Pre-training (CLIP) Since the category information is essential in reconstructing images, we propose category-guided attraction loss and category-agnostic repulsion loss.
arXiv Detail & Related papers (2023-08-18T08:28:17Z)
Mitigating Bias: Enhancing Image Classification by Improving Model Explanations [9.791305104409057]
Deep learning models tend to rely heavily on simple and easily discernible features in the background of images. We introduce a mechanism that encourages the model to allocate sufficient attention to the foreground. Our findings highlight the importance of foreground attention in enhancing model understanding and representation of the main concepts within images.
arXiv Detail & Related papers (2023-07-04T04:46:44Z)
EAML: Ensemble Self-Attention-based Mutual Learning Network for Document Image Classification [1.1470070927586016]
We design a self-attention-based fusion module that serves as a block in our ensemble trainable network. It allows to simultaneously learn the discriminant features of image and text modalities throughout the training stage. This is the first time to leverage a mutual learning approach along with a self-attention-based fusion module to perform document image classification.
arXiv Detail & Related papers (2023-05-11T16:05:03Z)
Introspective Deep Metric Learning for Image Retrieval [80.29866561553483]
We argue that a good similarity model should consider the semantic discrepancies with caution to better deal with ambiguous images for more robust training. We propose to represent an image using not only a semantic embedding but also an accompanying uncertainty embedding, which describes the semantic characteristics and ambiguity of an image, respectively. The proposed IDML framework improves the performance of deep metric learning through uncertainty modeling and attains state-of-the-art results on the widely used CUB-200-2011, Cars196, and Stanford Online Products datasets.
arXiv Detail & Related papers (2022-05-09T17:51:44Z)
LEAD: Self-Supervised Landmark Estimation by Aligning Distributions of Feature Similarity [49.84167231111667]
Existing works in self-supervised landmark detection are based on learning dense (pixel-level) feature representations from an image. We introduce an approach to enhance the learning of dense equivariant representations in a self-supervised fashion. We show that having such a prior in the feature extractor helps in landmark detection, even under drastically limited number of annotations.
arXiv Detail & Related papers (2022-04-06T17:48:18Z)
Hybrid Optimized Deep Convolution Neural Network based Learning Model for Object Detection [0.0]
Object identification is one of the most fundamental and difficult issues in computer vision. In recent years, deep learning-based object detection techniques have grabbed the public's interest. In this study, a unique deep learning classification technique is used to create an autonomous object detecting system. The suggested framework has a detection accuracy of 0.9864, which is greater than current techniques.
arXiv Detail & Related papers (2022-03-02T04:39:37Z)
Contrastive Object Detection Using Knowledge Graph Embeddings [72.17159795485915]
We compare the error statistics of the class embeddings learned from a one-hot approach with semantically structured embeddings from natural language processing or knowledge graphs. We propose a knowledge-embedded design for keypoint-based and transformer-based object detection architectures.
arXiv Detail & Related papers (2021-12-21T17:10:21Z)
Learning Contrastive Representation for Semantic Correspondence [150.29135856909477]
We propose a multi-level contrastive learning approach for semantic matching. We show that image-level contrastive learning is a key component to encourage the convolutional features to find correspondence between similar objects.
arXiv Detail & Related papers (2021-09-22T18:34:14Z)
Deep Relational Metric Learning [84.95793654872399]
This paper presents a deep relational metric learning framework for image clustering and retrieval. We learn an ensemble of features that characterizes an image from different aspects to model both interclass and intraclass distributions. Experiments on the widely-used CUB-200-2011, Cars196, and Stanford Online Products datasets demonstrate that our framework improves existing deep metric learning methods and achieves very competitive results.
arXiv Detail & Related papers (2021-08-23T09:31:18Z)
Unifying Remote Sensing Image Retrieval and Classification with Robust Fine-tuning [3.6526118822907594]
We aim at unifying remote sensing image retrieval and classification with a new large-scale training and testing dataset, SF300. We show that our framework systematically achieves a boost of retrieval and classification performance on nine different datasets compared to an ImageNet pretrained baseline.
arXiv Detail & Related papers (2021-02-26T11:01:30Z)
Learning semantic Image attributes using Image recognition and knowledge graph embeddings [0.3222802562733786]
We propose a shared learning approach to learn semantic attributes of images by combining a knowledge graph embedding model with the recognized attributes of images. The proposed approach is a step towards bridging the gap between frameworks which learn from large amounts of data and frameworks which use a limited set of predicates to infer new knowledge.
arXiv Detail & Related papers (2020-09-12T15:18:48Z)
Distilling Localization for Self-Supervised Representation Learning [82.79808902674282]
Contrastive learning has revolutionized unsupervised representation learning. Current contrastive models are ineffective at localizing the foreground object. We propose a data-driven approach for learning in variance to backgrounds.
arXiv Detail & Related papers (2020-04-14T16:29:42Z)
DeepEMD: Differentiable Earth Mover's Distance for Few-Shot Learning [122.51237307910878]
We develop methods for few-shot image classification from a new perspective of optimal matching between image regions. We employ the Earth Mover's Distance (EMD) as a metric to compute a structural distance between dense image representations. To generate the important weights of elements in the formulation, we design a cross-reference mechanism.
arXiv Detail & Related papers (2020-03-15T08:13:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.