A Semantic-Enhanced Heterogeneous Graph Learning Method for Flexible Objects Recognition
- URL: http://arxiv.org/abs/2503.22079v1
- Date: Fri, 28 Mar 2025 01:55:43 GMT
- Title: A Semantic-Enhanced Heterogeneous Graph Learning Method for Flexible Objects Recognition
- Authors: Kunshan Yang, Wenwei Luo, Yuguo Hu, Jiafu Yan, Mengmeng Jing, Lin Zuo,
- Abstract summary: We propose a semantic-enhanced heterogeneous graph learning method for flexible objects recognition.<n>An adaptive scanning module is employed to extract semantic context, facilitating the matching of flexible objects with varying shapes and sizes.<n>A graph generation module aggregates global visual and local semantic node features, improving the recognition of flexible objects.
- Score: 3.5624857747396814
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Flexible objects recognition remains a significant challenge due to its inherently diverse shapes and sizes, translucent attributes, and subtle inter-class differences. Graph-based models, such as graph convolution networks and graph vision models, are promising in flexible objects recognition due to their ability of capturing variable relations within the flexible objects. These methods, however, often focus on global visual relationships or fail to align semantic and visual information. To alleviate these limitations, we propose a semantic-enhanced heterogeneous graph learning method. First, an adaptive scanning module is employed to extract discriminative semantic context, facilitating the matching of flexible objects with varying shapes and sizes while aligning semantic and visual nodes to enhance cross-modal feature correlation. Second, a heterogeneous graph generation module aggregates global visual and local semantic node features, improving the recognition of flexible objects. Additionally, We introduce the FSCW, a large-scale flexible dataset curated from existing sources. We validate our method through extensive experiments on flexible datasets (FDA and FSCW), and challenge benchmarks (CIFAR-100 and ImageNet-Hard), demonstrating competitive performance.
Related papers
- Semantic-Aligned Learning with Collaborative Refinement for Unsupervised VI-ReID [82.12123628480371]
Unsupervised person re-identification (USL-VI-ReID) seeks to match pedestrian images of the same individual across different modalities without human annotations for model learning.
Previous methods unify pseudo-labels of cross-modality images through label association algorithms and then design contrastive learning framework for global feature learning.
We propose a Semantic-Aligned Learning with Collaborative Refinement (SALCR) framework, which builds up objective for specific fine-grained patterns emphasized by each modality.
arXiv Detail & Related papers (2025-04-27T13:58:12Z) - Semantic-Spatial Feature Fusion with Dynamic Graph Refinement for Remote Sensing Image Captioning [11.015244501780078]
This paper presents a semantic-spatial feature fusion with dynamic graph refinement (SFDR) method.
The proposed SFDR method significantly enhances the quality of the generated descriptions.
Experimental results on three benchmark datasets demonstrate the effectiveness of the proposed method.
arXiv Detail & Related papers (2025-03-30T14:14:41Z) - Improving vision-language alignment with graph spiking hybrid Networks [10.88584928028832]
This paper proposes a comprehensive visual semantic representation module, necessitating the utilization of panoptic segmentation to generate fine-grained semantic features.<n>We propose a novel Graph Spiking Hybrid Network (GSHN) that integrates the complementary advantages of Spiking Neural Networks (SNNs) and Graph Attention Networks (GATs) to encode visual semantic information.
arXiv Detail & Related papers (2025-01-31T11:55:17Z) - Web-Scale Visual Entity Recognition: An LLM-Driven Data Approach [56.55633052479446]
Web-scale visual entity recognition presents significant challenges due to the lack of clean, large-scale training data.
We propose a novel methodology to curate such a dataset, leveraging a multimodal large language model (LLM) for label verification, metadata generation, and rationale explanation.
Experiments demonstrate that models trained on this automatically curated data achieve state-of-the-art performance on web-scale visual entity recognition tasks.
arXiv Detail & Related papers (2024-10-31T06:55:24Z) - A Simple Background Augmentation Method for Object Detection with Diffusion Model [53.32935683257045]
In computer vision, it is well-known that a lack of data diversity will impair model performance.
We propose a simple yet effective data augmentation approach by leveraging advancements in generative models.
Background augmentation, in particular, significantly improves the models' robustness and generalization capabilities.
arXiv Detail & Related papers (2024-08-01T07:40:00Z) - Flexible ViG: Learning the Self-Saliency for Flexible Object Recognition [3.5624857747396814]
Recognizing flexible objects poses significant challenges due to their inherently diverse shapes and sizes, translucent attributes, ambiguous boundaries, and subtle inter-class differences.
We propose the Flexible Vision Graph Neural Network (FViG) to optimize the self-saliency and thereby improve the discrimination of the representations for flexible objects.
arXiv Detail & Related papers (2024-06-06T08:55:06Z) - AMOSL: Adaptive Modality-wise Structure Learning in Multi-view Graph Neural Networks For Enhanced Unified Representation [22.84527318463151]
Multi-view Graph Neural Networks (MVGNNs) excel at leveraging diverse modalities for learning object representation.
Existing methods assume identical local topology structures across modalities that overlook real-world discrepancies.
We propose adaptive modality-wise structure learning (AMoSL) to address these issues.
arXiv Detail & Related papers (2024-06-04T14:24:30Z) - Visual Commonsense based Heterogeneous Graph Contrastive Learning [79.22206720896664]
We propose a heterogeneous graph contrastive learning method to better finish the visual reasoning task.
Our method is designed as a plug-and-play way, so that it can be quickly and easily combined with a wide range of representative methods.
arXiv Detail & Related papers (2023-11-11T12:01:18Z) - Shape-Erased Feature Learning for Visible-Infrared Person
Re-Identification [90.39454748065558]
Body shape is one of the significant modality-shared cues for VI-ReID.
We propose shape-erased feature learning paradigm that decorrelates modality-shared features in two subspaces.
Experiments on SYSU-MM01, RegDB, and HITSZ-VCM datasets demonstrate the effectiveness of our method.
arXiv Detail & Related papers (2023-04-09T10:22:10Z) - Learning the Implicit Semantic Representation on Graph-Structured Data [57.670106959061634]
Existing representation learning methods in graph convolutional networks are mainly designed by describing the neighborhood of each node as a perceptual whole.
We propose a Semantic Graph Convolutional Networks (SGCN) that explores the implicit semantics by learning latent semantic-paths in graphs.
arXiv Detail & Related papers (2021-01-16T16:18:43Z) - Tensor Graph Convolutional Networks for Multi-relational and Robust
Learning [74.05478502080658]
This paper introduces a tensor-graph convolutional network (TGCN) for scalable semi-supervised learning (SSL) from data associated with a collection of graphs, that are represented by a tensor.
The proposed architecture achieves markedly improved performance relative to standard GCNs, copes with state-of-the-art adversarial attacks, and leads to remarkable SSL performance over protein-to-protein interaction networks.
arXiv Detail & Related papers (2020-03-15T02:33:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.