Learnable Expansion of Graph Operators for Multi-Modal Feature Fusion
- URL: http://arxiv.org/abs/2410.01506v4
- Date: Fri, 28 Feb 2025 13:52:40 GMT
- Title: Learnable Expansion of Graph Operators for Multi-Modal Feature Fusion
- Authors: Dexuan Ding, Lei Wang, Liyun Zhu, Tom Gedeon, Piotr Koniusz,
- Abstract summary: In computer vision tasks, features often come from diverse representations, domains, and modalities.<n>We shift from high-language feature space to a lower-dimensional, interpretable graph space by constructing relationship graphs.<n>We demonstrate the effectiveness of our graph-based fusion method on video anomaly detection.
- Score: 32.09145985103859
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In computer vision tasks, features often come from diverse representations, domains (e.g., indoor and outdoor), and modalities (e.g., text, images, and videos). Effectively fusing these features is essential for robust performance, especially with the availability of powerful pre-trained models like vision-language models. However, common fusion methods, such as concatenation, element-wise operations, and non-linear techniques, often fail to capture structural relationships, deep feature interactions, and suffer from inefficiency or misalignment of features across domains or modalities. In this paper, we shift from high-dimensional feature space to a lower-dimensional, interpretable graph space by constructing relationship graphs that encode feature relationships at different levels, e.g., clip, frame, patch, token, etc. To capture deeper interactions, we expand graphs through iterative graph relationship updates and introduce a learnable graph fusion operator to integrate these expanded relationships for more effective fusion. Our approach is relationship-centric, operates in a homogeneous space, and is mathematically principled, resembling element-wise relationship score aggregation via multilinear polynomials. We demonstrate the effectiveness of our graph-based fusion method on video anomaly detection, showing strong performance across multi-representational, multi-modal, and multi-domain feature fusion tasks.
Related papers
- Multi-Relation Graph-Kernel Strengthen Network for Graph-Level Clustering [10.67474681549171]
We propose a novel Multi-Relation Graph- Kernel Strengthen Network for Graph-Level Clustering (MGSN)
MGSN constructs multi-relation graphs to capture diverse semantic relationships between nodes and graphs.
A relation-aware representation refinement strategy is designed, which adaptively aligns multi-relation information across views.
arXiv Detail & Related papers (2025-04-02T11:17:15Z) - KGIF: Optimizing Relation-Aware Recommendations with Knowledge Graph Information Fusion [16.971592142597544]
This study introduces a specialized framework designed to merge entity and relation embeddings explicitly through a tailored self-attention mechanism.
This explicit fusion enhances the interplay between user-item interactions and item-attribute relationships, providing a nuanced balance between user-centric and item-centric representations.
The contributions of this work include an innovative method for explicit information fusion, improved robustness for sparse knowledge graphs, and the ability to generate explainable recommendations through interpretable path visualization.
arXiv Detail & Related papers (2025-01-07T22:19:15Z) - From Primes to Paths: Enabling Fast Multi-Relational Graph Analysis [5.008498268411793]
Multi-relational networks capture intricate relationships in data and have diverse applications across fields such as biomedical, financial, and social sciences.
This work extends the Prime Adjacency Matrices framework, which employs prime numbers to represent distinct relations within a network uniquely.
arXiv Detail & Related papers (2024-11-17T18:43:01Z) - MS-IMAP -- A Multi-Scale Graph Embedding Approach for Interpretable Manifold Learning [1.8124328823188354]
This paper introduces a framework for multi-scale graph network embedding based on spectral graph wavelets.
We show that in Paley-Wiener spaces on graphs, the spectral graph wavelets operator provides greater flexibility and control over smoothness.
An additional key advantage of the proposed embedding is its ability to establish a correspondence between the embedding and input feature spaces.
arXiv Detail & Related papers (2024-06-04T20:48:33Z) - From Text to Pixels: A Context-Aware Semantic Synergy Solution for
Infrared and Visible Image Fusion [66.33467192279514]
We introduce a text-guided multi-modality image fusion method that leverages the high-level semantics from textual descriptions to integrate semantics from infrared and visible images.
Our method not only produces visually superior fusion results but also achieves a higher detection mAP over existing methods, achieving state-of-the-art results.
arXiv Detail & Related papers (2023-12-31T08:13:47Z) - Hierarchical Aggregations for High-Dimensional Multiplex Graph Embedding [7.271256448682229]
HMGE is a novel embedding method based on hierarchical aggregation for high-dimensional multiplex graphs.
We leverage mutual information between local patches and global summaries to train the model without supervision.
Detailed experiments on synthetic and real-world data illustrate the suitability of our approach to downstream supervised tasks.
arXiv Detail & Related papers (2023-12-28T05:39:33Z) - Efficient Graphics Representation with Differentiable Indirection [17.025494260380476]
We introduce differentiable indirection -- a novel learned primitive that employs differentiable multi-scale lookup tables.
In all cases, differentiable indirection seamlessly integrates into existing architectures, trains rapidly, and yields both versatile and efficient results.
arXiv Detail & Related papers (2023-09-12T16:05:45Z) - Multi-Spectral Image Stitching via Spatial Graph Reasoning [52.27796682972484]
We propose a spatial graph reasoning based multi-spectral image stitching method.
We embed multi-scale complementary features from the same view position into a set of nodes.
By introducing long-range coherence along spatial and channel dimensions, the complementarity of pixel relations and channel interdependencies aids in the reconstruction of aligned multi-view features.
arXiv Detail & Related papers (2023-07-31T15:04:52Z) - FMGNN: Fused Manifold Graph Neural Network [102.61136611255593]
Graph representation learning has been widely studied and demonstrated effectiveness in various graph tasks.
We propose the Fused Manifold Graph Neural Network (NN), a novel GNN architecture that embeds graphs into different Manifolds during training.
Our experiments demonstrate that NN yields superior performance over strong baselines on the benchmarks of node classification and link prediction tasks.
arXiv Detail & Related papers (2023-04-03T15:38:53Z) - Transformer-based Dual Relation Graph for Multi-label Image Recognition [56.12543717723385]
We propose a novel Transformer-based Dual Relation learning framework.
We explore two aspects of correlation, i.e., structural relation graph and semantic relation graph.
Our approach achieves new state-of-the-art on two popular multi-label recognition benchmarks.
arXiv Detail & Related papers (2021-10-10T07:14:52Z) - Inter-domain Multi-relational Link Prediction [19.094154079752123]
When related graphs coexist, it is of great benefit to build a larger graph via integrating the smaller ones.
The integration requires predicting hidden relational connections between entities belonged to different graphs.
We propose a new approach to tackle the inter-domain link prediction problem by softly aligning the entity distributions between different domains.
arXiv Detail & Related papers (2021-06-11T05:10:31Z) - Mutual Graph Learning for Camouflaged Object Detection [31.422775969808434]
A major challenge is that intrinsic similarities between foreground objects and background surroundings make the features extracted by deep model indistinguishable.
We design a novel Mutual Graph Learning model, which generalizes the idea of conventional mutual learning from regular grids to the graph domain.
In contrast to most mutual learning approaches that use a shared function to model all between-task interactions, MGL is equipped with typed functions for handling different complementary relations.
arXiv Detail & Related papers (2021-04-03T10:14:39Z) - Multi-view Graph Learning by Joint Modeling of Consistency and
Inconsistency [65.76554214664101]
Graph learning has emerged as a promising technique for multi-view clustering with its ability to learn a unified and robust graph from multiple views.
We propose a new multi-view graph learning framework, which for the first time simultaneously models multi-view consistency and multi-view inconsistency in a unified objective function.
Experiments on twelve multi-view datasets have demonstrated the robustness and efficiency of the proposed approach.
arXiv Detail & Related papers (2020-08-24T06:11:29Z) - GraphOpt: Learning Optimization Models of Graph Formation [72.75384705298303]
We propose an end-to-end framework that learns an implicit model of graph structure formation and discovers an underlying optimization mechanism.
The learned objective can serve as an explanation for the observed graph properties, thereby lending itself to transfer across different graphs within a domain.
GraphOpt poses link formation in graphs as a sequential decision-making process and solves it using maximum entropy inverse reinforcement learning algorithm.
arXiv Detail & Related papers (2020-07-07T16:51:39Z) - Tensor Graph Convolutional Networks for Multi-relational and Robust
Learning [74.05478502080658]
This paper introduces a tensor-graph convolutional network (TGCN) for scalable semi-supervised learning (SSL) from data associated with a collection of graphs, that are represented by a tensor.
The proposed architecture achieves markedly improved performance relative to standard GCNs, copes with state-of-the-art adversarial attacks, and leads to remarkable SSL performance over protein-to-protein interaction networks.
arXiv Detail & Related papers (2020-03-15T02:33:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.