GeoMM: On Geodesic Perspective for Multi-modal Learning
- URL: http://arxiv.org/abs/2505.11216v1
- Date: Fri, 16 May 2025 13:12:41 GMT
- Title: GeoMM: On Geodesic Perspective for Multi-modal Learning
- Authors: Shibin Mei, Hang Wang, Bingbing Ni,
- Abstract summary: This paper introduces geodesic distance as a novel distance metric in multi-modal learning for the first time.<n>Our approach incorporates a comprehensive series of strategies to adapt geodesic distance for the current multimodal learning.
- Score: 55.41612200877861
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Geodesic distance serves as a reliable means of measuring distance in nonlinear spaces, and such nonlinear manifolds are prevalent in the current multimodal learning. In these scenarios, some samples may exhibit high similarity, yet they convey different semantics, making traditional distance metrics inadequate for distinguishing between positive and negative samples. This paper introduces geodesic distance as a novel distance metric in multi-modal learning for the first time, to mine correlations between samples, aiming to address the limitations of common distance metric. Our approach incorporates a comprehensive series of strategies to adapt geodesic distance for the current multimodal learning. Specifically, we construct a graph structure to represent the adjacency relationships among samples by thresholding distances between them and then apply the shortest-path algorithm to obtain geodesic distance within this graph. To facilitate efficient computation, we further propose a hierarchical graph structure through clustering and combined with incremental update strategies for dynamic status updates. Extensive experiments across various downstream tasks validate the effectiveness of our proposed method, demonstrating its capability to capture complex relationships between samples and improve the performance of multimodal learning models.
Related papers
- Long-Short Distance Graph Neural Networks and Improved Curriculum Learning for Emotion Recognition in Conversation [3.031375888004876]
This paper proposes a novel multimodal approach to Emotion Recognition in Conversation (ERC)<n>It constructs a long-distance graph neural network and a short-distance graph neural network to obtain multimodal features of distant and nearby utterances.<n> Experimental results on the IEMOCAP and MELD datasets demonstrate that our model outperforms existing benchmarks.
arXiv Detail & Related papers (2025-07-21T03:12:54Z) - RelCon: Relative Contrastive Learning for a Motion Foundation Model for Wearable Data [14.097517115921184]
We present RelCon, a novel self-supervised Relative Contrastive learning approach for training a motion foundation model from wearable accelerometry sensors.<n>First, a learnable distance measure is trained to capture motif similarity and domain-specific semantic information.<n>We are the first to show the generalizability of a foundation model with motion data from wearables across distinct evaluation tasks.
arXiv Detail & Related papers (2024-11-27T23:51:53Z) - Hierarchical Joint Graph Learning and Multivariate Time Series
Forecasting [0.16492989697868887]
We introduce a method of representing multivariate signals as nodes in a graph with edges indicating interdependency between them.
We leverage graph neural networks (GNN) and attention mechanisms to efficiently learn the underlying relationships within the time series data.
The effectiveness of our proposed model is evaluated across various real-world benchmark datasets designed for long-term forecasting tasks.
arXiv Detail & Related papers (2023-11-21T14:24:21Z) - Structured Optimal Variational Inference for Dynamic Latent Space Models [16.531262817315696]
We consider a latent space model for dynamic networks, where our objective is to estimate the pairwise inner products plus the intercept of the latent positions.
To balance posterior inference and computational scalability, we consider a structured mean-field variational inference framework.
arXiv Detail & Related papers (2022-09-29T22:10:42Z) - On the Versatile Uses of Partial Distance Correlation in Deep Learning [47.11577420740119]
This paper revisits a (less widely known) from statistics, called distance correlation (and its partial variant), designed to evaluate correlation between feature spaces of different dimensions.
We describe the steps necessary to carry out its deployment for large scale models.
This opens the door to a surprising array of applications ranging from conditioning one deep model w.r.t. another, learning disentangled representations as well as optimizing diverse models that would directly be more robust to adversarial attacks.
arXiv Detail & Related papers (2022-07-20T06:36:11Z) - Data-heterogeneity-aware Mixing for Decentralized Learning [63.83913592085953]
We characterize the dependence of convergence on the relationship between the mixing weights of the graph and the data heterogeneity across nodes.
We propose a metric that quantifies the ability of a graph to mix the current gradients.
Motivated by our analysis, we propose an approach that periodically and efficiently optimize the metric.
arXiv Detail & Related papers (2022-04-13T15:54:35Z) - Consistency and Diversity induced Human Motion Segmentation [231.36289425663702]
We propose a novel Consistency and Diversity induced human Motion (CDMS) algorithm.
Our model factorizes the source and target data into distinct multi-layer feature spaces.
A multi-mutual learning strategy is carried out to reduce the domain gap between the source and target data.
arXiv Detail & Related papers (2022-02-10T06:23:56Z) - Deep Relational Metric Learning [84.95793654872399]
This paper presents a deep relational metric learning framework for image clustering and retrieval.
We learn an ensemble of features that characterizes an image from different aspects to model both interclass and intraclass distributions.
Experiments on the widely-used CUB-200-2011, Cars196, and Stanford Online Products datasets demonstrate that our framework improves existing deep metric learning methods and achieves very competitive results.
arXiv Detail & Related papers (2021-08-23T09:31:18Z) - Multiple Object Tracking with Correlation Learning [16.959379957515974]
We propose to exploit the local correlation module to model the topological relationship between targets and their surrounding environment.
Specifically, we establish dense correspondences of each spatial location and its context, and explicitly constrain the correlation volumes through self-supervised learning.
Our approach demonstrates the effectiveness of correlation learning with the superior performance and obtains state-of-the-art MOTA of 76.5% and IDF1 of 73.6% on MOT17.
arXiv Detail & Related papers (2021-04-08T06:48:02Z) - GELATO: Geometrically Enriched Latent Model for Offline Reinforcement
Learning [54.291331971813364]
offline reinforcement learning approaches can be divided into proximal and uncertainty-aware methods.
In this work, we demonstrate the benefit of combining the two in a latent variational model.
Our proposed metrics measure both the quality of out of distribution samples as well as the discrepancy of examples in the data.
arXiv Detail & Related papers (2021-02-22T19:42:40Z) - MetricUNet: Synergistic Image- and Voxel-Level Learning for Precise CT
Prostate Segmentation via Online Sampling [66.01558025094333]
We propose a two-stage framework, with the first stage to quickly localize the prostate region and the second stage to precisely segment the prostate.
We introduce a novel online metric learning module through voxel-wise sampling in the multi-task network.
Our method can effectively learn more representative voxel-level features compared with the conventional learning methods with cross-entropy or Dice loss.
arXiv Detail & Related papers (2020-05-15T10:37:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.