BrokenBind: Universal Modality Exploration beyond Dataset Boundaries
- URL: http://arxiv.org/abs/2602.06451v1
- Date: Fri, 06 Feb 2026 07:26:49 GMT
- Title: BrokenBind: Universal Modality Exploration beyond Dataset Boundaries
- Authors: Zhuo Huang, Runnan Chen, Bo Han, Gang Niu, Masashi Sugiyama, Tongliang Liu,
- Abstract summary: We introduce BrokenBind, which focuses on binding modalities that are presented from different datasets.<n>Under our framework, any two modalities can be bound together, free from the dataset limitation.
- Score: 112.81381711545043
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multi-modal learning combines various modalities to provide a comprehensive understanding of real-world problems. A common strategy is to directly bind different modalities together in a specific joint embedding space. However, the capability of existing methods is restricted within the modalities presented in the given dataset, thus they are biased when generalizing to unpresented modalities in downstream tasks. As a result, due to such inflexibility, the viability of previous methods is seriously hindered by the cost of acquiring multi-modal datasets. In this paper, we introduce BrokenBind, which focuses on binding modalities that are presented from different datasets. To achieve this, BrokenBind simultaneously leverages multiple datasets containing the modalities of interest and one shared modality. Though the two datasets do not correspond to each other due to distribution mismatch, we can capture their relationship to generate pseudo embeddings to fill in the missing modalities of interest, enabling flexible and generalized multi-modal learning. Under our framework, any two modalities can be bound together, free from the dataset limitation, to achieve universal modality exploration. Further, to reveal the capability of our method, we study intensified scenarios where more than two datasets are needed for modality binding and show the effectiveness of BrokenBind in low-data regimes. Through extensive evaluation, we carefully justify the superiority of BrokenBind compared to well-known multi-modal baseline methods.
Related papers
- Better Together: Leveraging Unpaired Multimodal Data for Stronger Unimodal Models [63.032359320629105]
We introduce: Unpaired Multimodal, a modality-agnostic training paradigm in which a single model alternately processes inputs from different modalities while sharing parameters across them.<n>We show that using unpaired data from auxiliary modalities consistently improves downstream performance across diverse unimodal targets such as image and audio.
arXiv Detail & Related papers (2025-10-09T17:32:23Z) - Maximal Matching Matters: Preventing Representation Collapse for Robust Cross-Modal Retrieval [0.5999777817331317]
Cross-modal image-text retrieval is challenging because of the diverse possible associations between content from different modalities.<n>Traditional methods learn a single-vector embedding to represent semantics of each sample.<n>Set-based approaches, which represent each sample with multiple embeddings, offer a promising alternative.
arXiv Detail & Related papers (2025-06-26T17:55:34Z) - I2MoE: Interpretable Multimodal Interaction-aware Mixture-of-Experts [33.97906750476949]
We propose I2MoE (Interpretable Multimodal Interaction-aware Mixture of Experts) to enhance modality fusion.<n>I2MoE explicitly modeling diverse multimodal interactions, as well as providing interpretation on a local and global level.<n>I2MoE is flexible enough to be combined with different fusion techniques, consistently improves task performance, and provides interpretation across various real-world scenarios.
arXiv Detail & Related papers (2025-05-25T15:34:29Z) - Anchors Aweigh! Sail for Optimal Unified Multi-Modal Representations [22.45586503859047]
A unified representation space in multi-modal learning is essential for effectively integrating diverse data sources.<n>Recent binding methods, such as ImageBind, typically rely on a single, fixed anchor modality for aligning multi-modal data.<n>We propose the need for adaptive anchor binding methods, exemplified by our framework CentroBind.
arXiv Detail & Related papers (2024-10-02T23:19:23Z) - Mutual Information-based Representations Disentanglement for Unaligned Multimodal Language Sequences [25.73415065546444]
Key challenge in unaligned multimodal language sequences is to integrate information from various modalities to obtain a refined multimodal joint representation.
We propose a Mutual Information-based Representations Disentanglement (MIRD) method for unaligned multimodal language sequences.
arXiv Detail & Related papers (2024-09-19T02:12:26Z) - Missing Modality Prediction for Unpaired Multimodal Learning via Joint Embedding of Unimodal Models [6.610033827647869]
In real-world scenarios, consistently acquiring complete multimodal data presents significant challenges.
This often leads to the issue of missing modalities, where data for certain modalities are absent.
We propose a novel framework integrating parameter-efficient fine-tuning of unimodal pretrained models with a self-supervised joint-embedding learning method.
arXiv Detail & Related papers (2024-07-17T14:44:25Z) - NativE: Multi-modal Knowledge Graph Completion in the Wild [51.80447197290866]
We propose a comprehensive framework NativE to achieve MMKGC in the wild.
NativE proposes a relation-guided dual adaptive fusion module that enables adaptive fusion for any modalities.
We construct a new benchmark called WildKGC with five datasets to evaluate our method.
arXiv Detail & Related papers (2024-03-28T03:04:00Z) - Learning Unseen Modality Interaction [54.23533023883659]
Multimodal learning assumes all modality combinations of interest are available during training to learn cross-modal correspondences.
We pose the problem of unseen modality interaction and introduce a first solution.
It exploits a module that projects the multidimensional features of different modalities into a common space with rich information preserved.
arXiv Detail & Related papers (2023-06-22T10:53:10Z) - Multimodal Learning Without Labeled Multimodal Data: Guarantees and Applications [90.6849884683226]
We study the challenge of interaction quantification in a semi-supervised setting with only labeled unimodal data.
Using a precise information-theoretic definition of interactions, our key contribution is the derivation of lower and upper bounds.
We show how these theoretical results can be used to estimate multimodal model performance, guide data collection, and select appropriate multimodal models for various tasks.
arXiv Detail & Related papers (2023-06-07T15:44:53Z) - Align and Attend: Multimodal Summarization with Dual Contrastive Losses [57.83012574678091]
The goal of multimodal summarization is to extract the most important information from different modalities to form output summaries.
Existing methods fail to leverage the temporal correspondence between different modalities and ignore the intrinsic correlation between different samples.
We introduce Align and Attend Multimodal Summarization (A2Summ), a unified multimodal transformer-based model which can effectively align and attend the multimodal input.
arXiv Detail & Related papers (2023-03-13T17:01:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.