Related papers: Efficient Generalization via Multimodal Co-Training under Data Scarcity and Distribution Shift

Efficient Generalization via Multimodal Co-Training under Data Scarcity and Distribution Shift

URL: http://arxiv.org/abs/2510.07509v1
Date: Wed, 08 Oct 2025 20:13:17 GMT
Title: Efficient Generalization via Multimodal Co-Training under Data Scarcity and Distribution Shift
Authors: Tianyu Bell Pan, Damon L. Woodard,
Abstract summary: multimodal co-training is designed to enhance model generalization in situations where labeled data is limited.<n>We examine the theoretical foundations of this framework, deriving conditions under which the use of unlabeled data leads to significant improvements in generalization.<n>We establish a novel generalization bound that, for the first time in a multimodal co-training context, decomposes and quantifies the advantages gained from leveraging unlabeled multimodal data.
Score: 0.6331016589903705
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper explores a multimodal co-training framework designed to enhance model generalization in situations where labeled data is limited and distribution shifts occur. We thoroughly examine the theoretical foundations of this framework, deriving conditions under which the use of unlabeled data and the promotion of agreement between classifiers for different modalities lead to significant improvements in generalization. We also present a convergence analysis that confirms the effectiveness of iterative co-training in reducing classification errors. In addition, we establish a novel generalization bound that, for the first time in a multimodal co-training context, decomposes and quantifies the distinct advantages gained from leveraging unlabeled multimodal data, promoting inter-view agreement, and maintaining conditional view independence. Our findings highlight the practical benefits of multimodal co-training as a structured approach to developing data-efficient and robust AI systems that can effectively generalize in dynamic, real-world environments. The theoretical foundations are examined in dialogue with, and in advance of, established co-training principles.

Related papers

CoRe-Fed: Bridging Collaborative and Representation Fairness via Federated Embedding Distillation [12.707158627881968]
Federated Learning (FL) has emerged as a key approach to enable collaborative intelligence through decentralized model training.<n>We propose CoRe-Fed, a unified optimization framework that bridges collaborative and representation fairness.<n>We show that CoRe-Fed improves both fairness and model performance over the state-of-the-art baseline algorithms.
arXiv Detail & Related papers (2026-01-31T10:41:00Z)
Federated Attention: A Distributed Paradigm for Collaborative LLM Inference over Edge Networks [63.541114376141735]
Large language models (LLMs) are proliferating rapidly at the edge, delivering intelligent capabilities across diverse application scenarios.<n>However, their practical deployment in collaborative scenarios confronts fundamental challenges: privacy vulnerabilities, communication overhead, and computational bottlenecks.<n>We propose Federated Attention (FedAttn), which integrates the federated paradigm into the self-attention mechanism.
arXiv Detail & Related papers (2025-11-04T15:14:58Z)
Balanced Multimodal Learning via Mutual Information [1.9336815376402718]
We propose a novel unified framework designed to address modality imbalance by utilizing mutual information to quantify interactions between modalities.<n>Our approach adopts a balanced multimodal learning strategy comprising two key stages: cross-modal knowledge distillation (KD) and a multitask-like training paradigm.
arXiv Detail & Related papers (2025-11-02T15:58:05Z)
SheafAlign: A Sheaf-theoretic Framework for Decentralized Multimodal Alignment [23.996765202358223]
SheafAlign is a sheaf-theoretic framework for decentralized multimodal alignment.<n>SheafAlign overcomes the limitations of prior methods by not requiring mutual redundancy among all modalities.<n>Experiments on multimodal sensing datasets show superior zero-shot generalization, cross-modal alignment, and robustness to missing modalities.
arXiv Detail & Related papers (2025-10-23T13:27:24Z)
Distributional Vision-Language Alignment by Cauchy-Schwarz Divergence [83.15764564701706]
We propose a novel framework that performs vision-language alignment by integrating Cauchy-Schwarz divergence with mutual information.<n>We find that the CS divergence seamlessly addresses the InfoNCE's alignment-uniformity conflict and serves complementary roles with InfoNCE.<n> Experiments on text-to-image generation and cross-modality retrieval tasks demonstrate the effectiveness of our method on vision-language alignment.
arXiv Detail & Related papers (2025-02-24T10:29:15Z)
Towards the Generalization of Multi-view Learning: An Information-theoretical Analysis [28.009990407017618]
We develop information-theoretic generalization bounds for multi-view learning.<n>We derive novel data-dependent bounds under both leave-one-out and supersample settings.<n>In the interpolating regime, we further establish the fast-rate bound for multi-view learning.
arXiv Detail & Related papers (2025-01-28T07:47:19Z)
A Statistical Theory of Contrastive Pre-training and Multimodal Generative AI [18.974297347310287]
Multi-modal generative AI systems rely on contrastive pre-training to learn representations across different modalities.<n>This paper develops a theoretical framework to explain the success of contrastive pre-training in downstream tasks.
arXiv Detail & Related papers (2025-01-08T17:47:06Z)
Deriving Causal Order from Single-Variable Interventions: Guarantees & Algorithm [14.980926991441345]
We show that datasets containing interventional data can be effectively extracted under realistic assumptions about the data distribution.<n>We introduce a novel variant of interventional faithfulness, which relies on comparisons between the marginal distributions of each variable across observational and interventional settings.<n>We also introduce Intersort, an algorithm designed to infer the causal order from datasets containing large numbers of single-variable interventions.
arXiv Detail & Related papers (2024-05-28T16:07:17Z)
Unifying Perspectives: Plausible Counterfactual Explanations on Global, Group-wise, and Local Levels [2.494108084558292]
We propose a gradient-based optimization method for differentiable models that generates Counterfactual Explanations in a unified manner.<n>We especially enhance GWCF generation by combining instance grouping and counterfactual generation into a single efficient process.<n>Our results demonstrate the method's effectiveness in balancing validity, proximity, and plausibility while optimizing group granularity.
arXiv Detail & Related papers (2024-05-27T20:32:09Z)
Generalizable Heterogeneous Federated Cross-Correlation and Instance Similarity Learning [60.058083574671834]
This paper presents a novel FCCL+, federated correlation and similarity learning with non-target distillation. For heterogeneous issue, we leverage irrelevant unlabeled public data for communication. For catastrophic forgetting in local updating stage, FCCL+ introduces Federated Non Target Distillation.
arXiv Detail & Related papers (2023-09-28T09:32:27Z)
Enhancing multimodal cooperation via sample-level modality valuation [10.677997431505815]
We introduce a sample-level modality valuation metric to evaluate the contribution of each modality for each sample. Via modality valuation we observe that modality discrepancy indeed could be different at sample-level beyond the global contribution discrepancy at dataset-level. Our methods reasonably observe the fine-grained uni-modal contribution and achieve considerable improvement.
arXiv Detail & Related papers (2023-09-12T14:16:34Z)
Networked Communication for Decentralised Agents in Mean-Field Games [59.01527054553122]
We introduce networked communication to the mean-field game framework.<n>We prove that our architecture has sample guarantees bounded between those of the centralised- and independent-learning cases.<n>We show that our networked approach has significant advantages over both alternatives in terms of robustness to update failures and to changes in population size.
arXiv Detail & Related papers (2023-06-05T10:45:39Z)
Synergies between Disentanglement and Sparsity: Generalization and Identifiability in Multi-Task Learning [79.83792914684985]
We prove a new identifiability result that provides conditions under which maximally sparse base-predictors yield disentangled representations. Motivated by this theoretical result, we propose a practical approach to learn disentangled representations based on a sparsity-promoting bi-level optimization problem.
arXiv Detail & Related papers (2022-11-26T21:02:09Z)
Modeling Multiple Views via Implicitly Preserving Global Consistency and Local Complementarity [61.05259660910437]
We propose a global consistency and complementarity network (CoCoNet) to learn representations from multiple views. On the global stage, we reckon that the crucial knowledge is implicitly shared among views, and enhancing the encoder to capture such knowledge can improve the discriminability of the learned representations. Lastly on the local stage, we propose a complementarity-factor, which joints cross-view discriminative knowledge, and it guides the encoders to learn not only view-wise discriminability but also cross-view complementary information.
arXiv Detail & Related papers (2022-09-16T09:24:00Z)
Edge-assisted Democratized Learning Towards Federated Analytics [67.44078999945722]
We show the hierarchical learning structure of the proposed edge-assisted democratized learning mechanism, namely Edge-DemLearn. We also validate Edge-DemLearn as a flexible model training mechanism to build a distributed control and aggregation methodology in regions.
arXiv Detail & Related papers (2020-12-01T11:46:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.