Characterizing and overcoming the greedy nature of learning in
multi-modal deep neural networks
- URL: http://arxiv.org/abs/2202.05306v1
- Date: Thu, 10 Feb 2022 20:11:21 GMT
- Title: Characterizing and overcoming the greedy nature of learning in
multi-modal deep neural networks
- Authors: Nan Wu, Stanis{\l}aw Jastrz\k{e}bski, Kyunghyun Cho, Krzysztof J.
Geras
- Abstract summary: We show that due to the greedy nature of learning in deep neural networks, models tend to rely on just one modality while under-fitting the other modalities.
We propose an algorithm to balance the conditional learning speeds between modalities during training and demonstrate that it indeed addresses the issue of greedy learning.
- Score: 62.48782506095565
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We hypothesize that due to the greedy nature of learning in multi-modal deep
neural networks, these models tend to rely on just one modality while
under-fitting the other modalities. Such behavior is counter-intuitive and
hurts the models' generalization, as we observe empirically. To estimate the
model's dependence on each modality, we compute the gain on the accuracy when
the model has access to it in addition to another modality. We refer to this
gain as the conditional utilization rate. In the experiments, we consistently
observe an imbalance in conditional utilization rates between modalities,
across multiple tasks and architectures. Since conditional utilization rate
cannot be computed efficiently during training, we introduce a proxy for it
based on the pace at which the model learns from each modality, which we refer
to as the conditional learning speed. We propose an algorithm to balance the
conditional learning speeds between modalities during training and demonstrate
that it indeed addresses the issue of greedy learning. The proposed algorithm
improves the model's generalization on three datasets: Colored MNIST, Princeton
ModelNet40, and NVIDIA Dynamic Hand Gesture.
Related papers
- Dynamic Post-Hoc Neural Ensemblers [55.15643209328513]
In this study, we explore employing neural networks as ensemble methods.
Motivated by the risk of learning low-diversity ensembles, we propose regularizing the model by randomly dropping base model predictions.
We demonstrate this approach lower bounds the diversity within the ensemble, reducing overfitting and improving generalization capabilities.
arXiv Detail & Related papers (2024-10-06T15:25:39Z) - Combating Missing Modalities in Egocentric Videos at Test Time [92.38662956154256]
Real-world applications often face challenges with incomplete modalities due to privacy concerns, efficiency needs, or hardware issues.
We propose a novel approach to address this issue at test time without requiring retraining.
MiDl represents the first self-supervised, online solution for handling missing modalities exclusively at test time.
arXiv Detail & Related papers (2024-04-23T16:01:33Z) - Discrete Neural Algorithmic Reasoning [18.497863598167257]
We propose to force neural reasoners to maintain the execution trajectory as a combination of finite predefined states.
trained with supervision on the algorithm's state transitions, such models are able to perfectly align with the original algorithm.
arXiv Detail & Related papers (2024-02-18T16:03:04Z) - A Dynamical Model of Neural Scaling Laws [79.59705237659547]
We analyze a random feature model trained with gradient descent as a solvable model of network training and generalization.
Our theory shows how the gap between training and test loss can gradually build up over time due to repeated reuse of data.
arXiv Detail & Related papers (2024-02-02T01:41:38Z) - Dynamic Tensor Decomposition via Neural Diffusion-Reaction Processes [24.723536390322582]
tensor decomposition is an important tool for multiway data analysis.
We propose Dynamic EMbedIngs fOr dynamic algorithm dEcomposition (DEMOTE)
We show the advantage of our approach in both simulation study and real-world applications.
arXiv Detail & Related papers (2023-10-30T15:49:45Z) - Learning Unseen Modality Interaction [54.23533023883659]
Multimodal learning assumes all modality combinations of interest are available during training to learn cross-modal correspondences.
We pose the problem of unseen modality interaction and introduce a first solution.
It exploits a module that projects the multidimensional features of different modalities into a common space with rich information preserved.
arXiv Detail & Related papers (2023-06-22T10:53:10Z) - The Underlying Correlated Dynamics in Neural Training [6.385006149689549]
Training of neural networks is a computationally intensive task.
We propose a model based on the correlation of the parameters' dynamics, which dramatically reduces the dimensionality.
This representation enhances the understanding of the underlying training dynamics and can pave the way for designing better acceleration techniques.
arXiv Detail & Related papers (2022-12-18T08:34:11Z) - On Modality Bias Recognition and Reduction [70.69194431713825]
We study the modality bias problem in the context of multi-modal classification.
We propose a plug-and-play loss function method, whereby the feature space for each label is adaptively learned.
Our method yields remarkable performance improvements compared with the baselines.
arXiv Detail & Related papers (2022-02-25T13:47:09Z) - A Theory of Universal Learning [26.51949485387526]
We show that there are only three possible rates of universal learning.
We show that the learning curves of any given concept class decay either at an exponential, or arbitrarily slow rates.
arXiv Detail & Related papers (2020-11-09T15:10:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.