Training Multimodal Systems for Classification with Multiple Objectives
- URL: http://arxiv.org/abs/2008.11450v1
- Date: Wed, 26 Aug 2020 09:05:40 GMT
- Title: Training Multimodal Systems for Classification with Multiple Objectives
- Authors: Jason Armitage, Shramana Thakur, Rishi Tripathi, Jens Lehmann, and
Maria Maleshkova
- Abstract summary: Adapting architectures to learn from multiple modalities creates the potential to learn rich representations of the world.
Current multimodal systems only deliver marginal improvements on unimodal approaches.
This research introduces a second objective over the multimodal fusion process learned with variational inference.
- Score: 6.888664946634335
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We learn about the world from a diverse range of sensory information.
Automated systems lack this ability as investigation has centred on processing
information presented in a single form. Adapting architectures to learn from
multiple modalities creates the potential to learn rich representations of the
world - but current multimodal systems only deliver marginal improvements on
unimodal approaches. Neural networks learn sampling noise during training with
the result that performance on unseen data is degraded. This research
introduces a second objective over the multimodal fusion process learned with
variational inference. Regularisation methods are implemented in the inner
training loop to control variance and the modular structure stabilises
performance as additional neurons are added to layers. This framework is
evaluated on a multilabel classification task with textual and visual inputs to
demonstrate the potential for multiple objectives and probabilistic methods to
lower variance and improve generalisation.
Related papers
- A Classifier-Free Incremental Learning Framework for Scalable Medical Image Segmentation [6.591403935303867]
We introduce a novel segmentation paradigm enabling the segmentation of a variable number of classes within a single classifier-free network.
This network is trained using contrastive learning and produces discriminative feature representations that facilitate straightforward interpretation.
We demonstrate the flexibility of our method in handling varying class numbers within a unified network and its capacity for incremental learning.
arXiv Detail & Related papers (2024-05-25T19:05:07Z) - Beyond Unimodal Learning: The Importance of Integrating Multiple Modalities for Lifelong Learning [23.035725779568587]
We study the role and interactions of multiple modalities in mitigating forgetting in deep neural networks (DNNs)
Our findings demonstrate that leveraging multiple views and complementary information from multiple modalities enables the model to learn more accurate and robust representations.
We propose a method for integrating and aligning the information from different modalities by utilizing the relational structural similarities between the data points in each modality.
arXiv Detail & Related papers (2024-05-04T22:02:58Z) - Reinforcement Learning Based Multi-modal Feature Fusion Network for
Novel Class Discovery [47.28191501836041]
In this paper, we employ a Reinforcement Learning framework to simulate the cognitive processes of humans.
We also deploy a Member-to-Leader Multi-Agent framework to extract and fuse features from multi-modal information.
We demonstrate the performance of our approach in both the 3D and 2D domains by employing the OS-MN40, OS-MN40-Miss, and Cifar10 datasets.
arXiv Detail & Related papers (2023-08-26T07:55:32Z) - Learning Unseen Modality Interaction [54.23533023883659]
Multimodal learning assumes all modality combinations of interest are available during training to learn cross-modal correspondences.
We pose the problem of unseen modality interaction and introduce a first solution.
It exploits a module that projects the multidimensional features of different modalities into a common space with rich information preserved.
arXiv Detail & Related papers (2023-06-22T10:53:10Z) - i-Code: An Integrative and Composable Multimodal Learning Framework [99.56065789066027]
i-Code is a self-supervised pretraining framework where users may flexibly combine the modalities of vision, speech, and language into unified and general-purpose vector representations.
The entire system is pretrained end-to-end with new objectives including masked modality unit modeling and cross-modality contrastive learning.
Experimental results demonstrate how i-Code can outperform state-of-the-art techniques on five video understanding tasks and the GLUE NLP benchmark, improving by as much as 11%.
arXiv Detail & Related papers (2022-05-03T23:38:50Z) - Multi-view Information Bottleneck Without Variational Approximation [34.877573432746246]
We extend the information bottleneck principle to a supervised multi-view learning scenario.
We use the recently proposed matrix-based R'enyi's $alpha$-order entropy functional to optimize the resulting objective.
Empirical results in both synthetic and real-world datasets suggest that our method enjoys improved robustness to noise and redundant information in each view.
arXiv Detail & Related papers (2022-04-22T06:48:04Z) - Learning Prototype-oriented Set Representations for Meta-Learning [85.19407183975802]
Learning from set-structured data is a fundamental problem that has recently attracted increasing attention.
This paper provides a novel optimal transport based way to improve existing summary networks.
We further instantiate it to the cases of few-shot classification and implicit meta generative modeling.
arXiv Detail & Related papers (2021-10-18T09:49:05Z) - Multimodal Clustering Networks for Self-supervised Learning from
Unlabeled Videos [69.61522804742427]
This paper proposes a self-supervised training framework that learns a common multimodal embedding space.
We extend the concept of instance-level contrastive learning with a multimodal clustering step to capture semantic similarities across modalities.
The resulting embedding space enables retrieval of samples across all modalities, even from unseen datasets and different domains.
arXiv Detail & Related papers (2021-04-26T15:55:01Z) - Meta-learning One-class Classifiers with Eigenvalue Solvers for
Supervised Anomaly Detection [55.888835686183995]
We propose a neural network-based meta-learning method for supervised anomaly detection.
We experimentally demonstrate that the proposed method achieves better performance than existing anomaly detection and few-shot learning methods.
arXiv Detail & Related papers (2021-03-01T01:43:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.