Multimodal Fusion Interactions: A Study of Human and Automatic
Quantification
- URL: http://arxiv.org/abs/2306.04125v2
- Date: Mon, 30 Oct 2023 18:06:46 GMT
- Title: Multimodal Fusion Interactions: A Study of Human and Automatic
Quantification
- Authors: Paul Pu Liang, Yun Cheng, Ruslan Salakhutdinov, Louis-Philippe Morency
- Abstract summary: We study how humans annotate two categorizations of multimodal interactions.
We propose a method to automatically convert annotations of partial and counterfactual labels to information decomposition.
- Score: 116.55145773123132
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In order to perform multimodal fusion of heterogeneous signals, we need to
understand their interactions: how each modality individually provides
information useful for a task and how this information changes in the presence
of other modalities. In this paper, we perform a comparative study of how
humans annotate two categorizations of multimodal interactions: (1) partial
labels, where different annotators annotate the label given the first, second,
and both modalities, and (2) counterfactual labels, where the same annotator
annotates the label given the first modality before asking them to explicitly
reason about how their answer changes when given the second. We further propose
an alternative taxonomy based on (3) information decomposition, where
annotators annotate the degrees of redundancy: the extent to which modalities
individually and together give the same predictions, uniqueness: the extent to
which one modality enables a prediction that the other does not, and synergy:
the extent to which both modalities enable one to make a prediction that one
would not otherwise make using individual modalities. Through experiments and
annotations, we highlight several opportunities and limitations of each
approach and propose a method to automatically convert annotations of partial
and counterfactual labels to information decomposition, yielding an accurate
and efficient method for quantifying multimodal interactions.
Related papers
- Pseudo-Label Calibration Semi-supervised Multi-Modal Entity Alignment [7.147651976133246]
Multi-modal entity alignment (MMEA) aims to identify equivalent entities between two multi-modal knowledge graphs for integration.
We introduce a Pseudo-label Multimodal Entity Alignment (PCMEA) in a semi-supervised way.
We combine momentum-based contrastive learning to make full use of the labeled and unlabeled data, which improves the quality of pseudo-label and pulls aligned entities closer.
arXiv Detail & Related papers (2024-03-02T12:44:59Z) - Multimodal Learning Without Labeled Multimodal Data: Guarantees and Applications [90.6849884683226]
We study the challenge of interaction quantification in a semi-supervised setting with only labeled unimodal data.
Using a precise information-theoretic definition of interactions, our key contribution is the derivation of lower and upper bounds.
We show how these theoretical results can be used to estimate multimodal model performance, guide data collection, and select appropriate multimodal models for various tasks.
arXiv Detail & Related papers (2023-06-07T15:44:53Z) - Multi-modal Differentiable Unsupervised Feature Selection [5.314466196448187]
In multi-modal measurements, many observed variables in both modalities are often nuisance and do not carry information about the phenomenon of interest.
Here, we propose a multi-modal unsupervised feature selection framework: identifying informative variables based on coupled high-dimensional measurements.
We incorporate the scores with differentiable gates that mask nuisance features and enhance the accuracy of the structure captured by the graph Laplacian.
arXiv Detail & Related papers (2023-03-16T15:11:17Z) - Variational Distillation for Multi-View Learning [104.17551354374821]
We design several variational information bottlenecks to exploit two key characteristics for multi-view representation learning.
Under rigorously theoretical guarantee, our approach enables IB to grasp the intrinsic correlation between observations and semantic labels.
arXiv Detail & Related papers (2022-06-20T03:09:46Z) - High-Modality Multimodal Transformer: Quantifying Modality & Interaction
Heterogeneity for High-Modality Representation Learning [112.51498431119616]
This paper studies efficient representation learning for high-modality scenarios involving a large set of diverse modalities.
A single model, HighMMT, scales up to 10 modalities (text, image, audio, video, sensors, proprioception, speech, time-series, sets, and tables) and 15 tasks from 5 research areas.
arXiv Detail & Related papers (2022-03-02T18:56:20Z) - Single versus Multiple Annotation for Named Entity Recognition of
Mutations [4.213427823201119]
We address the impact of using a single annotator vs two annotators, in order to measure whether multiple annotators are required.
Once we evaluate the performance loss when using a single annotator, we apply different methods to sample the training data for second annotation.
We use held-out double-annotated data to build two scenarios with different types of rankings: similarity-based and confidence based.
We evaluate both approaches on: (i) their ability to identify training instances that are erroneous, and (ii) on Mutation NER performance for state-of-the-art
arXiv Detail & Related papers (2021-01-19T03:54:17Z) - Interactive Fusion of Multi-level Features for Compositional Activity
Recognition [100.75045558068874]
We present a novel framework that accomplishes this goal by interactive fusion.
We implement the framework in three steps, namely, positional-to-appearance feature extraction, semantic feature interaction, and semantic-to-positional prediction.
We evaluate our approach on two action recognition datasets, Something-Something and Charades.
arXiv Detail & Related papers (2020-12-10T14:17:18Z) - Self-Attention Attribution: Interpreting Information Interactions Inside
Transformer [89.21584915290319]
We propose a self-attention attribution method to interpret the information interactions inside Transformer.
We show that the attribution results can be used as adversarial patterns to implement non-targeted attacks towards BERT.
arXiv Detail & Related papers (2020-04-23T14:58:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.