Integrating Large Pre-trained Models into Multimodal Named Entity
Recognition with Evidential Fusion
- URL: http://arxiv.org/abs/2306.16991v1
- Date: Thu, 29 Jun 2023 14:50:23 GMT
- Title: Integrating Large Pre-trained Models into Multimodal Named Entity
Recognition with Evidential Fusion
- Authors: Weide Liu, Xiaoyang Zhong, Jingwen Hou, Shaohua Li, Haozhe Huang and
Yuming Fang
- Abstract summary: We propose incorporating uncertainty estimation into the MNER task, producing trustworthy predictions.
Our proposed algorithm models the distribution of each modality as a Normal-inverse Gamma distribution, and fuses them into a unified distribution.
Experiments on two datasets demonstrate that our proposed method outperforms the baselines and achieves new state-of-the-art performance.
- Score: 31.234455370113075
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multimodal Named Entity Recognition (MNER) is a crucial task for information
extraction from social media platforms such as Twitter. Most current methods
rely on attention weights to extract information from both text and images but
are often unreliable and lack interpretability. To address this problem, we
propose incorporating uncertainty estimation into the MNER task, producing
trustworthy predictions. Our proposed algorithm models the distribution of each
modality as a Normal-inverse Gamma distribution, and fuses them into a unified
distribution with an evidential fusion mechanism, enabling hierarchical
characterization of uncertainties and promotion of prediction accuracy and
trustworthiness. Additionally, we explore the potential of pre-trained large
foundation models in MNER and propose an efficient fusion approach that
leverages their robust feature representations. Experiments on two datasets
demonstrate that our proposed method outperforms the baselines and achieves new
state-of-the-art performance.
Related papers
- Uncertainty Quantification via Hölder Divergence for Multi-View Representation Learning [18.419742575630217]
This paper introduces a novel algorithm based on H"older Divergence (HD) to enhance the reliability of multi-view learning.
Through the Dempster-Shafer theory, integration of uncertainty from different modalities, thereby generating a comprehensive result.
Mathematically, HD proves to better measure the distance'' between real data distribution and predictive distribution of the model.
arXiv Detail & Related papers (2024-10-29T04:29:44Z) - Efficient Distribution Matching of Representations via Noise-Injected Deep InfoMax [73.03684002513218]
We enhance Deep InfoMax (DIM) to enable automatic matching of learned representations to a selected prior distribution.
We show that such modification allows for learning uniformly and normally distributed representations.
The results indicate a moderate trade-off between the performance on the downstream tasks and quality of DM.
arXiv Detail & Related papers (2024-10-09T15:40:04Z) - Confidence-aware multi-modality learning for eye disease screening [58.861421804458395]
We propose a novel multi-modality evidential fusion pipeline for eye disease screening.
It provides a measure of confidence for each modality and elegantly integrates the multi-modality information.
Experimental results on both public and internal datasets demonstrate that our model excels in robustness.
arXiv Detail & Related papers (2024-05-28T13:27:30Z) - LoRA-Ensemble: Efficient Uncertainty Modelling for Self-attention Networks [52.46420522934253]
We introduce LoRA-Ensemble, a parameter-efficient deep ensemble method for self-attention networks.
By employing a single pre-trained self-attention network with weights shared across all members, we train member-specific low-rank matrices for the attention projections.
Our method exhibits superior calibration compared to explicit ensembles and achieves similar or better accuracy across various prediction tasks and datasets.
arXiv Detail & Related papers (2024-05-23T11:10:32Z) - Ensemble Modeling for Multimodal Visual Action Recognition [50.38638300332429]
We propose an ensemble modeling approach for multimodal action recognition.
We independently train individual modality models using a variant of focal loss tailored to handle the long-tailed distribution of the MECCANO [21] dataset.
arXiv Detail & Related papers (2023-08-10T08:43:20Z) - Learning Against Distributional Uncertainty: On the Trade-off Between
Robustness and Specificity [24.874664446700272]
This paper studies a new framework that unifies the three approaches and that addresses the two challenges mentioned above.
The properties (e.g., consistency and normalities), non-asymptotic properties (e.g., unbiasedness and error bound), and a Monte-Carlo-based solution method of the proposed model are studied.
arXiv Detail & Related papers (2023-01-31T11:33:18Z) - Transformer Uncertainty Estimation with Hierarchical Stochastic
Attention [8.95459272947319]
We propose a novel way to enable transformers to have the capability of uncertainty estimation.
This is achieved by learning a hierarchical self-attention that attends to values and a set of learnable centroids.
We empirically evaluate our model on two text classification tasks with both in-domain (ID) and out-of-domain (OOD) datasets.
arXiv Detail & Related papers (2021-12-27T16:43:31Z) - Trustworthy Multimodal Regression with Mixture of Normal-inverse Gamma
Distributions [91.63716984911278]
We introduce a novel Mixture of Normal-Inverse Gamma distributions (MoNIG) algorithm, which efficiently estimates uncertainty in principle for adaptive integration of different modalities and produces a trustworthy regression result.
Experimental results on both synthetic and different real-world data demonstrate the effectiveness and trustworthiness of our method on various multimodal regression tasks.
arXiv Detail & Related papers (2021-11-11T14:28:12Z) - Diversity inducing Information Bottleneck in Model Ensembles [73.80615604822435]
In this paper, we target the problem of generating effective ensembles of neural networks by encouraging diversity in prediction.
We explicitly optimize a diversity inducing adversarial loss for learning latent variables and thereby obtain diversity in the output predictions necessary for modeling multi-modal data.
Compared to the most competitive baselines, we show significant improvements in classification accuracy, under a shift in the data distribution.
arXiv Detail & Related papers (2020-03-10T03:10:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.