Related papers: Out-of-Distribution Detection with Attention Head Masking for Multimodal Document Classification

Out-of-Distribution Detection with Attention Head Masking for Multimodal Document Classification

URL: http://arxiv.org/abs/2408.11237v1
Date: Tue, 20 Aug 2024 23:30:00 GMT
Title: Out-of-Distribution Detection with Attention Head Masking for Multimodal Document Classification
Authors: Christos Constantinou, Georgios Ioannides, Aman Chadha, Aaron Elkins, Edwin Simpson,
Abstract summary: We propose a novel methodology termed as attention head masking (AHM) for multi-modal OOD tasks in document classification systems. Our empirical results demonstrate that the proposed AHM method outperforms all state-of-the-art approaches. To address the scarcity of high-quality publicly available document datasets, we introduce FinanceDocs, a new document AI dataset.
Score: 3.141006099594433
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Detecting out-of-distribution (OOD) data is crucial in machine learning applications to mitigate the risk of model overconfidence, thereby enhancing the reliability and safety of deployed systems. The majority of existing OOD detection methods predominantly address uni-modal inputs, such as images or texts. In the context of multi-modal documents, there is a notable lack of extensive research on the performance of these methods, which have primarily been developed with a focus on computer vision tasks. We propose a novel methodology termed as attention head masking (AHM) for multi-modal OOD tasks in document classification systems. Our empirical results demonstrate that the proposed AHM method outperforms all state-of-the-art approaches and significantly decreases the false positive rate (FPR) compared to existing solutions up to 7.5\%. This methodology generalizes well to multi-modal data, such as documents, where visual and textual information are modeled under the same Transformer architecture. To address the scarcity of high-quality publicly available document datasets and encourage further research on OOD detection for documents, we introduce FinanceDocs, a new document AI dataset. Our code and dataset are publicly available.

Related papers

DocVCE: Diffusion-based Visual Counterfactual Explanations for Document Image Classification [5.247930659596986]
We introduce generative document counterfactuals that provide meaningful insights into the model's decision-making through actionable explanations.<n>To the best of the authors' knowledge, this is the first work to explore generative counterfactual explanations in document image analysis.
arXiv Detail & Related papers (2025-08-06T09:15:32Z)
Document Attribution: Examining Citation Relationships using Large Language Models [62.46146670035751]
We propose a zero-shot approach that frames attribution as a straightforward textual entailment task.<n>We also explore the role of the attention mechanism in enhancing the attribution process.
arXiv Detail & Related papers (2025-05-09T04:40:11Z)
Benchmarking Vision Foundation Models for Input Monitoring in Autonomous Driving [7.064497253920508]
Vision Foundation Models (VFMs) as feature extractors and density modeling techniques are proposed. A comparison with state-of-the-art binary OOD classification methods reveals that VFM embeddings with density estimation outperform existing approaches in identifying OOD inputs. Our method detects high-risk inputs likely to cause errors in downstream tasks, thereby improving overall performance.
arXiv Detail & Related papers (2025-01-14T12:51:34Z)
What If the Input is Expanded in OOD Detection? [77.37433624869857]
Out-of-distribution (OOD) detection aims to identify OOD inputs from unknown classes. Various scoring functions are proposed to distinguish it from in-distribution (ID) data. We introduce a novel perspective, i.e., employing different common corruptions on the input space.
arXiv Detail & Related papers (2024-10-24T06:47:28Z)
MetaSumPerceiver: Multimodal Multi-Document Evidence Summarization for Fact-Checking [0.283600654802951]
We present a summarization model designed to generate claim-specific summaries useful for fact-checking from multimodal datasets. We introduce a dynamic perceiver-based model that can handle inputs from multiple modalities of arbitrary lengths. Our approach outperforms the SOTA approach by 4.6% in the claim verification task on the MOCHEG dataset.
arXiv Detail & Related papers (2024-07-18T01:33:20Z)
A Comprehensive Library for Benchmarking Multi-class Visual Anomaly Detection [52.228708947607636]
This paper introduces a comprehensive visual anomaly detection benchmark, ADer, which is a modular framework for new methods. The benchmark includes multiple datasets from industrial and medical domains, implementing fifteen state-of-the-art methods and nine comprehensive metrics. We objectively reveal the strengths and weaknesses of different methods and provide insights into the challenges and future directions of multi-class visual anomaly detection.
arXiv Detail & Related papers (2024-06-05T13:40:07Z)
Out-of-Distribution Detection Using Peer-Class Generated by Large Language Model [0.0]
Out-of-distribution (OOD) detection is a critical task to ensure the reliability and security of machine learning models. In this paper, a novel method called ODPC is proposed, in which specific prompts to generate OOD peer classes of ID semantics are designed by a large language model. Experiments on five benchmark datasets show that the method we propose can yield state-of-the-art results.
arXiv Detail & Related papers (2024-03-20T06:04:05Z)
Exploring Large Language Models for Multi-Modal Out-of-Distribution Detection [67.68030805755679]
Large language models (LLMs) encode a wealth of world knowledge and can be prompted to generate descriptive features for each class. In this paper, we propose to apply world knowledge to enhance OOD detection performance through selective generation from LLMs.
arXiv Detail & Related papers (2023-10-12T04:14:28Z)
Unleashing Mask: Explore the Intrinsic Out-of-Distribution Detection Capability [70.72426887518517]
Out-of-distribution (OOD) detection is an indispensable aspect of secure AI when deploying machine learning models in real-world applications. We propose a novel method, Unleashing Mask, which aims to restore the OOD discriminative capabilities of the well-trained model with ID data. Our method utilizes a mask to figure out the memorized atypical samples, and then finetune the model or prune it with the introduced mask to forget them.
arXiv Detail & Related papers (2023-06-06T14:23:34Z)
Do-GOOD: Towards Distribution Shift Evaluation for Pre-Trained Visual Document Understanding Models [68.12229916000584]
We develop an out-of-distribution (OOD) benchmark termed Do-GOOD for the fine-Grained analysis on Document image-related tasks. We then evaluate the robustness and perform a fine-grained analysis of 5 latest VDU pre-trained models and 2 typical OOD generalization algorithms.
arXiv Detail & Related papers (2023-06-05T06:50:42Z)
Better Understanding Differences in Attribution Methods via Systematic Evaluations [57.35035463793008]
Post-hoc attribution methods have been proposed to identify image regions most influential to the models' decisions. We propose three novel evaluation schemes to more reliably measure the faithfulness of those methods. We use these evaluation schemes to study strengths and shortcomings of some widely used attribution methods over a wide range of models.
arXiv Detail & Related papers (2023-03-21T14:24:58Z)
Enhancing Out-of-Distribution Detection in Natural Language Understanding via Implicit Layer Ensemble [22.643719584452455]
Out-of-distribution (OOD) detection aims to discern outliers from the intended data distribution. We propose a novel framework based on contrastive learning that encourages intermediate features to learn layer-specialized representations. Our approach is significantly more effective than other works.
arXiv Detail & Related papers (2022-10-20T06:05:58Z)
Igeood: An Information Geometry Approach to Out-of-Distribution Detection [35.04325145919005]
We introduce Igeood, an effective method for detecting out-of-distribution (OOD) samples. Igeood applies to any pre-trained neural network, works under various degrees of access to the machine learning model. We show that Igeood outperforms competing state-of-the-art methods on a variety of network architectures and datasets.
arXiv Detail & Related papers (2022-03-15T11:26:35Z)
ProtoInfoMax: Prototypical Networks with Mutual Information Maximization for Out-of-Domain Detection [19.61846393392849]
ProtoInfoMax is a new architecture that extends Prototypical Networks to simultaneously process In-Domain (ID) and OOD sentences. We show that our proposed method can substantially improve performance up to 20% for OOD detection in low resource settings.
arXiv Detail & Related papers (2021-08-27T11:55:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.