Related papers: Explaining Neural Network Predictions on Sentence Pairs via Learning Word-Group Masks

Explaining Neural Network Predictions on Sentence Pairs via Learning Word-Group Masks

URL: http://arxiv.org/abs/2104.04488v2
Date: Tue, 13 Apr 2021 13:41:27 GMT
Title: Explaining Neural Network Predictions on Sentence Pairs via Learning Word-Group Masks
Authors: Hanjie Chen, Song Feng, Jatin Ganhotra, Hui Wan, Chulaka Gunasekara, Sachindra Joshi, Yangfeng Ji
Abstract summary: We propose the Group Mask (GMASK) method to implicitly detect word correlations by grouping correlated words from the input text pair together. The proposed method is evaluated with two different model architectures (decomposable attention model and BERT) across four datasets.
Score: 21.16662651409811
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Explaining neural network models is important for increasing their trustworthiness in real-world applications. Most existing methods generate post-hoc explanations for neural network models by identifying individual feature attributions or detecting interactions between adjacent features. However, for models with text pairs as inputs (e.g., paraphrase identification), existing methods are not sufficient to capture feature interactions between two texts and their simple extension of computing all word-pair interactions between two texts is computationally inefficient. In this work, we propose the Group Mask (GMASK) method to implicitly detect word correlations by grouping correlated words from the input text pair together and measure their contribution to the corresponding NLP tasks as a whole. The proposed method is evaluated with two different model architectures (decomposable attention model and BERT) across four datasets, including natural language inference and paraphrase identification tasks. Experiments show the effectiveness of GMASK in providing faithful explanations to these models.

Related papers

Using Shapley interactions to understand how models use structure [9.93791871599425]
We use Shapley Taylor interaction indices (STII) to examine how language and speech models internally relate and structure their inputs.<n>We relate the interaction patterns in models to three underlying linguistic structures: syntactic structure, non-compositional semantics, and phonetic coarticulation.
arXiv Detail & Related papers (2024-03-19T19:13:22Z)
FLIP: Fine-grained Alignment between ID-based Models and Pretrained Language Models for CTR Prediction [49.510163437116645]
Click-through rate (CTR) prediction plays as a core function module in personalized online services. Traditional ID-based models for CTR prediction take as inputs the one-hot encoded ID features of tabular modality. Pretrained Language Models(PLMs) has given rise to another paradigm, which takes as inputs the sentences of textual modality. We propose to conduct Fine-grained feature-level ALignment between ID-based Models and Pretrained Language Models(FLIP) for CTR prediction.
arXiv Detail & Related papers (2023-10-30T11:25:03Z)
Explaining Interactions Between Text Spans [50.70253702800355]
Reasoning over spans of tokens from different parts of the input is essential for natural language understanding. We introduce SpanEx, a dataset of human span interaction explanations for two NLU tasks: NLI and FC. We then investigate the decision-making processes of multiple fine-tuned large language models in terms of the employed connections between spans.
arXiv Detail & Related papers (2023-10-20T13:52:37Z)
KETM:A Knowledge-Enhanced Text Matching method [0.0]
We introduce a new model for text matching called the Knowledge Enhanced Text Matching model (KETM) We use Wiktionary to retrieve the text word definitions as our external knowledge. We fuse text and knowledge using a gating mechanism to learn the ratio of text and knowledge fusion.
arXiv Detail & Related papers (2023-08-11T17:08:14Z)
Co-Driven Recognition of Semantic Consistency via the Fusion of Transformer and HowNet Sememes Knowledge [6.184249194474601]
This paper proposes a co-driven semantic consistency recognition method based on the fusion of Transformer and HowNet sememes knowledge. BiLSTM is exploited to encode the conceptual semantic information and infer the semantic consistency.
arXiv Detail & Related papers (2023-02-21T09:53:19Z)
A Unified Understanding of Deep NLP Models for Text Classification [88.35418976241057]
We have developed a visual analysis tool, DeepNLPVis, to enable a unified understanding of NLP models for text classification. The key idea is a mutual information-based measure, which provides quantitative explanations on how each layer of a model maintains the information of input words in a sample. A multi-level visualization, which consists of a corpus-level, a sample-level, and a word-level visualization, supports the analysis from the overall training set to individual samples.
arXiv Detail & Related papers (2022-06-19T08:55:07Z)
Improving Multi-task Generalization Ability for Neural Text Matching via Prompt Learning [54.66399120084227]
Recent state-of-the-art neural text matching models (PLMs) are hard to generalize to different tasks. We adopt a specialization-generalization training strategy and refer to it as Match-Prompt. In specialization stage, descriptions of different matching tasks are mapped to only a few prompt tokens. In generalization stage, text matching model explores the essential matching signals by being trained on diverse multiple matching tasks.
arXiv Detail & Related papers (2022-04-06T11:01:08Z)
DEIM: An effective deep encoding and interaction model for sentence matching [0.0]
We propose a sentence matching method based on deep encoding and interaction to extract deep semantic information. In the encoder layer,we refer to the information of another sentence in the process of encoding a single sentence, and later use a algorithm to fuse the information. In the interaction layer, we use a bidirectional attention mechanism and a self-attention mechanism to obtain deep semantic information.
arXiv Detail & Related papers (2022-03-20T07:59:42Z)
Keyphrase Extraction with Dynamic Graph Convolutional Networks and Diversified Inference [50.768682650658384]
Keyphrase extraction (KE) aims to summarize a set of phrases that accurately express a concept or a topic covered in a given document. Recent Sequence-to-Sequence (Seq2Seq) based generative framework is widely used in KE task, and it has obtained competitive performance on various benchmarks. In this paper, we propose to adopt the Dynamic Graph Convolutional Networks (DGCN) to solve the above two problems simultaneously.
arXiv Detail & Related papers (2020-10-24T08:11:23Z)
A Graph-based Interactive Reasoning for Human-Object Interaction Detection [71.50535113279551]
We present a novel graph-based interactive reasoning model called Interactive Graph (abbr. in-Graph) to infer HOIs. We construct a new framework to assemble in-Graph models for detecting HOIs, namely in-GraphNet. Our framework is end-to-end trainable and free from costly annotations like human pose.
arXiv Detail & Related papers (2020-07-14T09:29:03Z)
Generating Hierarchical Explanations on Text Classification via Feature Interaction Detection [21.02924712220406]
We build hierarchical explanations by detecting feature interactions. Such explanations visualize how words and phrases are combined at different levels of the hierarchy. Experiments show the effectiveness of the proposed method in providing explanations both faithful to models and interpretable to humans.
arXiv Detail & Related papers (2020-04-04T20:56:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.