Related papers: Local Interpretations for Explainable Natural Language Processing: A Survey

Local Interpretations for Explainable Natural Language Processing: A Survey

URL: http://arxiv.org/abs/2103.11072v3
Date: Mon, 18 Mar 2024 08:29:49 GMT
Title: Local Interpretations for Explainable Natural Language Processing: A Survey
Authors: Siwen Luo, Hamish Ivison, Caren Han, Josiah Poon,
Abstract summary: This work investigates various methods to improve the interpretability of deep neural networks for Natural Language Processing (NLP) tasks. We provide a comprehensive discussion on the definition of the term interpretability and its various aspects at the beginning of this work.
Score: 5.717407321642629
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: As the use of deep learning techniques has grown across various fields over the past decade, complaints about the opaqueness of the black-box models have increased, resulting in an increased focus on transparency in deep learning models. This work investigates various methods to improve the interpretability of deep neural networks for Natural Language Processing (NLP) tasks, including machine translation and sentiment analysis. We provide a comprehensive discussion on the definition of the term interpretability and its various aspects at the beginning of this work. The methods collected and summarised in this survey are only associated with local interpretation and are specifically divided into three categories: 1) interpreting the model's predictions through related input features; 2) interpreting through natural language explanation; 3) probing the hidden states of models and word representations.

Related papers

Decoding Diffusion: A Scalable Framework for Unsupervised Analysis of Latent Space Biases and Representations Using Natural Language Prompts [68.48103545146127]
This paper proposes a novel framework for unsupervised exploration of diffusion latent spaces. We directly leverage natural language prompts and image captions to map latent directions. Our method provides a more scalable and interpretable understanding of the semantic knowledge encoded within diffusion models.
arXiv Detail & Related papers (2024-10-25T21:44:51Z)
Explaining Text Similarity in Transformer Models [52.571158418102584]
Recent advances in explainable AI have made it possible to mitigate limitations by leveraging improved explanations for Transformers. We use BiLRP, an extension developed for computing second-order explanations in bilinear similarity models, to investigate which feature interactions drive similarity in NLP models. Our findings contribute to a deeper understanding of different semantic similarity tasks and models, highlighting how novel explainable AI methods enable in-depth analyses and corpus-level insights.
arXiv Detail & Related papers (2024-05-10T17:11:31Z)
Pixel Sentence Representation Learning [67.4775296225521]
In this work, we conceptualize the learning of sentence-level textual semantics as a visual representation learning process. We employ visually-grounded text perturbation methods like typos and word order shuffling, resonating with human cognitive patterns, and enabling perturbation to be perceived as continuous. Our approach is further bolstered by large-scale unsupervised topical alignment training and natural language inference supervision.
arXiv Detail & Related papers (2024-02-13T02:46:45Z)
Interpreting Pretrained Language Models via Concept Bottlenecks [55.47515772358389]
Pretrained language models (PLMs) have made significant strides in various natural language processing tasks. The lack of interpretability due to their black-box'' nature poses challenges for responsible implementation. We propose a novel approach to interpreting PLMs by employing high-level, meaningful concepts that are easily understandable for humans.
arXiv Detail & Related papers (2023-11-08T20:41:18Z)
FICNN: A Framework for the Interpretation of Deep Convolutional Neural Networks [0.0]
The aim of this paper is to propose a framework for the study of interpretation methods designed for CNN models trained from visual data. Our framework highlights that just a very small amount of the suggested factors, and combinations thereof, have been actually studied.
arXiv Detail & Related papers (2023-05-17T10:59:55Z)
Explainability of Text Processing and Retrieval Methods: A Critical Survey [1.5320737596132752]
This article provides a broad overview of research on the explainability and interpretability of natural language processing and information retrieval methods. More specifically, we survey approaches that have been applied to explain word embeddings, sequence modeling, attention modules, transformers, BERT, and document ranking.
arXiv Detail & Related papers (2022-12-14T09:25:49Z)
Learnable Visual Words for Interpretable Image Recognition [70.85686267987744]
We propose the Learnable Visual Words (LVW) to interpret the model prediction behaviors with two novel modules. The semantic visual words learning relaxes the category-specific constraint, enabling the general visual words shared across different categories. Our experiments on six visual benchmarks demonstrate the superior effectiveness of our proposed LVW in both accuracy and model interpretation.
arXiv Detail & Related papers (2022-05-22T03:24:45Z)
Schr\"odinger's Tree -- On Syntax and Neural Language Models [10.296219074343785]
Language models have emerged as NLP's workhorse, displaying increasingly fluent generation capabilities. We observe a lack of clarity across numerous dimensions, which influences the hypotheses that researchers form. We outline the implications of the different types of research questions exhibited in studies on syntax.
arXiv Detail & Related papers (2021-10-17T18:25:23Z)
On the Faithfulness Measurements for Model Interpretations [100.2730234575114]
Post-hoc interpretations aim to uncover how natural language processing (NLP) models make predictions. To tackle these issues, we start with three criteria: the removal-based criterion, the sensitivity of interpretations, and the stability of interpretations. Motivated by the desideratum of these faithfulness notions, we introduce a new class of interpretation methods that adopt techniques from the adversarial domain.
arXiv Detail & Related papers (2021-04-18T09:19:44Z)
Interpretable Deep Learning: Interpretations, Interpretability, Trustworthiness, and Beyond [49.93153180169685]
We introduce and clarify two basic concepts-interpretations and interpretability-that people usually get confused. We elaborate the design of several recent interpretation algorithms, from different perspectives, through proposing a new taxonomy. We summarize the existing work in evaluating models' interpretability using "trustworthy" interpretation algorithms.
arXiv Detail & Related papers (2021-03-19T08:40:30Z)
A Survey on Neural Network Interpretability [25.27545364222555]
interpretability is a desired property for deep networks to become powerful tools in other research fields. We propose a novel taxonomy organized along three dimensions: type of engagement (passive vs. active interpretation approaches), the type of explanation, and the focus.
arXiv Detail & Related papers (2020-12-28T15:09:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.