Thermodynamics-inspired Explanations of Artificial Intelligence
- URL: http://arxiv.org/abs/2206.13475v3
- Date: Mon, 8 Apr 2024 23:41:21 GMT
- Title: Thermodynamics-inspired Explanations of Artificial Intelligence
- Authors: Shams Mehdi, Pratyush Tiwary,
- Abstract summary: We present Explainable Representations of AI and other black-box Paradigms (TERP)
TERP is a method for generating accurate, and human-interpretable explanations for black-box predictions in a model-agnostic manner.
To demonstrate the wide-ranging applicability of TERP, we successfully employ it to explain various black-box model architectures.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In recent years, predictive machine learning methods have gained prominence in various scientific domains. However, due to their black-box nature, it is essential to establish trust in these models before accepting them as accurate. One promising strategy for assigning trust involves employing explanation techniques that elucidate the rationale behind a black-box model's predictions in a manner that humans can understand. However, assessing the degree of human interpretability of the rationale generated by such methods is a nontrivial challenge. In this work, we introduce interpretation entropy as a universal solution for assessing the degree of human interpretability associated with any linear model. Using this concept and drawing inspiration from classical thermodynamics, we present Thermodynamics-inspired Explainable Representations of AI and other black-box Paradigms (TERP), a method for generating accurate, and human-interpretable explanations for black-box predictions in a model-agnostic manner. To demonstrate the wide-ranging applicability of TERP, we successfully employ it to explain various black-box model architectures, including deep learning Autoencoders, Recurrent Neural Networks, and Convolutional Neural Networks, across diverse domains such as molecular simulations, text, and image classification.
Related papers
- I Predict Therefore I Am: Is Next Token Prediction Enough to Learn Human-Interpretable Concepts from Data? [79.01538178959726]
Large language models (LLMs) have led many to conclude that they exhibit a form of intelligence.
We introduce a novel generative model that generates tokens on the basis of human interpretable concepts represented as latent discrete variables.
arXiv Detail & Related papers (2025-03-12T01:21:17Z) - Neural DNF-MT: A Neuro-symbolic Approach for Learning Interpretable and Editable Policies [51.03989561425833]
We propose a neuro-symbolic approach called neural DNF-MT for end-to-end policy learning.
The differentiable nature of the neural DNF-MT model enables the use of deep actor-critic algorithms for training.
We show how the bivalent representations of deterministic policies can be edited and incorporated back into a neural model.
arXiv Detail & Related papers (2025-01-07T15:51:49Z) - Learning Discrete Concepts in Latent Hierarchical Models [73.01229236386148]
Learning concepts from natural high-dimensional data holds potential in building human-aligned and interpretable machine learning models.
We formalize concepts as discrete latent causal variables that are related via a hierarchical causal model.
We substantiate our theoretical claims with synthetic data experiments.
arXiv Detail & Related papers (2024-06-01T18:01:03Z) - Interpretable Machine Learning for Weather and Climate Prediction: A Survey [24.028385794099435]
We review current interpretable machine learning approaches applied to meteorological predictions.
Design inherently interpretable models from scratch using architectures like tree ensembles and explainable neural networks.
We discuss research challenges around achieving deeper mechanistic interpretations aligned with physical principles.
arXiv Detail & Related papers (2024-03-24T14:23:35Z) - ContPhy: Continuum Physical Concept Learning and Reasoning from Videos [86.63174804149216]
ContPhy is a novel benchmark for assessing machine physical commonsense.
We evaluated a range of AI models and found that they still struggle to achieve satisfactory performance on ContPhy.
We also introduce an oracle model (ContPRO) that marries the particle-based physical dynamic models with the recent large language models.
arXiv Detail & Related papers (2024-02-09T01:09:21Z) - Manipulating Feature Visualizations with Gradient Slingshots [54.31109240020007]
We introduce a novel method for manipulating Feature Visualization (FV) without significantly impacting the model's decision-making process.
We evaluate the effectiveness of our method on several neural network models and demonstrate its capabilities to hide the functionality of arbitrarily chosen neurons.
arXiv Detail & Related papers (2024-01-11T18:57:17Z) - Interpreting Pretrained Language Models via Concept Bottlenecks [55.47515772358389]
Pretrained language models (PLMs) have made significant strides in various natural language processing tasks.
The lack of interpretability due to their black-box'' nature poses challenges for responsible implementation.
We propose a novel approach to interpreting PLMs by employing high-level, meaningful concepts that are easily understandable for humans.
arXiv Detail & Related papers (2023-11-08T20:41:18Z) - Concept backpropagation: An Explainable AI approach for visualising
learned concepts in neural network models [0.0]
We present an extension to the method of concept detection, named emphconcept backpropagation, which provides a way of analysing how the information representing a given concept is internalised in a given neural network model.
arXiv Detail & Related papers (2023-07-24T08:21:13Z) - On Modifying a Neural Network's Perception [3.42658286826597]
We propose a method which allows one to modify what an artificial neural network is perceiving regarding specific human-defined concepts.
We test the proposed method on different models, assessing whether the performed manipulations are well interpreted by the models, and analyzing how they react to them.
arXiv Detail & Related papers (2023-03-05T12:09:37Z) - Hessian-based toolbox for reliable and interpretable machine learning in
physics [58.720142291102135]
We present a toolbox for interpretability and reliability, extrapolation of the model architecture.
It provides a notion of the influence of the input data on the prediction at a given test point, an estimation of the uncertainty of the model predictions, and an agnostic score for the model predictions.
Our work opens the road to the systematic use of interpretability and reliability methods in ML applied to physics and, more generally, science.
arXiv Detail & Related papers (2021-08-04T16:32:59Z) - Formalising Concepts as Grounded Abstractions [68.24080871981869]
This report shows how representation learning can be used to induce concepts from raw data.
The main technical goal of this report is to show how techniques from representation learning can be married with a lattice-theoretic formulation of conceptual spaces.
arXiv Detail & Related papers (2021-01-13T15:22:01Z) - Physics-Constrained Predictive Molecular Latent Space Discovery with
Graph Scattering Variational Autoencoder [0.0]
We develop a molecular generative model based on variational inference and graph theory in the small data regime.
The model's performance is evaluated by generating molecules with desired target properties.
arXiv Detail & Related papers (2020-09-29T09:05:27Z) - Scientific Discovery by Generating Counterfactuals using Image
Translation [5.820406793815877]
We propose a framework to convert predictions from explanation techniques to a mechanism of discovery.
We show how generative models in combination with black-box predictors can be used to generate hypotheses that can be critically examined.
We demonstrate that the proposed framework is able to explain the underlying scientific mechanism, thus bridging the gap between the model's performance and human understanding.
arXiv Detail & Related papers (2020-07-10T17:25:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.