Related papers: Neural Interpretable Reasoning

Neural Interpretable Reasoning

URL: http://arxiv.org/abs/2502.11639v1
Date: Mon, 17 Feb 2025 10:33:24 GMT
Title: Neural Interpretable Reasoning
Authors: Pietro Barbiero, Giuseppe Marra, Gabriele Ciravegna, David Debot, Francesco De Santis, Michelangelo Diligenti, Mateo Espinosa Zarlenga, Francesco Giannini,
Abstract summary: We formalize a novel modeling framework for achieving interpretability in deep learning. We show that this complexity can be mitigated by treating interpretability as a Markovian property. We propose a new modeling paradigm -- neural generation and interpretable execution.
Score: 12.106771300842945
License:
Abstract: We formalize a novel modeling framework for achieving interpretability in deep learning, anchored in the principle of inference equivariance. While the direct verification of interpretability scales exponentially with the number of variables of the system, we show that this complexity can be mitigated by treating interpretability as a Markovian property and employing neural re-parametrization techniques. Building on these insights, we propose a new modeling paradigm -- neural generation and interpretable execution -- that enables scalable verification of equivariance. This paradigm provides a general approach for designing Neural Interpretable Reasoners that are not only expressive but also transparent.

Related papers

Enhancing Performance of Explainable AI Models with Constrained Concept Refinement [10.241134756773228]
Trade-off between accuracy and interpretability has long been a challenge in machine learning (ML) In this paper, we investigate the impact of deviations in concept representations and propose a novel framework to mitigate these effects. Compared to existing explainable methods, our approach not only improves prediction accuracy while preserving model interpretability across various large-scale benchmarks but also achieves this with significantly lower computational cost.
arXiv Detail & Related papers (2025-02-10T18:53:15Z)
Toward Understanding In-context vs. In-weight Learning [50.24035812301655]
We identify simplified distributional properties that give rise to the emergence and disappearance of in-context learning. We then extend the study to a full large language model, showing how fine-tuning on various collections of natural language prompts can elicit similar in-context and in-weight learning behaviour.
arXiv Detail & Related papers (2024-10-30T14:09:00Z)
Flow Factorized Representation Learning [109.51947536586677]
We introduce a generative model which specifies a distinct set of latent probability paths that define different input transformations. We show that our model achieves higher likelihoods on standard representation learning benchmarks while simultaneously being closer to approximately equivariant models.
arXiv Detail & Related papers (2023-09-22T20:15:37Z)
Adversarial Attacks on the Interpretation of Neuron Activation Maximization [70.5472799454224]
Activation-maximization approaches are used to interpret and analyze trained deep-learning models. In this work, we consider the concept of an adversary manipulating a model for the purpose of deceiving the interpretation.
arXiv Detail & Related papers (2023-06-12T19:54:33Z)
Evaluating the Robustness of Interpretability Methods through Explanation Invariance and Equivariance [72.50214227616728]
Interpretability methods are valuable only if their explanations faithfully describe the explained model. We consider neural networks whose predictions are invariant under a specific symmetry group.
arXiv Detail & Related papers (2023-04-13T17:59:03Z)
Learning Disentangled Representations for Natural Language Definitions [0.0]
We argue that recurrent syntactic and semantic regularities in textual data can be used to provide the models with both structural biases and generative factors. We leverage the semantic structures present in a representative and semantically dense category of sentence types, definitional sentences, for training a Variational Autoencoder to learn disentangled representations.
arXiv Detail & Related papers (2022-09-22T14:31:55Z)
Neuro-symbolic Natural Logic with Introspective Revision for Natural Language Inference [17.636872632724582]
We introduce a neuro-symbolic natural logic framework based on reinforcement learning with introspective revision. The proposed model has built-in interpretability and shows superior capability in monotonicity inference, systematic generalization, and interpretability.
arXiv Detail & Related papers (2022-03-09T16:31:58Z)
Dynamic Inference with Neural Interpreters [72.90231306252007]
We present Neural Interpreters, an architecture that factorizes inference in a self-attention network as a system of modules. inputs to the model are routed through a sequence of functions in a way that is end-to-end learned. We show that Neural Interpreters perform on par with the vision transformer using fewer parameters, while being transferrable to a new task in a sample efficient manner.
arXiv Detail & Related papers (2021-10-12T23:22:45Z)
Instance-Based Neural Dependency Parsing [56.63500180843504]
We develop neural models that possess an interpretable inference process for dependency parsing. Our models adopt instance-based inference, where dependency edges are extracted and labeled by comparing them to edges in a training set.
arXiv Detail & Related papers (2021-09-28T05:30:52Z)
On the Lack of Robust Interpretability of Neural Text Classifiers [14.685352584216757]
We assess the robustness of interpretations of neural text classifiers based on pretrained Transformer encoders. Both tests show surprising deviations from expected behavior, raising questions about the extent of insights that practitioners may draw from interpretations.
arXiv Detail & Related papers (2021-06-08T18:31:02Z)
Making Sense of CNNs: Interpreting Deep Representations & Their Invariances with INNs [19.398202091883366]
We present an approach based on INNs that (i) recovers the task-specific, learned invariances by disentangling the remaining factor of variation in the data and that (ii) invertibly transforms these invariances combined with the model representation into an equally expressive one with accessible semantic concepts. Our invertible approach significantly extends the abilities to understand black box models by enabling post-hoc interpretations of state-of-the-art networks without compromising their performance.
arXiv Detail & Related papers (2020-08-04T19:27:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.