SOInter: A Novel Deep Energy Based Interpretation Method for Explaining
Structured Output Models
- URL: http://arxiv.org/abs/2202.09914v1
- Date: Sun, 20 Feb 2022 21:57:07 GMT
- Title: SOInter: A Novel Deep Energy Based Interpretation Method for Explaining
Structured Output Models
- Authors: S. Fatemeh Seyyedsalehi, Mahdieh Soleymani, Hamid R. Rabiee
- Abstract summary: We propose a novel interpretation technique to explain the behavior of structured output models.
We focus on one of the outputs as the target and try to find the most important features utilized by the structured model to decide on the target in each locality of the input space.
- Score: 6.752231769293388
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a novel interpretation technique to explain the behavior of
structured output models, which learn mappings between an input vector to a set
of output variables simultaneously. Because of the complex relationship between
the computational path of output variables in structured models, a feature can
affect the value of output through other ones. We focus on one of the outputs
as the target and try to find the most important features utilized by the
structured model to decide on the target in each locality of the input space.
In this paper, we assume an arbitrary structured output model is available as a
black box and argue how considering the correlations between output variables
can improve the explanation performance. The goal is to train a function as an
interpreter for the target output variable over the input space. We introduce
an energy-based training process for the interpreter function, which
effectively considers the structural information incorporated into the model to
be explained. The effectiveness of the proposed method is confirmed using a
variety of simulated and real data sets.
Related papers
- Unveiling Transformer Perception by Exploring Input Manifolds [41.364418162255184]
This paper introduces a general method for the exploration of equivalence classes in the input space of Transformer models.
The proposed approach is based on sound mathematical theory which describes the internal layers of a Transformer architecture as sequential deformations of the input manifold.
arXiv Detail & Related papers (2024-10-08T13:20:31Z) - Counting in Small Transformers: The Delicate Interplay between Attention and Feed-Forward Layers [16.26331213222281]
We investigate how architectural design choices influence the space of solutions that a transformer can implement and learn.
We characterize two different counting strategies that small transformers can implement theoretically.
Our findings highlight that even in simple settings, slight variations in model design can cause significant changes to the solutions a transformer learns.
arXiv Detail & Related papers (2024-07-16T09:48:10Z) - Explaining Text Similarity in Transformer Models [52.571158418102584]
Recent advances in explainable AI have made it possible to mitigate limitations by leveraging improved explanations for Transformers.
We use BiLRP, an extension developed for computing second-order explanations in bilinear similarity models, to investigate which feature interactions drive similarity in NLP models.
Our findings contribute to a deeper understanding of different semantic similarity tasks and models, highlighting how novel explainable AI methods enable in-depth analyses and corpus-level insights.
arXiv Detail & Related papers (2024-05-10T17:11:31Z) - Learning to Extract Structured Entities Using Language Models [52.281701191329]
Recent advances in machine learning have significantly impacted the field of information extraction.
We reformulate the task to be entity-centric, enabling the use of diverse metrics.
We contribute to the field by introducing Structured Entity Extraction and proposing the Approximate Entity Set OverlaP metric.
arXiv Detail & Related papers (2024-02-06T22:15:09Z) - A Mechanistic Interpretation of Arithmetic Reasoning in Language Models
using Causal Mediation Analysis [128.0532113800092]
We present a mechanistic interpretation of Transformer-based LMs on arithmetic questions.
This provides insights into how information related to arithmetic is processed by LMs.
arXiv Detail & Related papers (2023-05-24T11:43:47Z) - Relational Local Explanations [11.679389861042]
We develop a novel model-agnostic and permutation-based feature attribution algorithm based on relational analysis between input variables.
We are able to gain a broader insight into machine learning model decisions and data.
arXiv Detail & Related papers (2022-12-23T14:46:23Z) - Dynamic Latent Separation for Deep Learning [67.62190501599176]
A core problem in machine learning is to learn expressive latent variables for model prediction on complex data.
Here, we develop an approach that improves expressiveness, provides partial interpretation, and is not restricted to specific applications.
arXiv Detail & Related papers (2022-10-07T17:56:53Z) - Efficient Sub-structured Knowledge Distillation [52.5931565465661]
We propose an approach that is much simpler in its formulation and far more efficient for training than existing approaches.
We transfer the knowledge from a teacher model to its student model by locally matching their predictions on all sub-structures, instead of the whole output space.
arXiv Detail & Related papers (2022-03-09T15:56:49Z) - Adversarial Infidelity Learning for Model Interpretation [43.37354056251584]
We propose a Model-agnostic Effective Efficient Direct (MEED) IFS framework for model interpretation.
Our framework mitigates concerns about sanity, shortcuts, model identifiability, and information transmission.
Our AIL mechanism can help learn the desired conditional distribution between selected features and targets.
arXiv Detail & Related papers (2020-06-09T16:27:17Z) - Explaining Black Box Predictions and Unveiling Data Artifacts through
Influence Functions [55.660255727031725]
Influence functions explain the decisions of a model by identifying influential training examples.
We conduct a comparison between influence functions and common word-saliency methods on representative tasks.
We develop a new measure based on influence functions that can reveal artifacts in training data.
arXiv Detail & Related papers (2020-05-14T00:45:23Z) - Learning Latent Causal Structures with a Redundant Input Neural Network [9.044150926401574]
It is known that inputs cause outputs, and these causal relationships are encoded by a causal network among a set of latent variables.
We develop a deep learning model, which we call a redundant input neural network (RINN), with a modified architecture and a regularized objective function.
A series of simulation experiments provide support that the RINN method can successfully recover latent causal structure between input and output variables.
arXiv Detail & Related papers (2020-03-29T20:52:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.