Learning from the Best: Rationalizing Prediction by Adversarial
Information Calibration
- URL: http://arxiv.org/abs/2012.08884v2
- Date: Fri, 18 Dec 2020 10:07:27 GMT
- Title: Learning from the Best: Rationalizing Prediction by Adversarial
Information Calibration
- Authors: Lei Sha, Oana-Maria Camburu, and Thomas Lukasiewicz
- Abstract summary: We train two models jointly: one is a typical neural model that solves the task at hand in an accurate but black-box manner, and the other is a selector-predictor model that additionally produces a rationale for its prediction.
We use an adversarial-based technique to calibrate the information extracted by the two models.
For natural language tasks, we propose to use a language-model-based regularizer to encourage the extraction of fluent rationales.
- Score: 39.685626118667074
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Explaining the predictions of AI models is paramount in safety-critical
applications, such as in legal or medical domains. One form of explanation for
a prediction is an extractive rationale, i.e., a subset of features of an
instance that lead the model to give its prediction on the instance. Previous
works on generating extractive rationales usually employ a two-phase model: a
selector that selects the most important features (i.e., the rationale)
followed by a predictor that makes the prediction based exclusively on the
selected features. One disadvantage of these works is that the main signal for
learning to select features comes from the comparison of the answers given by
the predictor and the ground-truth answers. In this work, we propose to squeeze
more information from the predictor via an information calibration method. More
precisely, we train two models jointly: one is a typical neural model that
solves the task at hand in an accurate but black-box manner, and the other is a
selector-predictor model that additionally produces a rationale for its
prediction. The first model is used as a guide to the second model. We use an
adversarial-based technique to calibrate the information extracted by the two
models such that the difference between them is an indicator of the missed or
over-selected features. In addition, for natural language tasks, we propose to
use a language-model-based regularizer to encourage the extraction of fluent
rationales. Experimental results on a sentiment analysis task as well as on
three tasks from the legal domain show the effectiveness of our approach to
rationale extraction.
Related papers
- You Only Forward Once: Prediction and Rationalization in A Single
Forward Pass [10.998983921416533]
Unsupervised rationale extraction aims to extract concise and contiguous text snippets to support model predictions without any rationale.
Previous studies have used a two-phase framework known as the Rationalizing Neural Prediction (RNP) framework, which follows a generate-then-predict paradigm.
We propose a novel single-phase framework called You Only Forward Once (YOFO), derived from a relaxed version of rationale where rationales aim to support model predictions rather than make predictions.
arXiv Detail & Related papers (2023-11-04T08:04:28Z) - Rationalizing Predictions by Adversarial Information Calibration [65.19407304154177]
We train two models jointly: one is a typical neural model that solves the task at hand in an accurate but black-box manner, and the other is a selector-predictor model that additionally produces a rationale for its prediction.
We use an adversarial technique to calibrate the information extracted by the two models such that the difference between them is an indicator of the missed or over-selected features.
arXiv Detail & Related papers (2023-01-15T03:13:09Z) - An Additive Instance-Wise Approach to Multi-class Model Interpretation [53.87578024052922]
Interpretable machine learning offers insights into what factors drive a certain prediction of a black-box system.
Existing methods mainly focus on selecting explanatory input features, which follow either locally additive or instance-wise approaches.
This work exploits the strengths of both methods and proposes a global framework for learning local explanations simultaneously for multiple target classes.
arXiv Detail & Related papers (2022-07-07T06:50:27Z) - A Generative Language Model for Few-shot Aspect-Based Sentiment Analysis [90.24921443175514]
We focus on aspect-based sentiment analysis, which involves extracting aspect term, category, and predicting their corresponding polarities.
We propose to reformulate the extraction and prediction tasks into the sequence generation task, using a generative language model with unidirectional attention.
Our approach outperforms the previous state-of-the-art (based on BERT) on average performance by a large margins in few-shot and full-shot settings.
arXiv Detail & Related papers (2022-04-11T18:31:53Z) - Understanding Interlocking Dynamics of Cooperative Rationalization [90.6863969334526]
Selective rationalization explains the prediction of complex neural networks by finding a small subset of the input that is sufficient to predict the neural model output.
We reveal a major problem with such cooperative rationalization paradigm -- model interlocking.
We propose a new rationalization framework, called A2R, which introduces a third component into the architecture, a predictor driven by soft attention as opposed to selection.
arXiv Detail & Related papers (2021-10-26T17:39:18Z) - Fairness-aware Summarization for Justified Decision-Making [16.47665757950391]
We focus on the problem of (un)fairness in the justification of the text-based neural models.
We propose a fairness-aware summarization mechanism to detect and counteract the bias in such models.
arXiv Detail & Related papers (2021-07-13T17:04:10Z) - Better Model Selection with a new Definition of Feature Importance [8.914907178577476]
Feature importance aims at measuring how crucial each input feature is for model prediction.
In this paper, we propose a new tree-model explanation approach for model selection.
arXiv Detail & Related papers (2020-09-16T14:32:22Z) - Are Visual Explanations Useful? A Case Study in Model-in-the-Loop
Prediction [49.254162397086006]
We study explanations based on visual saliency in an image-based age prediction task.
We find that presenting model predictions improves human accuracy.
However, explanations of various kinds fail to significantly alter human accuracy or trust in the model.
arXiv Detail & Related papers (2020-07-23T20:39:40Z) - Adversarial Infidelity Learning for Model Interpretation [43.37354056251584]
We propose a Model-agnostic Effective Efficient Direct (MEED) IFS framework for model interpretation.
Our framework mitigates concerns about sanity, shortcuts, model identifiability, and information transmission.
Our AIL mechanism can help learn the desired conditional distribution between selected features and targets.
arXiv Detail & Related papers (2020-06-09T16:27:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.