Rationalizing Predictions by Adversarial Information Calibration
- URL: http://arxiv.org/abs/2301.06009v1
- Date: Sun, 15 Jan 2023 03:13:09 GMT
- Title: Rationalizing Predictions by Adversarial Information Calibration
- Authors: Lei Sha, Oana-Maria Camburu, Thomas Lukasiewicz
- Abstract summary: We train two models jointly: one is a typical neural model that solves the task at hand in an accurate but black-box manner, and the other is a selector-predictor model that additionally produces a rationale for its prediction.
We use an adversarial technique to calibrate the information extracted by the two models such that the difference between them is an indicator of the missed or over-selected features.
- Score: 65.19407304154177
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Explaining the predictions of AI models is paramount in safety-critical
applications, such as in legal or medical domains. One form of explanation for
a prediction is an extractive rationale, i.e., a subset of features of an
instance that lead the model to give its prediction on that instance. For
example, the subphrase ``he stole the mobile phone'' can be an extractive
rationale for the prediction of ``Theft''. Previous works on generating
extractive rationales usually employ a two-phase model: a selector that selects
the most important features (i.e., the rationale) followed by a predictor that
makes the prediction based exclusively on the selected features. One
disadvantage of these works is that the main signal for learning to select
features comes from the comparison of the answers given by the predictor to the
ground-truth answers. In this work, we propose to squeeze more information from
the predictor via an information calibration method. More precisely, we train
two models jointly: one is a typical neural model that solves the task at hand
in an accurate but black-box manner, and the other is a selector-predictor
model that additionally produces a rationale for its prediction. The first
model is used as a guide for the second model. We use an adversarial technique
to calibrate the information extracted by the two models such that the
difference between them is an indicator of the missed or over-selected
features. In addition, for natural language tasks, we propose a
language-model-based regularizer to encourage the extraction of fluent
rationales. Experimental results on a sentiment analysis task, a hate speech
recognition task as well as on three tasks from the legal domain show the
effectiveness of our approach to rationale extraction.
Related papers
- You Only Forward Once: Prediction and Rationalization in A Single
Forward Pass [10.998983921416533]
Unsupervised rationale extraction aims to extract concise and contiguous text snippets to support model predictions without any rationale.
Previous studies have used a two-phase framework known as the Rationalizing Neural Prediction (RNP) framework, which follows a generate-then-predict paradigm.
We propose a novel single-phase framework called You Only Forward Once (YOFO), derived from a relaxed version of rationale where rationales aim to support model predictions rather than make predictions.
arXiv Detail & Related papers (2023-11-04T08:04:28Z) - Explaining Hate Speech Classification with Model Agnostic Methods [0.9990687944474738]
The research goal of this paper is to bridge the gap between hate speech prediction and the explanations generated by the system to support its decision.
This has been achieved by first predicting the classification of a text and then providing a posthoc, model agnostic and surrogate interpretability approach.
arXiv Detail & Related papers (2023-05-30T19:52:56Z) - A Generative Language Model for Few-shot Aspect-Based Sentiment Analysis [90.24921443175514]
We focus on aspect-based sentiment analysis, which involves extracting aspect term, category, and predicting their corresponding polarities.
We propose to reformulate the extraction and prediction tasks into the sequence generation task, using a generative language model with unidirectional attention.
Our approach outperforms the previous state-of-the-art (based on BERT) on average performance by a large margins in few-shot and full-shot settings.
arXiv Detail & Related papers (2022-04-11T18:31:53Z) - Understanding Interlocking Dynamics of Cooperative Rationalization [90.6863969334526]
Selective rationalization explains the prediction of complex neural networks by finding a small subset of the input that is sufficient to predict the neural model output.
We reveal a major problem with such cooperative rationalization paradigm -- model interlocking.
We propose a new rationalization framework, called A2R, which introduces a third component into the architecture, a predictor driven by soft attention as opposed to selection.
arXiv Detail & Related papers (2021-10-26T17:39:18Z) - Fairness-aware Summarization for Justified Decision-Making [16.47665757950391]
We focus on the problem of (un)fairness in the justification of the text-based neural models.
We propose a fairness-aware summarization mechanism to detect and counteract the bias in such models.
arXiv Detail & Related papers (2021-07-13T17:04:10Z) - Contrastive Explanations for Model Interpretability [77.92370750072831]
We propose a methodology to produce contrastive explanations for classification models.
Our method is based on projecting model representation to a latent space.
Our findings shed light on the ability of label-contrastive explanations to provide a more accurate and finer-grained interpretability of a model's decision.
arXiv Detail & Related papers (2021-03-02T00:36:45Z) - Explain and Predict, and then Predict Again [6.865156063241553]
We propose ExPred, that uses multi-task learning in the explanation generation phase effectively trading-off explanation and prediction losses.
We conduct an extensive evaluation of our approach on three diverse language datasets.
arXiv Detail & Related papers (2021-01-11T19:36:52Z) - Learning from the Best: Rationalizing Prediction by Adversarial
Information Calibration [39.685626118667074]
We train two models jointly: one is a typical neural model that solves the task at hand in an accurate but black-box manner, and the other is a selector-predictor model that additionally produces a rationale for its prediction.
We use an adversarial-based technique to calibrate the information extracted by the two models.
For natural language tasks, we propose to use a language-model-based regularizer to encourage the extraction of fluent rationales.
arXiv Detail & Related papers (2020-12-16T11:54:15Z) - Are Visual Explanations Useful? A Case Study in Model-in-the-Loop
Prediction [49.254162397086006]
We study explanations based on visual saliency in an image-based age prediction task.
We find that presenting model predictions improves human accuracy.
However, explanations of various kinds fail to significantly alter human accuracy or trust in the model.
arXiv Detail & Related papers (2020-07-23T20:39:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.