An Information Bottleneck Approach for Controlling Conciseness in
Rationale Extraction
- URL: http://arxiv.org/abs/2005.00652v3
- Date: Tue, 3 Nov 2020 04:38:35 GMT
- Title: An Information Bottleneck Approach for Controlling Conciseness in
Rationale Extraction
- Authors: Bhargavi Paranjape, Mandar Joshi, John Thickstun, Hannaneh Hajishirzi,
Luke Zettlemoyer
- Abstract summary: We show that it is possible to better manage this trade-off by optimizing a bound on the Information Bottleneck (IB) objective.
Our fully unsupervised approach jointly learns an explainer that predicts sparse binary masks over sentences, and an end-task predictor that considers only the extracted rationale.
- Score: 84.49035467829819
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Decisions of complex language understanding models can be rationalized by
limiting their inputs to a relevant subsequence of the original text. A
rationale should be as concise as possible without significantly degrading task
performance, but this balance can be difficult to achieve in practice. In this
paper, we show that it is possible to better manage this trade-off by
optimizing a bound on the Information Bottleneck (IB) objective. Our fully
unsupervised approach jointly learns an explainer that predicts sparse binary
masks over sentences, and an end-task predictor that considers only the
extracted rationale. Using IB, we derive a learning objective that allows
direct control of mask sparsity levels through a tunable sparse prior.
Experiments on ERASER benchmark tasks demonstrate significant gains over
norm-minimization techniques for both task performance and agreement with human
rationales. Furthermore, we find that in the semi-supervised setting, a modest
amount of gold rationales (25% of training examples) closes the gap with a
model that uses the full input.
Related papers
- Plausible Extractive Rationalization through Semi-Supervised Entailment Signal [29.67884478799914]
We take a semi-supervised approach to optimize for the plausibility of extracted rationales.
We adopt a pre-trained natural language inference (NLI) model and further fine-tune it on a small set of supervised rationales.
We show that, by enforcing the alignment agreement between the explanation and answer in a question-answering task, the performance can be improved without access to ground truth labels.
arXiv Detail & Related papers (2024-02-13T14:12:32Z) - One-bit Supervision for Image Classification: Problem, Solution, and
Beyond [114.95815360508395]
This paper presents one-bit supervision, a novel setting of learning with fewer labels, for image classification.
We propose a multi-stage training paradigm and incorporate negative label suppression into an off-the-shelf semi-supervised learning algorithm.
In multiple benchmarks, the learning efficiency of the proposed approach surpasses that using full-bit, semi-supervised supervision.
arXiv Detail & Related papers (2023-11-26T07:39:00Z) - REFER: An End-to-end Rationale Extraction Framework for Explanation
Regularization [12.409398096527829]
We propose REFER, a framework that employs a differentiable rationale extractor that allows to back-propagate through the rationale extraction process.
We analyze the impact of using human highlights during training by jointly training the task model and the rationale extractor.
arXiv Detail & Related papers (2023-10-22T21:20:52Z) - Hierarchical Decomposition of Prompt-Based Continual Learning:
Rethinking Obscured Sub-optimality [55.88910947643436]
Self-supervised pre-training is essential for handling vast quantities of unlabeled data in practice.
HiDe-Prompt is an innovative approach that explicitly optimize the hierarchical components with an ensemble of task-specific prompts and statistics.
Our experiments demonstrate the superior performance of HiDe-Prompt and its robustness to pre-training paradigms in continual learning.
arXiv Detail & Related papers (2023-10-11T06:51:46Z) - Boosted Control Functions [10.503777692702952]
This work aims to bridge the gap between causal effect estimation and prediction tasks.
We establish a novel connection between the field of distribution from machine learning, and simultaneous equation models and control function from econometrics.
Within this framework, we propose a strong notion of invariance for a predictive model and compare it with existing (weaker) versions.
arXiv Detail & Related papers (2023-10-09T15:43:46Z) - Self-Supervised Learning via Maximum Entropy Coding [57.56570417545023]
We propose Maximum Entropy Coding (MEC) as a principled objective that explicitly optimize on the structure of the representation.
MEC learns a more generalizable representation than previous methods based on specific pretext tasks.
It achieves state-of-the-art performance consistently on various downstream tasks, including not only ImageNet linear probe, but also semi-supervised classification, object detection, instance segmentation, and object tracking.
arXiv Detail & Related papers (2022-10-20T17:58:30Z) - Counterfactual Maximum Likelihood Estimation for Training Deep Networks [83.44219640437657]
Deep learning models are prone to learning spurious correlations that should not be learned as predictive clues.
We propose a causality-based training framework to reduce the spurious correlations caused by observable confounders.
We conduct experiments on two real-world tasks: Natural Language Inference (NLI) and Image Captioning.
arXiv Detail & Related papers (2021-06-07T17:47:16Z) - WSSOD: A New Pipeline for Weakly- and Semi-Supervised Object Detection [75.80075054706079]
We propose a weakly- and semi-supervised object detection framework (WSSOD)
An agent detector is first trained on a joint dataset and then used to predict pseudo bounding boxes on weakly-annotated images.
The proposed framework demonstrates remarkable performance on PASCAL-VOC and MSCOCO benchmark, achieving a high performance comparable to those obtained in fully-supervised settings.
arXiv Detail & Related papers (2021-05-21T11:58:50Z) - A Distributional Approach to Controlled Text Generation [3.279201607581627]
We propose a Distributional Approach to address Controlled Text Generation from pre-trained Language Models (LMs)
This view permits to define, in a single formal framework, "pointwise" and "distributional" constraints over the target LM.
We then perform experiments over distributional constraints, a unique feature of our approach, demonstrating its potential as a remedy to the problem of Bias in Language Models.
arXiv Detail & Related papers (2020-12-21T19:02:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.