Decoupled Rationalization with Asymmetric Learning Rates: A Flexible
Lipschitz Restraint
- URL: http://arxiv.org/abs/2305.13599v3
- Date: Sat, 24 Jun 2023 08:54:12 GMT
- Title: Decoupled Rationalization with Asymmetric Learning Rates: A Flexible
Lipschitz Restraint
- Authors: Wei Liu, Jun Wang, Haozhao Wang, Ruixuan Li, Yang Qiu, YuanKai Zhang,
Jie Han, Yixiong Zou
- Abstract summary: Self-explaining rationalization model is generally constructed by a cooperative game where a generator selects the most human-intelligible pieces from the input text as rationales, followed by a predictor that makes predictions based on the selected rationales.
Such a cooperative game may incur the degeneration problem where the predictor overfits to the uninformative pieces generated by a not yet well-trained generator and in turn, leads the generator to converge to a sub-optimal model that tends to select senseless pieces.
We empirically propose a simple but effective method named DR, which can naturally and flexibly restrain the Lipschitz constant of the
- Score: 16.54547887989801
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: A self-explaining rationalization model is generally constructed by a
cooperative game where a generator selects the most human-intelligible pieces
from the input text as rationales, followed by a predictor that makes
predictions based on the selected rationales. However, such a cooperative game
may incur the degeneration problem where the predictor overfits to the
uninformative pieces generated by a not yet well-trained generator and in turn,
leads the generator to converge to a sub-optimal model that tends to select
senseless pieces. In this paper, we theoretically bridge degeneration with the
predictor's Lipschitz continuity. Then, we empirically propose a simple but
effective method named DR, which can naturally and flexibly restrain the
Lipschitz constant of the predictor, to address the problem of degeneration.
The main idea of DR is to decouple the generator and predictor to allocate them
with asymmetric learning rates. A series of experiments conducted on two widely
used benchmarks have verified the effectiveness of the proposed method. Codes:
\href{https://github.com/jugechengzi/Rationalization-DR}{https://github.com/jugechengzi/Rationalization-DR}.
Related papers
- Enhancing the Rationale-Input Alignment for Self-explaining
Rationalization [22.74436500022893]
We introduce a novel approach called DAR (textbfDiscriminatively textbfAligned textbfRationalization) to align the selected rationale and the original input.
Experiments on two widely used real-world benchmarks show that the proposed method significantly improves the explanation quality.
arXiv Detail & Related papers (2023-12-07T07:37:15Z) - Provable Guarantees for Generative Behavior Cloning: Bridging Low-Level
Stability and High-Level Behavior [51.60683890503293]
We propose a theoretical framework for studying behavior cloning of complex expert demonstrations using generative modeling.
We show that pure supervised cloning can generate trajectories matching the per-time step distribution of arbitrary expert trajectories.
arXiv Detail & Related papers (2023-07-27T04:27:26Z) - Unsupervised Selective Rationalization with Noise Injection [7.17737088382948]
unsupervised selective rationalization produces rationales alongside predictions by chaining two jointly-trained components, a rationale generator and a predictor.
We introduce a novel training technique that effectively limits generation of implausible rationales by injecting noise between the generator and the predictor.
We achieve sizeable improvements in rationale plausibility and task accuracy over the state-of-the-art across a variety of tasks, including our new benchmark.
arXiv Detail & Related papers (2023-05-27T17:34:36Z) - FR: Folded Rationalization with a Unified Encoder [14.899075910719189]
We propose Folded Rationalization (FR) that folds the two phases of the rationale model into one from the perspective of text semantic extraction.
We show that FR improves the F1 score by up to 10.3% as compared to state-of-the-art methods.
arXiv Detail & Related papers (2022-09-17T08:49:45Z) - Distributionally Robust Models with Parametric Likelihood Ratios [123.05074253513935]
Three simple ideas allow us to train models with DRO using a broader class of parametric likelihood ratios.
We find that models trained with the resulting parametric adversaries are consistently more robust to subpopulation shifts when compared to other DRO approaches.
arXiv Detail & Related papers (2022-04-13T12:43:12Z) - Discovering Invariant Rationales for Graph Neural Networks [104.61908788639052]
Intrinsic interpretability of graph neural networks (GNNs) is to find a small subset of the input graph's features.
We propose a new strategy of discovering invariant rationale (DIR) to construct intrinsically interpretable GNNs.
arXiv Detail & Related papers (2022-01-30T16:43:40Z) - Understanding Interlocking Dynamics of Cooperative Rationalization [90.6863969334526]
Selective rationalization explains the prediction of complex neural networks by finding a small subset of the input that is sufficient to predict the neural model output.
We reveal a major problem with such cooperative rationalization paradigm -- model interlocking.
We propose a new rationalization framework, called A2R, which introduces a third component into the architecture, a predictor driven by soft attention as opposed to selection.
arXiv Detail & Related papers (2021-10-26T17:39:18Z) - Rationales for Sequential Predictions [117.93025782838123]
Sequence models are a critical component of modern NLP systems, but their predictions are difficult to explain.
We consider model explanations though rationales, subsets of context that can explain individual model predictions.
We propose an efficient greedy algorithm to approximate this objective.
arXiv Detail & Related papers (2021-09-14T01:25:15Z) - Invariant Rationalization [84.1861516092232]
A typical rationalization criterion, i.e. maximum mutual information (MMI), finds the rationale that maximizes the prediction performance based only on the rationale.
We introduce a game-theoretic invariant rationalization criterion where the rationales are constrained to enable the same predictor to be optimal across different environments.
We show both theoretically and empirically that the proposed rationales can rule out spurious correlations, generalize better to different test scenarios, and align better with human judgments.
arXiv Detail & Related papers (2020-03-22T00:50:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.