FR: Folded Rationalization with a Unified Encoder
- URL: http://arxiv.org/abs/2209.08285v2
- Date: Tue, 20 Sep 2022 11:08:27 GMT
- Title: FR: Folded Rationalization with a Unified Encoder
- Authors: Wei Liu, Haozhao Wang, Jun Wang, Ruixuan Li, Chao Yue, Yuankai Zhang
- Abstract summary: We propose Folded Rationalization (FR) that folds the two phases of the rationale model into one from the perspective of text semantic extraction.
We show that FR improves the F1 score by up to 10.3% as compared to state-of-the-art methods.
- Score: 14.899075910719189
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Conventional works generally employ a two-phase model in which a generator
selects the most important pieces, followed by a predictor that makes
predictions based on the selected pieces. However, such a two-phase model may
incur the degeneration problem where the predictor overfits to the noise
generated by a not yet well-trained generator and in turn, leads the generator
to converge to a sub-optimal model that tends to select senseless pieces. To
tackle this challenge, we propose Folded Rationalization (FR) that folds the
two phases of the rationale model into one from the perspective of text
semantic extraction. The key idea of FR is to employ a unified encoder between
the generator and predictor, based on which FR can facilitate a better
predictor by access to valuable information blocked by the generator in the
traditional two-phase model and thus bring a better generator. Empirically, we
show that FR improves the F1 score by up to 10.3% as compared to
state-of-the-art methods.
Related papers
- Dual Student Networks for Data-Free Model Stealing [79.67498803845059]
Two main challenges are estimating gradients of the target model without access to its parameters, and generating a diverse set of training samples.
We propose a Dual Student method where two students are symmetrically trained in order to provide the generator a criterion to generate samples that the two students disagree on.
We show that our new optimization framework provides more accurate gradient estimation of the target model and better accuracies on benchmark classification datasets.
arXiv Detail & Related papers (2023-09-18T18:11:31Z) - Complexity Matters: Rethinking the Latent Space for Generative Modeling [65.64763873078114]
In generative modeling, numerous successful approaches leverage a low-dimensional latent space, e.g., Stable Diffusion.
In this study, we aim to shed light on this under-explored topic by rethinking the latent space from the perspective of model complexity.
arXiv Detail & Related papers (2023-07-17T07:12:29Z) - Unifying GANs and Score-Based Diffusion as Generative Particle Models [18.00326775812974]
We propose a novel framework that unifies particle and adversarial generative models.
This suggests that a generator is an optional addition to any such generative model.
We empirically test the viability of these original models as proofs of concepts of potential applications of our framework.
arXiv Detail & Related papers (2023-05-25T15:20:10Z) - Decoupled Rationalization with Asymmetric Learning Rates: A Flexible
Lipschitz Restraint [16.54547887989801]
Self-explaining rationalization model is generally constructed by a cooperative game where a generator selects the most human-intelligible pieces from the input text as rationales, followed by a predictor that makes predictions based on the selected rationales.
Such a cooperative game may incur the degeneration problem where the predictor overfits to the uninformative pieces generated by a not yet well-trained generator and in turn, leads the generator to converge to a sub-optimal model that tends to select senseless pieces.
We empirically propose a simple but effective method named DR, which can naturally and flexibly restrain the Lipschitz constant of the
arXiv Detail & Related papers (2023-05-23T02:01:13Z) - MGR: Multi-generator Based Rationalization [14.745836934156427]
Rationalization is to employ a generator and a predictor to construct a self-explaining NLP model.
In this paper, we propose a simple yet effective method named MGR to simultaneously solve the two problems.
We show that MGR improves the F1 score by up to 20.9% as compared to state-of-the-art methods.
arXiv Detail & Related papers (2023-05-08T06:36:46Z) - Conditional Denoising Diffusion for Sequential Recommendation [62.127862728308045]
Two prominent generative models, Generative Adversarial Networks (GANs) and Variational AutoEncoders (VAEs)
GANs suffer from unstable optimization, while VAEs are prone to posterior collapse and over-smoothed generations.
We present a conditional denoising diffusion model, which includes a sequence encoder, a cross-attentive denoising decoder, and a step-wise diffuser.
arXiv Detail & Related papers (2023-04-22T15:32:59Z) - Learning Probabilistic Models from Generator Latent Spaces with Hat EBM [81.35199221254763]
This work proposes a method for using any generator network as the foundation of an Energy-Based Model (EBM)
Experiments show strong performance of the proposed method on (1) unconditional ImageNet synthesis at 128x128 resolution, (2) refining the output of existing generators, and (3) learning EBMs that incorporate non-probabilistic generators.
arXiv Detail & Related papers (2022-10-29T03:55:34Z) - An Energy-Based Prior for Generative Saliency [62.79775297611203]
We propose a novel generative saliency prediction framework that adopts an informative energy-based model as a prior distribution.
With the generative saliency model, we can obtain a pixel-wise uncertainty map from an image, indicating model confidence in the saliency prediction.
Experimental results show that our generative saliency model with an energy-based prior can achieve not only accurate saliency predictions but also reliable uncertainty maps consistent with human perception.
arXiv Detail & Related papers (2022-04-19T10:51:00Z) - Understanding Interlocking Dynamics of Cooperative Rationalization [90.6863969334526]
Selective rationalization explains the prediction of complex neural networks by finding a small subset of the input that is sufficient to predict the neural model output.
We reveal a major problem with such cooperative rationalization paradigm -- model interlocking.
We propose a new rationalization framework, called A2R, which introduces a third component into the architecture, a predictor driven by soft attention as opposed to selection.
arXiv Detail & Related papers (2021-10-26T17:39:18Z) - MG-GAN: A Multi-Generator Model Preventing Out-of-Distribution Samples
in Pedestrian Trajectory Prediction [0.6445605125467573]
We propose a multi-generator model for pedestrian trajectory prediction.
Each generator specializes in learning a distribution over trajectories routing towards one of the primary modes in the scene.
A second network learns a categorical distribution over these generators, conditioned on the dynamics and scene input.
This architecture allows us to effectively sample from specialized generators and to significantly reduce the out-of-distribution samples compared to single generator methods.
arXiv Detail & Related papers (2021-08-20T17:10:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.