Understanding the Mechanics of SPIGOT: Surrogate Gradients for Latent
Structure Learning
- URL: http://arxiv.org/abs/2010.02357v1
- Date: Mon, 5 Oct 2020 21:56:00 GMT
- Title: Understanding the Mechanics of SPIGOT: Surrogate Gradients for Latent
Structure Learning
- Authors: Tsvetomila Mihaylova, Vlad Niculae, Andr\'e F. T. Martins
- Abstract summary: Latent structure models are a powerful tool for modeling language data.
One challenge with end-to-end training of these models is the argmax operation, which has null gradient.
We explore latent structure learning through the angle of pulling back the downstream learning objective.
- Score: 20.506232306308977
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Latent structure models are a powerful tool for modeling language data: they
can mitigate the error propagation and annotation bottleneck in pipeline
systems, while simultaneously uncovering linguistic insights about the data.
One challenge with end-to-end training of these models is the argmax operation,
which has null gradient. In this paper, we focus on surrogate gradients, a
popular strategy to deal with this problem. We explore latent structure
learning through the angle of pulling back the downstream learning objective.
In this paradigm, we discover a principled motivation for both the
straight-through estimator (STE) as well as the recently-proposed SPIGOT - a
variant of STE for structured models. Our perspective leads to new algorithms
in the same family. We empirically compare the known and the novel pulled-back
estimators against the popular alternatives, yielding new insight for
practitioners and revealing intriguing failure cases.
Related papers
- Unified Explanations in Machine Learning Models: A Perturbation Approach [0.0]
Inconsistencies between XAI and modeling techniques can have the undesirable effect of casting doubt upon the efficacy of these explainability approaches.
We propose a systematic, perturbation-based analysis against a popular, model-agnostic method in XAI, SHapley Additive exPlanations (Shap)
We devise algorithms to generate relative feature importance in settings of dynamic inference amongst a suite of popular machine learning and deep learning methods, and metrics that allow us to quantify how well explanations generated under the static case hold.
arXiv Detail & Related papers (2024-05-30T16:04:35Z) - Towards Learning Stochastic Population Models by Gradient Descent [0.0]
We show that simultaneous estimation of parameters and structure poses major challenges for optimization procedures.
We demonstrate accurate estimation of models but find that enforcing the inference of parsimonious, interpretable models drastically increases the difficulty.
arXiv Detail & Related papers (2024-04-10T14:38:58Z) - The Edge-of-Reach Problem in Offline Model-Based Reinforcement Learning [37.387280102209274]
offline reinforcement learning aims to enable agents to be trained from pre-collected datasets, however, this comes with the added challenge of estimating the value of behavior not covered in the dataset.
Model-based methods offer a solution by allowing agents to collect additional synthetic data via rollouts in a learned dynamics model.
However, if the learned dynamics model is replaced by the true error-free dynamics, existing model-based methods completely fail.
We propose Reach-Aware Value Learning (RAVL), a simple and robust method that directly addresses the edge-of-reach problem.
arXiv Detail & Related papers (2024-02-19T20:38:00Z) - Seeking Neural Nuggets: Knowledge Transfer in Large Language Models from a Parametric Perspective [106.92016199403042]
We empirically investigate knowledge transfer from larger to smaller models through a parametric perspective.
We employ sensitivity-based techniques to extract and align knowledge-specific parameters between different large language models.
Our findings highlight the critical factors contributing to the process of parametric knowledge transfer.
arXiv Detail & Related papers (2023-10-17T17:58:34Z) - Autoregressive Structured Prediction with Language Models [73.11519625765301]
We describe an approach to model structures as sequences of actions in an autoregressive manner with PLMs.
Our approach achieves the new state-of-the-art on all the structured prediction tasks we looked at.
arXiv Detail & Related papers (2022-10-26T13:27:26Z) - FOSTER: Feature Boosting and Compression for Class-Incremental Learning [52.603520403933985]
Deep neural networks suffer from catastrophic forgetting when learning new categories.
We propose a novel two-stage learning paradigm FOSTER, empowering the model to learn new categories adaptively.
arXiv Detail & Related papers (2022-04-10T11:38:33Z) - Towards Interpretable Deep Reinforcement Learning Models via Inverse
Reinforcement Learning [27.841725567976315]
We propose a novel framework utilizing Adversarial Inverse Reinforcement Learning.
This framework provides global explanations for decisions made by a Reinforcement Learning model.
We capture intuitive tendencies that the model follows by summarizing the model's decision-making process.
arXiv Detail & Related papers (2022-03-30T17:01:59Z) - Layer-wise Analysis of a Self-supervised Speech Representation Model [26.727775920272205]
Self-supervised learning approaches have been successful for pre-training speech representation models.
Not much has been studied about the type or extent of information encoded in the pre-trained representations themselves.
arXiv Detail & Related papers (2021-07-10T02:13:25Z) - Gone Fishing: Neural Active Learning with Fisher Embeddings [55.08537975896764]
There is an increasing need for active learning algorithms that are compatible with deep neural networks.
This article introduces BAIT, a practical representation of tractable, and high-performing active learning algorithm for neural networks.
arXiv Detail & Related papers (2021-06-17T17:26:31Z) - Towards Interpretable Deep Learning Models for Knowledge Tracing [62.75876617721375]
We propose to adopt the post-hoc method to tackle the interpretability issue for deep learning based knowledge tracing (DLKT) models.
Specifically, we focus on applying the layer-wise relevance propagation (LRP) method to interpret RNN-based DLKT model.
Experiment results show the feasibility using the LRP method for interpreting the DLKT model's predictions.
arXiv Detail & Related papers (2020-05-13T04:03:21Z) - Plausible Counterfactuals: Auditing Deep Learning Classifiers with
Realistic Adversarial Examples [84.8370546614042]
Black-box nature of Deep Learning models has posed unanswered questions about what they learn from data.
Generative Adversarial Network (GAN) and multi-objectives are used to furnish a plausible attack to the audited model.
Its utility is showcased within a human face classification task, unveiling the enormous potential of the proposed framework.
arXiv Detail & Related papers (2020-03-25T11:08:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.