Learning with Latent Structures in Natural Language Processing: A Survey
- URL: http://arxiv.org/abs/2201.00490v1
- Date: Mon, 3 Jan 2022 06:16:17 GMT
- Title: Learning with Latent Structures in Natural Language Processing: A Survey
- Authors: Zhaofeng Wu
- Abstract summary: Recent interest in learning with latent discrete structures to incorporate better inductive biases for improved end-task performance and better interpretability.
This work surveys three main families of methods to learn such models: surrogate gradients, continuous relaxation, and marginal likelihood via sampling.
We conclude with a review of applications of these methods and an inspection of the learned latent structure that they induce.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While end-to-end learning with fully differentiable models has enabled
tremendous success in natural language process (NLP) and machine learning,
there have been significant recent interests in learning with latent discrete
structures to incorporate better inductive biases for improved end-task
performance and better interpretability. This paradigm, however, is not
straightforwardly amenable to the mainstream gradient-based optimization
methods. This work surveys three main families of methods to learn such models:
surrogate gradients, continuous relaxation, and marginal likelihood
maximization via sampling. We conclude with a review of applications of these
methods and an inspection of the learned latent structure that they induce.
Related papers
- Deep Learning Through A Telescoping Lens: A Simple Model Provides Empirical Insights On Grokking, Gradient Boosting & Beyond [61.18736646013446]
In pursuit of a deeper understanding of its surprising behaviors, we investigate the utility of a simple yet accurate model of a trained neural network.
Across three case studies, we illustrate how it can be applied to derive new empirical insights on a diverse range of prominent phenomena.
arXiv Detail & Related papers (2024-10-31T22:54:34Z) - Probing the Decision Boundaries of In-context Learning in Large Language Models [31.977886254197138]
We propose a new mechanism to probe and understand in-context learning from the lens of decision boundaries for in-context binary classification.
To our surprise, we find that the decision boundaries learned by current LLMs in simple binary classification tasks are often irregular and non-smooth.
arXiv Detail & Related papers (2024-06-17T06:00:24Z) - Unlearning with Control: Assessing Real-world Utility for Large Language Model Unlearning [97.2995389188179]
Recent research has begun to approach large language models (LLMs) unlearning via gradient ascent (GA)
Despite their simplicity and efficiency, we suggest that GA-based methods face the propensity towards excessive unlearning.
We propose several controlling methods that can regulate the extent of excessive unlearning.
arXiv Detail & Related papers (2024-06-13T14:41:00Z) - Reinforcement Learning Fine-tuning of Language Models is Biased Towards
More Extractable Features [0.5937476291232802]
We investigate whether principles governing inductive biases in the supervised fine-tuning of large language models also apply when the fine-tuning process uses reinforcement learning.
We find statistically significant correlations which constitute strong evidence for these hypotheses.
arXiv Detail & Related papers (2023-11-07T15:00:39Z) - Latent Traversals in Generative Models as Potential Flows [113.4232528843775]
We propose to model latent structures with a learned dynamic potential landscape.
Inspired by physics, optimal transport, and neuroscience, these potential landscapes are learned as physically realistic partial differential equations.
Our method achieves both more qualitatively and quantitatively disentangled trajectories than state-of-the-art baselines.
arXiv Detail & Related papers (2023-04-25T15:53:45Z) - Offline Reinforcement Learning with Differentiable Function
Approximation is Provably Efficient [65.08966446962845]
offline reinforcement learning, which aims at optimizing decision-making strategies with historical data, has been extensively applied in real-life applications.
We take a step by considering offline reinforcement learning with differentiable function class approximation (DFA)
Most importantly, we show offline differentiable function approximation is provably efficient by analyzing the pessimistic fitted Q-learning algorithm.
arXiv Detail & Related papers (2022-10-03T07:59:42Z) - Latent Properties of Lifelong Learning Systems [59.50307752165016]
We introduce an algorithm-agnostic explainable surrogate-modeling approach to estimate latent properties of lifelong learning algorithms.
We validate the approach for estimating these properties via experiments on synthetic data.
arXiv Detail & Related papers (2022-07-28T20:58:13Z) - Genetic Programming for Manifold Learning: Preserving Local Topology [5.226724669049025]
We propose a new approach to using genetic programming for manifold learning, which preserves local topology.
This is expected to significantly improve performance on tasks where local neighbourhood structure (topology) is paramount.
arXiv Detail & Related papers (2021-08-23T03:48:48Z) - Active Learning for Sequence Tagging with Deep Pre-trained Models and
Bayesian Uncertainty Estimates [52.164757178369804]
Recent advances in transfer learning for natural language processing in conjunction with active learning open the possibility to significantly reduce the necessary annotation budget.
We conduct an empirical study of various Bayesian uncertainty estimation methods and Monte Carlo dropout options for deep pre-trained models in the active learning framework.
We also demonstrate that to acquire instances during active learning, a full-size Transformer can be substituted with a distilled version, which yields better computational performance.
arXiv Detail & Related papers (2021-01-20T13:59:25Z) - Understanding the Mechanics of SPIGOT: Surrogate Gradients for Latent
Structure Learning [20.506232306308977]
Latent structure models are a powerful tool for modeling language data.
One challenge with end-to-end training of these models is the argmax operation, which has null gradient.
We explore latent structure learning through the angle of pulling back the downstream learning objective.
arXiv Detail & Related papers (2020-10-05T21:56:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.