Learning to Generalize for Sequential Decision Making
- URL: http://arxiv.org/abs/2010.02229v1
- Date: Mon, 5 Oct 2020 18:00:03 GMT
- Title: Learning to Generalize for Sequential Decision Making
- Authors: Xusen Yin, Ralph Weischedel, Jonathan May
- Abstract summary: We introduce a teacher-student imitation learning methodology and a means of converting a reinforcement learning model into a natural language understanding model.
We show that models can learn faster and generalize more, leveraging both the imitation learning and the reformulation.
- Score: 19.075378799280728
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider problems of making sequences of decisions to accomplish tasks,
interacting via the medium of language. These problems are often tackled with
reinforcement learning approaches. We find that these models do not generalize
well when applied to novel task domains. However, the large amount of
computation necessary to adequately train and explore the search space of
sequential decision making, under a reinforcement learning paradigm, precludes
the inclusion of large contextualized language models, which might otherwise
enable the desired generalization ability. We introduce a teacher-student
imitation learning methodology and a means of converting a reinforcement
learning model into a natural language understanding model. Together, these
methodologies enable the introduction of contextualized language models into
the sequential decision making problem space. We show that models can learn
faster and generalize more, leveraging both the imitation learning and the
reformulation. Our models exceed teacher performance on various held-out
decision problems, by up to 7% on in-domain problems and 24% on out-of-domain
problems.
Related papers
- Neuro-symbolic Training for Reasoning over Spatial Language [17.901249830817882]
We propose training language models with neuro-symbolic techniques that can exploit the logical rules of reasoning as constraints.
We focus on a challenging problem of spatial reasoning over text.
arXiv Detail & Related papers (2024-06-19T20:47:36Z) - Scalable Language Model with Generalized Continual Learning [58.700439919096155]
The Joint Adaptive Re-ization (JARe) is integrated with Dynamic Task-related Knowledge Retrieval (DTKR) to enable adaptive adjustment of language models based on specific downstream tasks.
Our method demonstrates state-of-the-art performance on diverse backbones and benchmarks, achieving effective continual learning in both full-set and few-shot scenarios with minimal forgetting.
arXiv Detail & Related papers (2024-04-11T04:22:15Z) - Learning to Diversify Neural Text Generation via Degenerative Model [39.961572541752005]
We propose a new approach to prevent degeneration problems by training two models.
We first train a model that is designed to amplify undesirable patterns.
We then enhance the diversity of the second model by focusing on patterns that the first model fails to learn.
arXiv Detail & Related papers (2023-09-22T04:57:10Z) - Foundation Models for Decision Making: Problems, Methods, and
Opportunities [124.79381732197649]
Foundation models pretrained on diverse data at scale have demonstrated extraordinary capabilities in a wide range of vision and language tasks.
New paradigms are emerging for training foundation models to interact with other agents and perform long-term reasoning.
Research at the intersection of foundation models and decision making holds tremendous promise for creating powerful new systems.
arXiv Detail & Related papers (2023-03-07T18:44:07Z) - Exploring Length Generalization in Large Language Models [46.417433724786854]
The ability to extrapolate from short problem instances to longer ones is an important form of out-of-distribution generalization in reasoning tasks.
We show that naively finetuning transformers on length generalization tasks shows significant generalization deficiencies independent of model scale.
We then show that combining pretrained large language models' in-context learning abilities with scratchpad prompting results in a dramatic improvement in length generalization.
arXiv Detail & Related papers (2022-07-11T14:24:38Z) - Solving Quantitative Reasoning Problems with Language Models [53.53969870599973]
We introduce Minerva, a large language model pretrained on general natural language data and further trained on technical content.
The model achieves state-of-the-art performance on technical benchmarks without the use of external tools.
We also evaluate our model on over two hundred undergraduate-level problems in physics, biology, chemistry, economics, and other sciences.
arXiv Detail & Related papers (2022-06-29T18:54:49Z) - Few-shot Prompting Towards Controllable Response Generation [49.479958672988566]
We first explored the combination of prompting and reinforcement learning (RL) to steer models' generation without accessing any of the models' parameters.
We apply multi-task learning to make the model learn to generalize to new tasks better.
Experiment results show that our proposed method can successfully control several state-of-the-art (SOTA) dialogue models without accessing their parameters.
arXiv Detail & Related papers (2022-06-08T14:48:06Z) - Internet-augmented language models through few-shot prompting for
open-domain question answering [6.573232954655063]
We capitalize on the unique few-shot capabilities offered by large-scale language models to overcome some of their challenges.
We use few-shot prompting to learn to condition language models on information returned from the web using Google Search.
We find that language models conditioned on the web surpass performance of closed-book models of similar, or even larger, model sizes in open-domain question answering.
arXiv Detail & Related papers (2022-03-10T02:24:14Z) - Exploring Bayesian Deep Learning for Urgent Instructor Intervention Need
in MOOC Forums [58.221459787471254]
Massive Open Online Courses (MOOCs) have become a popular choice for e-learning thanks to their great flexibility.
Due to large numbers of learners and their diverse backgrounds, it is taxing to offer real-time support.
With the large volume of posts and high workloads for MOOC instructors, it is unlikely that the instructors can identify all learners requiring intervention.
This paper explores for the first time Bayesian deep learning on learner-based text posts with two methods: Monte Carlo Dropout and Variational Inference.
arXiv Detail & Related papers (2021-04-26T15:12:13Z) - Importance Weighted Policy Learning and Adaptation [89.46467771037054]
We study a complementary approach which is conceptually simple, general, modular and built on top of recent improvements in off-policy learning.
The framework is inspired by ideas from the probabilistic inference literature and combines robust off-policy learning with a behavior prior.
Our approach achieves competitive adaptation performance on hold-out tasks compared to meta reinforcement learning baselines and can scale to complex sparse-reward scenarios.
arXiv Detail & Related papers (2020-09-10T14:16:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.