Learning to Generalize for Sequential Decision Making
- URL: http://arxiv.org/abs/2010.02229v1
- Date: Mon, 5 Oct 2020 18:00:03 GMT
- Title: Learning to Generalize for Sequential Decision Making
- Authors: Xusen Yin, Ralph Weischedel, Jonathan May
- Abstract summary: We introduce a teacher-student imitation learning methodology and a means of converting a reinforcement learning model into a natural language understanding model.
We show that models can learn faster and generalize more, leveraging both the imitation learning and the reformulation.
- Score: 19.075378799280728
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider problems of making sequences of decisions to accomplish tasks,
interacting via the medium of language. These problems are often tackled with
reinforcement learning approaches. We find that these models do not generalize
well when applied to novel task domains. However, the large amount of
computation necessary to adequately train and explore the search space of
sequential decision making, under a reinforcement learning paradigm, precludes
the inclusion of large contextualized language models, which might otherwise
enable the desired generalization ability. We introduce a teacher-student
imitation learning methodology and a means of converting a reinforcement
learning model into a natural language understanding model. Together, these
methodologies enable the introduction of contextualized language models into
the sequential decision making problem space. We show that models can learn
faster and generalize more, leveraging both the imitation learning and the
reformulation. Our models exceed teacher performance on various held-out
decision problems, by up to 7% on in-domain problems and 24% on out-of-domain
problems.
Related papers
- BloomWise: Enhancing Problem-Solving capabilities of Large Language Models using Bloom's-Taxonomy-Inspired Prompts [59.83547898874152]
We introduce BloomWise, a new prompting technique, inspired by Bloom's taxonomy, to improve the performance of Large Language Models (LLMs)
The decision regarding the need to employ more sophisticated cognitive skills is based on self-evaluation performed by the LLM.
In extensive experiments across 4 popular math reasoning datasets, we have demonstrated the effectiveness of our proposed approach.
arXiv Detail & Related papers (2024-10-05T09:27:52Z) - Building Decision Making Models Through Language Model Regime [17.61892714225144]
We propose a novel approach for decision making problems leveraging the generalization capabilities of large language models (LLMs)
LLMs demonstrate remarkable success in generalizing across varied language tasks, inspiring a new strategy for training decision making models.
Experiments in e-commerce domains such as advertising and search optimization have shown that LTU approach outperforms traditional supervised learning regimes.
arXiv Detail & Related papers (2024-08-12T12:04:14Z) - Neuro-symbolic Training for Reasoning over Spatial Language [17.901249830817882]
We propose training language models with neuro-symbolic techniques that can exploit the logical rules of reasoning as constraints.
We focus on a challenging problem of spatial reasoning over text.
arXiv Detail & Related papers (2024-06-19T20:47:36Z) - Scalable Language Model with Generalized Continual Learning [58.700439919096155]
The Joint Adaptive Re-ization (JARe) is integrated with Dynamic Task-related Knowledge Retrieval (DTKR) to enable adaptive adjustment of language models based on specific downstream tasks.
Our method demonstrates state-of-the-art performance on diverse backbones and benchmarks, achieving effective continual learning in both full-set and few-shot scenarios with minimal forgetting.
arXiv Detail & Related papers (2024-04-11T04:22:15Z) - Learning to Diversify Neural Text Generation via Degenerative Model [39.961572541752005]
We propose a new approach to prevent degeneration problems by training two models.
We first train a model that is designed to amplify undesirable patterns.
We then enhance the diversity of the second model by focusing on patterns that the first model fails to learn.
arXiv Detail & Related papers (2023-09-22T04:57:10Z) - Foundation Models for Decision Making: Problems, Methods, and
Opportunities [124.79381732197649]
Foundation models pretrained on diverse data at scale have demonstrated extraordinary capabilities in a wide range of vision and language tasks.
New paradigms are emerging for training foundation models to interact with other agents and perform long-term reasoning.
Research at the intersection of foundation models and decision making holds tremendous promise for creating powerful new systems.
arXiv Detail & Related papers (2023-03-07T18:44:07Z) - Exploring Length Generalization in Large Language Models [46.417433724786854]
The ability to extrapolate from short problem instances to longer ones is an important form of out-of-distribution generalization in reasoning tasks.
We show that naively finetuning transformers on length generalization tasks shows significant generalization deficiencies independent of model scale.
We then show that combining pretrained large language models' in-context learning abilities with scratchpad prompting results in a dramatic improvement in length generalization.
arXiv Detail & Related papers (2022-07-11T14:24:38Z) - Solving Quantitative Reasoning Problems with Language Models [53.53969870599973]
We introduce Minerva, a large language model pretrained on general natural language data and further trained on technical content.
The model achieves state-of-the-art performance on technical benchmarks without the use of external tools.
We also evaluate our model on over two hundred undergraduate-level problems in physics, biology, chemistry, economics, and other sciences.
arXiv Detail & Related papers (2022-06-29T18:54:49Z) - Few-shot Prompting Towards Controllable Response Generation [49.479958672988566]
We first explored the combination of prompting and reinforcement learning (RL) to steer models' generation without accessing any of the models' parameters.
We apply multi-task learning to make the model learn to generalize to new tasks better.
Experiment results show that our proposed method can successfully control several state-of-the-art (SOTA) dialogue models without accessing their parameters.
arXiv Detail & Related papers (2022-06-08T14:48:06Z) - Exploring Bayesian Deep Learning for Urgent Instructor Intervention Need
in MOOC Forums [58.221459787471254]
Massive Open Online Courses (MOOCs) have become a popular choice for e-learning thanks to their great flexibility.
Due to large numbers of learners and their diverse backgrounds, it is taxing to offer real-time support.
With the large volume of posts and high workloads for MOOC instructors, it is unlikely that the instructors can identify all learners requiring intervention.
This paper explores for the first time Bayesian deep learning on learner-based text posts with two methods: Monte Carlo Dropout and Variational Inference.
arXiv Detail & Related papers (2021-04-26T15:12:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.