Remember what you did so you know what to do next
- URL: http://arxiv.org/abs/2311.01468v1
- Date: Mon, 30 Oct 2023 19:29:00 GMT
- Title: Remember what you did so you know what to do next
- Authors: Manuel R. Ciosici, Alex Hedges, Yash Kankanampati, Justin Martin,
Marjorie Freedman, Ralph Weischedel
- Abstract summary: We create a plan for a simulated robot to achieve 30 classes of goals in ScienceWorld, a text game simulator for elementary science experiments.
Our experiments show that performance varies widely across the 30 classes of actions, indicating that averaging over tasks can hide significant performance issues.
- Score: 10.526351131118096
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: We explore using a moderately sized large language model (GPT-J 6B
parameters) to create a plan for a simulated robot to achieve 30 classes of
goals in ScienceWorld, a text game simulator for elementary science
experiments. Previously published empirical work claimed that large language
models (LLMs) are a poor fit (Wang et al., 2022) compared to reinforcement
learning. Using the Markov assumption (a single previous step), the LLM
outperforms the reinforcement learning-based approach by a factor of 1.4. When
we fill the LLM's input buffer with as many prior steps as possible,
improvement rises to 3.5x. Even when training on only 6.5% of the training
data, we observe a 2.2x improvement over the reinforcement-learning-based
approach. Our experiments show that performance varies widely across the 30
classes of actions, indicating that averaging over tasks can hide significant
performance issues. In work contemporaneous with ours, Lin et al. (2023)
demonstrated a two-part approach (SwiftSage) that uses a small LLM (T5-large)
complemented by OpenAI's massive LLMs to achieve outstanding results in
ScienceWorld. Our 6-B parameter, single-stage GPT-J matches the performance of
SwiftSage's two-stage architecture when it incorporates GPT-3.5 turbo which has
29-times more parameters than GPT-J.
Related papers
- Efficient Continual Pre-training by Mitigating the Stability Gap [68.49269649759005]
We study the behavior of Large Language Models (LLMs) during continual pre-training.
We propose three effective strategies to enhance LLM performance within a fixed compute budget.
Our strategies improve the average medical task performance of the OpenLlama-3B model from 36.2% to 40.7% with only 40% of the original training budget.
arXiv Detail & Related papers (2024-06-21T02:28:37Z) - LLMEmbed: Rethinking Lightweight LLM's Genuine Function in Text Classification [13.319594321038926]
We propose a simple and effective transfer learning strategy, namely LLMEmbed, to address this classical but challenging task.
We perform extensive experiments on publicly available datasets, and the results show that LLMEmbed achieves strong performance while enjoys low training overhead.
arXiv Detail & Related papers (2024-06-06T03:46:59Z) - How Far Are We on the Decision-Making of LLMs? Evaluating LLMs' Gaming Ability in Multi-Agent Environments [83.78240828340681]
This research investigates Large Language Models' decision-making capabilities through the lens of Game Theory.
We focus specifically on games that support the participation of more than two agents simultaneously.
We introduce our framework, GAMA-Bench, including eight classical multi-agent games.
arXiv Detail & Related papers (2024-03-18T14:04:47Z) - LLMs in the Imaginarium: Tool Learning through Simulated Trial and Error [54.954211216847135]
Existing large language models (LLMs) only reach a correctness rate in the range of 30% to 60%.
We propose a biologically inspired method for tool-augmented LLMs, simulated trial and error (STE)
STE orchestrates three key mechanisms for successful tool use behaviors in the biological system: trial and error, imagination, and memory.
arXiv Detail & Related papers (2024-03-07T18:50:51Z) - EfficientVLM: Fast and Accurate Vision-Language Models via Knowledge
Distillation and Modal-adaptive Pruning [19.354515754130592]
We introduce a distilling then pruning framework to compress large vision-language models into smaller, faster, and more accurate ones.
We apply our framework to train EfficientVLM, a fast and accurate vision-language model consisting of 6 vision layers, 3 text layers, and 3 cross-modal fusion layers.
EfficientVLM retains 98.4% performance of the teacher model and accelerates its inference speed by 2.2x.
arXiv Detail & Related papers (2022-10-14T13:26:41Z) - GLaM: Efficient Scaling of Language Models with Mixture-of-Experts [84.33607245023049]
We propose and develop a family of language models named GLaM (Generalist Language Model)
GLaM uses a sparsely activated mixture-of-experts architecture to scale the model capacity while also incurring substantially less training cost compared to dense variants.
It consumes only 1/3 of the energy used to train GPT-3 and requires half of the flops for inference, while still achieving better overall zero-shot and one-shot performance across 29 NLP tasks.
arXiv Detail & Related papers (2021-12-13T18:58:19Z) - LiST: Lite Self-training Makes Efficient Few-shot Learners [91.28065455714018]
LiST improves by 35% over classic fine-tuning methods and 6% over prompt-tuning with 96% reduction in number of trainable parameters when fine-tuned with no more than 30 labeled examples from each target domain.
arXiv Detail & Related papers (2021-10-12T18:47:18Z) - CPM-2: Large-scale Cost-effective Pre-trained Language Models [71.59893315671997]
We present a suite of cost-effective techniques for the use of PLMs to deal with the efficiency issues of pre-training, fine-tuning, and inference.
We introduce knowledge inheritance to accelerate the pre-training process by exploiting existing PLMs instead of training models from scratch.
We implement a new inference toolkit, namely InfMoE, for using large-scale PLMs with limited computational resources.
arXiv Detail & Related papers (2021-06-20T15:43:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.