STEPS: A Benchmark for Order Reasoning in Sequential Tasks
- URL: http://arxiv.org/abs/2306.04441v1
- Date: Wed, 7 Jun 2023 13:58:55 GMT
- Title: STEPS: A Benchmark for Order Reasoning in Sequential Tasks
- Authors: Weizhi Wang, Hong Wang, Xifeng Yan
- Abstract summary: We describe the data construction and task formulations, and benchmark most of significant Large Language Models (LLMs)
The experimental results demonstrate 1) The commonsense reasoning of action orders in sequential tasks are challenging to resolve via zero-shot prompting or few-shot in-context learning.
- Score: 16.52934509949172
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Various human activities can be abstracted into a sequence of actions in
natural text, i.e. cooking, repairing, manufacturing, etc. Such action
sequences heavily depend on the executing order, while disorder in action
sequences leads to failure of further task execution by robots or AI agents.
Therefore, to verify the order reasoning capability of current neural models in
sequential tasks, we propose a challenging benchmark , named STEPS. STEPS
involves two subtask settings, focusing on determining the rationality of given
next step in recipes and selecting the reasonable step from the multi-choice
question, respectively. We describe the data construction and task
formulations, and benchmark most of significant Large Language Models (LLMs).
The experimental results demonstrate 1) The commonsense reasoning of action
orders in sequential tasks are challenging to resolve via zero-shot prompting
or few-shot in-context learning for LLMs; 2) Prompting method still
significantly lags behind tuning-based method on STEPS.
Related papers
- The SIFo Benchmark: Investigating the Sequential Instruction Following Ability of Large Language Models [48.455388608863785]
We introduce a benchmark designed to evaluate models' abilities to follow multiple instructions through sequential instruction following tasks.
Our benchmark evaluates instruction following using four tasks (text modification, question answering, mathematics, and security rule following), each assessing different aspects of sequential instruction following.
Our evaluation of popular LLMs, both closed-source and open-source, shows that more recent and larger models significantly outperform their older and smaller counterparts on the SIFo tasks.
arXiv Detail & Related papers (2024-06-28T15:34:26Z) - BiKC: Keypose-Conditioned Consistency Policy for Bimanual Robotic Manipulation [48.08416841005715]
We introduce a novel keypose-conditioned consistency policy tailored for bimanual manipulation.
It is a hierarchical imitation learning framework that consists of a high-level keypose predictor and a low-level trajectory generator.
Simulated and real-world experimental results demonstrate that the proposed approach surpasses baseline methods in terms of success rate and operational efficiency.
arXiv Detail & Related papers (2024-06-14T14:49:12Z) - Data-CUBE: Data Curriculum for Instruction-based Sentence Representation
Learning [85.66907881270785]
We propose a data curriculum method, namely Data-CUBE, that arranges the orders of all the multi-task data for training.
In the task level, we aim to find the optimal task order to minimize the total cross-task interference risk.
In the instance level, we measure the difficulty of all instances per task, then divide them into the easy-to-difficult mini-batches for training.
arXiv Detail & Related papers (2024-01-07T18:12:20Z) - Active Instruction Tuning: Improving Cross-Task Generalization by
Training on Prompt Sensitive Tasks [101.40633115037983]
Instruction tuning (IT) achieves impressive zero-shot generalization results by training large language models (LLMs) on a massive amount of diverse tasks with instructions.
How to select new tasks to improve the performance and generalizability of IT models remains an open question.
We propose active instruction tuning based on prompt uncertainty, a novel framework to identify informative tasks, and then actively tune the models on the selected tasks.
arXiv Detail & Related papers (2023-11-01T04:40:05Z) - Instruction Position Matters in Sequence Generation with Large Language
Models [67.87516654892343]
Large language models (LLMs) are capable of performing conditional sequence generation tasks, such as translation or summarization.
We propose enhancing the instruction-following capability of LLMs by shifting the position of task instructions after the input sentences.
arXiv Detail & Related papers (2023-08-23T12:36:57Z) - Fast Inference and Transfer of Compositional Task Structures for
Few-shot Task Generalization [101.72755769194677]
We formulate it as a few-shot reinforcement learning problem where a task is characterized by a subtask graph.
Our multi-task subtask graph inferencer (MTSGI) first infers the common high-level task structure in terms of the subtask graph from the training tasks.
Our experiment results on 2D grid-world and complex web navigation domains show that the proposed method can learn and leverage the common underlying structure of the tasks for faster adaptation to the unseen tasks.
arXiv Detail & Related papers (2022-05-25T10:44:25Z) - CINS: Comprehensive Instruction for Few-shot Learning in Task-oriented
Dialog Systems [56.302581679816775]
This paper proposes Comprehensive Instruction (CINS) that exploits PLMs with task-specific instructions.
We design a schema (definition, constraint, prompt) of instructions and their customized realizations for three important downstream tasks in ToD.
Experiments are conducted on these ToD tasks in realistic few-shot learning scenarios with small validation data.
arXiv Detail & Related papers (2021-09-10T03:23:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.