Related papers: Haste Makes Waste: Evaluating Planning Abilities of LLMs for Efficient and Feasible Multitasking with Time Constraints Between Actions

Haste Makes Waste: Evaluating Planning Abilities of LLMs for Efficient and Feasible Multitasking with Time Constraints Between Actions

URL: http://arxiv.org/abs/2503.02238v1
Date: Tue, 04 Mar 2025 03:27:02 GMT
Title: Haste Makes Waste: Evaluating Planning Abilities of LLMs for Efficient and Feasible Multitasking with Time Constraints Between Actions
Authors: Zirui Wu, Xiao Liu, Jiayi Li, Lingpeng Kong, Yansong Feng,
Abstract summary: We present Recipe2Plan, a novel benchmark framework based on real-world cooking scenarios.<n>Unlike conventional benchmarks, Recipe2Plan challenges agents to optimize cooking time through parallel task execution.
Score: 56.88110850242265
License: http://creativecommons.org/licenses/by/4.0/
Abstract: While Large Language Model-based agents have demonstrated substantial progress in task completion, existing evaluation benchmarks tend to overemphasize single-task performance, with insufficient attention given to the crucial aspects of multitask planning and execution efficiency required in real-world scenarios. To bridge this gap, we present Recipe2Plan, a novel benchmark framework based on real-world cooking scenarios. Unlike conventional benchmarks, Recipe2Plan challenges agents to optimize cooking time through parallel task execution while respecting temporal constraints i.e. specific actions need to be performed within a particular time intervals following the preceding steps. Overly aggressive local parallelization may disrupt this constraint, potentially compromising the entire cooking process. This strict time constraint between actions raises a unique challenge for agents to balance between maximizing concurrent operations and adhering to critical timing constraints. Extensive experiments with state-of-the-art models reveal challenges in maintaining this balance between efficiency and feasibility. The results highlight the need for improved temporal awareness and global multitasking capabilities in large language models. We open-source our benchmark and code at https://github.com/WilliamZR/Recipe2Plan.

Related papers

Benchmarking Agentic Workflow Generation [80.74757493266057]
We introduce WorfBench, a unified workflow generation benchmark with multi-faceted scenarios and intricate graph workflow structures.<n>We also present WorfEval, a systemic evaluation protocol utilizing subsequence and subgraph matching algorithms.<n>We observe that the generated can enhance downstream tasks, enabling them to achieve superior performance with less time during inference.
arXiv Detail & Related papers (2024-10-10T12:41:19Z)
Multi-Step Time Series Inference Agent for Reasoning and Automated Task Execution [19.64976935450366]
We propose a novel task: multi-step time series inference that demands both compositional reasoning and precision of time series analysis.<n>By integrating in-context learning, self-correction, and program-aided execution, our proposed approach ensures accurate and interpretable results.
arXiv Detail & Related papers (2024-10-05T06:04:19Z)
Unlocking Reasoning Potential in Large Langauge Models by Scaling Code-form Planning [94.76546523689113]
We introduce CodePlan, a framework that generates and follows textcode-form plans -- pseudocode that outlines high-level, structured reasoning processes. CodePlan effectively captures the rich semantics and control flows inherent to sophisticated reasoning tasks. It achieves a 25.1% relative improvement compared with directly generating responses.
arXiv Detail & Related papers (2024-09-19T04:13:58Z)
Enhancing Temporal Planning Domains by Sequential Macro-actions (Extended Version) [2.064612766965483]
Temporal planning is an extension of classical planning involving concurrent execution of actions and alignment with temporal constraints. Our work contributes a general concept of sequential temporal macro-actions that guarantees the applicability of obtained plans. Our experiments yield improvements in terms of obtained satisficing plans as well as plan quality for the majority of tested planners and domains.
arXiv Detail & Related papers (2023-07-22T13:50:34Z)
Optimal task and motion planning and execution for human-robot multi-agent systems in dynamic environments [54.39292848359306]
We propose a combined task and motion planning approach to optimize sequencing, assignment, and execution of tasks. The framework relies on decoupling tasks and actions, where an action is one possible geometric realization of a symbolic task. We demonstrate the approach effectiveness in a collaborative manufacturing scenario, in which a robotic arm and a human worker shall assemble a mosaic.
arXiv Detail & Related papers (2023-03-27T01:50:45Z)
In Defense of the Unitary Scalarization for Deep Multi-Task Learning [121.76421174107463]
We present a theoretical analysis suggesting that many specialized multi-tasks can be interpreted as forms of regularization. We show that, when coupled with standard regularization and stabilization techniques, unitary scalarization matches or improves upon the performance of complex multitasks.
arXiv Detail & Related papers (2022-01-11T18:44:17Z)
Efficient Temporal Piecewise-Linear Numeric Planning with Lazy Consistency Checking [4.834203844100679]
We propose a set of techniques that allow the planner to compute LP consistency checks lazily where possible. We also propose an algorithm to perform duration-dependent goal checking more selectively. The resultant planner is not only more efficient, but outperforms most state-of-the-art temporal-numeric and hybrid planners.
arXiv Detail & Related papers (2021-05-21T07:36:54Z)
Multi-Task Time Series Forecasting With Shared Attention [15.294939035413217]
We propose two self-attention based sharing schemes for multi-task time series forecasting. Our proposed architectures can not only outperform the state-of-the-art single-task forecasting baselines but also outperform the RNN-based multi-task forecasting method.
arXiv Detail & Related papers (2021-01-24T04:25:08Z)
Dynamic Multi-Robot Task Allocation under Uncertainty and Temporal Constraints [52.58352707495122]
We present a multi-robot allocation algorithm that decouples the key computational challenges of sequential decision-making under uncertainty and multi-agent coordination. We validate our results over a wide range of simulations on two distinct domains: multi-arm conveyor belt pick-and-place and multi-drone delivery dispatch in a city.
arXiv Detail & Related papers (2020-05-27T01:10:41Z)
Distributed Primal-Dual Optimization for Online Multi-Task Learning [22.45069527817333]
We propose an adaptive primal-dual algorithm, which captures task-specific noise in adversarial learning and carries out a projection-free update with runtime efficiency. Our model is well-suited to decentralized periodic-connected tasks as it allows the energy-starved or bandwidth-constraint tasks to postpone the update. Empirical results confirm that the proposed model is highly effective on various real-world datasets.
arXiv Detail & Related papers (2020-04-02T23:36:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.