Related papers: ALLOY: Generating Reusable Agent Workflows from User Demonstration

ALLOY: Generating Reusable Agent Workflows from User Demonstration

URL: http://arxiv.org/abs/2510.10049v1
Date: Sat, 11 Oct 2025 06:30:34 GMT
Title: ALLOY: Generating Reusable Agent Workflows from User Demonstration
Authors: Jiawen Li, Zheng Ning, Yuan Tian, Toby Jia-jun Li,
Abstract summary: Large language models (LLMs) enable end-users to delegate complex tasks to autonomous agents through natural language.<n>Users often struggle to specify procedural requirements for tasks.<n>A ''successful'' prompt for one task may not be reusable or generalizable across similar tasks.
Score: 17.329536879065788
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) enable end-users to delegate complex tasks to autonomous agents through natural language. However, prompt-based interaction faces critical limitations: Users often struggle to specify procedural requirements for tasks, especially those that don't have a factually correct solution but instead rely on personal preferences, such as posting social media content or planning a trip. Additionally, a ''successful'' prompt for one task may not be reusable or generalizable across similar tasks. We present ALLOY, a system inspired by classical HCI theories on Programming by Demonstration (PBD), but extended to enhance adaptability in creating LLM-based web agents. ALLOY enables users to express procedural preferences through natural demonstrations rather than prompts, while making these procedures transparent and editable through visualized workflows that can be generalized across task variations. In a study with 12 participants, ALLOY's demonstration--based approach outperformed prompt-based agents and manual workflows in capturing user intent and procedural preferences in complex web tasks. Insights from the study also show how demonstration--based interaction complements the traditional prompt-based approach.

Related papers

SignalLLM: A General-Purpose LLM Agent Framework for Automated Signal Processing [36.22027224597969]
Large Language Models (LLMs) offer strong reasoning capabilities, broad general-purpose knowledge, in-context learning, and cross-modal transfer abilities.<n>We introduce SignalLLM, the first general-purpose LLM-based agent framework for general SP tasks.<n>We demonstrate the versatility and effectiveness of SignalLLM through five representative tasks in communication and sensing.
arXiv Detail & Related papers (2025-09-21T18:54:54Z)
TORSO: Template-Oriented Reasoning Towards General Tasks [23.681707595200265]
We introduce template-Oriented Reasoning (TORSO), which elicits the model to utilize internal reasoning abilities to generate proper responses across various tasks without the need for manually crafted few-shot examples.<n>Our experimental results demonstrate that TORSO achieves strong performance on diverse LLMs benchmarks with reasonable rationales.
arXiv Detail & Related papers (2025-09-11T13:31:35Z)
DICE: Dynamic In-Context Example Selection in LLM Agents via Efficient Knowledge Transfer [50.64531021352504]
Large language model-based agents, empowered by in-context learning (ICL), have demonstrated strong capabilities in complex reasoning and tool-use tasks.<n>Existing approaches typically rely on example selection, including in some agentic or multi-step settings.<n>We propose DICE, a theoretically grounded ICL framework for agentic tasks that selects the most relevant demonstrations at each step of reasoning.
arXiv Detail & Related papers (2025-07-31T13:42:14Z)
Thought-Augmented Planning for LLM-Powered Interactive Recommender Agent [56.61028117645315]
We propose a novel thought-augmented interactive recommender agent system (TAIRA) that addresses complex user intents through distilled thought patterns.<n>Specifically, TAIRA is designed as an LLM-powered multi-agent system featuring a manager agent that orchestrates recommendation tasks by decomposing user needs and planning subtasks.<n>Through comprehensive experiments conducted across multiple datasets, TAIRA exhibits significantly enhanced performance compared to existing methods.
arXiv Detail & Related papers (2025-06-30T03:15:50Z)
PersonaAgent: When Large Language Model Agents Meet Personalization at Test Time [87.99027488664282]
PersonaAgent is a framework designed to address versatile personalization tasks.<n>It integrates a personalized memory module and a personalized action module.<n>Test-time user-preference alignment strategy ensures real-time user preference alignment.
arXiv Detail & Related papers (2025-06-06T17:29:49Z)
TemPrompt: Multi-Task Prompt Learning for Temporal Relation Extraction in RAG-based Crowdsourcing Systems [21.312052922118585]
Temporal relation extraction (TRE) aims to grasp the evolution of events or actions, and thus shape the workflow of associated tasks. We propose a multi-task prompt learning framework for TRE (TemPrompt), incorporating prompt tuning and contrastive learning to tackle these issues.
arXiv Detail & Related papers (2024-06-21T01:52:37Z)
Many Hands Make Light Work: Task-Oriented Dialogue System with Module-Based Mixture-of-Experts [9.129081545049992]
Task-oriented dialogue systems have greatly benefited from pre-trained language models (PLMs) We propose Soft Mixture-of-Expert Task-Oriented Dialogue system (SMETOD) SMETOD leverages an ensemble of Mixture-of-Experts (MoEs) to excel at subproblems and generate specialized outputs for task-oriented dialogues. We extensively evaluate our model on three benchmark functionalities: intent prediction, dialogue state tracking, and dialogue response generation.
arXiv Detail & Related papers (2024-05-16T01:02:09Z)
Towards Generalist Prompting for Large Language Models by Mental Models [105.03747314550591]
Large language models (LLMs) have demonstrated impressive performance on many tasks. To achieve optimal performance, specially designed prompting methods are still needed. We introduce the concept of generalist prompting, which operates on the design principle of achieving optimal or near-optimal performance.
arXiv Detail & Related papers (2024-02-28T11:29:09Z)
Eliciting Human Preferences with Language Models [56.68637202313052]
Language models (LMs) can be directed to perform target tasks by using labeled examples or natural language prompts. We propose to use *LMs themselves* to guide the task specification process. We study GATE in three domains: email validation, content recommendation, and moral reasoning.
arXiv Detail & Related papers (2023-10-17T21:11:21Z)
Improving Generalization in Task-oriented Dialogues with Workflows and Action Plans [1.0499611180329804]
Task-oriented dialogue is difficult in part because it involves understanding user intent, collecting information from the user, executing API calls, and generating fluent responses. We show that large pre-trained language models can be fine-tuned end-to-end to create multi-step task-oriented dialogue agents. Our experiments confirm that this approach alone cannot reliably perform new multi-step tasks that are unseen during training.
arXiv Detail & Related papers (2023-06-02T17:54:36Z)
Improving Task Generalization via Unified Schema Prompt [87.31158568180514]
Unified Prompt is a flexible and prompting method, which automatically customizes the learnable prompts for each task according to the task input schema. It models the shared knowledge between tasks, while keeping the characteristics of different task schema. The framework achieves strong zero-shot and few-shot performance on 16 unseen tasks downstream from 8 task types.
arXiv Detail & Related papers (2022-08-05T15:26:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.