The Task-oriented Queries Benchmark (ToQB)
- URL: http://arxiv.org/abs/2406.02943v1
- Date: Wed, 5 Jun 2024 05:05:41 GMT
- Title: The Task-oriented Queries Benchmark (ToQB)
- Authors: Keun Soo Yim,
- Abstract summary: A standard benchmark for task-oriented queries is not yet available.
Existing benchmarks in the relevant NLP fields have primarily focused on task-oriented dialogues.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Task-oriented queries (e.g., one-shot queries to play videos, order food, or call a taxi) are crucial for assessing the quality of virtual assistants, chatbots, and other large language model (LLM)-based services. However, a standard benchmark for task-oriented queries is not yet available, as existing benchmarks in the relevant NLP (Natural Language Processing) fields have primarily focused on task-oriented dialogues. Thus, we present a new methodology for efficiently generating the Task-oriented Queries Benchmark (ToQB) using existing task-oriented dialogue datasets and an LLM service. Our methodology involves formulating the underlying NLP task to summarize the original intent of a speaker in each dialogue, detailing the key steps to perform the devised NLP task using an LLM service, and outlining a framework for automating a major part of the benchmark generation process. Through a case study encompassing three domains (i.e., two single-task domains and one multi-task domain), we demonstrate how to customize the LLM prompts (e.g., omitting system utterances or speaker labels) for those three domains and characterize the generated task-oriented queries. The generated ToQB dataset is made available to the public. We further discuss new domains that can be added to ToQB by community contributors and its practical applications.
Related papers
- LLM-based Frameworks for API Argument Filling in Task-Oriented Conversational Systems [39.54340884416936]
We study the application of Large Language Models (LLMs) for the problem of API argument filling task.
Our experimental results demonstrate that when paired with proposed techniques, the argument filling performance of LLMs noticeably improves.
arXiv Detail & Related papers (2024-06-27T06:54:53Z) - HELPER-X: A Unified Instructable Embodied Agent to Tackle Four Interactive Vision-Language Domains with Memory-Augmented Language Models [13.963676467274109]
We extend the capabilities of HELPER by expanding its memory with a wider array of examples and prompts.
This simple expansion of HELPER into a shared memory enables the agent to work across domains executing plans from dialogue, natural language instruction, active question asking, and common room reorganization.
We evaluate the agent on four diverse interactive visual-language embodied agent: AChRED, TEA, DialFRED, and the Tidy Task.
arXiv Detail & Related papers (2024-04-29T19:12:42Z) - TaskBench: Benchmarking Large Language Models for Task Automation [82.2932794189585]
We introduce TaskBench, a framework to evaluate the capability of large language models (LLMs) in task automation.
Specifically, task decomposition, tool selection, and parameter prediction are assessed.
Our approach combines automated construction with rigorous human verification, ensuring high consistency with human evaluation.
arXiv Detail & Related papers (2023-11-30T18:02:44Z) - A Self-enhancement Approach for Domain-specific Chatbot Training via
Knowledge Mining and Digest [62.63606958140248]
Large Language Models (LLMs) often encounter challenges when dealing with intricate and knowledge-demanding queries in specific domains.
This paper introduces a novel approach to enhance LLMs by effectively extracting the relevant knowledge from domain-specific textual sources.
We train a knowledge miner, namely LLMiner, which autonomously extracts Question-Answer pairs from relevant documents.
arXiv Detail & Related papers (2023-11-17T16:09:10Z) - Large Language Models can accomplish Business Process Management Tasks [0.0]
We show how Large Language Models (LLMs) can accomplish text-related Business Process Management tasks.
LLMs can accomplish process models from textual descriptions, mining declarative process models from textual descriptions, and assessing the suitability of process tasks from textual descriptions for robotic process automation.
arXiv Detail & Related papers (2023-07-19T11:54:46Z) - AnnoLLM: Making Large Language Models to Be Better Crowdsourced Annotators [98.11286353828525]
GPT-3.5 series models have demonstrated remarkable few-shot and zero-shot ability across various NLP tasks.
We propose AnnoLLM, which adopts a two-step approach, explain-then-annotate.
We build the first conversation-based information retrieval dataset employing AnnoLLM.
arXiv Detail & Related papers (2023-03-29T17:03:21Z) - Automaton-Based Representations of Task Knowledge from Generative
Language Models [24.63416209240575]
Large-scale generative language models (GLMs) can automatically generate relevant task knowledge.
We propose a novel algorithm named GLM2FSA, which constructs a finite state automaton (FSA) encoding high-level task knowledge from a brief natural-language description of the task goal.
arXiv Detail & Related papers (2022-12-04T22:34:16Z) - Recitation-Augmented Language Models [85.30591349383849]
We show that RECITE is a powerful paradigm for knowledge-intensive NLP tasks.
Specifically, we show that by utilizing recitation as the intermediate step, a recite-and-answer scheme can achieve new state-of-the-art performance.
arXiv Detail & Related papers (2022-10-04T00:49:20Z) - FETA: A Benchmark for Few-Sample Task Transfer in Open-Domain Dialogue [70.65782786401257]
This work explores conversational task transfer by introducing FETA: a benchmark for few-sample task transfer in open-domain dialogue.
FETA contains two underlying sets of conversations upon which there are 10 and 7 tasks annotated, enabling the study of intra-dataset task transfer.
We utilize three popular language models and three learning algorithms to analyze the transferability between 132 source-target task pairs.
arXiv Detail & Related papers (2022-05-12T17:59:00Z) - InstructionNER: A Multi-Task Instruction-Based Generative Framework for
Few-shot NER [31.32381919473188]
We propose a multi-task instruction-based generative framework, named InstructionNER, for low-resource named entity recognition.
Specifically, we reformulate the NER task as a generation problem, which enriches source sentences with task-specific instructions and answer options, then inferences the entities and types in natural language.
Experimental results show that our method consistently outperforms other baselines on five datasets in few-shot settings.
arXiv Detail & Related papers (2022-03-08T07:56:36Z) - Exploring Relational Context for Multi-Task Dense Prediction [76.86090370115]
We consider a multi-task environment for dense prediction tasks, represented by a common backbone and independent task-specific heads.
We explore various attention-based contexts, such as global and local, in the multi-task setting.
We propose an Adaptive Task-Relational Context module, which samples the pool of all available contexts for each task pair.
arXiv Detail & Related papers (2021-04-28T16:45:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.