Related papers: ChaI-TeA: A Benchmark for Evaluating Autocompletion of Interactions with LLM-based Chatbots

ChaI-TeA: A Benchmark for Evaluating Autocompletion of Interactions with LLM-based Chatbots

URL: http://arxiv.org/abs/2412.18377v2
Date: Wed, 25 Dec 2024 09:26:52 GMT
Title: ChaI-TeA: A Benchmark for Evaluating Autocompletion of Interactions with LLM-based Chatbots
Authors: Shani Goren, Oren Kalinsky, Tomer Stav, Yuri Rapoport, Yaron Fairstein, Ram Yazdi, Nachshon Cohen, Alexander Libov, Guy Kushilevitz,
Abstract summary: We present ChaI-TeA: CHat InTEraction Autocomplete; An autcomplete evaluation framework for LLM-based interactions.<n>The framework includes a formal definition of the task, coupled with suitable datasets and metrics.<n>We use the framework to evaluate After formally defining the task along with suitable datasets and metrics, we test 9 models on the defined auto completion task.
Score: 37.80024426010599
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The rise of LLMs has deflected a growing portion of human-computer interactions towards LLM-based chatbots. The remarkable abilities of these models allow users to interact using long, diverse natural language text covering a wide range of topics and styles. Phrasing these messages is a time and effort consuming task, calling for an autocomplete solution to assist users. We introduce the task of chatbot interaction autocomplete. We present ChaI-TeA: CHat InTEraction Autocomplete; An autcomplete evaluation framework for LLM-based chatbot interactions. The framework includes a formal definition of the task, coupled with suitable datasets and metrics. We use the framework to evaluate After formally defining the task along with suitable datasets and metrics, we test 9 models on the defined auto completion task, finding that while current off-the-shelf models perform fairly, there is still much room for improvement, mainly in ranking of the generated suggestions. We provide insights for practitioners working on this task and open new research directions for researchers in the field. We release our framework to serve as a foundation for future research.

Related papers

Learning to Clarify: Multi-turn Conversations with Action-Based Contrastive Self-Training [33.57497419019826]
Action-Based Contrastive Self-Training enables data-efficient dialogue policy learning in multi-turn conversation modeling.<n>We demonstrate ACT's efficacy under in data-efficient tuning scenarios, even when there is no action label available.<n>We also propose evaluating LLMs' ability to function as conversational agents by examining whether they can implicitly recognize and reason about ambiguity in conversation.
arXiv Detail & Related papers (2024-05-31T22:44:48Z)
Long-Span Question-Answering: Automatic Question Generation and QA-System Ranking via Side-by-Side Evaluation [65.16137964758612]
We explore the use of long-context capabilities in large language models to create synthetic reading comprehension data from entire books. Our objective is to test the capabilities of LLMs to analyze, understand, and reason over problems that require a detailed comprehension of long spans of text.
arXiv Detail & Related papers (2024-05-31T20:15:10Z)
Characteristic AI Agents via Large Language Models [40.10858767752735]
This research focuses on investigating the performance of Large Language Models in constructing characteristic AI agents. A dataset called Character100'' is built for this benchmark, comprising the most-visited people on Wikipedia for language models to role-play. The experimental results underscore the potential directions for further improvement in the capabilities of LLMs in constructing characteristic AI agents.
arXiv Detail & Related papers (2024-03-19T02:25:29Z)
Multi-User Chat Assistant (MUCA): a Framework Using LLMs to Facilitate Group Conversations [3.6975712141698445]
Multi-User Chat Assistant (MUCA) is an LLM-based framework tailored for group discussions. MUCA consists of three main modules: Sub-topic Generator, Dialog Analyzer, and Conversational Strategies Arbitrator. This paper further proposes an LLM-based Multi-User Simulator (MUS) to ease MUCA's optimization.
arXiv Detail & Related papers (2024-01-10T02:22:21Z)
Learning-To-Rank Approach for Identifying Everyday Objects Using a Physical-World Search Engine [0.8749675983608172]
We focus on the task of retrieving target objects from open-vocabulary user instructions in a human-in-the-loop setting. We propose MultiRankIt, which is a novel approach for the learning-to-rank physical objects task.
arXiv Detail & Related papers (2023-12-26T01:40:31Z)
Interactive Planning Using Large Language Models for Partially Observable Robotics Tasks [54.60571399091711]
Large Language Models (LLMs) have achieved impressive results in creating robotic agents for performing open vocabulary tasks. We present an interactive planning technique for partially observable tasks using LLMs.
arXiv Detail & Related papers (2023-12-11T22:54:44Z)
TaskBench: Benchmarking Large Language Models for Task Automation [82.2932794189585]
We introduce TaskBench, a framework to evaluate the capability of large language models (LLMs) in task automation. Specifically, task decomposition, tool selection, and parameter prediction are assessed. Our approach combines automated construction with rigorous human verification, ensuring high consistency with human evaluation.
arXiv Detail & Related papers (2023-11-30T18:02:44Z)
AutoConv: Automatically Generating Information-seeking Conversations with Large Language Models [74.10293412011455]
We propose AutoConv for synthetic conversation generation. Specifically, we formulate the conversation generation problem as a language modeling task. We finetune an LLM with a few human conversations to capture the characteristics of the information-seeking process.
arXiv Detail & Related papers (2023-08-12T08:52:40Z)
ChatGPT as your Personal Data Scientist [0.9689893038619583]
This paper introduces a ChatGPT-based conversational data-science framework to act as a "personal data scientist" Our model pivots around four dialogue states: Data visualization, Task Formulation, Prediction Engineering, and Result Summary and Recommendation. In summary, we developed an end-to-end system that not only proves the viability of the novel concept of conversational data science but also underscores the potency of LLMs in solving complex tasks.
arXiv Detail & Related papers (2023-05-23T04:00:16Z)
KETOD: Knowledge-Enriched Task-Oriented Dialogue [77.59814785157877]
Existing studies in dialogue system research mostly treat task-oriented dialogue and chit-chat as separate domains. We investigate how task-oriented dialogue and knowledge-grounded chit-chat can be effectively integrated into a single model.
arXiv Detail & Related papers (2022-05-11T16:01:03Z)
Few-Shot Bot: Prompt-Based Learning for Dialogue Systems [58.27337673451943]
Learning to converse using only a few examples is a great challenge in conversational AI. The current best conversational models are either good chit-chatters (e.g., BlenderBot) or goal-oriented systems (e.g., MinTL) We propose prompt-based few-shot learning which does not require gradient-based fine-tuning but instead uses a few examples as the only source of learning.
arXiv Detail & Related papers (2021-10-15T14:36:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.