Related papers: Demonstrating CAT: Synthesizing Data-Aware Conversational Agents for Transactional Databases

Demonstrating CAT: Synthesizing Data-Aware Conversational Agents for Transactional Databases

URL: http://arxiv.org/abs/2203.14144v1
Date: Sat, 26 Mar 2022 19:46:43 GMT
Title: Demonstrating CAT: Synthesizing Data-Aware Conversational Agents for Transactional Databases
Authors: Marius Gassen, Benjamin H\"attasch, Benjamin Hilprecht, Nadja Geisler, Alexander Fraser, Carsten Binnig
Abstract summary: We present CAT, which can be used to create conversational agents for transactional databases. The main idea is that, for a given O database, CAT uses weak supervision to synthesize the required training data. CAT provides an out-of-the-box integration of the resulting agent with the database.
Score: 67.96827026450562
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Databases for OLTP are often the backbone for applications such as hotel room or cinema ticket booking applications. However, developing a conversational agent (i.e., a chatbot-like interface) to allow end-users to interact with an application using natural language requires both immense amounts of training data and NLP expertise. This motivates CAT, which can be used to easily create conversational agents for transactional databases. The main idea is that, for a given OLTP database, CAT uses weak supervision to synthesize the required training data to train a state-of-the-art conversational agent, allowing users to interact with the OLTP database. Furthermore, CAT provides an out-of-the-box integration of the resulting agent with the database. As a major difference to existing conversational agents, agents synthesized by CAT are data-aware. This means that the agent decides which information should be requested from the user based on the current data distributions in the database, which typically results in markedly more efficient dialogues compared with non-data-aware agents. We publish the code for CAT as open source.

Related papers

KathDB: Explainable Multimodal Database Management System with Human-AI Collaboration [4.7682930360459785]
KathDB is a new system that combines relational semantics with the reasoning power of foundation models over multimodal data.<n>It includes human-AI interaction channels during query parsing, execution, and result explanation.
arXiv Detail & Related papers (2025-12-11T19:36:23Z)
Transduction is All You Need for Structured Data Workflows [8.178153196011028]
This paper introduces Agentics, a functional agentic AI framework for building structured data workflow pipelines.<n>Designed for both research and practical applications, Agentics offers a new data-centric paradigm in which agents are embedded within data types.<n>We present a range of structured data workflow tasks and empirical evidence demonstrating the effectiveness of this approach.
arXiv Detail & Related papers (2025-08-21T14:35:47Z)
Design and testing of an agent chatbot supporting decision making with public transport data [0.19791587637442667]
This paper presents a user-friendly tool to interact with datasets and support decision making.<n>It is based on an agent architecture, which expands the capabilities of the core Large Language Model (LLM)<n>This paper also tackles one of the main open problems of such Generative AI projects: collecting data to measure the system's performance.
arXiv Detail & Related papers (2025-05-28T14:31:14Z)
AutoPrep: Natural Language Question-Aware Data Preparation with a Multi-Agent Framework [22.72266037804117]
Tabular Question Answering (TQA) allows users to quickly and efficiently extract meaningful insights from structured data. Many tables are derived from web sources or real-world scenarios, which require meticulous data preparation (or data prep) to ensure accurate responses. This question-aware data preparation involves specific tasks such as column augmentation and filtering tailored to particular questions. We propose AutoPrep, a large language model (LLM)-based multi-agent framework that leverages the strengths of multiple agents.
arXiv Detail & Related papers (2024-12-10T11:03:49Z)
Interactive Dialogue Agents via Reinforcement Learning on Hindsight Regenerations [58.65755268815283]
Many real dialogues are interactive, meaning an agent's utterances will influence their conversational partner, elicit information, or change their opinion. We use this fact to rewrite and augment existing suboptimal data, and train via offline reinforcement learning (RL) an agent that outperforms both prompting and learning from unaltered human demonstrations. Our results in a user study with real humans show that our approach greatly outperforms existing state-of-the-art dialogue agents.
arXiv Detail & Related papers (2024-11-07T21:37:51Z)
An Approach for Auto Generation of Labeling Functions for Software Engineering Chatbots [3.1911318265930944]
We propose an approach to automatically generate labeling functions (LFs) by extracting patterns from labeled user queries. We evaluate the effectiveness of our approach by applying it to the queries of four diverse SE datasets.
arXiv Detail & Related papers (2024-10-09T17:34:14Z)
CleanAgent: Automating Data Standardization with LLM-based Agents [6.677219861416146]
We propose a Python library with declarative, unified APIs for standardizing different column types. Dataprep.Clean significantly reduces the coding complexity by enabling the standardization of specific column types with a single line of code. We introduce the CleanAgent framework integrating Dataprep.Clean and LLM-based agents to automate the data standardization process.
arXiv Detail & Related papers (2024-03-13T06:54:15Z)
Effective and Efficient Conversation Retrieval for Dialogue State Tracking with Implicit Text Summaries [48.243879779374836]
Few-shot dialogue state tracking (DST) with Large Language Models (LLM) relies on an effective and efficient conversation retriever to find similar in-context examples for prompt learning. Previous works use raw dialogue context as search keys and queries, and a retriever is fine-tuned with annotated dialogues to achieve superior performance. We handle the task of conversation retrieval based on text summaries of the conversations. A LLM-based conversation summarizer is adopted for query and key generation, which enables effective maximum inner product search.
arXiv Detail & Related papers (2024-02-20T14:31:17Z)
Recommender AI Agent: Integrating Large Language Models for Interactive Recommendations [53.76682562935373]
We introduce an efficient framework called textbfInteRecAgent, which employs LLMs as the brain and recommender models as tools. InteRecAgent achieves satisfying performance as a conversational recommender system, outperforming general-purpose LLMs.
arXiv Detail & Related papers (2023-08-31T07:36:44Z)
Automatic Generation of Conversational Interfaces for Tabular Data Analysis [1.9744907811058787]
Tabular data is most common format to publish and exchange structured data online. We propose the use of a conversational interface to exploit data sources published by public administrations.
arXiv Detail & Related papers (2023-05-18T22:23:40Z)
Data Driven Content Creation using Statistical and Natural Language Processing Techniques for Financial Domain [0.0]
We propose a two-part framework where the first part describes methods to combine the information from different interaction channels like call, search, and chat. The second part of the framework focuses on extracting customer questions by analyzing interaction data sources.
arXiv Detail & Related papers (2021-09-07T08:37:28Z)
Pchatbot: A Large-Scale Dataset for Personalized Chatbot [49.16746174238548]
We introduce Pchatbot, a large-scale dialogue dataset that contains two subsets collected from Weibo and Judicial forums respectively. To adapt the raw dataset to dialogue systems, we elaborately normalize the raw dataset via processes such as anonymization. The scale of Pchatbot is significantly larger than existing Chinese datasets, which might benefit the data-driven models.
arXiv Detail & Related papers (2020-09-28T12:49:07Z)
Efficient Deployment of Conversational Natural Language Interfaces over Databases [45.52672694140881]
We propose a novel method for accelerating the training dataset collection for developing the natural language-to-query-language machine learning models. Our system allows one to generate conversational multi-term data, where multiple turns define a dialogue session.
arXiv Detail & Related papers (2020-05-31T19:16:27Z)
Conversations with Search Engines: SERP-based Conversational Response Generation [77.1381159789032]
We create a suitable dataset, the Search as a Conversation (SaaC) dataset, for the development of pipelines for conversations with search engines. We also develop a state-of-the-art pipeline for conversations with search engines, the Conversations with Search Engines (CaSE) using this dataset. CaSE enhances the state-of-the-art by introducing a supporting token identification module and aprior-aware pointer generator.
arXiv Detail & Related papers (2020-04-29T13:07:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.