Demonstrating CAT: Synthesizing Data-Aware Conversational Agents for
Transactional Databases
- URL: http://arxiv.org/abs/2203.14144v1
- Date: Sat, 26 Mar 2022 19:46:43 GMT
- Title: Demonstrating CAT: Synthesizing Data-Aware Conversational Agents for
Transactional Databases
- Authors: Marius Gassen, Benjamin H\"attasch, Benjamin Hilprecht, Nadja Geisler,
Alexander Fraser, Carsten Binnig
- Abstract summary: We present CAT, which can be used to create conversational agents for transactional databases.
The main idea is that, for a given O database, CAT uses weak supervision to synthesize the required training data.
CAT provides an out-of-the-box integration of the resulting agent with the database.
- Score: 67.96827026450562
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Databases for OLTP are often the backbone for applications such as hotel room
or cinema ticket booking applications. However, developing a conversational
agent (i.e., a chatbot-like interface) to allow end-users to interact with an
application using natural language requires both immense amounts of training
data and NLP expertise. This motivates CAT, which can be used to easily create
conversational agents for transactional databases. The main idea is that, for a
given OLTP database, CAT uses weak supervision to synthesize the required
training data to train a state-of-the-art conversational agent, allowing users
to interact with the OLTP database. Furthermore, CAT provides an out-of-the-box
integration of the resulting agent with the database. As a major difference to
existing conversational agents, agents synthesized by CAT are data-aware. This
means that the agent decides which information should be requested from the
user based on the current data distributions in the database, which typically
results in markedly more efficient dialogues compared with non-data-aware
agents. We publish the code for CAT as open source.
Related papers
- An Approach for Auto Generation of Labeling Functions for Software Engineering Chatbots [3.1911318265930944]
We propose an approach to automatically generate labeling functions (LFs) by extracting patterns from labeled user queries.
We evaluate the effectiveness of our approach by applying it to the queries of four diverse SE datasets.
arXiv Detail & Related papers (2024-10-09T17:34:14Z) - Effective and Efficient Conversation Retrieval for Dialogue State Tracking with Implicit Text Summaries [48.243879779374836]
Few-shot dialogue state tracking (DST) with Large Language Models (LLM) relies on an effective and efficient conversation retriever to find similar in-context examples for prompt learning.
Previous works use raw dialogue context as search keys and queries, and a retriever is fine-tuned with annotated dialogues to achieve superior performance.
We handle the task of conversation retrieval based on text summaries of the conversations.
A LLM-based conversation summarizer is adopted for query and key generation, which enables effective maximum inner product search.
arXiv Detail & Related papers (2024-02-20T14:31:17Z) - Recommender AI Agent: Integrating Large Language Models for Interactive
Recommendations [53.76682562935373]
We introduce an efficient framework called textbfInteRecAgent, which employs LLMs as the brain and recommender models as tools.
InteRecAgent achieves satisfying performance as a conversational recommender system, outperforming general-purpose LLMs.
arXiv Detail & Related papers (2023-08-31T07:36:44Z) - Automatic Generation of Conversational Interfaces for Tabular Data Analysis [1.9744907811058787]
Tabular data is most common format to publish and exchange structured data online.
We propose the use of a conversational interface to exploit data sources published by public administrations.
arXiv Detail & Related papers (2023-05-18T22:23:40Z) - Open Domain Question Answering over Virtual Documents: A Unified
Approach for Data and Text [62.489652395307914]
We use the data-to-text method as a means for encoding structured knowledge for knowledge-intensive applications, i.e. open-domain question answering (QA)
Specifically, we propose a verbalizer-retriever-reader framework for open-domain QA over data and text where verbalized tables from Wikipedia and triples from Wikidata are used as augmented knowledge sources.
We show that our Unified Data and Text QA, UDT-QA, can effectively benefit from the expanded knowledge index, leading to large gains over text-only baselines.
arXiv Detail & Related papers (2021-10-16T00:11:21Z) - Data Driven Content Creation using Statistical and Natural Language
Processing Techniques for Financial Domain [0.0]
We propose a two-part framework where the first part describes methods to combine the information from different interaction channels like call, search, and chat.
The second part of the framework focuses on extracting customer questions by analyzing interaction data sources.
arXiv Detail & Related papers (2021-09-07T08:37:28Z) - Pchatbot: A Large-Scale Dataset for Personalized Chatbot [49.16746174238548]
We introduce Pchatbot, a large-scale dialogue dataset that contains two subsets collected from Weibo and Judicial forums respectively.
To adapt the raw dataset to dialogue systems, we elaborately normalize the raw dataset via processes such as anonymization.
The scale of Pchatbot is significantly larger than existing Chinese datasets, which might benefit the data-driven models.
arXiv Detail & Related papers (2020-09-28T12:49:07Z) - Efficient Deployment of Conversational Natural Language Interfaces over
Databases [45.52672694140881]
We propose a novel method for accelerating the training dataset collection for developing the natural language-to-query-language machine learning models.
Our system allows one to generate conversational multi-term data, where multiple turns define a dialogue session.
arXiv Detail & Related papers (2020-05-31T19:16:27Z) - Conversations with Search Engines: SERP-based Conversational Response
Generation [77.1381159789032]
We create a suitable dataset, the Search as a Conversation (SaaC) dataset, for the development of pipelines for conversations with search engines.
We also develop a state-of-the-art pipeline for conversations with search engines, the Conversations with Search Engines (CaSE) using this dataset.
CaSE enhances the state-of-the-art by introducing a supporting token identification module and aprior-aware pointer generator.
arXiv Detail & Related papers (2020-04-29T13:07:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.