Expanding Chatbot Knowledge in Customer Service: Context-Aware Similar Question Generation Using Large Language Models
- URL: http://arxiv.org/abs/2410.12444v2
- Date: Wed, 19 Mar 2025 06:22:38 GMT
- Title: Expanding Chatbot Knowledge in Customer Service: Context-Aware Similar Question Generation Using Large Language Models
- Authors: Mengze Hong, Chen Jason Zhang, Di Jiang, Yuanfeng Song, Lu Wang, Yuanqin He, Zhiyang Su, Qing Li,
- Abstract summary: Service chatbots play an important role in enhancing customer support by delivering timely responses to diverse queries.<n>To effectively handle varied customer inquiries, augmenting the knowledge base with similar questions that maintain semantic consistency and linguistic variability is crucial.<n>This paper presents methodologies for a novel approach that utilizes Large Language Models for generating similar questions and selecting an optimal subset of questions for knowledge base augmentation.
- Score: 19.131389732699365
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Service chatbots play an important role in enhancing customer support by delivering timely responses to diverse queries. Traditionally, these chatbots rely on retrieval-based methods constrained by a predefined knowledge base of question-answer (QA) pairs to guarantee reliable responses. To effectively handle varied customer inquiries, augmenting the knowledge base with similar questions that maintain semantic consistency and linguistic variability is crucial. This paper presents methodologies for a novel approach that utilizes Large Language Models (LLMs) for generating similar questions and selecting an optimal subset of questions for knowledge base augmentation in industrial chatbots. Specifically, we define the SQG task in the context of LLM training and propose a one-to-many objective that incorporates contextual information. We also introduce an optimization framework that selects a diverse subset of similar questions within predefined resource constraints. Experimental results demonstrate significant improvements over traditional methods, achieving greater semantic diversity while aligning with source QA pairs, with over 120% relative improvement in meeting business-specific requirements with human evaluation. Combined with several best practices, we provide a robust, application-driven solution for enhancing chatbot performance and improving customer service satisfaction.
Related papers
- FactGuard: Leveraging Multi-Agent Systems to Generate Answerable and Unanswerable Questions for Enhanced Long-Context LLM Extraction [25.00896070082754]
Extractive reading comprehension systems are designed to locate the correct answer to a question within a given text.
A persistent challenge lies in ensuring these models maintain high accuracy in answering questions while reliably recognizing unanswerable queries.
We propose an innovative data augmentation methodology grounded in a multi-agent collaborative framework.
arXiv Detail & Related papers (2025-04-08T01:45:16Z) - Dancing with Critiques: Enhancing LLM Reasoning with Stepwise Natural Language Self-Critique [66.94905631175209]
We propose a novel inference-time scaling approach -- stepwise natural language self-critique (PANEL)
It employs self-generated natural language critiques as feedback to guide the step-level search process.
This approach bypasses the need for task-specific verifiers and the associated training overhead.
arXiv Detail & Related papers (2025-03-21T17:59:55Z) - AGENT-CQ: Automatic Generation and Evaluation of Clarifying Questions for Conversational Search with LLMs [53.6200736559742]
AGENT-CQ consists of two stages: a generation stage and an evaluation stage.
CrowdLLM simulates human crowdsourcing judgments to assess generated questions and answers.
Experiments on the ClariQ dataset demonstrate CrowdLLM's effectiveness in evaluating question and answer quality.
arXiv Detail & Related papers (2024-10-25T17:06:27Z) - Enhancing Multi-Step Reasoning Abilities of Language Models through Direct Q-Function Optimization [49.362750475706235]
Reinforcement Learning (RL) plays a crucial role in aligning large language models with human preferences and improving their ability to perform complex tasks.
We introduce Direct Q-function Optimization (DQO), which formulates the response generation process as a Markov Decision Process (MDP) and utilizes the soft actor-critic (SAC) framework to optimize a Q-function directly parameterized by the language model.
Experimental results on two math problem-solving datasets, GSM8K and MATH, demonstrate that DQO outperforms previous methods, establishing it as a promising offline reinforcement learning approach for aligning language models.
arXiv Detail & Related papers (2024-10-11T23:29:20Z) - PerkwE_COQA: Enhanced Persian Conversational Question Answering by combining contextual keyword extraction with Large Language Models [0.8057006406834466]
This paper presents a novel method to elevate the performance of Persian Conversational question-answering (CQA) systems.
It combines the strengths of Large Language Models (LLMs) with contextual keyword extraction.
The proposed method effectively handles implicit questions, delivers contextually relevant answers, and tackles complex questions that rely heavily on conversational context.
arXiv Detail & Related papers (2024-04-08T11:14:58Z) - Interactive-KBQA: Multi-Turn Interactions for Knowledge Base Question Answering with Large Language Models [7.399563588835834]
Interactive-KBQA is a framework designed to generate logical forms through direct interaction with knowledge bases (KBs)
Our method achieves competitive results on the WebQuestionsSP, ComplexWebQuestions, KQA Pro, and MetaQA datasets.
arXiv Detail & Related papers (2024-02-23T06:32:18Z) - SEMQA: Semi-Extractive Multi-Source Question Answering [94.04430035121136]
We introduce a new QA task for answering multi-answer questions by summarizing multiple diverse sources in a semi-extractive fashion.
We create the first dataset of this kind, QuoteSum, with human-written semi-extractive answers to natural and generated questions.
arXiv Detail & Related papers (2023-11-08T18:46:32Z) - Diversify Question Generation with Retrieval-Augmented Style Transfer [68.00794669873196]
We propose RAST, a framework for Retrieval-Augmented Style Transfer.
The objective is to utilize the style of diverse templates for question generation.
We develop a novel Reinforcement Learning (RL) based approach that maximizes a weighted combination of diversity reward and consistency reward.
arXiv Detail & Related papers (2023-10-23T02:27:31Z) - Improving Question Generation with Multi-level Content Planning [70.37285816596527]
This paper addresses the problem of generating questions from a given context and an answer, specifically focusing on questions that require multi-hop reasoning across an extended context.
We propose MultiFactor, a novel QG framework based on multi-level content planning. Specifically, MultiFactor includes two components: FA-model, which simultaneously selects key phrases and generates full answers, and Q-model which takes the generated full answer as an additional input to generate questions.
arXiv Detail & Related papers (2023-10-20T13:57:01Z) - An Empirical Comparison of LM-based Question and Answer Generation
Methods [79.31199020420827]
Question and answer generation (QAG) consists of generating a set of question-answer pairs given a context.
In this paper, we establish baselines with three different QAG methodologies that leverage sequence-to-sequence language model (LM) fine-tuning.
Experiments show that an end-to-end QAG model, which is computationally light at both training and inference times, is generally robust and outperforms other more convoluted approaches.
arXiv Detail & Related papers (2023-05-26T14:59:53Z) - DialogQAE: N-to-N Question Answer Pair Extraction from Customer Service
Chatlog [34.69426306212259]
We propose N-to-N QA extraction task in which the derived questions and corresponding answers might be separated across different utterances.
We introduce a suite of generative/discriminative tagging based methods with end-to-end and two-stage variants that perform well on 5 customer service datasets.
arXiv Detail & Related papers (2022-12-14T09:05:14Z) - UniKGQA: Unified Retrieval and Reasoning for Solving Multi-hop Question
Answering Over Knowledge Graph [89.98762327725112]
Multi-hop Question Answering over Knowledge Graph(KGQA) aims to find the answer entities that are multiple hops away from the topic entities mentioned in a natural language question.
We propose UniKGQA, a novel approach for multi-hop KGQA task, by unifying retrieval and reasoning in both model architecture and parameter learning.
arXiv Detail & Related papers (2022-12-02T04:08:09Z) - Suggesting Relevant Questions for a Query Using Statistical Natural
Language Processing Technique [0.0]
Suggesting similar questions for a user query has many applications ranging from reducing search time of users on e-commerce websites, training of employees in companies to holistic learning for students.
The use of Natural Language Processing techniques for suggesting similar questions is prevalent over the existing architecture.
arXiv Detail & Related papers (2022-04-26T04:30:16Z) - How to Build Robust FAQ Chatbot with Controllable Question Generator? [5.680871239968297]
We propose a high-quality, diverse, controllable method to generate adversarial samples with a semantic graph.
The fluent and semantically generated QA pairs fool our passage retrieval model successfully.
We find that the generated data set improves the generalizability of the QA model to the new target domain.
arXiv Detail & Related papers (2021-11-18T12:54:07Z) - Effective FAQ Retrieval and Question Matching With Unsupervised
Knowledge Injection [10.82418428209551]
We propose a contextual language model for retrieving appropriate answers to frequently asked questions.
We also explore to capitalize on domain-specific topically-relevant relations between words in an unsupervised manner.
We evaluate variants of our approach on a publicly-available Chinese FAQ dataset, and further apply and contextualize it to a large-scale question-matching task.
arXiv Detail & Related papers (2020-10-27T05:03:34Z) - Learning an Effective Context-Response Matching Model with
Self-Supervised Tasks for Retrieval-based Dialogues [88.73739515457116]
We introduce four self-supervised tasks including next session prediction, utterance restoration, incoherence detection and consistency discrimination.
We jointly train the PLM-based response selection model with these auxiliary tasks in a multi-task manner.
Experiment results indicate that the proposed auxiliary self-supervised tasks bring significant improvement for multi-turn response selection.
arXiv Detail & Related papers (2020-09-14T08:44:46Z) - Logic-Guided Data Augmentation and Regularization for Consistent
Question Answering [55.05667583529711]
This paper addresses the problem of improving the accuracy and consistency of responses to comparison questions.
Our method leverages logical and linguistic knowledge to augment labeled training data and then uses a consistency-based regularizer to train the model.
arXiv Detail & Related papers (2020-04-21T17:03:08Z) - Reinforced Multi-task Approach for Multi-hop Question Generation [47.15108724294234]
We take up Multi-hop question generation, which aims at generating relevant questions based on supporting facts in the context.
We employ multitask learning with the auxiliary task of answer-aware supporting fact prediction to guide the question generator.
We demonstrate the effectiveness of our approach through experiments on the multi-hop question answering dataset, HotPotQA.
arXiv Detail & Related papers (2020-04-05T10:16:59Z) - ManyModalQA: Modality Disambiguation and QA over Diverse Inputs [73.93607719921945]
We present a new multimodal question answering challenge, ManyModalQA, in which an agent must answer a question by considering three distinct modalities.
We collect our data by scraping Wikipedia and then utilize crowdsourcing to collect question-answer pairs.
arXiv Detail & Related papers (2020-01-22T14:39:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.