Question Generation in Knowledge-Driven Dialog: Explainability and Evaluation
- URL: http://arxiv.org/abs/2404.07836v1
- Date: Thu, 11 Apr 2024 15:24:50 GMT
- Title: Question Generation in Knowledge-Driven Dialog: Explainability and Evaluation
- Authors: Juliette Faille, Quentin Brabant, Gwenole Lecorve, Lina M. Rojas-Barahona, Claire Gardent,
- Abstract summary: We present a model which instead of directly generating a question, sequentially predicts first a fact then a question.
We evaluate our approach on 37k test dialogs adapted from the KGConv dataset.
- Score: 5.598219914856067
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We explore question generation in the context of knowledge-grounded dialogs focusing on explainability and evaluation. Inspired by previous work on planning-based summarisation, we present a model which instead of directly generating a question, sequentially predicts first a fact then a question. We evaluate our approach on 37k test dialogs adapted from the KGConv dataset and we show that, although more demanding in terms of inference, our approach performs on par with a standard model which solely generates a question while allowing for a detailed referenceless evaluation of the model behaviour in terms of relevance, factuality and pronominalisation.
Related papers
- Automated Speaking Assessment of Conversation Tests with Novel Graph-based Modeling on Spoken Response Coherence [11.217656140423207]
ASAC aims to evaluate the overall speaking proficiency of an L2 speaker in a setting where an interlocutor interacts with one or more candidates.
We propose a hierarchical graph model that aptly incorporates both broad inter-response interactions and nuanced semantic information.
Extensive experimental results on the NICT-JLE benchmark dataset suggest that our proposed modeling approach can yield considerable improvements in prediction accuracy.
arXiv Detail & Related papers (2024-09-11T07:24:07Z) - "Is This It?": Towards Ecologically Valid Benchmarks for Situated Collaboration [16.25921668308458]
We develop benchmarks to assess the capabilities of large multimodal models for engaging in situated collaboration.
In contrast to existing benchmarks, in which question-answer pairs are generated post hoc over preexisting or synthetic datasets via templates, human annotators, or large language models, we propose and investigate an interactive system-driven approach.
We illustrate how the questions that arise are different in form and content from questions typically found in existing embodied question answering (EQA) benchmarks and discuss new real-world challenge problems brought to the fore.
arXiv Detail & Related papers (2024-08-30T12:41:23Z) - Open-ended VQA benchmarking of Vision-Language models by exploiting Classification datasets and their semantic hierarchy [27.454549324141087]
We propose a novel VQA benchmark based on well-known visual classification datasets.
We also suggest using the semantic hierarchy of the label space to ask automatically generated follow-up questions about the ground-truth category.
Our contributions aim to lay the foundation for more precise and meaningful assessments.
arXiv Detail & Related papers (2024-02-11T18:26:18Z) - SWING: Balancing Coverage and Faithfulness for Dialogue Summarization [67.76393867114923]
We propose to utilize natural language inference (NLI) models to improve coverage while avoiding factual inconsistencies.
We use NLI to compute fine-grained training signals to encourage the model to generate content in the reference summaries that have not been covered.
Experiments on the DialogSum and SAMSum datasets confirm the effectiveness of the proposed approach.
arXiv Detail & Related papers (2023-01-25T09:33:11Z) - GODEL: Large-Scale Pre-Training for Goal-Directed Dialog [119.1397031992088]
We introduce GODEL, a large pre-trained language model for dialog.
We show that GODEL outperforms state-of-the-art pre-trained dialog models in few-shot fine-tuning setups.
A novel feature of our evaluation methodology is the introduction of a notion of utility that assesses the usefulness of responses.
arXiv Detail & Related papers (2022-06-22T18:19:32Z) - What should I Ask: A Knowledge-driven Approach for Follow-up Questions
Generation in Conversational Surveys [63.51903260461746]
We propose a novel task for knowledge-driven follow-up question generation in conversational surveys.
We constructed a new human-annotated dataset of human-written follow-up questions with dialogue history and labeled knowledge.
We then propose a two-staged knowledge-driven model for the task, which generates informative and coherent follow-up questions.
arXiv Detail & Related papers (2022-05-23T00:57:33Z) - On the Use of Linguistic Features for the Evaluation of Generative
Dialogue Systems [17.749995931459136]
We propose that a metric based on linguistic features may be able to maintain good correlation with human judgment and be interpretable.
To support this proposition, we measure and analyze various linguistic features on dialogues produced by multiple dialogue models.
We find that the features' behaviour is consistent with the known properties of the models tested, and is similar across domains.
arXiv Detail & Related papers (2021-04-13T16:28:00Z) - I like fish, especially dolphins: Addressing Contradictions in Dialogue
Modeling [104.09033240889106]
We introduce the DialoguE COntradiction DEtection task (DECODE) and a new conversational dataset containing both human-human and human-bot contradictory dialogues.
We then compare a structured utterance-based approach of using pre-trained Transformer models for contradiction detection with the typical unstructured approach.
arXiv Detail & Related papers (2020-12-24T18:47:49Z) - Knowledge-Grounded Dialogue Generation with Pre-trained Language Models [74.09352261943911]
We study knowledge-grounded dialogue generation with pre-trained language models.
We propose equipping response generation defined by a pre-trained language model with a knowledge selection module.
arXiv Detail & Related papers (2020-10-17T16:49:43Z) - A Revised Generative Evaluation of Visual Dialogue [80.17353102854405]
We propose a revised evaluation scheme for the VisDial dataset.
We measure consensus between answers generated by the model and a set of relevant answers.
We release these sets and code for the revised evaluation scheme as DenseVisDial.
arXiv Detail & Related papers (2020-04-20T13:26:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.