Asking Multimodal Clarifying Questions in Mixed-Initiative
Conversational Search
- URL: http://arxiv.org/abs/2402.07742v1
- Date: Mon, 12 Feb 2024 16:04:01 GMT
- Title: Asking Multimodal Clarifying Questions in Mixed-Initiative
Conversational Search
- Authors: Yifei Yuan, Clemencia Siro, Mohammad Aliannejadi, Maarten de Rijke,
Wai Lam
- Abstract summary: In mixed-initiative conversational search systems, clarifying questions are used to help users who struggle to express their intentions in a single query.
We hypothesize that in scenarios where multimodal information is pertinent, the clarification process can be improved by using non-textual information.
We collect a dataset named Melon that contains over 4k multimodal clarifying questions, enriched with over 14k images.
Several analyses are conducted to understand the importance of multimodal contents during the query clarification phase.
- Score: 89.1772985740272
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: In mixed-initiative conversational search systems, clarifying questions are
used to help users who struggle to express their intentions in a single query.
These questions aim to uncover user's information needs and resolve query
ambiguities. We hypothesize that in scenarios where multimodal information is
pertinent, the clarification process can be improved by using non-textual
information. Therefore, we propose to add images to clarifying questions and
formulate the novel task of asking multimodal clarifying questions in
open-domain, mixed-initiative conversational search systems. To facilitate
research into this task, we collect a dataset named Melon that contains over 4k
multimodal clarifying questions, enriched with over 14k images. We also propose
a multimodal query clarification model named Marto and adopt a prompt-based,
generative fine-tuning strategy to perform the training of different stages
with different prompts. Several analyses are conducted to understand the
importance of multimodal contents during the query clarification phase.
Experimental results indicate that the addition of images leads to significant
improvements of up to 90% in retrieval performance when selecting the relevant
images. Extensive analyses are also performed to show the superiority of Marto
compared with discriminative baselines in terms of effectiveness and
efficiency.
Related papers
- Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent [102.31558123570437]
Multimodal Retrieval Augmented Generation (mRAG) plays an important role in mitigating the "hallucination" issue inherent in multimodal large language models (MLLMs)
We propose the first self-adaptive planning agent for multimodal retrieval, OmniSearch.
arXiv Detail & Related papers (2024-11-05T09:27:21Z) - Venn Diagram Prompting : Accelerating Comprehension with Scaffolding Effect [0.0]
We introduce Venn Diagram (VD) Prompting, an innovative prompting technique which allows Large Language Models (LLMs) to combine and synthesize information across documents.
Our proposed technique also aims to eliminate the inherent position bias in the LLMs, enhancing consistency in answers by removing sensitivity to the sequence of input information.
In the experiments performed on four public benchmark question-answering datasets, VD prompting continually matches or surpasses the performance of a meticulously crafted instruction prompt.
arXiv Detail & Related papers (2024-06-08T06:27:26Z) - CLARINET: Augmenting Language Models to Ask Clarification Questions for Retrieval [52.134133938779776]
We present CLARINET, a system that asks informative clarification questions by choosing questions whose answers would maximize certainty in the correct candidate.
Our approach works by augmenting a large language model (LLM) to condition on a retrieval distribution, finetuning end-to-end to generate the question that would have maximized the rank of the true candidate at each turn.
arXiv Detail & Related papers (2024-04-28T18:21:31Z) - End-to-end Knowledge Retrieval with Multi-modal Queries [50.01264794081951]
ReMuQ requires a system to retrieve knowledge from a large corpus by integrating contents from both text and image queries.
We introduce a retriever model ReViz'' that can directly process input text and images to retrieve relevant knowledge in an end-to-end fashion.
We demonstrate superior performance in retrieval on two datasets under zero-shot settings.
arXiv Detail & Related papers (2023-06-01T08:04:12Z) - Zero-shot Clarifying Question Generation for Conversational Search [25.514678546942754]
We propose a constrained clarifying question generation system which uses both question templates and query facets to guide the effective and precise question generation.
Experiment results show that our method outperforms existing state-of-the-art zero-shot baselines by a large margin.
arXiv Detail & Related papers (2023-01-30T04:43:02Z) - Exposing Query Identification for Search Transparency [69.06545074617685]
We explore the feasibility of approximate exposing query identification (EQI) as a retrieval task by reversing the role of queries and documents in two classes of search systems.
We derive an evaluation metric to measure the quality of a ranking of exposing queries, as well as conducting an empirical analysis focusing on various practical aspects of approximate EQI.
arXiv Detail & Related papers (2021-10-14T20:19:27Z) - Analysing the Effect of Clarifying Questions on Document Ranking in
Conversational Search [10.335808358080289]
We investigate how different aspects of clarifying questions and user answers affect the quality of ranking.
We introduce a simple-based lexical baseline, that significantly outperforms the existing naive baselines.
arXiv Detail & Related papers (2020-08-09T12:55:16Z) - Guided Transformer: Leveraging Multiple External Sources for
Representation Learning in Conversational Search [36.64582291809485]
Asking clarifying questions in response to ambiguous or faceted queries has been recognized as a useful technique for various information retrieval systems.
In this paper, we enrich the representations learned by Transformer networks using a novel attention mechanism from external information sources.
Our experiments use a public dataset for search clarification and demonstrate significant improvements compared to competitive baselines.
arXiv Detail & Related papers (2020-06-13T03:24:53Z) - Multi-Stage Conversational Passage Retrieval: An Approach to Fusing Term
Importance Estimation and Neural Query Rewriting [56.268862325167575]
We tackle conversational passage retrieval (ConvPR) with query reformulation integrated into a multi-stage ad-hoc IR system.
We propose two conversational query reformulation (CQR) methods: (1) term importance estimation and (2) neural query rewriting.
For the former, we expand conversational queries using important terms extracted from the conversational context with frequency-based signals.
For the latter, we reformulate conversational queries into natural, standalone, human-understandable queries with a pretrained sequence-tosequence model.
arXiv Detail & Related papers (2020-05-05T14:30:20Z) - Multi-View Attention Network for Visual Dialog [5.731758300670842]
It is necessary for an agent to 1) determine the semantic intent of question and 2) align question-relevant textual and visual contents.
We propose Multi-View Attention Network (MVAN), which leverages multiple views about heterogeneous inputs.
MVAN effectively captures the question-relevant information from the dialog history with two complementary modules.
arXiv Detail & Related papers (2020-04-29T08:46:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.