Backtracing: Retrieving the Cause of the Query
- URL: http://arxiv.org/abs/2403.03956v1
- Date: Wed, 6 Mar 2024 18:59:02 GMT
- Title: Backtracing: Retrieving the Cause of the Query
- Authors: Rose E. Wang, Pawan Wirawarn, Omar Khattab, Noah Goodman, Dorottya
Demszky
- Abstract summary: We introduce the task of backtracing, in which systems retrieve the text segment that most likely caused a user query.
We evaluate the zero-shot performance of popular information retrieval methods and language modeling methods.
Our results show that there is room for improvement on backtracing and it requires new retrieval approaches.
- Score: 7.715089044732362
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Many online content portals allow users to ask questions to supplement their
understanding (e.g., of lectures). While information retrieval (IR) systems may
provide answers for such user queries, they do not directly assist content
creators -- such as lecturers who want to improve their content -- identify
segments that _caused_ a user to ask those questions. We introduce the task of
backtracing, in which systems retrieve the text segment that most likely caused
a user query. We formalize three real-world domains for which backtracing is
important in improving content delivery and communication: understanding the
cause of (a) student confusion in the Lecture domain, (b) reader curiosity in
the News Article domain, and (c) user emotion in the Conversation domain. We
evaluate the zero-shot performance of popular information retrieval methods and
language modeling methods, including bi-encoder, re-ranking and
likelihood-based methods and ChatGPT. While traditional IR systems retrieve
semantically relevant information (e.g., details on "projection matrices" for a
query "does projecting multiple times still lead to the same point?"), they
often miss the causally relevant context (e.g., the lecturer states "projecting
twice gets me the same answer as one projection"). Our results show that there
is room for improvement on backtracing and it requires new retrieval
approaches. We hope our benchmark serves to improve future retrieval systems
for backtracing, spawning systems that refine content generation and identify
linguistic triggers influencing user queries. Our code and data are
open-sourced: https://github.com/rosewang2008/backtracing.
Related papers
- Open Domain Question Answering with Conflicting Contexts [55.739842087655774]
We find that as much as 25% of unambiguous, open domain questions can lead to conflicting contexts when retrieved using Google Search.
We ask our annotators to provide explanations for their selections of correct answers.
arXiv Detail & Related papers (2024-10-16T07:24:28Z) - QueryBuilder: Human-in-the-Loop Query Development for Information Retrieval [12.543590253664492]
We present a novel, interactive system called $textitQueryBuilder$.
It allows a novice, English-speaking user to create queries with a small amount of effort.
It rapidly develops cross-lingual information retrieval queries corresponding to the user's information needs.
arXiv Detail & Related papers (2024-09-07T00:46:58Z) - Redefining Information Retrieval of Structured Database via Large Language Models [10.117751707641416]
This paper introduces a novel retrieval augmentation framework called ChatLR.
It primarily employs the powerful semantic understanding ability of Large Language Models (LLMs) as retrievers to achieve precise and concise information retrieval.
Experimental results demonstrate the effectiveness of ChatLR in addressing user queries, achieving an overall information retrieval accuracy exceeding 98.8%.
arXiv Detail & Related papers (2024-05-09T02:37:53Z) - Selecting Query-bag as Pseudo Relevance Feedback for Information-seeking Conversations [76.70349332096693]
Information-seeking dialogue systems are widely used in e-commerce systems.
We propose a Query-bag based Pseudo Relevance Feedback framework (QB-PRF)
It constructs a query-bag with related queries to serve as pseudo signals to guide information-seeking conversations.
arXiv Detail & Related papers (2024-03-22T08:10:32Z) - Towards Self-Contained Answers: Entity-Based Answer Rewriting in
Conversational Search [19.147174273221452]
This paper explore ways to rewrite answers in CIS, so that users can understand them without having to resort to external services or sources.
As our first contribution, we create a dataset of conversations annotated with entities for saliency.
As our second contribution, we propose two answer rewriting strategies aimed at improving the overall user experience in CIS.
arXiv Detail & Related papers (2024-03-04T05:52:41Z) - DIVKNOWQA: Assessing the Reasoning Ability of LLMs via Open-Domain
Question Answering over Knowledge Base and Text [73.68051228972024]
Large Language Models (LLMs) have exhibited impressive generation capabilities, but they suffer from hallucinations when relying on their internal knowledge.
Retrieval-augmented LLMs have emerged as a potential solution to ground LLMs in external knowledge.
arXiv Detail & Related papers (2023-10-31T04:37:57Z) - Social Commonsense-Guided Search Query Generation for Open-Domain
Knowledge-Powered Conversations [66.16863141262506]
We present a novel approach that focuses on generating internet search queries guided by social commonsense.
Our proposed framework addresses passive user interactions by integrating topic tracking, commonsense response generation and instruction-driven query generation.
arXiv Detail & Related papers (2023-10-22T16:14:56Z) - Multi-Grained Knowledge Retrieval for End-to-End Task-Oriented Dialog [42.088274728084265]
Retrieving proper domain knowledge from an external database lies at the heart of end-to-end task-oriented dialog systems.
Most existing systems blend knowledge retrieval with response generation and optimize them with direct supervision from reference responses.
We propose to decouple knowledge retrieval from response generation and introduce a multi-grained knowledge retriever.
arXiv Detail & Related papers (2023-05-17T12:12:46Z) - Guided Transformer: Leveraging Multiple External Sources for
Representation Learning in Conversational Search [36.64582291809485]
Asking clarifying questions in response to ambiguous or faceted queries has been recognized as a useful technique for various information retrieval systems.
In this paper, we enrich the representations learned by Transformer networks using a novel attention mechanism from external information sources.
Our experiments use a public dataset for search clarification and demonstrate significant improvements compared to competitive baselines.
arXiv Detail & Related papers (2020-06-13T03:24:53Z) - Query Resolution for Conversational Search with Limited Supervision [63.131221660019776]
We propose QuReTeC (Query Resolution by Term Classification), a neural query resolution model based on bidirectional transformers.
We show that QuReTeC outperforms state-of-the-art models, and furthermore, that our distant supervision method can be used to substantially reduce the amount of human-curated data required to train QuReTeC.
arXiv Detail & Related papers (2020-05-24T11:37:22Z) - IART: Intent-aware Response Ranking with Transformers in
Information-seeking Conversation Systems [80.0781718687327]
We analyze user intent patterns in information-seeking conversations and propose an intent-aware neural response ranking model "IART"
IART is built on top of the integration of user intent modeling and language representation learning with the Transformer architecture.
arXiv Detail & Related papers (2020-02-03T05:59:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.