Query Understanding via Intent Description Generation
- URL: http://arxiv.org/abs/2008.10889v1
- Date: Tue, 25 Aug 2020 08:56:40 GMT
- Title: Query Understanding via Intent Description Generation
- Authors: Ruqing Zhang, Jiafeng Guo, Yixing Fan, Yanyan Lan, and Xueqi Cheng
- Abstract summary: We propose a novel Query-to-Intent-Description (Q2ID) task for query understanding.
Unlike existing ranking tasks which leverage the query and its description to compute the relevance of documents, Q2ID is a reverse task which aims to generate a natural language intent description.
We demonstrate the effectiveness of our model by comparing with several state-of-the-art generation models on the Q2ID task.
- Score: 75.64800976586771
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Query understanding is a fundamental problem in information retrieval (IR),
which has attracted continuous attention through the past decades. Many
different tasks have been proposed for understanding users' search queries,
e.g., query classification or query clustering. However, it is not that precise
to understand a search query at the intent class/cluster level due to the loss
of many detailed information. As we may find in many benchmark datasets, e.g.,
TREC and SemEval, queries are often associated with a detailed description
provided by human annotators which clearly describes its intent to help
evaluate the relevance of the documents. If a system could automatically
generate a detailed and precise intent description for a search query, like
human annotators, that would indicate much better query understanding has been
achieved. In this paper, therefore, we propose a novel
Query-to-Intent-Description (Q2ID) task for query understanding. Unlike those
existing ranking tasks which leverage the query and its description to compute
the relevance of documents, Q2ID is a reverse task which aims to generate a
natural language intent description based on both relevant and irrelevant
documents of a given query. To address this new task, we propose a novel
Contrastive Generation model, namely CtrsGen for short, to generate the intent
description by contrasting the relevant documents with the irrelevant documents
given a query. We demonstrate the effectiveness of our model by comparing with
several state-of-the-art generation models on the Q2ID task. We discuss the
potential usage of such Q2ID technique through an example application.
Related papers
- Disentangling Questions from Query Generation for Task-Adaptive Retrieval [22.86406485412172]
We propose EGG, a query generator that better adapts to wide search intents expressed in the BeIR benchmark.
Our method outperforms baselines and existing models on four tasks with underexplored intents, while utilizing a query generator 47 times smaller than the previous state-of-the-art.
arXiv Detail & Related papers (2024-09-25T02:53:27Z) - QueryBuilder: Human-in-the-Loop Query Development for Information Retrieval [12.543590253664492]
We present a novel, interactive system called $textitQueryBuilder$.
It allows a novice, English-speaking user to create queries with a small amount of effort.
It rapidly develops cross-lingual information retrieval queries corresponding to the user's information needs.
arXiv Detail & Related papers (2024-09-07T00:46:58Z) - Understanding the User: An Intent-Based Ranking Dataset [2.6145315573431214]
This paper proposes an approach to augmenting such datasets to annotate informative query descriptions.
Our methodology involves utilizing state-of-the-art LLMs to analyze and comprehend the implicit intent within individual queries.
By extracting key semantic elements, we construct detailed and contextually rich descriptions for these queries.
arXiv Detail & Related papers (2024-08-30T08:40:59Z) - BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval [54.54576644403115]
Many complex real-world queries require in-depth reasoning to identify relevant documents.
We introduce BRIGHT, the first text retrieval benchmark that requires intensive reasoning to retrieve relevant documents.
Our dataset consists of 1,384 real-world queries spanning diverse domains, such as economics, psychology, mathematics, and coding.
arXiv Detail & Related papers (2024-07-16T17:58:27Z) - Toward Conversational Agents with Context and Time Sensitive Long-term Memory [8.085414868117917]
Until recently, most work on RAG has focused on information retrieval from large databases of texts, like Wikipedia.
We argue that effective retrieval from long-form conversational data faces two unique problems compared to static database retrieval.
We generate a new dataset of ambiguous and time-based questions that build upon a recent dataset of long-form, simulated conversations.
arXiv Detail & Related papers (2024-05-29T18:19:46Z) - Improving Topic Relevance Model by Mix-structured Summarization and LLM-based Data Augmentation [16.170841777591345]
In most social search scenarios such as Dianping, modeling search relevance always faces two challenges.
We first take queryd with the query-based summary and the document summary without query as the input of topic relevance model.
Then, we utilize the language understanding and generation abilities of large language model (LLM) to rewrite and generate query from queries and documents in existing training data.
arXiv Detail & Related papers (2024-04-03T10:05:47Z) - CAPSTONE: Curriculum Sampling for Dense Retrieval with Document
Expansion [68.19934563919192]
We propose a curriculum sampling strategy that utilizes pseudo queries during training and progressively enhances the relevance between the generated query and the real query.
Experimental results on both in-domain and out-of-domain datasets demonstrate that our approach outperforms previous dense retrieval models.
arXiv Detail & Related papers (2022-12-18T15:57:46Z) - UniKGQA: Unified Retrieval and Reasoning for Solving Multi-hop Question
Answering Over Knowledge Graph [89.98762327725112]
Multi-hop Question Answering over Knowledge Graph(KGQA) aims to find the answer entities that are multiple hops away from the topic entities mentioned in a natural language question.
We propose UniKGQA, a novel approach for multi-hop KGQA task, by unifying retrieval and reasoning in both model architecture and parameter learning.
arXiv Detail & Related papers (2022-12-02T04:08:09Z) - Graph Enhanced BERT for Query Understanding [55.90334539898102]
query understanding plays a key role in exploring users' search intents and facilitating users to locate their most desired information.
In recent years, pre-trained language models (PLMs) have advanced various natural language processing tasks.
We propose a novel graph-enhanced pre-training framework, GE-BERT, which can leverage both query content and the query graph.
arXiv Detail & Related papers (2022-04-03T16:50:30Z) - Query Resolution for Conversational Search with Limited Supervision [63.131221660019776]
We propose QuReTeC (Query Resolution by Term Classification), a neural query resolution model based on bidirectional transformers.
We show that QuReTeC outperforms state-of-the-art models, and furthermore, that our distant supervision method can be used to substantially reduce the amount of human-curated data required to train QuReTeC.
arXiv Detail & Related papers (2020-05-24T11:37:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.