QUEST: A Retrieval Dataset of Entity-Seeking Queries with Implicit Set
Operations
- URL: http://arxiv.org/abs/2305.11694v2
- Date: Wed, 31 May 2023 05:11:21 GMT
- Title: QUEST: A Retrieval Dataset of Entity-Seeking Queries with Implicit Set
Operations
- Authors: Chaitanya Malaviya, Peter Shaw, Ming-Wei Chang, Kenton Lee, Kristina
Toutanova
- Abstract summary: QUEST is a dataset of 3357 natural language queries with implicit set operations.
The dataset challenges models to match multiple constraints mentioned in queries with corresponding evidence in documents.
We analyze several modern retrieval systems, finding that they often struggle on such queries.
- Score: 36.70770411188946
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Formulating selective information needs results in queries that implicitly
specify set operations, such as intersection, union, and difference. For
instance, one might search for "shorebirds that are not sandpipers" or
"science-fiction films shot in England". To study the ability of retrieval
systems to meet such information needs, we construct QUEST, a dataset of 3357
natural language queries with implicit set operations, that map to a set of
entities corresponding to Wikipedia documents. The dataset challenges models to
match multiple constraints mentioned in queries with corresponding evidence in
documents and correctly perform various set operations. The dataset is
constructed semi-automatically using Wikipedia category names. Queries are
automatically composed from individual categories, then paraphrased and further
validated for naturalness and fluency by crowdworkers. Crowdworkers also assess
the relevance of entities based on their documents and highlight attribution of
query constraints to spans of document text. We analyze several modern
retrieval systems, finding that they often struggle on such queries. Queries
involving negation and conjunction are particularly challenging and systems are
further challenged with combinations of these operations.
Related papers
- RoundTable: Leveraging Dynamic Schema and Contextual Autocomplete for Enhanced Query Precision in Tabular Question Answering [11.214912072391108]
Real-world datasets often feature a vast array of attributes and complex values.
Traditional methods cannot fully relay the datasets size and complexity to the Large Language Models.
We propose a novel framework that leverages Full-Text Search (FTS) on the input table.
arXiv Detail & Related papers (2024-08-22T13:13:06Z) - Aligning Query Representation with Rewritten Query and Relevance Judgments in Conversational Search [32.35446999027349]
We leverage both rewritten queries and relevance judgments in the conversational search data to train a better query representation model.
The proposed model -- Query Representation Alignment Conversational Retriever, QRACDR, is tested on eight datasets.
arXiv Detail & Related papers (2024-07-29T17:14:36Z) - UQE: A Query Engine for Unstructured Databases [71.49289088592842]
We investigate the potential of Large Language Models to enable unstructured data analytics.
We propose a new Universal Query Engine (UQE) that directly interrogates and draws insights from unstructured data collections.
arXiv Detail & Related papers (2024-06-23T06:58:55Z) - PDFTriage: Question Answering over Long, Structured Documents [60.96667912964659]
Representing structured documents as plain text is incongruous with the user's mental model of these documents with rich structure.
We propose PDFTriage that enables models to retrieve the context based on either structure or content.
Our benchmark dataset consists of 900+ human-generated questions over 80 structured documents.
arXiv Detail & Related papers (2023-09-16T04:29:05Z) - Automated Query Generation for Evidence Collection from Web Search
Engines [2.642698101441705]
It is widely accepted that so-called facts can be checked by searching for information on the Internet.
This process requires a fact-checker to formulate a search query based on the fact and to present it to a search engine.
We ask the question as to whether it is possible to automate the first step, that of query generation.
arXiv Detail & Related papers (2023-03-15T14:32:00Z) - Searching for Better Database Queries in the Outputs of Semantic Parsers [16.221439565760058]
In this paper, we consider the case when, at the test time, the system has access to an external criterion that evaluates the generated queries.
The criterion can vary from checking that a query executes without errors to verifying the query on a set of tests.
We apply our approach to the state-of-the-art semantics and report that it allows us to find many queries passing all the tests on different datasets.
arXiv Detail & Related papers (2022-10-13T17:20:45Z) - Parallel Instance Query Network for Named Entity Recognition [73.30174490672647]
Named entity recognition (NER) is a fundamental task in natural language processing.
Recent works treat named entity recognition as a reading comprehension task, constructing type-specific queries manually to extract entities.
We propose Parallel Instance Query Network (PIQN), which sets up global and learnable instance queries to extract entities in a parallel manner.
arXiv Detail & Related papers (2022-03-20T13:01:25Z) - Text Summarization with Latent Queries [60.468323530248945]
We introduce LaQSum, the first unified text summarization system that learns Latent Queries from documents for abstractive summarization with any existing query forms.
Under a deep generative framework, our system jointly optimize a latent query model and a conditional language model, allowing users to plug-and-play queries of any type at test time.
Our system robustly outperforms strong comparison systems across summarization benchmarks with different query types, document settings, and target domains.
arXiv Detail & Related papers (2021-05-31T21:14:58Z) - CSFCube -- A Test Collection of Computer Science Research Articles for
Faceted Query by Example [43.01717754418893]
We introduce the task of faceted Query by Example.
Users can also specify a finer grained aspect in addition to the input query document.
We envision models which are able to retrieve scientific papers analogous to a query scientific paper.
arXiv Detail & Related papers (2021-03-24T01:02:12Z) - Query Understanding via Intent Description Generation [75.64800976586771]
We propose a novel Query-to-Intent-Description (Q2ID) task for query understanding.
Unlike existing ranking tasks which leverage the query and its description to compute the relevance of documents, Q2ID is a reverse task which aims to generate a natural language intent description.
We demonstrate the effectiveness of our model by comparing with several state-of-the-art generation models on the Q2ID task.
arXiv Detail & Related papers (2020-08-25T08:56:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.