Related papers: Answering Compositional Queries with Set-Theoretic Embeddings

Answering Compositional Queries with Set-Theoretic Embeddings

URL: http://arxiv.org/abs/2306.04133v1
Date: Wed, 7 Jun 2023 04:04:36 GMT
Title: Answering Compositional Queries with Set-Theoretic Embeddings
Authors: Shib Dasgupta, Andrew McCallum, Steffen Rendle, Li Zhang
Abstract summary: Box embeddings are a region-based representation that can be thought of as learnable Venn diagrams. We present experiments and analysis providing insights into the behavior of both. We find that, while vector and box embeddings are equally suited to single attribute queries, for compositional queries box embeddings provide substantial advantages.
Score: 43.926610595182126
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The need to compactly and robustly represent item-attribute relations arises in many important tasks, such as faceted browsing and recommendation systems. A popular machine learning approach for this task denotes that an item has an attribute by a high dot-product between vectors for the item and attribute -- a representation that is not only dense, but also tends to correct noisy and incomplete data. While this method works well for queries retrieving items by a single attribute (such as \emph{movies that are comedies}), we find that vector embeddings do not so accurately support compositional queries (such as movies that are comedies and British but not romances). To address these set-theoretic compositions, this paper proposes to replace vectors with box embeddings, a region-based representation that can be thought of as learnable Venn diagrams. We introduce a new benchmark dataset for compositional queries, and present experiments and analysis providing insights into the behavior of both. We find that, while vector and box embeddings are equally suited to single attribute queries, for compositional queries box embeddings provide substantial advantages over vectors, particularly at the moderate and larger retrieval set sizes that are most useful for users' search and browsing.

Related papers

PRISM: Fine-Grained Paper-to-Paper Retrieval with Multi-Aspect-Aware Query Optimization [61.783280234747394]
PRISM is a document-to-document retrieval method that introduces multiple, fine-grained representations for both the query and candidate papers.<n>We present SciFullBench, a novel benchmark in which the complete and segmented context of full papers for both queries and candidates is available.<n>Experiments show that PRISM improves performance by an average of 4.3% over existing retrieval baselines.
arXiv Detail & Related papers (2025-07-14T08:41:53Z)
Bridging Queries and Tables through Entities in Table Retrieval [70.13748256886288]
Entities are well-studied in the context of text retrieval, but there is a noticeable lack of research on their applications in table retrieval. We propose an entity-enhanced training framework and design an interaction paradigm based on entity representations. Our proposed framework is plug-and-play and flexible, making it easy to integrate into existing table retriever training processes.
arXiv Detail & Related papers (2025-04-09T03:16:33Z)
A Geometric Approach to Personalized Recommendation with Set-Theoretic Constraints Using Box Embeddings [43.609405236093025]
We formulate the problem of personalized item recommendation as matrix completion where rows are set-theoretically dependent. Box embeddings can intuitively be understood as trainable Venn diagrams. We empirically demonstrate the superiority of box embeddings over vector-based neural methods on both simple and complex item recommendation queries by up to 30 % overall.
arXiv Detail & Related papers (2025-02-15T18:18:00Z)
ReTreever: Tree-based Coarse-to-Fine Representations for Retrieval [64.44265315244579]
We propose a tree-based method for organizing and representing reference documents at various granular levels. Our method, called ReTreever, jointly learns a routing function per internal node of a binary tree such that query and reference documents are assigned to similar tree branches. Our evaluations show that ReTreever generally preserves full representation accuracy.
arXiv Detail & Related papers (2025-02-11T21:35:13Z)
Generative Retrieval as Multi-Vector Dense Retrieval [71.75503049199897]
Generative retrieval generates identifiers of relevant documents in an end-to-end manner. Prior work has demonstrated that generative retrieval with atomic identifiers is equivalent to single-vector dense retrieval. We show that generative retrieval and multi-vector dense retrieval share the same framework for measuring the relevance to a query of a document.
arXiv Detail & Related papers (2024-03-31T13:29:43Z)
Multi-Intent Attribute-Aware Text Matching in Searching [21.92265431319774]
We propose a multi-intent attribute-aware matching model (MIM), which consists of three main components: attribute-aware encoder, multi-intent modeling, and intent-aware matching. In the MIM, the text and attributes are weighted and processed through a scaled attention mechanism with regard to the attributes' importance. In the intent-aware matching, the intents are evaluated by a self-supervised masking task, and then incorporated to output the final matching result.
arXiv Detail & Related papers (2024-02-12T16:54:22Z)
Dense X Retrieval: What Retrieval Granularity Should We Use? [56.90827473115201]
Often-overlooked design choice is the retrieval unit in which the corpus is indexed, e.g. document, passage, or sentence. We introduce a novel retrieval unit, proposition, for dense retrieval. Experiments reveal that indexing a corpus by fine-grained units such as propositions significantly outperforms passage-level units in retrieval tasks.
arXiv Detail & Related papers (2023-12-11T18:57:35Z)
Improving Items and Contexts Understanding with Descriptive Graph for Conversational Recommendation [4.640835690336652]
State-of-the-art methods on conversational recommender systems (CRS) leverage external knowledge to enhance both items' and contextual words' representations. We propose a new CRS framework KLEVER, which jointly models items and their associated contextual words in the same semantic space. Experiments on benchmarking CRS dataset demonstrate that KLEVER achieves superior performance, especially when the information from the users' responses is lacking.
arXiv Detail & Related papers (2023-04-11T21:21:46Z)
Same or Different? Diff-Vectors for Authorship Analysis [78.83284164605473]
In classic'' authorship analysis a feature vector represents a document, the value of a feature represents (an increasing function of) the relative frequency of the feature in the document, and the class label represents the author of the document. Our experiments tackle same-author verification, authorship verification, and closed-set authorship attribution; while DVs are naturally geared for solving the 1st, we also provide two novel methods for solving the 2nd and 3rd.
arXiv Detail & Related papers (2023-01-24T08:48:12Z)
What Are You Token About? Dense Retrieval as Distributions Over the Vocabulary [68.77983831618685]
We propose to interpret the vector representations produced by dual encoders by projecting them into the model's vocabulary space. We show that the resulting projections contain rich semantic information, and draw connection between them and sparse retrieval.
arXiv Detail & Related papers (2022-12-20T16:03:25Z)
ReSel: N-ary Relation Extraction from Scientific Text and Tables by Learning to Retrieve and Select [53.071352033539526]
We study the problem of extracting N-ary relations from scientific articles. Our proposed method ReSel decomposes this task into a two-stage procedure. Our experiments on three scientific information extraction datasets show that ReSel outperforms state-of-the-art baselines significantly.
arXiv Detail & Related papers (2022-10-26T02:28:02Z)
Aspect-Oriented Summarization through Query-Focused Extraction [23.62412515574206]
Real users' needs often fall more closely into aspects, broad topics in a dataset the user is interested in rather than specific queries. We benchmark extractive query-focused training schemes, and propose a contrastive augmentation approach to train the model. We evaluate on two aspect-oriented datasets and find this approach yields focused summaries, better than those from a generic summarization system.
arXiv Detail & Related papers (2021-10-15T18:06:21Z)
CSFCube -- A Test Collection of Computer Science Research Articles for Faceted Query by Example [43.01717754418893]
We introduce the task of faceted Query by Example. Users can also specify a finer grained aspect in addition to the input query document. We envision models which are able to retrieve scientific papers analogous to a query scientific paper.
arXiv Detail & Related papers (2021-03-24T01:02:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.