Answering Compositional Queries with Set-Theoretic Embeddings
- URL: http://arxiv.org/abs/2306.04133v1
- Date: Wed, 7 Jun 2023 04:04:36 GMT
- Title: Answering Compositional Queries with Set-Theoretic Embeddings
- Authors: Shib Dasgupta, Andrew McCallum, Steffen Rendle, Li Zhang
- Abstract summary: Box embeddings are a region-based representation that can be thought of as learnable Venn diagrams.
We present experiments and analysis providing insights into the behavior of both.
We find that, while vector and box embeddings are equally suited to single attribute queries, for compositional queries box embeddings provide substantial advantages.
- Score: 43.926610595182126
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The need to compactly and robustly represent item-attribute relations arises
in many important tasks, such as faceted browsing and recommendation systems. A
popular machine learning approach for this task denotes that an item has an
attribute by a high dot-product between vectors for the item and attribute -- a
representation that is not only dense, but also tends to correct noisy and
incomplete data. While this method works well for queries retrieving items by a
single attribute (such as \emph{movies that are comedies}), we find that vector
embeddings do not so accurately support compositional queries (such as movies
that are comedies and British but not romances). To address these set-theoretic
compositions, this paper proposes to replace vectors with box embeddings, a
region-based representation that can be thought of as learnable Venn diagrams.
We introduce a new benchmark dataset for compositional queries, and present
experiments and analysis providing insights into the behavior of both. We find
that, while vector and box embeddings are equally suited to single attribute
queries, for compositional queries box embeddings provide substantial
advantages over vectors, particularly at the moderate and larger retrieval set
sizes that are most useful for users' search and browsing.
Related papers
- Generative Retrieval as Multi-Vector Dense Retrieval [71.75503049199897]
Generative retrieval generates identifiers of relevant documents in an end-to-end manner.
Prior work has demonstrated that generative retrieval with atomic identifiers is equivalent to single-vector dense retrieval.
We show that generative retrieval and multi-vector dense retrieval share the same framework for measuring the relevance to a query of a document.
arXiv Detail & Related papers (2024-03-31T13:29:43Z) - Multi-Intent Attribute-Aware Text Matching in Searching [21.92265431319774]
We propose a multi-intent attribute-aware matching model (MIM), which consists of three main components: attribute-aware encoder, multi-intent modeling, and intent-aware matching.
In the MIM, the text and attributes are weighted and processed through a scaled attention mechanism with regard to the attributes' importance.
In the intent-aware matching, the intents are evaluated by a self-supervised masking task, and then incorporated to output the final matching result.
arXiv Detail & Related papers (2024-02-12T16:54:22Z) - Dense X Retrieval: What Retrieval Granularity Should We Use? [56.90827473115201]
Often-overlooked design choice is the retrieval unit in which the corpus is indexed, e.g. document, passage, or sentence.
We introduce a novel retrieval unit, proposition, for dense retrieval.
Experiments reveal that indexing a corpus by fine-grained units such as propositions significantly outperforms passage-level units in retrieval tasks.
arXiv Detail & Related papers (2023-12-11T18:57:35Z) - Improving Items and Contexts Understanding with Descriptive Graph for
Conversational Recommendation [4.640835690336652]
State-of-the-art methods on conversational recommender systems (CRS) leverage external knowledge to enhance both items' and contextual words' representations.
We propose a new CRS framework KLEVER, which jointly models items and their associated contextual words in the same semantic space.
Experiments on benchmarking CRS dataset demonstrate that KLEVER achieves superior performance, especially when the information from the users' responses is lacking.
arXiv Detail & Related papers (2023-04-11T21:21:46Z) - Same or Different? Diff-Vectors for Authorship Analysis [78.83284164605473]
In classic'' authorship analysis a feature vector represents a document, the value of a feature represents (an increasing function of) the relative frequency of the feature in the document, and the class label represents the author of the document.
Our experiments tackle same-author verification, authorship verification, and closed-set authorship attribution; while DVs are naturally geared for solving the 1st, we also provide two novel methods for solving the 2nd and 3rd.
arXiv Detail & Related papers (2023-01-24T08:48:12Z) - What Are You Token About? Dense Retrieval as Distributions Over the
Vocabulary [68.77983831618685]
We propose to interpret the vector representations produced by dual encoders by projecting them into the model's vocabulary space.
We show that the resulting projections contain rich semantic information, and draw connection between them and sparse retrieval.
arXiv Detail & Related papers (2022-12-20T16:03:25Z) - ReSel: N-ary Relation Extraction from Scientific Text and Tables by
Learning to Retrieve and Select [53.071352033539526]
We study the problem of extracting N-ary relations from scientific articles.
Our proposed method ReSel decomposes this task into a two-stage procedure.
Our experiments on three scientific information extraction datasets show that ReSel outperforms state-of-the-art baselines significantly.
arXiv Detail & Related papers (2022-10-26T02:28:02Z) - Aspect-Oriented Summarization through Query-Focused Extraction [23.62412515574206]
Real users' needs often fall more closely into aspects, broad topics in a dataset the user is interested in rather than specific queries.
We benchmark extractive query-focused training schemes, and propose a contrastive augmentation approach to train the model.
We evaluate on two aspect-oriented datasets and find this approach yields focused summaries, better than those from a generic summarization system.
arXiv Detail & Related papers (2021-10-15T18:06:21Z) - CSFCube -- A Test Collection of Computer Science Research Articles for
Faceted Query by Example [43.01717754418893]
We introduce the task of faceted Query by Example.
Users can also specify a finer grained aspect in addition to the input query document.
We envision models which are able to retrieve scientific papers analogous to a query scientific paper.
arXiv Detail & Related papers (2021-03-24T01:02:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.