MICO: Selective Search with Mutual Information Co-training
- URL: http://arxiv.org/abs/2209.04378v1
- Date: Fri, 9 Sep 2022 16:26:52 GMT
- Title: MICO: Selective Search with Mutual Information Co-training
- Authors: Zhanyu Wang, Xiao Zhang, Hyokun Yun, Choon Hui Teo, Trishul Chilimbi
- Abstract summary: MICO is a Mutual Information CO-training framework for selective search.
After training, MICO does not only cluster the documents, but also routes unseen queries to the relevant clusters for efficient retrieval.
- Score: 14.456028769565386
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In contrast to traditional exhaustive search, selective search first clusters
documents into several groups before all the documents are searched
exhaustively by a query, to limit the search executed within one group or only
a few groups. Selective search is designed to reduce the latency and
computation in modern large-scale search systems. In this study, we propose
MICO, a Mutual Information CO-training framework for selective search with
minimal supervision using the search logs. After training, MICO does not only
cluster the documents, but also routes unseen queries to the relevant clusters
for efficient retrieval. In our empirical experiments, MICO significantly
improves the performance on multiple metrics of selective search and
outperforms a number of existing competitive baselines.
Related papers
- ClusterTalk: Corpus Exploration Framework using Multi-Dimensional Exploratory Search [3.4123736336071864]
ClusterTalk is a framework for corpus exploration using multi-dimensional exploratory search.
Our system integrates document clustering with faceted search, allowing users to interactively refine their exploration and ask corpus and document-level queries.
arXiv Detail & Related papers (2024-12-19T05:11:16Z) - Maybe you are looking for CroQS: Cross-modal Query Suggestion for Text-to-Image Retrieval [15.757140563856675]
This work introduces a novel task that focuses on suggesting minimal textual modifications needed to explore visually consistent subsets of the collection.
To facilitate the evaluation and development of methods, we present a tailored benchmark named CroQS.
Baseline methods from related fields, such as image captioning and content summarization, are adapted for this task to provide reference performance scores.
arXiv Detail & Related papers (2024-12-18T13:24:09Z) - PseudoSeer: a Search Engine for Pseudocode [18.726136894285403]
A novel pseudocode search engine is designed to facilitate efficient retrieval and search of academic papers containing pseudocode.
By leveraging snippets, the system enables users to search across various facets of a paper, such as the title, abstract, author information, and code snippets.
A weighted BM25-based ranking algorithm is used by the search engine, and factors considered when prioritizing search results are described.
arXiv Detail & Related papers (2024-11-19T16:58:03Z) - Query-oriented Data Augmentation for Session Search [71.84678750612754]
We propose query-oriented data augmentation to enrich search logs and empower the modeling.
We generate supplemental training pairs by altering the most important part of a search context.
We develop several strategies to alter the current query, resulting in new training data with varying degrees of difficulty.
arXiv Detail & Related papers (2024-07-04T08:08:33Z) - ExcluIR: Exclusionary Neural Information Retrieval [74.08276741093317]
We present ExcluIR, a set of resources for exclusionary retrieval.
evaluation benchmark includes 3,452 high-quality exclusionary queries.
training set contains 70,293 exclusionary queries, each paired with a positive document and a negative document.
arXiv Detail & Related papers (2024-04-26T09:43:40Z) - Generative Retrieval as Multi-Vector Dense Retrieval [71.75503049199897]
Generative retrieval generates identifiers of relevant documents in an end-to-end manner.
Prior work has demonstrated that generative retrieval with atomic identifiers is equivalent to single-vector dense retrieval.
We show that generative retrieval and multi-vector dense retrieval share the same framework for measuring the relevance to a query of a document.
arXiv Detail & Related papers (2024-03-31T13:29:43Z) - Survival of the Most Influential Prompts: Efficient Black-Box Prompt
Search via Clustering and Pruning [77.61565726647784]
We propose a simple black-box search method that first clusters and prunes the search space to focus exclusively on influential prompt tokens.
Our findings underscore the critical role of search space design and optimization in enhancing both the usefulness and the efficiency of black-box prompt-based learning.
arXiv Detail & Related papers (2023-10-19T14:25:06Z) - Deep Reinforcement Agent for Efficient Instant Search [14.086339486783018]
We propose to address the load issue by identifying tokens that are semantically more salient towards retrieving relevant documents.
We train a reinforcement agent that interacts directly with the search engine and learns to predict the word's importance.
A novel evaluation framework is presented to study the trade-off between the number of triggered searches and the system's performance.
arXiv Detail & Related papers (2022-03-17T22:47:15Z) - Exploring Complicated Search Spaces with Interleaving-Free Sampling [127.07551427957362]
In this paper, we build the search algorithm upon a complicated search space with long-distance connections.
We present a simple yet effective algorithm named textbfIF-NAS, where we perform a periodic sampling strategy to construct different sub-networks.
In the proposed search space, IF-NAS outperform both random sampling and previous weight-sharing search algorithms by a significant margin.
arXiv Detail & Related papers (2021-12-05T06:42:48Z) - Exposing Query Identification for Search Transparency [69.06545074617685]
We explore the feasibility of approximate exposing query identification (EQI) as a retrieval task by reversing the role of queries and documents in two classes of search systems.
We derive an evaluation metric to measure the quality of a ranking of exposing queries, as well as conducting an empirical analysis focusing on various practical aspects of approximate EQI.
arXiv Detail & Related papers (2021-10-14T20:19:27Z) - Searching for a Search Method: Benchmarking Search Algorithms for
Generating NLP Adversarial Examples [10.993342896547691]
We study the behavior of several black-box search algorithms used for generating adversarial examples for natural language processing (NLP) tasks.
We perform a fine-grained analysis of three elements relevant to search: search algorithm, search space, and search budget.
arXiv Detail & Related papers (2020-09-09T17:04:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.