Related papers: Semantic Search and Recommendation Algorithm

Related papers

Dense Passage Retrieval in Conversational Search [0.0]
We present a new method called dense retrieval, which uses a dual-encoder to create contextual embeddings that can be indexed and clustered efficiently at run-time. We propose an end-to-end conversational search system called GPT2QR+DPR, which incorporates various query reformulation strategies to improve retrieval accuracy. Our work contributes to the growing body of research on neural-based retrieval methods in conversational search, and highlights the potential of dense retrieval in improving retrieval accuracy in conversational search systems.
arXiv Detail & Related papers (2025-03-21T19:39:31Z)
Investigating the Scalability of Approximate Sparse Retrieval Algorithms to Massive Datasets [8.1990111961557]
We investigate the behavior of state-of-the-art retrieval algorithms on massive datasets. We compare and contrast the recently-proposed Seismic and graph-based solutions adapted from dense retrieval. We extensively evaluate Splade embeddings of 138M passages from MsMarco-v2 and report indexing time and other efficiency and effectiveness metrics.
arXiv Detail & Related papers (2025-01-20T17:59:21Z)
VectorSearch: Enhancing Document Retrieval with Semantic Embeddings and Optimized Search [1.0411820336052784]
We propose VectorSearch, which leverages advanced algorithms, embeddings, and indexing techniques for refined retrieval. By utilizing innovative multi-vector search operations and encoding searches with advanced language models, our approach significantly improves retrieval accuracy. Experiments on real-world datasets show that VectorSearch outperforms baseline metrics.
arXiv Detail & Related papers (2024-09-25T21:58:08Z)
Efficient Line Search Method Based on Regression and Uncertainty Quantification [7.724860428430271]
Unconstrained optimization problems are typically solved using iterative methods to determine optimal step lengths. This paper introduces a novel line search approach using Bayesian optimization. It demonstrates superior performance compared to existing state-of-the-art methods, solving more problems to optimality with equivalent resource usage.
arXiv Detail & Related papers (2024-05-17T16:35:20Z)
Semi-Parametric Retrieval via Binary Token Index [71.78109794895065]
Semi-parametric Vocabulary Disentangled Retrieval (SVDR) is a novel semi-parametric retrieval framework. It supports two types of indexes: an embedding-based index for high effectiveness, akin to existing neural retrieval methods; and a binary token index that allows for quick and cost-effective setup, resembling traditional term-based retrieval. It achieves a 3% higher top-1 retrieval accuracy compared to the dense retriever DPR when using an embedding-based index and a 9% higher top-1 accuracy compared to BM25 when using a binary token index.
arXiv Detail & Related papers (2024-05-03T08:34:13Z)
Efficient Architecture Search via Bi-level Data Pruning [70.29970746807882]
This work pioneers an exploration into the critical role of dataset characteristics for DARTS bi-level optimization. We introduce a new progressive data pruning strategy that utilizes supernet prediction dynamics as the metric. Comprehensive evaluations on the NAS-Bench-201 search space, DARTS search space, and MobileNet-like search space validate that BDP reduces search costs by over 50%.
arXiv Detail & Related papers (2023-12-21T02:48:44Z)
Lexically-Accelerated Dense Retrieval [29.327878974130055]
'LADR' (Lexically-Accelerated Dense Retrieval) is a simple-yet-effective approach that improves the efficiency of existing dense retrieval models. LADR consistently achieves both precision and recall that are on par with an exhaustive search on standard benchmarks.
arXiv Detail & Related papers (2023-07-31T15:44:26Z)
Improved Algorithms for Neural Active Learning [74.89097665112621]
We improve the theoretical and empirical performance of neural-network(NN)-based active learning algorithms for the non-parametric streaming setting. We introduce two regret metrics by minimizing the population loss that are more suitable in active learning than the one used in state-of-the-art (SOTA) related work.
arXiv Detail & Related papers (2022-10-02T05:03:38Z)
Efficient Non-Parametric Optimizer Search for Diverse Tasks [93.64739408827604]
We present the first efficient scalable and general framework that can directly search on the tasks of interest. Inspired by the innate tree structure of the underlying math expressions, we re-arrange the spaces into a super-tree. We adopt an adaptation of the Monte Carlo method to tree search, equipped with rejection sampling and equivalent- form detection.
arXiv Detail & Related papers (2022-09-27T17:51:31Z)
Compactness Score: A Fast Filter Method for Unsupervised Feature Selection [66.84571085643928]
We propose a fast unsupervised feature selection method, named as, Compactness Score (CSUFS) to select desired features. Our proposed algorithm seems to be more accurate and efficient compared with existing algorithms.
arXiv Detail & Related papers (2022-01-31T13:01:37Z)
FDDH: Fast Discriminative Discrete Hashing for Large-Scale Cross-Modal Retrieval [41.125141897096874]
Cross-modal hashing is favored for its effectiveness and efficiency. Most existing methods do not sufficiently exploit the discriminative power of semantic information when learning the hash codes. We propose Fast Discriminative Discrete Hashing (FDDH) approach for large-scale cross-modal retrieval.
arXiv Detail & Related papers (2021-05-15T03:53:48Z)
A Genetic Algorithm for Obtaining Memory Constrained Near-Perfect Hashing [0.0]
We present an approach based on hash tables that focuses on both minimizing the number of comparisons performed during the search and minimizing the total collection size. The paper results show that near-perfect hashing is faster than binary search, yet uses less memory than perfect hashing.
arXiv Detail & Related papers (2020-07-16T12:57:15Z)
Progressively Pretrained Dense Corpus Index for Open-Domain Question Answering [87.32442219333046]
We propose a simple and resource-efficient method to pretrain the paragraph encoder. Our method outperforms an existing dense retrieval method that uses 7 times more computational resources for pretraining.
arXiv Detail & Related papers (2020-04-30T18:09:50Z)
GridMask Data Augmentation [76.79300104795966]
We propose a novel data augmentation method GridMask' in this paper. It utilizes information removal to achieve state-of-the-art results in a variety of computer vision tasks.
arXiv Detail & Related papers (2020-01-13T07:27:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.