Semantic Search and Recommendation Algorithm
- URL: http://arxiv.org/abs/2412.06649v1
- Date: Mon, 09 Dec 2024 16:43:23 GMT
- Title: Semantic Search and Recommendation Algorithm
- Authors: Aryan Duhan, Aryan Singhal, Shourya Sharma, Neeraj, Arti MK,
- Abstract summary: This paper introduces a new semantic search algorithm that uses Word2Vec and Annoy Index to improve the efficiency of information retrieval from large datasets.
Testing on datasets up to 100GB demonstrates the method's effectiveness in processing vast amounts of data while maintaining high precision and performance.
- Score: 0.5242869847419834
- License:
- Abstract: This paper introduces a new semantic search algorithm that uses Word2Vec and Annoy Index to improve the efficiency of information retrieval from large datasets. The proposed approach addresses the limitations of traditional search methods by offering enhanced speed, accuracy, and scalability. Testing on datasets up to 100GB demonstrates the method's effectiveness in processing vast amounts of data while maintaining high precision and performance.
Related papers
- Investigating the Scalability of Approximate Sparse Retrieval Algorithms to Massive Datasets [8.1990111961557]
We investigate the behavior of state-of-the-art retrieval algorithms on massive datasets.
We compare and contrast the recently-proposed Seismic and graph-based solutions adapted from dense retrieval.
We extensively evaluate Splade embeddings of 138M passages from MsMarco-v2 and report indexing time and other efficiency and effectiveness metrics.
arXiv Detail & Related papers (2025-01-20T17:59:21Z) - VectorSearch: Enhancing Document Retrieval with Semantic Embeddings and
Optimized Search [1.0411820336052784]
We propose VectorSearch, which leverages advanced algorithms, embeddings, and indexing techniques for refined retrieval.
By utilizing innovative multi-vector search operations and encoding searches with advanced language models, our approach significantly improves retrieval accuracy.
Experiments on real-world datasets show that VectorSearch outperforms baseline metrics.
arXiv Detail & Related papers (2024-09-25T21:58:08Z) - Efficient Line Search Method Based on Regression and Uncertainty Quantification [7.724860428430271]
Unconstrained optimization problems are typically solved using iterative methods to determine optimal step lengths.
This paper introduces a novel line search approach using Bayesian optimization.
It demonstrates superior performance compared to existing state-of-the-art methods, solving more problems to optimality with equivalent resource usage.
arXiv Detail & Related papers (2024-05-17T16:35:20Z) - Semi-Parametric Retrieval via Binary Token Index [71.78109794895065]
Semi-parametric Vocabulary Disentangled Retrieval (SVDR) is a novel semi-parametric retrieval framework.
It supports two types of indexes: an embedding-based index for high effectiveness, akin to existing neural retrieval methods; and a binary token index that allows for quick and cost-effective setup, resembling traditional term-based retrieval.
It achieves a 3% higher top-1 retrieval accuracy compared to the dense retriever DPR when using an embedding-based index and a 9% higher top-1 accuracy compared to BM25 when using a binary token index.
arXiv Detail & Related papers (2024-05-03T08:34:13Z) - Efficient Architecture Search via Bi-level Data Pruning [70.29970746807882]
This work pioneers an exploration into the critical role of dataset characteristics for DARTS bi-level optimization.
We introduce a new progressive data pruning strategy that utilizes supernet prediction dynamics as the metric.
Comprehensive evaluations on the NAS-Bench-201 search space, DARTS search space, and MobileNet-like search space validate that BDP reduces search costs by over 50%.
arXiv Detail & Related papers (2023-12-21T02:48:44Z) - Efficient Non-Parametric Optimizer Search for Diverse Tasks [93.64739408827604]
We present the first efficient scalable and general framework that can directly search on the tasks of interest.
Inspired by the innate tree structure of the underlying math expressions, we re-arrange the spaces into a super-tree.
We adopt an adaptation of the Monte Carlo method to tree search, equipped with rejection sampling and equivalent- form detection.
arXiv Detail & Related papers (2022-09-27T17:51:31Z) - Compactness Score: A Fast Filter Method for Unsupervised Feature
Selection [66.84571085643928]
We propose a fast unsupervised feature selection method, named as, Compactness Score (CSUFS) to select desired features.
Our proposed algorithm seems to be more accurate and efficient compared with existing algorithms.
arXiv Detail & Related papers (2022-01-31T13:01:37Z) - FDDH: Fast Discriminative Discrete Hashing for Large-Scale Cross-Modal
Retrieval [41.125141897096874]
Cross-modal hashing is favored for its effectiveness and efficiency.
Most existing methods do not sufficiently exploit the discriminative power of semantic information when learning the hash codes.
We propose Fast Discriminative Discrete Hashing (FDDH) approach for large-scale cross-modal retrieval.
arXiv Detail & Related papers (2021-05-15T03:53:48Z) - A Genetic Algorithm for Obtaining Memory Constrained Near-Perfect
Hashing [0.0]
We present an approach based on hash tables that focuses on both minimizing the number of comparisons performed during the search and minimizing the total collection size.
The paper results show that near-perfect hashing is faster than binary search, yet uses less memory than perfect hashing.
arXiv Detail & Related papers (2020-07-16T12:57:15Z) - Progressively Pretrained Dense Corpus Index for Open-Domain Question
Answering [87.32442219333046]
We propose a simple and resource-efficient method to pretrain the paragraph encoder.
Our method outperforms an existing dense retrieval method that uses 7 times more computational resources for pretraining.
arXiv Detail & Related papers (2020-04-30T18:09:50Z) - GridMask Data Augmentation [76.79300104795966]
We propose a novel data augmentation method GridMask' in this paper.
It utilizes information removal to achieve state-of-the-art results in a variety of computer vision tasks.
arXiv Detail & Related papers (2020-01-13T07:27:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.