Related papers: Leveraging Cognitive Search Patterns to Enhance Automated Natural Language Retrieval Performance

Leveraging Cognitive Search Patterns to Enhance Automated Natural Language Retrieval Performance

URL: http://arxiv.org/abs/2004.10035v1
Date: Tue, 21 Apr 2020 14:13:33 GMT
Title: Leveraging Cognitive Search Patterns to Enhance Automated Natural Language Retrieval Performance
Authors: Bhawani Selvaretnam, Mohammed Belkhatir
Abstract summary: We show that cognitive reformulation patterns that mimic user search behaviour are highlighted. We formalize the application of these patterns by considering a query conceptual representation. A genetic algorithm-based weighting process allows placing emphasis on terms according to their conceptual role-type.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The search of information in large text repositories has been plagued by the so-called document-query vocabulary gap, i.e. the semantic discordance between the contents in the stored document entities on the one hand and the human query on the other hand. Over the past two decades, a significant body of works has advanced technical retrieval prowess while several studies have shed light on issues pertaining to human search behavior. We believe that these efforts should be conjoined, in the sense that automated retrieval systems have to fully emulate human search behavior and thus consider the procedure according to which users incrementally enhance their initial query. To this end, cognitive reformulation patterns that mimic user search behaviour are highlighted and enhancement terms which are statistically collocated with or lexical-semantically related to the original terms adopted in the retrieval process. We formalize the application of these patterns by considering a query conceptual representation and introducing a set of operations allowing to operate modifications on the initial query. A genetic algorithm-based weighting process allows placing emphasis on terms according to their conceptual role-type. An experimental evaluation on real-world datasets against relevance, language, conceptual and knowledge-based models is conducted. We also show, when compared to language and relevance models, a better performance in terms of mean average precision than a word embedding-based model instantiation.

Related papers

Constrained Auto-Regressive Decoding Constrains Generative Retrieval [71.71161220261655]
Generative retrieval seeks to replace traditional search index data structures with a single large-scale neural network. In this paper, we examine the inherent limitations of constrained auto-regressive generation from two essential perspectives: constraints and beam search.
arXiv Detail & Related papers (2025-04-14T06:54:49Z)
Likelihood as a Performance Gauge for Retrieval-Augmented Generation [78.28197013467157]
We show that likelihoods serve as an effective gauge for language model performance. We propose two methods that use question likelihood as a gauge for selecting and constructing prompts that lead to better performance.
arXiv Detail & Related papers (2024-11-12T13:14:09Z)
VectorSearch: Enhancing Document Retrieval with Semantic Embeddings and Optimized Search [1.0411820336052784]
We propose VectorSearch, which leverages advanced algorithms, embeddings, and indexing techniques for refined retrieval. By utilizing innovative multi-vector search operations and encoding searches with advanced language models, our approach significantly improves retrieval accuracy. Experiments on real-world datasets show that VectorSearch outperforms baseline metrics.
arXiv Detail & Related papers (2024-09-25T21:58:08Z)
Bootstrapped Pre-training with Dynamic Identifier Prediction for Generative Retrieval [108.9772640854136]
Generative retrieval uses differentiable search indexes to directly generate relevant document identifiers in response to a query. Recent studies have highlighted the potential of a strong generative retrieval model, trained with carefully crafted pre-training tasks, to enhance downstream retrieval tasks via fine-tuning. We introduce BootRet, a bootstrapped pre-training method for generative retrieval that dynamically adjusts document identifiers during pre-training to accommodate the continuing of the corpus.
arXiv Detail & Related papers (2024-07-16T08:42:36Z)
Query-oriented Data Augmentation for Session Search [71.84678750612754]
We propose query-oriented data augmentation to enrich search logs and empower the modeling. We generate supplemental training pairs by altering the most important part of a search context. We develop several strategies to alter the current query, resulting in new training data with varying degrees of difficulty.
arXiv Detail & Related papers (2024-07-04T08:08:33Z)
Enhancing Retrieval-Augmented LMs with a Two-stage Consistency Learning Compressor [4.35807211471107]
This work proposes a novel two-stage consistency learning approach for retrieved information compression in retrieval-augmented language models. The proposed method is empirically validated across multiple datasets, demonstrating notable enhancements in precision and efficiency for question-answering tasks.
arXiv Detail & Related papers (2024-06-04T12:43:23Z)
Enhancing Cloud-Based Large Language Model Processing with Elasticsearch and Transformer Models [17.09116903102371]
Large Language Models (LLMs) are a class of generative AI models built using the Transformer network. LLMs are capable of leveraging vast datasets to identify, summarize, translate, predict, and generate language. Semantic vector search within large language models is a potent technique that can significantly enhance search result accuracy and relevance.
arXiv Detail & Related papers (2024-02-24T12:31:22Z)
Dense X Retrieval: What Retrieval Granularity Should We Use? [56.90827473115201]
Often-overlooked design choice is the retrieval unit in which the corpus is indexed, e.g. document, passage, or sentence. We introduce a novel retrieval unit, proposition, for dense retrieval. Experiments reveal that indexing a corpus by fine-grained units such as propositions significantly outperforms passage-level units in retrieval tasks.
arXiv Detail & Related papers (2023-12-11T18:57:35Z)
How Does Generative Retrieval Scale to Millions of Passages? [68.98628807288972]
We conduct the first empirical study of generative retrieval techniques across various corpus scales. We scale generative retrieval to millions of passages with a corpus of 8.8M passages and evaluating model sizes up to 11B parameters. While generative retrieval is competitive with state-of-the-art dual encoders on small corpora, scaling to millions of passages remains an important and unsolved challenge.
arXiv Detail & Related papers (2023-05-19T17:33:38Z)
Semantic Search for Large Scale Clinical Ontologies [63.71950996116403]
We present a deep learning approach to build a search system for large clinical vocabularies. We propose a Triplet-BERT model and a method that generates training data based on semantic training data. The model is evaluated using five real benchmark data sets and the results show that our approach achieves high results on both free text to concept and concept to searching concept vocabularies.
arXiv Detail & Related papers (2022-01-01T05:15:42Z)
A Proposed Conceptual Framework for a Representational Approach to Information Retrieval [42.67826268399347]
This paper outlines a conceptual framework for understanding recent developments in information retrieval and natural language processing. I propose a representational approach that breaks the core text retrieval problem into a logical scoring model and a physical retrieval model.
arXiv Detail & Related papers (2021-10-04T15:57:02Z)
Coupled intrinsic and extrinsic human language resource-based query expansion [0.0]
We present here a query expansion framework which capitalizes on both linguistic characteristics for query constituent encoding, expansion concept extraction and concept weighting. A thorough empirical evaluation on real-world datasets validates our approach against unigram language model, relevance model and a sequential dependence based technique.
arXiv Detail & Related papers (2020-04-23T11:22:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.