Leveraging Cognitive Search Patterns to Enhance Automated Natural
Language Retrieval Performance
- URL: http://arxiv.org/abs/2004.10035v1
- Date: Tue, 21 Apr 2020 14:13:33 GMT
- Title: Leveraging Cognitive Search Patterns to Enhance Automated Natural
Language Retrieval Performance
- Authors: Bhawani Selvaretnam, Mohammed Belkhatir
- Abstract summary: We show that cognitive reformulation patterns that mimic user search behaviour are highlighted.
We formalize the application of these patterns by considering a query conceptual representation.
A genetic algorithm-based weighting process allows placing emphasis on terms according to their conceptual role-type.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The search of information in large text repositories has been plagued by the
so-called document-query vocabulary gap, i.e. the semantic discordance between
the contents in the stored document entities on the one hand and the human
query on the other hand. Over the past two decades, a significant body of works
has advanced technical retrieval prowess while several studies have shed light
on issues pertaining to human search behavior. We believe that these efforts
should be conjoined, in the sense that automated retrieval systems have to
fully emulate human search behavior and thus consider the procedure according
to which users incrementally enhance their initial query. To this end,
cognitive reformulation patterns that mimic user search behaviour are
highlighted and enhancement terms which are statistically collocated with or
lexical-semantically related to the original terms adopted in the retrieval
process. We formalize the application of these patterns by considering a query
conceptual representation and introducing a set of operations allowing to
operate modifications on the initial query. A genetic algorithm-based weighting
process allows placing emphasis on terms according to their conceptual
role-type. An experimental evaluation on real-world datasets against relevance,
language, conceptual and knowledge-based models is conducted. We also show,
when compared to language and relevance models, a better performance in terms
of mean average precision than a word embedding-based model instantiation.
Related papers
- VectorSearch: Enhancing Document Retrieval with Semantic Embeddings and
Optimized Search [1.0411820336052784]
We propose VectorSearch, which leverages advanced algorithms, embeddings, and indexing techniques for refined retrieval.
By utilizing innovative multi-vector search operations and encoding searches with advanced language models, our approach significantly improves retrieval accuracy.
Experiments on real-world datasets show that VectorSearch outperforms baseline metrics.
arXiv Detail & Related papers (2024-09-25T21:58:08Z) - Query-oriented Data Augmentation for Session Search [71.84678750612754]
We propose query-oriented data augmentation to enrich search logs and empower the modeling.
We generate supplemental training pairs by altering the most important part of a search context.
We develop several strategies to alter the current query, resulting in new training data with varying degrees of difficulty.
arXiv Detail & Related papers (2024-07-04T08:08:33Z) - Enhancing Retrieval-Augmented LMs with a Two-stage Consistency Learning Compressor [4.35807211471107]
This work proposes a novel two-stage consistency learning approach for retrieved information compression in retrieval-augmented language models.
The proposed method is empirically validated across multiple datasets, demonstrating notable enhancements in precision and efficiency for question-answering tasks.
arXiv Detail & Related papers (2024-06-04T12:43:23Z) - Enhancing Cloud-Based Large Language Model Processing with Elasticsearch
and Transformer Models [17.09116903102371]
Large Language Models (LLMs) are a class of generative AI models built using the Transformer network.
LLMs are capable of leveraging vast datasets to identify, summarize, translate, predict, and generate language.
Semantic vector search within large language models is a potent technique that can significantly enhance search result accuracy and relevance.
arXiv Detail & Related papers (2024-02-24T12:31:22Z) - Dense X Retrieval: What Retrieval Granularity Should We Use? [56.90827473115201]
Often-overlooked design choice is the retrieval unit in which the corpus is indexed, e.g. document, passage, or sentence.
We introduce a novel retrieval unit, proposition, for dense retrieval.
Experiments reveal that indexing a corpus by fine-grained units such as propositions significantly outperforms passage-level units in retrieval tasks.
arXiv Detail & Related papers (2023-12-11T18:57:35Z) - How Does Generative Retrieval Scale to Millions of Passages? [68.98628807288972]
We conduct the first empirical study of generative retrieval techniques across various corpus scales.
We scale generative retrieval to millions of passages with a corpus of 8.8M passages and evaluating model sizes up to 11B parameters.
While generative retrieval is competitive with state-of-the-art dual encoders on small corpora, scaling to millions of passages remains an important and unsolved challenge.
arXiv Detail & Related papers (2023-05-19T17:33:38Z) - Recommender Systems with Generative Retrieval [58.454606442670034]
We propose a novel generative retrieval approach, where the retrieval model autoregressively decodes the identifiers of the target candidates.
To that end, we create semantically meaningful of codewords to serve as a Semantic ID for each item.
We show that recommender systems trained with the proposed paradigm significantly outperform the current SOTA models on various datasets.
arXiv Detail & Related papers (2023-05-08T21:48:17Z) - UnifieR: A Unified Retriever for Large-Scale Retrieval [84.61239936314597]
Large-scale retrieval is to recall relevant documents from a huge collection given a query.
Recent retrieval methods based on pre-trained language models (PLM) can be coarsely categorized into either dense-vector or lexicon-based paradigms.
We propose a new learning framework, UnifieR which unifies dense-vector and lexicon-based retrieval in one model with a dual-representing capability.
arXiv Detail & Related papers (2022-05-23T11:01:59Z) - Semantic Search for Large Scale Clinical Ontologies [63.71950996116403]
We present a deep learning approach to build a search system for large clinical vocabularies.
We propose a Triplet-BERT model and a method that generates training data based on semantic training data.
The model is evaluated using five real benchmark data sets and the results show that our approach achieves high results on both free text to concept and concept to searching concept vocabularies.
arXiv Detail & Related papers (2022-01-01T05:15:42Z) - A Proposed Conceptual Framework for a Representational Approach to
Information Retrieval [42.67826268399347]
This paper outlines a conceptual framework for understanding recent developments in information retrieval and natural language processing.
I propose a representational approach that breaks the core text retrieval problem into a logical scoring model and a physical retrieval model.
arXiv Detail & Related papers (2021-10-04T15:57:02Z) - Coupled intrinsic and extrinsic human language resource-based query
expansion [0.0]
We present here a query expansion framework which capitalizes on both linguistic characteristics for query constituent encoding, expansion concept extraction and concept weighting.
A thorough empirical evaluation on real-world datasets validates our approach against unigram language model, relevance model and a sequential dependence based technique.
arXiv Detail & Related papers (2020-04-23T11:22:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.