Exploring new Approaches for Information Retrieval through Natural Language Processing
- URL: http://arxiv.org/abs/2505.02199v1
- Date: Sun, 04 May 2025 17:37:26 GMT
- Title: Exploring new Approaches for Information Retrieval through Natural Language Processing
- Authors: Manak Raj, Nidhi Mishra,
- Abstract summary: This review paper explores recent advancements and emerging approaches in Information Retrieval (IR) applied to Natural Language Processing (NLP)<n>We examine traditional IR models such as Boolean, vector space, probabilistic, and inference network models, and highlight modern techniques including deep learning, reinforcement learning, and pretrained transformer models like BERT.<n>A comparative analysis of sparse, dense, and hybrid retrieval methods is presented, along with applications in web search engines, cross-language IR, argument mining, private information retrieval, and hate speech detection.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This review paper explores recent advancements and emerging approaches in Information Retrieval (IR) applied to Natural Language Processing (NLP). We examine traditional IR models such as Boolean, vector space, probabilistic, and inference network models, and highlight modern techniques including deep learning, reinforcement learning, and pretrained transformer models like BERT. We discuss key tools and libraries - Lucene, Anserini, and Pyserini - for efficient text indexing and search. A comparative analysis of sparse, dense, and hybrid retrieval methods is presented, along with applications in web search engines, cross-language IR, argument mining, private information retrieval, and hate speech detection. Finally, we identify open challenges and future research directions to enhance retrieval accuracy, scalability, and ethical considerations.
Related papers
- From Web Search towards Agentic Deep Research: Incentivizing Search with Reasoning Agents [96.65646344634524]
Large Language Models (LLMs), endowed with reasoning and agentic capabilities, are ushering in a new paradigm termed Agentic Deep Research.<n>We trace the evolution from static web search to interactive, agent-based systems that plan, explore, and learn.<n>We demonstrate that Agentic Deep Research not only significantly outperforms existing approaches, but is also poised to become the dominant paradigm for future information seeking.
arXiv Detail & Related papers (2025-06-23T17:27:19Z) - Crafting Knowledge: Exploring the Creative Mechanisms of Chat-Based
Search Engines [3.5845457075304368]
This research aims to dissect the mechanisms through which an LLM-powered search engine, specifically Bing Chat, selects information sources for its responses.
Bing Chat exhibits a preference for content that is not only readable and formally structured, but also demonstrates lower perplexity levels.
Our investigation documents a greater similarity among websites cited by RAG technologies compared to those ranked highest by conventional search engines.
arXiv Detail & Related papers (2024-02-29T18:20:37Z) - Utilizing BERT for Information Retrieval: Survey, Applications,
Resources, and Challenges [4.588192657854766]
This survey focuses on approaches that apply pretrained transformer encoders like BERT to information retrieval (IR)
We group them into six high-level categories: (i) handling long documents, (ii) integrating semantic information, (iii) balancing effectiveness and efficiency, (iv) predicting the weights of terms, (v) query expansion, and (vi) document expansion.
We find that for specific tasks, finely tuned BERT encoders still outperform, and at a lower deployment cost.
arXiv Detail & Related papers (2024-02-18T23:22:40Z) - Navigating the Knowledge Sea: Planet-scale answer retrieval using LLMs [0.0]
Information retrieval is characterized by a continuous refinement of techniques and technologies.
This paper focuses on the role of Large Language Models (LLMs) in bridging the gap between traditional search methods and the emerging paradigm of answer retrieval.
arXiv Detail & Related papers (2024-02-07T23:39:40Z) - RegaVAE: A Retrieval-Augmented Gaussian Mixture Variational Auto-Encoder
for Language Modeling [79.56442336234221]
We introduce RegaVAE, a retrieval-augmented language model built upon the variational auto-encoder (VAE)
It encodes the text corpus into a latent space, capturing current and future information from both source and target text.
Experimental results on various datasets demonstrate significant improvements in text generation quality and hallucination removal.
arXiv Detail & Related papers (2023-10-16T16:42:01Z) - Large Language Models for Information Retrieval: A Survey [58.30439850203101]
Information retrieval has evolved from term-based methods to its integration with advanced neural models.
Recent research has sought to leverage large language models (LLMs) to improve IR systems.
We delve into the confluence of LLMs and IR systems, including crucial aspects such as query rewriters, retrievers, rerankers, and readers.
arXiv Detail & Related papers (2023-08-14T12:47:22Z) - Synergistic Interplay between Search and Large Language Models for
Information Retrieval [141.18083677333848]
InteR allows RMs to expand knowledge in queries using LLM-generated knowledge collections.
InteR achieves overall superior zero-shot retrieval performance compared to state-of-the-art methods.
arXiv Detail & Related papers (2023-05-12T11:58:15Z) - Explainability of Text Processing and Retrieval Methods: A Critical
Survey [1.5320737596132752]
This article provides a broad overview of research on the explainability and interpretability of natural language processing and information retrieval methods.
More specifically, we survey approaches that have been applied to explain word embeddings, sequence modeling, attention modules, transformers, BERT, and document ranking.
arXiv Detail & Related papers (2022-12-14T09:25:49Z) - CorpusBrain: Pre-train a Generative Retrieval Model for
Knowledge-Intensive Language Tasks [62.22920673080208]
Single-step generative model can dramatically simplify the search process and be optimized in end-to-end manner.
We name the pre-trained generative retrieval model as CorpusBrain as all information about the corpus is encoded in its parameters without the need of constructing additional index.
arXiv Detail & Related papers (2022-08-16T10:22:49Z) - A New Neural Search and Insights Platform for Navigating and Organizing
AI Research [56.65232007953311]
We introduce a new platform, AI Research Navigator, that combines classical keyword search with neural retrieval to discover and organize relevant literature.
We give an overview of the overall architecture of the system and of the components for document analysis, question answering, search, analytics, expert search, and recommendations.
arXiv Detail & Related papers (2020-10-30T19:12:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.