Related papers: SPAR: Scholar Paper Retrieval with LLM-based Agents for Enhanced Academic Search

Related papers

UniPAR: A Unified Framework for Pedestrian Attribute Recognition [14.613498516126498]
We propose UniPAR, a unified Transformer-based framework for Pedestrian Attribute Recognition.<n>By incorporating a unified data scheduling strategy and a dynamic classification head, UniPAR enables a single model to simultaneously process diverse datasets.<n> Experimental results on the widely used benchmark datasets, including MSP60K, DukeMTMC, and EventPAR, demonstrate that UniPAR achieves performance comparable to specialized SOTA methods.
arXiv Detail & Related papers (2026-03-05T12:34:35Z)
SAGE: Benchmarking and Improving Retrieval for Deep Research Agents [60.53966065867568]
We introduce SAGE, a benchmark for scientific literature retrieval comprising 1,200 queries across four scientific domains, with a 200,000 paper retrieval corpus.<n>We evaluate six deep research agents and find that all systems struggle with reasoning-intensive retrieval.<n> BM25 significantly outperforms LLM-based retrievers by approximately 30%, as existing agents generate keyword-oriented sub-queries.
arXiv Detail & Related papers (2026-02-05T18:25:24Z)
SPAR: Session-based Pipeline for Adaptive Retrieval on Legacy File Systems [6.5637131627375505]
SPAR (Session-based Pipeline for Adaptive Retrieval) is a conceptual framework that integrates Large Language Models into a Retrieval-Augmented Generation (RAG) architecture specifically designed for legacy enterprise environments.<n>Unlike conventional RAG pipelines, SPAR employs a lightweight two-stage process: a semantic Metadata Index is first created, after which session-specific vector databases are dynamically generated on demand.<n>This design reduces computational overhead while improving transparency, controllability, and relevance in retrieval.
arXiv Detail & Related papers (2025-12-15T02:54:10Z)
MARAG-R1: Beyond Single Retriever via Reinforcement-Learned Multi-Tool Agentic Retrieval [50.30107119622642]
Large Language Models (LLMs) excel at reasoning and generation but are inherently limited by static pretraining data.<n>Retrieval-Augmented Generation (RAG) addresses this issue by grounding LLMs in external knowledge.<n>MarAG-R1 is a reinforcement-learned multi-tool RAG framework that enables LLMs to dynamically coordinate multiple retrieval mechanisms.
arXiv Detail & Related papers (2025-10-31T15:51:39Z)
LLMs as Sparse Retrievers:A Framework for First-Stage Product Search [103.70006474544364]
Product search is a crucial component of modern e-commerce platforms, with billions of user queries every day.<n>Sparse retrieval methods suffer from severe vocabulary mismatch issues, leading to suboptimal performance in product search scenarios.<n>With their potential for semantic analysis, large language models (LLMs) offer a promising avenue for mitigating vocabulary mismatch issues.<n>We propose PROSPER, a framework for PROduct search leveraging LLMs as SParsE Retrievers.
arXiv Detail & Related papers (2025-10-21T11:13:21Z)
Rethinking On-policy Optimization for Query Augmentation [49.87723664806526]
We present the first systematic comparison of prompting-based and RL-based query augmentation across diverse benchmarks.<n>We introduce a novel hybrid method, On-policy Pseudo-document Query Expansion (OPQE), which learns to generate a pseudo-document that maximizes retrieval performance.
arXiv Detail & Related papers (2025-10-20T04:16:28Z)
Reasoning-enhanced Query Understanding through Decomposition and Interpretation [87.56450566014625]
ReDI is a Reasoning-enhanced approach for query understanding through Decomposition and Interpretation.<n>We compiled a large-scale dataset of real-world complex queries from a major search engine.<n> Experiments on BRIGHT and BEIR demonstrate that ReDI consistently surpasses strong baselines in both sparse and dense retrieval paradigms.
arXiv Detail & Related papers (2025-09-08T10:58:42Z)
PRGB Benchmark: A Robust Placeholder-Assisted Algorithm for Benchmarking Retrieval-Augmented Generation [15.230902967865925]
Retrieval-Augmented Generation (RAG) enhances large language models (LLMs) by integrating external knowledge.<n>Current benchmarks emphasize broad aspects such as noise robustness, but lack a systematic and granular evaluation framework on document utilization.<n>Our benchmark provides a reproducible framework for developing more reliable and efficient RAG systems.
arXiv Detail & Related papers (2025-07-23T16:14:08Z)
Learning Efficient and Generalizable Graph Retriever for Knowledge-Graph Question Answering [75.12322966980003]
Large Language Models (LLMs) have shown strong inductive reasoning ability across various domains.<n>Most existing RAG pipelines rely on unstructured text, limiting interpretability and structured reasoning.<n>Recent studies have explored integrating knowledge graphs with LLMs for knowledge graph question answering.<n>We propose RAPL, a novel framework for efficient and effective graph retrieval in KGQA.
arXiv Detail & Related papers (2025-06-11T12:03:52Z)
Exp4Fuse: A Rank Fusion Framework for Enhanced Sparse Retrieval using Large Language Model-based Query Expansion [0.0]
Large Language Models (LLMs) have shown potential in generating hypothetical documents for query expansion.<n>We introduce a novel fusion ranking framework, Exp4Fuse, which enhances the performance of sparse retrievers.
arXiv Detail & Related papers (2025-06-05T08:44:34Z)
Iterative Self-Incentivization Empowers Large Language Models as Agentic Searchers [74.17516978246152]
Large language models (LLMs) have been widely integrated into information retrieval to advance traditional techniques.<n>We propose EXSEARCH, an agentic search framework, where the LLM learns to retrieve useful information as the reasoning unfolds.<n>Experiments on four knowledge-intensive benchmarks show that EXSEARCH substantially outperforms baselines.
arXiv Detail & Related papers (2025-05-26T15:27:55Z)
SeRL: Self-Play Reinforcement Learning for Large Language Models with Limited Data [65.56911325914582]
We propose Self-play Reinforcement Learning (SeRL) to bootstrap Large Language Models (LLMs) training with limited initial data.<n>The proposed SeRL yields results superior to its counterparts and achieves performance on par with those obtained by high-quality data with verifiable rewards.
arXiv Detail & Related papers (2025-05-25T13:28:04Z)
Pseudo Relevance Feedback is Enough to Close the Gap Between Small and Large Dense Retrieval Models [29.934928091542375]
Scaling dense retrievers to larger large language model (LLM) backbones has been a dominant strategy for improving their retrieval effectiveness.<n>We introduce PromptPRF, a feature-based pseudo-relevance feedback (PRF) framework that enables small LLM-based dense retrievers to achieve effectiveness comparable to much larger models.
arXiv Detail & Related papers (2025-03-19T04:30:20Z)
Reducing Hallucinations in Language Model-based SPARQL Query Generation Using Post-Generation Memory Retrieval [9.860751439256754]
Large language models (LLMs) are susceptible to hallucinations and out-of-distribution errors when producing KG elements.<n>This has led to increased research aimed at detecting and mitigating such errors.<n>In this paper, we introduce PGMR, a modular framework that incorporates a non-parametric memory module to retrieve KG elements.
arXiv Detail & Related papers (2025-02-19T02:08:13Z)
Learning More Effective Representations for Dense Retrieval through Deliberate Thinking Before Search [65.53881294642451]
Deliberate Thinking based Dense Retriever (DEBATER)<n>DEBATER enhances recent dense retrievers by enabling them to learn more effective document representations through a step-by-step thinking process.<n> Experimental results show that DEBATER significantly outperforms existing methods across several retrieval benchmarks.
arXiv Detail & Related papers (2025-02-18T15:56:34Z)
Large Language Model Can Be a Foundation for Hidden Rationale-Based Retrieval [12.83513794686623]
In this paper, we propose and study a more challenging type of retrieval task, called hidden rationale retrieval.<n>To address such problems, an instruction-tuned Large language model (LLM) with a cross-encoder architecture could be a reasonable choice.<n>We name this retrieval framework by RaHoRe and verify its zero-shot and fine-tuned performance superiority on Emotional Support Conversation (ESC)
arXiv Detail & Related papers (2024-12-21T13:19:15Z)
A Thorough Performance Benchmarking on Lightweight Embedding-based Recommender Systems [67.52782366565658]
State-of-the-art recommender systems (RSs) depend on categorical features, which ecoded by embedding vectors, resulting in excessively large embedding tables.<n>Despite the prosperity of lightweight embedding-based RSs, a wide diversity is seen in evaluation protocols.<n>This study investigates various LERS' performance, efficiency, and cross-task transferability via a thorough benchmarking process.
arXiv Detail & Related papers (2024-06-25T07:45:00Z)
Synergistic Interplay between Search and Large Language Models for Information Retrieval [141.18083677333848]
InteR allows RMs to expand knowledge in queries using LLM-generated knowledge collections. InteR achieves overall superior zero-shot retrieval performance compared to state-of-the-art methods.
arXiv Detail & Related papers (2023-05-12T11:58:15Z)
UnifieR: A Unified Retriever for Large-Scale Retrieval [84.61239936314597]
Large-scale retrieval is to recall relevant documents from a huge collection given a query. Recent retrieval methods based on pre-trained language models (PLM) can be coarsely categorized into either dense-vector or lexicon-based paradigms. We propose a new learning framework, UnifieR which unifies dense-vector and lexicon-based retrieval in one model with a dual-representing capability.
arXiv Detail & Related papers (2022-05-23T11:01:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.