Discourse-Aware Scientific Paper Recommendation via QA-Style Summarization and Multi-Level Contrastive Learning
- URL: http://arxiv.org/abs/2511.03330v1
- Date: Wed, 05 Nov 2025 09:55:12 GMT
- Title: Discourse-Aware Scientific Paper Recommendation via QA-Style Summarization and Multi-Level Contrastive Learning
- Authors: Shenghua Wang, Zhen Yin,
- Abstract summary: OMRC-MR is a hierarchical framework that integrates QA-style OMRC summarization, multi-level contrastive learning, and structure-aware re-ranking for scholarly recommendation.<n> Experiments on DBLP, S2ORC, and the newly constructed Sci-OMRC dataset demonstrate that OMRC-MR consistently surpasses state-of-the-art baselines.
- Score: 2.105564340986074
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The rapid growth of open-access (OA) publications has intensified the challenge of identifying relevant scientific papers. Due to privacy constraints and limited access to user interaction data, recent efforts have shifted toward content-based recommendation, which relies solely on textual information. However, existing models typically treat papers as unstructured text, neglecting their discourse organization and thereby limiting semantic completeness and interpretability. To address these limitations, we propose OMRC-MR, a hierarchical framework that integrates QA-style OMRC (Objective, Method, Result, Conclusion) summarization, multi-level contrastive learning, and structure-aware re-ranking for scholarly recommendation. The QA-style summarization module converts raw papers into structured and discourse-consistent representations, while multi-level contrastive objectives align semantic representations across metadata, section, and document levels. The final re-ranking stage further refines retrieval precision through contextual similarity calibration. Experiments on DBLP, S2ORC, and the newly constructed Sci-OMRC dataset demonstrate that OMRC-MR consistently surpasses state-of-the-art baselines, achieving up to 7.2% and 3.8% improvements in Precision@10 and Recall@10, respectively. Additional evaluations confirm that QA-style summarization produces more coherent and factually complete representations. Overall, OMRC-MR provides a unified and interpretable content-based paradigm for scientific paper recommendation, advancing trustworthy and privacy-aware scholarly information retrieval.
Related papers
- OmniReview: A Large-scale Benchmark and LLM-enhanced Framework for Realistic Reviewer Recommendation [22.223973340236594]
Profiling Scholars with Multi-gate Mixture-of-Experts (Pro-MMoE) is a novel framework that synergizes Large Language Models (LLMs) with Multi-task Learning.<n>Pro-MMoE achieves state-of-the-art performance across six of seven metrics, establishing a new benchmark for realistic reviewer recommendation.
arXiv Detail & Related papers (2026-02-09T16:57:35Z) - RPC-Bench: A Fine-grained Benchmark for Research Paper Comprehension [65.81339691942757]
RPC-Bench is a large-scale question-answering benchmark built from review-rebuttal exchanges of high-quality computer science papers.<n>We design a fine-grained taxonomy aligned with the scientific research flow to assess models' ability to understand and answer why, what, and how questions in scholarly contexts.
arXiv Detail & Related papers (2026-01-14T11:37:00Z) - RIGOURATE: Quantifying Scientific Exaggeration with Evidence-Aligned Claim Evaluation [29.44948404858214]
RIGOURATE retrieves supporting evidence from a paper's body and assigns each claim an overstatement score.<n>The framework consists of a dataset of over 10K claim-evidence sets from ICLR and NeurIPS papers.
arXiv Detail & Related papers (2026-01-07T19:36:08Z) - DeepSynth-Eval: Objectively Evaluating Information Consolidation in Deep Survey Writing [53.85037373860246]
We introduce Deep Synth-Eval, a benchmark designed to objectively evaluate information consolidation capabilities.<n>We propose a fine-grained evaluation protocol using General Checklists (for factual coverage) and Constraint Checklists (for structural organization)<n>Our results demonstrate that agentic plan-and-write significantly outperform single-turn generation.
arXiv Detail & Related papers (2026-01-07T03:07:52Z) - Leveraging Hierarchical Organization for Medical Multi-document Summarization [9.907620975399185]
We investigate whether incorporating hierarchical structures in the inputs of medical multi-document summarization can improve a model's ability to organize and contextualize information across documents.<n>Our results show that human experts prefer model-generated summaries over human-written summaries.
arXiv Detail & Related papers (2025-10-27T08:18:02Z) - CoT Referring: Improving Referring Expression Tasks with Grounded Reasoning [67.18702329644526]
CoT Referring enhances model reasoning across modalities through a structured, chain-of-thought training data structure.<n>We restructure the training data to enforce a new output form, providing new annotations for existing datasets.<n>We also integrate detection and segmentation capabilities into a unified MLLM framework, training it with a novel adaptive weighted loss to optimize performance.
arXiv Detail & Related papers (2025-10-03T08:50:21Z) - Context-Aware Hierarchical Taxonomy Generation for Scientific Papers via LLM-Guided Multi-Aspect Clustering [59.54662810933882]
Existing taxonomy construction methods, leveraging unsupervised clustering or direct prompting of large language models, often lack coherence and granularity.<n>We propose a novel context-aware hierarchical taxonomy generation framework that integrates LLM-guided multi-aspect encoding with dynamic clustering.
arXiv Detail & Related papers (2025-09-23T15:12:58Z) - Beyond Chunking: Discourse-Aware Hierarchical Retrieval for Long Document Question Answering [51.7493726399073]
We present a discourse-aware hierarchical framework to enhance long document question answering.<n>The framework involves three key innovations: specialized discourse parsing for lengthy documents, LLM-based enhancement of discourse relation nodes, and structure-guided hierarchical retrieval.
arXiv Detail & Related papers (2025-05-26T14:45:12Z) - CoRank: LLM-Based Compact Reranking with Document Features for Scientific Retrieval [30.341167520613197]
First-stage retrieval is often suboptimal in the scientific domain, so relevant documents are ranked lower.<n>We propose CoRank, a training-free, model-agnostic reranking framework for scientific retrieval.
arXiv Detail & Related papers (2025-05-19T22:10:27Z) - XtraGPT: Context-Aware and Controllable Academic Paper Revision [43.263488839387584]
We propose a human-AI collaboration framework for academic paper revision centered on criteria-guided intent alignment and context-aware modeling.<n>We instantiate the framework in XtraGPT, the first suite of open-source LLMs for context-aware, instruction-guided writing assistance.
arXiv Detail & Related papers (2025-05-16T15:02:19Z) - Can LLMs Generate Tabular Summaries of Science Papers? Rethinking the Evaluation Protocol [83.90769864167301]
Literature review tables are essential for summarizing and comparing collections of scientific papers.<n>We explore the task of generating tables that best fulfill a user's informational needs given a collection of scientific papers.<n>Our contributions focus on three key challenges encountered in real-world use: (i) User prompts are often under-specified; (ii) Retrieved candidate papers frequently contain irrelevant content; and (iii) Task evaluation should move beyond shallow text similarity techniques.
arXiv Detail & Related papers (2025-04-14T14:52:28Z) - Beyond Factual Accuracy: Evaluating Coverage of Diverse Factual Information in Long-form Text Generation [56.82274763974443]
ICAT is an evaluation framework for measuring coverage of diverse factual information in long-form text generation.<n>It computes the alignment between the atomic factual claims and various aspects expected to be presented in the output.<n>Our framework offers interpretable and fine-grained analysis of diversity and coverage.
arXiv Detail & Related papers (2025-01-07T05:43:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.