Intelligent Scientific Literature Explorer using Machine Learning (ISLE)
- URL: http://arxiv.org/abs/2512.12760v1
- Date: Sun, 14 Dec 2025 16:54:24 GMT
- Title: Intelligent Scientific Literature Explorer using Machine Learning (ISLE)
- Authors: Sina Jani, Arman Heidari, Amirmohammad Anvari, Zahra Rahimi,
- Abstract summary: This paper presents an integrated system for scientific literature exploration that combines large-scale data acquisition, hybrid retrieval, semantic topic modeling, and heterogeneous knowledge graph construction.<n>The proposed framework contributes a foundation for AI-assisted scientific discovery.
- Score: 0.797970449705065
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The rapid acceleration of scientific publishing has created substantial challenges for researchers attempting to discover, contextualize, and interpret relevant literature. Traditional keyword-based search systems provide limited semantic understanding, while existing AI-driven tools typically focus on isolated tasks such as retrieval, clustering, or bibliometric visualization. This paper presents an integrated system for scientific literature exploration that combines large-scale data acquisition, hybrid retrieval, semantic topic modeling, and heterogeneous knowledge graph construction. The system builds a comprehensive corpus by merging full-text data from arXiv with structured metadata from OpenAlex. A hybrid retrieval architecture fuses BM25 lexical search with embedding-based semantic search using Reciprocal Rank Fusion. Topic modeling is performed on retrieved results using BERTopic or non-negative matrix factorization depending on computational resources. A knowledge graph unifies papers, authors, institutions, countries, and extracted topics into an interpretable structure. The system provides a multi-layered exploration environment that reveals not only relevant publications but also the conceptual and relational landscape surrounding a query. Evaluation across multiple queries demonstrates improvements in retrieval relevance, topic coherence, and interpretability. The proposed framework contributes an extensible foundation for AI-assisted scientific discovery.
Related papers
- ReSearch: A Multi-Stage Machine Learning Framework for Earth Science Data Discovery [6.780086370528623]
We introduce textbfReSearch, a multi-stage, reasoning-enhanced search framework that formulates Earth Science data discovery.<n>ReSearch integrates lexical search, semantic embeddings, abbreviation expansion, and large language model reranking within a unified architecture.<n>Experiments demonstrate that ReSearch consistently improves recall and ranking performance over baseline methods.
arXiv Detail & Related papers (2026-01-20T17:27:12Z) - Scaling Beyond Context: A Survey of Multimodal Retrieval-Augmented Generation for Document Understanding [61.36285696607487]
Document understanding is critical for applications from financial analysis to scientific discovery.<n>Current approaches, whether OCR-based pipelines feeding Large Language Models (LLMs) or native Multimodal LLMs (MLLMs) face key limitations.<n>Retrieval-Augmented Generation (RAG) helps ground models in external data, but documents' multimodal nature, combining text, tables, charts, and layout, demands a more advanced paradigm: Multimodal RAG.
arXiv Detail & Related papers (2025-10-17T02:33:16Z) - Cross-Granularity Hypergraph Retrieval-Augmented Generation for Multi-hop Question Answering [49.43814054718318]
Multi-hop question answering (MHQA) requires integrating knowledge scattered across multiple passages to derive the correct answer.<n>Traditional retrieval-augmented generation (RAG) methods primarily focus on coarse-grained textual semantic similarity.<n>We propose a novel RAG approach called HGRAG for MHQA that achieves cross-granularity integration of structural and semantic information via hypergraphs.
arXiv Detail & Related papers (2025-08-15T06:36:13Z) - HySemRAG: A Hybrid Semantic Retrieval-Augmented Generation Framework for Automated Literature Synthesis and Methodological Gap Analysis [55.2480439325792]
HySemRAG is a framework that combines Extract, Transform, Load (ETL) pipelines with Retrieval-Augmented Generation (RAG)<n>System addresses limitations in existing RAG architectures through a multi-layered approach.
arXiv Detail & Related papers (2025-08-01T20:30:42Z) - OnSET: Ontology and Semantic Exploration Toolkit [5.1293983340834055]
We propose a Semantic system, Ontology and Exploration Toolkit (OnSET)<n>OnSET allows non-expert users to easily build queries with visual user guidance provided by topic modelling and semantic search.<n>OnSET combines efficient and open platforms to deploy the system on commodity hardware.
arXiv Detail & Related papers (2025-04-11T09:18:06Z) - CS-PaperSum: A Large-Scale Dataset of AI-Generated Summaries for Scientific Papers [3.929864777332447]
CS-PaperSum is a large-scale dataset of 91,919 papers from 31 top-tier computer science conferences.<n>Our dataset enables automated literature analysis, research trend forecasting, and AI-driven scientific discovery.
arXiv Detail & Related papers (2025-02-27T22:48:35Z) - GeAR: Generation Augmented Retrieval [82.20696567697016]
This paper introduces a novel method, $textbfGe$neration.<n>It improves the global document-Query similarity through contrastive learning, but also integrates well-designed fusion and decoding modules.<n>When used as a retriever, GeAR does not incur any additional computational cost over bi-encoders.
arXiv Detail & Related papers (2025-01-06T05:29:00Z) - ClusterChat: Multi-Feature Search for Corpus Exploration [3.4123736336071864]
ClusterChat is an open-source system for corpus exploration that integrates cluster-based organization of documents.<n>We validate the system with two case studies on a four million abstract PubMed dataset.
arXiv Detail & Related papers (2024-12-19T05:11:16Z) - Conversational Exploratory Search of Scholarly Publications Using Knowledge Graphs [3.3916160303055567]
We develop a conversational search system for exploring scholarly publications using a knowledge graph.
To assess the system's effectiveness, we employed various performance metrics and conducted a human evaluation with 40 participants.
arXiv Detail & Related papers (2024-10-01T06:16:07Z) - pathfinder: A Semantic Framework for Literature Review and Knowledge Discovery in Astronomy [2.6952253149772996]
Pathfinder is a machine learning framework designed to enable literature review and knowledge discovery in astronomy.
Our framework couples advanced retrieval techniques with LLM-based synthesis to search astronomical literature by semantic context.
It addresses complexities of jargon, named entities, and temporal aspects through time-based and citation-based weighting schemes.
arXiv Detail & Related papers (2024-08-02T20:05:24Z) - Retrieval-Enhanced Machine Learning: Synthesis and Opportunities [60.34182805429511]
Retrieval-enhancement can be extended to a broader spectrum of machine learning (ML)
This work introduces a formal framework of this paradigm, Retrieval-Enhanced Machine Learning (REML), by synthesizing the literature in various domains in ML with consistent notations which is missing from the current literature.
The goal of this work is to equip researchers across various disciplines with a comprehensive, formally structured framework of retrieval-enhanced models, thereby fostering interdisciplinary future research.
arXiv Detail & Related papers (2024-07-17T20:01:21Z) - AceMap: Knowledge Discovery through Academic Graph [90.12694363549483]
AceMap is an academic system designed for knowledge discovery through academic graph.
We present advanced database construction techniques to build the comprehensive AceMap database.
AceMap provides advanced analysis capabilities, including tracing the evolution of academic ideas.
arXiv Detail & Related papers (2024-03-05T01:17:56Z) - A New Neural Search and Insights Platform for Navigating and Organizing
AI Research [56.65232007953311]
We introduce a new platform, AI Research Navigator, that combines classical keyword search with neural retrieval to discover and organize relevant literature.
We give an overview of the overall architecture of the system and of the components for document analysis, question answering, search, analytics, expert search, and recommendations.
arXiv Detail & Related papers (2020-10-30T19:12:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.