A Retrieval-Augmented Generation Framework for Academic Literature Navigation in Data Science
- URL: http://arxiv.org/abs/2412.15404v1
- Date: Thu, 19 Dec 2024 21:14:54 GMT
- Title: A Retrieval-Augmented Generation Framework for Academic Literature Navigation in Data Science
- Authors: Ahmet Yasin Aytar, Kemal Kilic, Kamer Kaya,
- Abstract summary: This paper presents an enhanced Retrieval-Augmented Generation application, an artificial intelligence (AI)-based system designed to assist data scientists in accessing precise and contextually relevant academic resources.
The AI-powered application integrates advanced techniques, including the GeneRation Of BIbliographic Data (GROBID) technique for extracting information.
A comprehensive evaluation using the Retrieval-Augmented Generation Assessment System (RAGAS) framework demonstrates substantial improvements in key metrics.
- Score: 2.5398014196797614
- License:
- Abstract: In the rapidly evolving field of data science, efficiently navigating the expansive body of academic literature is crucial for informed decision-making and innovation. This paper presents an enhanced Retrieval-Augmented Generation (RAG) application, an artificial intelligence (AI)-based system designed to assist data scientists in accessing precise and contextually relevant academic resources. The AI-powered application integrates advanced techniques, including the GeneRation Of BIbliographic Data (GROBID) technique for extracting bibliographic information, fine-tuned embedding models, semantic chunking, and an abstract-first retrieval method, to significantly improve the relevance and accuracy of the retrieved information. This implementation of AI specifically addresses the challenge of academic literature navigation. A comprehensive evaluation using the Retrieval-Augmented Generation Assessment System (RAGAS) framework demonstrates substantial improvements in key metrics, particularly Context Relevance, underscoring the system's effectiveness in reducing information overload and enhancing decision-making processes. Our findings highlight the potential of this enhanced Retrieval-Augmented Generation system to transform academic exploration within data science, ultimately advancing the workflow of research and innovation in the field.
Related papers
- Graph Foundation Models for Recommendation: A Comprehensive Survey [55.70529188101446]
Large language models (LLMs) are designed to process and comprehend natural language, making both approaches highly effective and widely adopted.
Recent research has focused on graph foundation models (GFMs)
GFMs integrate the strengths of GNNs and LLMs to model complex RS problems more efficiently by leveraging the graph-based structure of user-item relationships alongside textual understanding.
arXiv Detail & Related papers (2025-02-12T12:13:51Z) - A Comprehensive Survey on Integrating Large Language Models with Knowledge-Based Methods [4.686190098233778]
The paper highlights the benefits of integrating generative AI with knowledge bases, including improved data contextualization, enhanced model accuracy, and better utilization of knowledge resources.
The findings provide a detailed overview of the current state of research, identify key gaps, and offer actionable recommendations.
arXiv Detail & Related papers (2025-01-19T23:25:21Z) - Leveraging Retrieval-Augmented Generation for Persian University Knowledge Retrieval [2.749898166276854]
This paper introduces an innovative approach using Retrieval-Augmented Generation (RAG) pipelines with Large Language Models (LLMs)
By systematically extracting data from the university official webpage, we generate accurate, contextually relevant responses to user queries.
Our experimental results demonstrate significant improvements in the precision and relevance of generated responses.
arXiv Detail & Related papers (2024-11-09T17:38:01Z) - Machine Learning Innovations in CPR: A Comprehensive Survey on Enhanced Resuscitation Techniques [52.71395121577439]
This survey paper explores the transformative role of Machine Learning (ML) and Artificial Intelligence (AI) in Cardiopulmonary Resuscitation (CPR)
It highlights the impact of predictive modeling, AI-enhanced devices, and real-time data analysis in improving resuscitation outcomes.
The paper provides a comprehensive overview, classification, and critical analysis of current applications, challenges, and future directions in this emerging field.
arXiv Detail & Related papers (2024-11-03T18:01:50Z) - A Comprehensive Survey of Retrieval-Augmented Generation (RAG): Evolution, Current Landscape and Future Directions [0.0]
RAG combines retrieval mechanisms with generative language models to enhance the accuracy of outputs.
Recent research breakthroughs are discussed, highlighting novel methods for improving retrieval efficiency.
Future research directions are proposed, focusing on improving the robustness of RAG models.
arXiv Detail & Related papers (2024-10-03T22:29:47Z) - DNS-Rec: Data-aware Neural Architecture Search for Recommender Systems [79.76519917171261]
This paper addresses the computational overhead and resource inefficiency prevalent in Sequential Recommender Systems (SRSs)
We introduce an innovative approach combining pruning methods with advanced model designs.
Our principal contribution is the development of a Data-aware Neural Architecture Search for Recommender System (DNS-Rec)
arXiv Detail & Related papers (2024-02-01T07:22:52Z) - Harnessing Retrieval-Augmented Generation (RAG) for Uncovering Knowledge
Gaps [0.0]
The study demonstrates the effectiveness of the RAG system in generating relevant suggestions with a consistent accuracy of 93%.
The results highlight the value of identifying and understanding knowledge gaps to guide future endeavours.
arXiv Detail & Related papers (2023-12-12T23:22:57Z) - The Efficiency Spectrum of Large Language Models: An Algorithmic Survey [54.19942426544731]
The rapid growth of Large Language Models (LLMs) has been a driving force in transforming various domains.
This paper examines the multi-faceted dimensions of efficiency essential for the end-to-end algorithmic development of LLMs.
arXiv Detail & Related papers (2023-12-01T16:00:25Z) - A Study on the Implementation of Generative AI Services Using an
Enterprise Data-Based LLM Application Architecture [0.0]
This study presents a method for implementing generative AI services by utilizing the Large Language Models (LLM) application architecture.
The research delves into strategies for mitigating the issue of inadequate data, offering tailored solutions.
A significant contribution of this work is the development of a Retrieval-Augmented Generation (RAG) model.
arXiv Detail & Related papers (2023-09-03T07:03:17Z) - Research Trends and Applications of Data Augmentation Algorithms [77.34726150561087]
We identify the main areas of application of data augmentation algorithms, the types of algorithms used, significant research trends, their progression over time and research gaps in data augmentation literature.
We expect readers to understand the potential of data augmentation, as well as identify future research directions and open questions within data augmentation research.
arXiv Detail & Related papers (2022-07-18T11:38:32Z) - Retrieval-Enhanced Machine Learning [110.5237983180089]
We describe a generic retrieval-enhanced machine learning framework, which includes a number of existing models as special cases.
REML challenges information retrieval conventions, presenting opportunities for novel advances in core areas, including optimization.
REML research agenda lays a foundation for a new style of information access research and paves a path towards advancing machine learning and artificial intelligence.
arXiv Detail & Related papers (2022-05-02T21:42:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.