A Systematic Review of FAIR-compliant Big Data Software Reference Architectures
- URL: http://arxiv.org/abs/2509.14370v1
- Date: Wed, 17 Sep 2025 19:10:39 GMT
- Title: A Systematic Review of FAIR-compliant Big Data Software Reference Architectures
- Authors: João Pedro de Carvalho Castro, Maria Júlia Soares De Grandi, Cristina Dutra de Aguiar,
- Abstract summary: The FAIR Principles emphasize the importance of making scientific data Findable, Accessible, Interoperable, and Reusable.<n>This article conducts a systematic review of research efforts focused on architectural solutions for such repositories.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: To meet the standards of the Open Science movement, the FAIR Principles emphasize the importance of making scientific data Findable, Accessible, Interoperable, and Reusable. Yet, creating a repository that adheres to these principles presents significant challenges. Managing large volumes of diverse research data and metadata, often generated rapidly, requires a precise approach. This necessity has led to the development of Software Reference Architectures (SRAs) to guide the implementation process for FAIR-compliant repositories. This article conducts a systematic review of research efforts focused on architectural solutions for such repositories. We detail our methodology, covering all activities undertaken in the planning and execution phases of the review. We analyze 323 references from reputable sources and expert recommendations, identifying 7 studies on general-purpose big data SRAs, 13 pipelines implementing FAIR Principles in specific contexts, and 3 FAIR-compliant big data SRAs. We provide a thorough description of their key features and assess whether the research questions posed in the planning phase were adequately addressed. Additionally, we discuss the limitations of the retrieved studies and identify tendencies and opportunities for further research.
Related papers
- GISA: A Benchmark for General Information-Seeking Assistant [102.30831921333755]
GISA is a benchmark for General Information-Seeking Assistants comprising 373 human-crafted queries.<n>It integrates both deep reasoning and broad information aggregation within unified tasks, and includes a live subset with periodically updated answers to resist memorization.<n>Experiments on mainstream LLMs and commercial search products reveal that even the best-performing model achieves only 19.30% exact match score.
arXiv Detail & Related papers (2026-02-09T11:44:15Z) - Let the Barbarians In: How AI Can Accelerate Systems Performance Research [80.43506848683633]
We term this iterative cycle of generation, evaluation, and refinement AI-Driven Research for Systems.<n>We demonstrate that ADRS-generated solutions can match or even outperform human state-of-the-art designs.
arXiv Detail & Related papers (2025-12-16T18:51:23Z) - Deep Research: A Systematic Survey [118.82795024422722]
Deep Research (DR) aims to combine the reasoning capabilities of large language models with external tools, such as search engines.<n>This survey presents a comprehensive and systematic overview of deep research systems.
arXiv Detail & Related papers (2025-11-24T15:28:28Z) - Deep Research: A Survey of Autonomous Research Agents [33.96146020332329]
The rapid advancement of large language models (LLMs) has driven the development of agentic systems capable of autonomously performing complex tasks.<n>To overcome these limitations, the paradigm of deep research has been proposed, wherein agents actively engage in planning, retrieval, and synthesis to generate comprehensive and faithful analytical reports grounded in web-based evidence.<n>We provide a systematic overview of the deep research pipeline, which comprises four core stages: planning, question developing, web exploration, and report generation.
arXiv Detail & Related papers (2025-08-18T09:26:14Z) - Towards Agentic RAG with Deep Reasoning: A Survey of RAG-Reasoning Systems in LLMs [69.10441885629787]
Retrieval-Augmented Generation (RAG) lifts the factuality of Large Language Models (LLMs) by injecting external knowledge.<n>It falls short on problems that demand multi-step inference; conversely, purely reasoning-oriented approaches often hallucinate or mis-ground facts.<n>This survey synthesizes both strands under a unified reasoning-retrieval perspective.
arXiv Detail & Related papers (2025-07-13T03:29:41Z) - HiRA: A Hierarchical Reasoning Framework for Decoupled Planning and Execution in Deep Search [85.12447821237045]
HiRA is a hierarchical framework that separates strategic planning from specialized execution.<n>Our approach decomposes complex search tasks into focused subtasks, assigns each subtask to domain-specific agents equipped with external tools and reasoning capabilities.<n> Experiments on four complex, cross-modal deep search benchmarks demonstrate that HiRA significantly outperforms state-of-the-art RAG and agent-based systems.
arXiv Detail & Related papers (2025-07-03T14:18:08Z) - Research Knowledge Graphs: the Shifting Paradigm of Scholarly Information Representation [2.967893090870586]
Research Knowledge Graphs (RKGs) aim at providing an easy to use and machine-actionable representation of research artifacts and their relations.<n>This paper provides the first conceptualisation of the RKG vision, a categorisation of in-use RKGs together with a description of RKG building blocks and principles.
arXiv Detail & Related papers (2025-06-08T21:10:30Z) - Insight-RAG: Enhancing LLMs with Insight-Driven Augmentation [4.390998479503661]
We propose Insight-RAG, a novel framework designed to retrieve documents based on insights.<n>In the initial stage of Insight-RAG, instead of using traditional retrieval methods, we employ an LLM to analyze the input query and task.<n>By integrating the original query with the retrieved insights, similar to conventional RAG approaches, we employ a final LLM to generate a contextually enriched and accurate response.
arXiv Detail & Related papers (2025-03-31T19:50:27Z) - A Comprehensive Survey on Composed Image Retrieval [54.54527281731775]
Composed Image Retrieval (CIR) is an emerging yet challenging task that allows users to search for target images using a multimodal query.<n>There is currently no comprehensive review of CIR to provide a timely overview of this field.<n>We synthesize insights from over 120 publications in top conferences and journals, including ACM TOIS, SIGIR, and CVPR.
arXiv Detail & Related papers (2025-02-19T01:37:24Z) - StructRAG: Boosting Knowledge Intensive Reasoning of LLMs via Inference-time Hybrid Information Structurization [94.31508613367296]
Retrieval-augmented generation (RAG) is a key means to effectively enhance large language models (LLMs)
We propose StructRAG, which can identify the optimal structure type for the task at hand, reconstruct original documents into this structured format, and infer answers based on the resulting structure.
Experiments show that StructRAG achieves state-of-the-art performance, particularly excelling in challenging scenarios.
arXiv Detail & Related papers (2024-10-11T13:52:44Z) - STaRK: Benchmarking LLM Retrieval on Textual and Relational Knowledge Bases [93.96463520716759]
We develop STARK, a large-scale Semi-structure retrieval benchmark on Textual and Knowledge Bases.
Our benchmark covers three domains: product search, academic paper search, and queries in precision medicine.
We design a novel pipeline to synthesize realistic user queries that integrate diverse relational information and complex textual properties.
arXiv Detail & Related papers (2024-04-19T22:54:54Z) - FAIR Enough: How Can We Develop and Assess a FAIR-Compliant Dataset for Large Language Models' Training? [3.0406004578714008]
The rapid evolution of Large Language Models highlights the necessity for ethical considerations and data integrity in AI development.
While FAIR principles are crucial for ethical data stewardship, their specific application in the context of LLM training data remains an under-explored area.
We propose a novel framework designed to integrate FAIR principles into the LLM development lifecycle.
arXiv Detail & Related papers (2024-01-19T21:21:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.