Simplifying Data Integration: SLM-Driven Systems for Unified Semantic Queries Across Heterogeneous Databases
- URL: http://arxiv.org/abs/2504.05634v1
- Date: Tue, 08 Apr 2025 03:28:03 GMT
- Title: Simplifying Data Integration: SLM-Driven Systems for Unified Semantic Queries Across Heterogeneous Databases
- Authors: Teng Lin,
- Abstract summary: This paper presents a Small Language Model(SLM)-driven system that synergizes advancements in lightweight Retrieval-Augmented Generation (RAG) and semantic-aware data structuring.<n>By integrating MiniRAG's semantic-aware heterogeneous graph indexing and topology-enhanced retrieval with SLM-powered structured data extraction, our system addresses the limitations of traditional methods.<n> Experimental results demonstrate superior performance in accuracy and efficiency, while the introduction of semantic entropy as an unsupervised evaluation metric provides robust insights into model uncertainty.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The integration of heterogeneous databases into a unified querying framework remains a critical challenge, particularly in resource-constrained environments. This paper presents a novel Small Language Model(SLM)-driven system that synergizes advancements in lightweight Retrieval-Augmented Generation (RAG) and semantic-aware data structuring to enable efficient, accurate, and scalable query resolution across diverse data formats. By integrating MiniRAG's semantic-aware heterogeneous graph indexing and topology-enhanced retrieval with SLM-powered structured data extraction, our system addresses the limitations of traditional methods in handling Multi-Entity Question Answering (Multi-Entity QA) and complex semantic queries. Experimental results demonstrate superior performance in accuracy and efficiency, while the introduction of semantic entropy as an unsupervised evaluation metric provides robust insights into model uncertainty. This work pioneers a cost-effective, domain-agnostic solution for next-generation database systems.
Related papers
- HM-RAG: Hierarchical Multi-Agent Multimodal Retrieval Augmented Generation [11.53083922927901]
HM-RAG is a novel Hierarchical Multi-agent Multimodal RAG framework.
It pioneers collaborative intelligence for dynamic knowledge synthesis across structured, unstructured, and graph-based data.
arXiv Detail & Related papers (2025-04-13T06:55:33Z) - MultiConIR: Towards multi-condition Information Retrieval [57.6405602406446]
We introduce MultiConIR, the first benchmark designed to evaluate retrieval models in multi-condition scenarios.<n>We propose three tasks to assess retrieval and reranking models on multi-condition robustness, monotonic relevance ranking, and query format sensitivity.
arXiv Detail & Related papers (2025-03-11T05:02:03Z) - New Dataset and Methods for Fine-Grained Compositional Referring Expression Comprehension via Specialist-MLLM Collaboration [49.180693704510006]
Referring Expression (REC) is a cross-modal task that evaluates the interplay of language understanding, image comprehension, and language-to-image grounding.
We introduce a new REC dataset with two key features. First, it is designed with controllable difficulty levels, requiring fine-grained reasoning across object categories, attributes, and relationships.
Second, it incorporates negative text and images generated through fine-grained editing, explicitly testing a model's ability to reject non-existent targets.
arXiv Detail & Related papers (2025-02-27T13:58:44Z) - MiniRAG: Towards Extremely Simple Retrieval-Augmented Generation [22.512017529583332]
MiniRAG is a novel Retrieval-Augmented Generation (RAG) system designed for extreme simplicity and efficiency.<n>MiniRAG introduces two key technical innovations: (1) a semantic-aware heterogeneous graph indexing mechanism that combines text chunks and named entities in a unified structure, reducing reliance on complex semantic understanding, and (2) a lightweight topology-enhanced retrieval approach that leverages graph structures for efficient knowledge discovery without requiring advanced language capabilities.
arXiv Detail & Related papers (2025-01-12T04:44:06Z) - Dynamic Multi-Agent Orchestration and Retrieval for Multi-Source Question-Answer Systems using Large Language Models [0.0]
We propose a methodology that combines several advanced techniques in Large Language Model (LLM) retrieval to support the development of robust, multi-source question-answer systems.
This methodology is designed to integrate information from diverse data sources, through a coordinated multi-agent orchestration and dynamic retrieval approach.
Our results indicate that this approach enhances response accuracy and relevance, offering a versatile and scalable framework for developing question-answer systems.
arXiv Detail & Related papers (2024-12-23T20:28:20Z) - Adapting to Non-Stationary Environments: Multi-Armed Bandit Enhanced Retrieval-Augmented Generation on Knowledge Graphs [23.357843519762483]
Recent studies have demonstrated that leveraging the Retrieval-Augmented Generation framework, combined with Knowledge Graphs, robustly enhances the reasoning capabilities of Large language models.<n>We introduce a Multi-objective Multi-Armed Bandit enhanced RAG framework, supported by multiple retrieval methods with diverse capabilities.<n>Our method significantly outperforms baseline methods in non-stationary settings while achieving state-of-the-art performance in stationary environments.
arXiv Detail & Related papers (2024-12-10T15:56:03Z) - A Collaborative Multi-Agent Approach to Retrieval-Augmented Generation Across Diverse Data [0.0]
Retrieval-Augmented Generation (RAG) enhances Large Language Models (LLMs)<n>Traditional RAG systems typically use a single-agent architecture to handle query generation, data retrieval, and response synthesis.<n>This paper proposes a multi-agent RAG system to address these limitations.
arXiv Detail & Related papers (2024-12-08T07:18:19Z) - Matchmaker: Self-Improving Large Language Model Programs for Schema Matching [60.23571456538149]
We propose a compositional language model program for schema matching, comprised of candidate generation, refinement and confidence scoring.
Matchmaker self-improves in a zero-shot manner without the need for labeled demonstrations.
Empirically, we demonstrate on real-world medical schema matching benchmarks that Matchmaker outperforms previous ML-based approaches.
arXiv Detail & Related papers (2024-10-31T16:34:03Z) - DiscoveryBench: Towards Data-Driven Discovery with Large Language Models [50.36636396660163]
We present DiscoveryBench, the first comprehensive benchmark that formalizes the multi-step process of data-driven discovery.
Our benchmark contains 264 tasks collected across 6 diverse domains, such as sociology and engineering.
Our benchmark, thus, illustrates the challenges in autonomous data-driven discovery and serves as a valuable resource for the community to make progress.
arXiv Detail & Related papers (2024-07-01T18:58:22Z) - ACE: A Generative Cross-Modal Retrieval Framework with Coarse-To-Fine Semantic Modeling [53.97609687516371]
We propose a pioneering generAtive Cross-modal rEtrieval framework (ACE) for end-to-end cross-modal retrieval.
ACE achieves state-of-the-art performance in cross-modal retrieval and outperforms the strong baselines on Recall@1 by 15.27% on average.
arXiv Detail & Related papers (2024-06-25T12:47:04Z) - Adaptive-RAG: Learning to Adapt Retrieval-Augmented Large Language Models through Question Complexity [59.57065228857247]
Retrieval-augmented Large Language Models (LLMs) have emerged as a promising approach to enhancing response accuracy in several tasks, such as Question-Answering (QA)
We propose a novel adaptive QA framework, that can dynamically select the most suitable strategy for (retrieval-augmented) LLMs based on the query complexity.
We validate our model on a set of open-domain QA datasets, covering multiple query complexities, and show that ours enhances the overall efficiency and accuracy of QA systems.
arXiv Detail & Related papers (2024-03-21T13:52:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.