NeurDB: On the Design and Implementation of an AI-powered Autonomous Database
- URL: http://arxiv.org/abs/2408.03013v1
- Date: Tue, 6 Aug 2024 07:48:51 GMT
- Title: NeurDB: On the Design and Implementation of an AI-powered Autonomous Database
- Authors: Zhanhao Zhao, Shaofeng Cai, Haotian Gao, Hexiang Pan, Siqi Xiang, Naili Xing, Gang Chen, Beng Chin Ooi, Yanyan Shen, Yuncheng Wu, Meihui Zhang,
- Abstract summary: This paper introduces NeurDB, an AI-powered autonomous database.
NeurDB deepens the fusion of AI and databases with adaptability to data and workload drift.
Empirical evaluations demonstrate that NeurDB substantially outperforms existing solutions in managing AI analytics tasks.
- Score: 27.13518136879994
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Databases are increasingly embracing AI to provide autonomous system optimization and intelligent in-database analytics, aiming to relieve end-user burdens across various industry sectors. Nonetheless, most existing approaches fail to account for the dynamic nature of databases, which renders them ineffective for real-world applications characterized by evolving data and workloads. This paper introduces NeurDB, an AI-powered autonomous database that deepens the fusion of AI and databases with adaptability to data and workload drift. NeurDB establishes a new in-database AI ecosystem that seamlessly integrates AI workflows within the database. This integration enables efficient and effective in-database AI analytics and fast-adaptive learned system components. Empirical evaluations demonstrate that NeurDB substantially outperforms existing solutions in managing AI analytics tasks, with the proposed learned components more effectively handling environmental dynamism than state-of-the-art approaches.
Related papers
- LAMBDA: A Large Model Based Data Agent [7.240586338370509]
We introduce LArge Model Based Data Agent (LAMBDA), a novel open-source, code-free multi-agent data analysis system.
LAMBDA is designed to address data analysis challenges in complex data-driven applications.
It has the potential to enhance data analysis paradigms by seamlessly integrating human and artificial intelligence.
arXiv Detail & Related papers (2024-07-24T06:26:36Z) - NeurDB: An AI-powered Autonomous Data System [44.14807794638682]
We present NeurDB, an AI-powered autonomous data system designed to fully embrace AI design in each major system component.
We outline the conceptual and architectural overview of NeurDB, discuss its design choices and key components, and report its current development and future plan.
arXiv Detail & Related papers (2024-05-07T00:51:48Z) - Powering In-Database Dynamic Model Slicing for Structured Data Analytics [31.360239181279525]
We introduce LEADS, a novel dynamic model slicing technique to customize models for specifiedsql queries.
LEADS improves the predictive modeling of structured data via the mixture of experts (MoE) and maintains efficiency by a SQL-aware gating network.
Our experiments on real-world datasets demonstrate that LEADS consistently outperforms the baseline models.
arXiv Detail & Related papers (2024-05-01T15:18:12Z) - Automated Fusion of Multimodal Electronic Health Records for Better
Medical Predictions [48.0590120095748]
We propose a novel neural architecture search (NAS) framework named AutoFM, which can automatically search for the optimal model architectures for encoding diverse input modalities and fusion strategies.
We conduct thorough experiments on real-world multi-modal EHR data and prediction tasks, and the results demonstrate that our framework achieves significant performance improvement over existing state-of-the-art methods.
arXiv Detail & Related papers (2024-01-20T15:14:14Z) - Efficient Architecture Search via Bi-level Data Pruning [70.29970746807882]
This work pioneers an exploration into the critical role of dataset characteristics for DARTS bi-level optimization.
We introduce a new progressive data pruning strategy that utilizes supernet prediction dynamics as the metric.
Comprehensive evaluations on the NAS-Bench-201 search space, DARTS search space, and MobileNet-like search space validate that BDP reduces search costs by over 50%.
arXiv Detail & Related papers (2023-12-21T02:48:44Z) - A Scalable Space-efficient In-database Interpretability Framework for
Embedding-based Semantic SQL Queries [3.0938904602244346]
We introduce a new co-occurrence based interpretability approach to capture relationships between relational entities.
Our approach provides both query-agnostic (global) and query-specific (local) interpretabilities.
arXiv Detail & Related papers (2023-02-23T17:18:40Z) - Analytical Engines With Context-Rich Processing: Towards Efficient
Next-Generation Analytics [12.317930859033149]
We envision an analytical engine co-optimized with components that enable context-rich analysis.
We aim for a holistic pipeline cost- and rule-based optimization across relational and model-based operators.
arXiv Detail & Related papers (2022-12-14T21:46:33Z) - Distributed intelligence on the Edge-to-Cloud Continuum: A systematic
literature review [62.997667081978825]
This review aims at providing a comprehensive vision of the main state-of-the-art libraries and frameworks for machine learning and data analytics available today.
The main simulation, emulation, deployment systems, and testbeds for experimental research on the Edge-to-Cloud Continuum available today are also surveyed.
arXiv Detail & Related papers (2022-04-29T08:06:05Z) - Nemo: Guiding and Contextualizing Weak Supervision for Interactive Data
Programming [77.38174112525168]
We present Nemo, an end-to-end interactive Supervision system that improves overall productivity of WS learning pipeline by an average 20% (and up to 47% in one task) compared to the prevailing WS supervision approach.
arXiv Detail & Related papers (2022-03-02T19:57:32Z) - Unsupervised Domain Adaptive Learning via Synthetic Data for Person
Re-identification [101.1886788396803]
Person re-identification (re-ID) has gained more and more attention due to its widespread applications in video surveillance.
Unfortunately, the mainstream deep learning methods still need a large quantity of labeled data to train models.
In this paper, we develop a data collector to automatically generate synthetic re-ID samples in a computer game, and construct a data labeler to simultaneously annotate them.
arXiv Detail & Related papers (2021-09-12T15:51:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.