The Tensor Data Platform: Towards an AI-centric Database System
- URL: http://arxiv.org/abs/2211.02753v1
- Date: Fri, 4 Nov 2022 21:26:16 GMT
- Title: The Tensor Data Platform: Towards an AI-centric Database System
- Authors: Apurva Gandhi, Yuki Asada, Victor Fu, Advitya Gemawat, Lihao Zhang,
Rathijit Sen, Carlo Curino, Jes\'us Camacho-Rodr\'iguez, Matteo Interlandi
- Abstract summary: We make the case that it is time to do the same for AI -- but with a twist!
We claim that achieving a truly AI-centric database requires moving the engine, at its core, from a relational to a tensor abstraction.
This allows us to: (1) support multi-modal data processing such as images, videos, audio, text as well as relational; (2) leverage the wellspring of innovation in HW and runtimes for tensor computation; and (3) exploit automatic differentiation to enable a novel class of "trainable" queries that can learn to perform a task.
- Score: 6.519203713828565
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Database engines have historically absorbed many of the innovations in data
processing, adding features to process graph data, XML, object oriented, and
text among many others. In this paper, we make the case that it is time to do
the same for AI -- but with a twist! While existing approaches have tried to
achieve this by integrating databases with external ML tools, in this paper we
claim that achieving a truly AI-centric database requires moving the DBMS
engine, at its core, from a relational to a tensor abstraction. This allows us
to: (1) support multi-modal data processing such as images, videos, audio, text
as well as relational; (2) leverage the wellspring of innovation in HW and
runtimes for tensor computation; and (3) exploit automatic differentiation to
enable a novel class of "trainable" queries that can learn to perform a task.
To support the above scenarios, we introduce TDP: a system that builds upon
our prior work mapping relational queries to tensors. Thanks to a tighter
integration with the tensor runtime, TDP is able to provide a broader coverage
of new emerging scenarios requiring access to multi-modal data and automatic
differentiation.
Related papers
- Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows? [73.81908518992161]
We introduce Spider2-V, the first multimodal agent benchmark focusing on professional data science and engineering.
Spider2-V features real-world tasks in authentic computer environments and incorporating 20 enterprise-level professional applications.
These tasks evaluate the ability of a multimodal agent to perform data-related tasks by writing code and managing the GUI in enterprise data software systems.
arXiv Detail & Related papers (2024-07-15T17:54:37Z) - Accelerated Cloud for Artificial Intelligence (ACAI) [24.40451195277244]
We propose an end-to-end cloud-based machine learning platform, Accelerated Cloud for AI (ACAI)
ACAI enables cloud-based storage of indexed, labeled, and searchable data, as well as automatic resource provisioning, job scheduling, and experiment tracking.
We show that our auto-provisioner produces a 1.7x speed-up and 39% cost reduction, and our system reduces experiment time for ML scientists by 20% on typical ML use cases.
arXiv Detail & Related papers (2024-01-30T07:09:48Z) - Relational Deep Learning: Graph Representation Learning on Relational
Databases [69.7008152388055]
We introduce an end-to-end representation approach to learn on data laid out across multiple tables.
Message Passing Graph Neural Networks can then automatically learn across the graph to extract representations that leverage all data input.
arXiv Detail & Related papers (2023-12-07T18:51:41Z) - Saturn: An Optimized Data System for Large Model Deep Learning Workloads [6.377812618046872]
We tackle SPASE: Select a Parallelism, Allocate resources, and SchedulE.
We propose a new information system architecture to tackle the SPASE problem holistically.
We find that direct use of an MILP-solver is significantly more effective than several baselines.
arXiv Detail & Related papers (2023-09-03T17:19:11Z) - SOLIS -- The MLOps journey from data acquisition to actionable insights [62.997667081978825]
In this paper we present a unified deployment pipeline and freedom-to-operate approach that supports all requirements while using basic cross-platform tensor framework and script language engines.
This approach however does not supply the needed procedures and pipelines for the actual deployment of machine learning capabilities in real production grade systems.
arXiv Detail & Related papers (2021-12-22T14:45:37Z) - Unsupervised Domain Adaptive Learning via Synthetic Data for Person
Re-identification [101.1886788396803]
Person re-identification (re-ID) has gained more and more attention due to its widespread applications in video surveillance.
Unfortunately, the mainstream deep learning methods still need a large quantity of labeled data to train models.
In this paper, we develop a data collector to automatically generate synthetic re-ID samples in a computer game, and construct a data labeler to simultaneously annotate them.
arXiv Detail & Related papers (2021-09-12T15:51:41Z) - Unsupervised Paraphrasing with Pretrained Language Models [85.03373221588707]
We propose a training pipeline that enables pre-trained language models to generate high-quality paraphrases in an unsupervised setting.
Our recipe consists of task-adaptation, self-supervision, and a novel decoding algorithm named Dynamic Blocking.
We show with automatic and human evaluations that our approach achieves state-of-the-art performance on both the Quora Question Pair and the ParaNMT datasets.
arXiv Detail & Related papers (2020-10-24T11:55:28Z) - Partially-Aligned Data-to-Text Generation with Distant Supervision [69.15410325679635]
We propose a new generation task called Partially-Aligned Data-to-Text Generation (PADTG)
It is more practical since it utilizes automatically annotated data for training and thus considerably expands the application domains.
Our framework outperforms all baseline models as well as verify the feasibility of utilizing partially-aligned data.
arXiv Detail & Related papers (2020-10-03T03:18:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.