DAT: Data Architecture Modeling Tool for Data-Driven Applications
- URL: http://arxiv.org/abs/2306.12182v2
- Date: Thu, 22 Jun 2023 17:49:59 GMT
- Title: DAT: Data Architecture Modeling Tool for Data-Driven Applications
- Authors: Moamin Abughazala, Henry Muccini, Mohammad Sharaf
- Abstract summary: Data Architecture (DA) focuses on describing, collecting, storing, processing, and analyzing the data to meet business needs.
We present the DAT, a model-driven engineering tool enabling data architects, data engineers, and other stakeholders to describe how data flows through the system.
- Score: 1.6037279419318131
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Data is the key to success for any Data-Driven Organization, and managing it
is considered the most challenging task. Data Architecture (DA) focuses on
describing, collecting, storing, processing, and analyzing the data to meet
business needs. In this tool demo paper, we present the DAT, a model-driven
engineering tool enabling data architects, data engineers, and other
stakeholders to describe how data flows through the system and provides a
blueprint for managing data that saves time and effort dedicated to Data
Architectures for IoT applications. We evaluated this work by modeling five
case studies, receiving expressiveness and ease of use feedback from two
companies, more than six researchers, and eighteen undergraduate students from
the software architecture course
Related papers
- DSBench: How Far Are Data Science Agents to Becoming Data Science Experts? [58.330879414174476]
We introduce DSBench, a benchmark designed to evaluate data science agents with realistic tasks.
This benchmark includes 466 data analysis tasks and 74 data modeling tasks, sourced from Eloquence and Kaggle competitions.
Our evaluation of state-of-the-art LLMs, LVLMs, and agents shows that they struggle with most tasks, with the best agent solving only 34.12% of data analysis tasks and achieving a 34.74% Relative Performance Gap (RPG)
arXiv Detail & Related papers (2024-09-12T02:08:00Z) - CRAFT Your Dataset: Task-Specific Synthetic Dataset Generation Through Corpus Retrieval and Augmentation [51.2289822267563]
We propose Corpus Retrieval and Augmentation for Fine-Tuning (CRAFT), a method for generating synthetic datasets.
We use large-scale public web-crawled corpora and similarity-based document retrieval to find other relevant human-written documents.
We demonstrate that CRAFT can efficiently generate large-scale task-specific training datasets for four diverse tasks.
arXiv Detail & Related papers (2024-09-03T17:54:40Z) - DiscoveryBench: Towards Data-Driven Discovery with Large Language Models [50.36636396660163]
We present DiscoveryBench, the first comprehensive benchmark that formalizes the multi-step process of data-driven discovery.
Our benchmark contains 264 tasks collected across 6 diverse domains, such as sociology and engineering.
Our benchmark, thus, illustrates the challenges in autonomous data-driven discovery and serves as a valuable resource for the community to make progress.
arXiv Detail & Related papers (2024-07-01T18:58:22Z) - An Integrated Data Processing Framework for Pretraining Foundation Models [57.47845148721817]
Researchers and practitioners often have to manually curate datasets from difference sources.
We propose a data processing framework that integrates a Processing Module and an Analyzing Module.
The proposed framework is easy to use and highly flexible.
arXiv Detail & Related papers (2024-02-26T07:22:51Z) - Architectural Design Decisions for Self-Serve Data Platforms in Data
Meshes [3.627365672061558]
Data mesh is an emerging decentralized approach to managing and generating value from analytical enterprise data at scale.
It shifts the ownership of the data to the business domains closest to the data, promotes sharing and managing data as autonomous products, and uses a federated and automated data governance model.
The data mesh relies on a managed data platform that offers services to domain and governance teams to build, share, and manage data products efficiently.
arXiv Detail & Related papers (2024-02-07T09:13:26Z) - Architecting Data-Intensive Applications : From Data Architecture Design
to Its Quality Assurance [0.0]
Data Architecture is crucial in describing, collecting, storing, processing, and analyzing data to meet business needs.
We have evaluated the DAT on more than five cases within various industry domains, demonstrating its exceptional adaptability and effectiveness.
arXiv Detail & Related papers (2024-01-22T14:58:54Z) - Serving Deep Learning Model in Relational Databases [70.53282490832189]
Serving deep learning (DL) models on relational data has become a critical requirement across diverse commercial and scientific domains.
We highlight three pivotal paradigms: The state-of-the-art DL-centric architecture offloads DL computations to dedicated DL frameworks.
The potential UDF-centric architecture encapsulates one or more tensor computations into User Defined Functions (UDFs) within the relational database management system (RDBMS)
arXiv Detail & Related papers (2023-10-07T06:01:35Z) - Modeling Data Analytics Architecture for Smart Cities Data-Driven
Applications using DAT [1.8945921149936187]
This article shares our experiences in developing a Data Analytics Architecture (DAA) using model-driven engineering for Data-Driven Smart Cities applications utilizing DAT.
DAA uses model-driven engineering for Data-Driven Smart Cities applications utilizing DAT.
arXiv Detail & Related papers (2023-07-17T21:52:57Z) - Data Architecture for Digital Object Space Management Service (DOSM)
using DAT [1.8945921149936187]
This work focuses on describing the movement of data, data formats, data location, data processing (batch or real-time), data storage technologies, and main operations on the data.
Data architecture is a complex task that involves describing the flow of data from its source to its destination.
arXiv Detail & Related papers (2023-06-22T14:22:56Z) - CateCom: a practical data-centric approach to categorization of
computational models [77.34726150561087]
We present an effort aimed at organizing the landscape of physics-based and data-driven computational models.
We apply object-oriented design concepts and outline the foundations of an open-source collaborative framework.
arXiv Detail & Related papers (2021-09-28T02:59:40Z) - HPTMT Parallel Operators for High Performance Data Science & Data
Engineering [0.0]
HPTMT architecture identifies a set of data structures, operators, and an execution model for creating rich data applications.
This paper elaborates and illustrates this architecture using an end-to-end application with deep learning and data engineering parts working together.
arXiv Detail & Related papers (2021-08-13T00:05:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.