Architecting Data-Intensive Applications : From Data Architecture Design
to Its Quality Assurance
- URL: http://arxiv.org/abs/2401.12011v3
- Date: Sat, 9 Mar 2024 14:16:39 GMT
- Title: Architecting Data-Intensive Applications : From Data Architecture Design
to Its Quality Assurance
- Authors: Moamin Abughazala
- Abstract summary: Data Architecture is crucial in describing, collecting, storing, processing, and analyzing data to meet business needs.
We have evaluated the DAT on more than five cases within various industry domains, demonstrating its exceptional adaptability and effectiveness.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Context - The exponential growth of data is becoming a significant concern.
Managing this data has become incredibly challenging, especially when dealing
with various sources in different formats and speeds. Moreover, Ensuring data
quality has become increasingly crucial for effective decision-making and
operational processes. Data Architecture is crucial in describing, collecting,
storing, processing, and analyzing data to meet business needs. Providing an
abstract view of data-intensive applications is essential to ensure that the
data is transformed into valuable information. We must take these challenges
seriously to ensure we can effectively manage and use the data to our
advantage. Objective - To establish an architecture framework that enables a
comprehensive description of the data architecture and effectively streamlines
data quality monitoring. Method - The architecture framework utilizes Model
Driven Engineering (MDE) techniques. Its backing of data-intensive architecture
descriptions empowers with an automated generation for data quality checks.
Result - The Framework offers a comprehensive solution for data-intensive
applications to model their architecture efficiently and monitor the quality of
their data. It automates the entire process and ensures precision and
consistency in data. With DAT, architects and analysts gain access to a
powerful tool that simplifies their workflow and empowers them to make informed
decisions based on reliable data insights. Conclusion - We have evaluated the
DAT on more than five cases within various industry domains, demonstrating its
exceptional adaptability and effectiveness.
Related papers
- Data Advisor: Dynamic Data Curation for Safety Alignment of Large Language Models [79.65071553905021]
We propose Data Advisor, a method for generating data that takes into account the characteristics of the desired dataset.
Data Advisor monitors the status of the generated data, identifies weaknesses in the current dataset, and advises the next iteration of data generation.
arXiv Detail & Related papers (2024-10-07T17:59:58Z) - DSBench: How Far Are Data Science Agents to Becoming Data Science Experts? [58.330879414174476]
We introduce DSBench, a benchmark designed to evaluate data science agents with realistic tasks.
This benchmark includes 466 data analysis tasks and 74 data modeling tasks, sourced from Eloquence and Kaggle competitions.
Our evaluation of state-of-the-art LLMs, LVLMs, and agents shows that they struggle with most tasks, with the best agent solving only 34.12% of data analysis tasks and achieving a 34.74% Relative Performance Gap (RPG)
arXiv Detail & Related papers (2024-09-12T02:08:00Z) - Best Practices and Lessons Learned on Synthetic Data [83.63271573197026]
The success of AI models relies on the availability of large, diverse, and high-quality datasets.
Synthetic data has emerged as a promising solution by generating artificial data that mimics real-world patterns.
arXiv Detail & Related papers (2024-04-11T06:34:17Z) - An Integrated Data Processing Framework for Pretraining Foundation Models [57.47845148721817]
Researchers and practitioners often have to manually curate datasets from difference sources.
We propose a data processing framework that integrates a Processing Module and an Analyzing Module.
The proposed framework is easy to use and highly flexible.
arXiv Detail & Related papers (2024-02-26T07:22:51Z) - Architectural Design Decisions for Self-Serve Data Platforms in Data
Meshes [3.627365672061558]
Data mesh is an emerging decentralized approach to managing and generating value from analytical enterprise data at scale.
It shifts the ownership of the data to the business domains closest to the data, promotes sharing and managing data as autonomous products, and uses a federated and automated data governance model.
The data mesh relies on a managed data platform that offers services to domain and governance teams to build, share, and manage data products efficiently.
arXiv Detail & Related papers (2024-02-07T09:13:26Z) - Modeling Data Analytics Architecture for Smart Cities Data-Driven
Applications using DAT [1.8945921149936187]
This article shares our experiences in developing a Data Analytics Architecture (DAA) using model-driven engineering for Data-Driven Smart Cities applications utilizing DAT.
DAA uses model-driven engineering for Data-Driven Smart Cities applications utilizing DAT.
arXiv Detail & Related papers (2023-07-17T21:52:57Z) - Data Architecture for Digital Object Space Management Service (DOSM)
using DAT [1.8945921149936187]
This work focuses on describing the movement of data, data formats, data location, data processing (batch or real-time), data storage technologies, and main operations on the data.
Data architecture is a complex task that involves describing the flow of data from its source to its destination.
arXiv Detail & Related papers (2023-06-22T14:22:56Z) - DAT: Data Architecture Modeling Tool for Data-Driven Applications [1.6037279419318131]
Data Architecture (DA) focuses on describing, collecting, storing, processing, and analyzing the data to meet business needs.
We present the DAT, a model-driven engineering tool enabling data architects, data engineers, and other stakeholders to describe how data flows through the system.
arXiv Detail & Related papers (2023-06-21T11:24:59Z) - CateCom: a practical data-centric approach to categorization of
computational models [77.34726150561087]
We present an effort aimed at organizing the landscape of physics-based and data-driven computational models.
We apply object-oriented design concepts and outline the foundations of an open-source collaborative framework.
arXiv Detail & Related papers (2021-09-28T02:59:40Z) - AdaXpert: Adapting Neural Architecture for Growing Data [63.30393509048505]
In real-world applications, data often come in a growing manner, where the data volume and the number of classes may increase dynamically.
Given the increasing data volume or the number of classes, one has to instantaneously adjust the neural model capacity to obtain promising performance.
Existing methods either ignore the growing nature of data or seek to independently search an optimal architecture for a given dataset.
arXiv Detail & Related papers (2021-07-01T07:22:05Z) - Towards an Integrated Platform for Big Data Analysis [4.5257812998381315]
This paper presents the vision of an integrated plat-form for big data analysis that combines all these aspects.
Main benefits of this approach are an enhanced scalability of the whole platform, a better parameterization of algorithms, and an improved usability during the end-to-end data analysis process.
arXiv Detail & Related papers (2020-04-27T03:15:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.