A Novel Micro-service Based Platform for Composition, Deployment and
Execution of BDA Applications
- URL: http://arxiv.org/abs/2202.02845v1
- Date: Sun, 6 Feb 2022 20:36:17 GMT
- Title: A Novel Micro-service Based Platform for Composition, Deployment and
Execution of BDA Applications
- Authors: Davide Profeta, Nicola Masi, Domenico Messina, Davide Dalle Carbonare,
Susanna Bonura, Vito Morreale
- Abstract summary: ALIDA aims to achieve a unified platform that allows both BDA application developers and data analysts to interact with it.
Developers will be able to register new BDA applications through the exposed API and/or through the web user interface.
Data analysts will be able to use the BDA applications provided to create batch/stream through a dashboard user interface to manipulate and subsequently visualize results from one or more sources.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Big Data are growing at an exponential rate and it becomes necessary the use
of tools and technologies to manage, process and visualize them in order to
extract value. In this paper a micro-service based platform is presented for
the composition, deployment and execution of Big Data Analytics (BDA)
application workflows in several domains and scenarios is presented. ALIDA is a
result coming from previous research activities by ENGINEERING. It aims to
achieve a unified platform that allows both BDA application developers and data
analysts to interact with it. Developers will be able to register new BDA
applications through the exposed API and/or through the web user interface.
Data analysts will be able to use the BDA applications provided to create
batch/stream workflows through a dashboard user interface to manipulate and
subsequently visualize results from one or more sources. The platform also
supports the auto-tuning of Big Data frameworks deployment properties to
improve metrics for analytics application. ALIDA has been properly extended and
integrated into a software solution for the analysis of large amounts of data
from the avionic industries. A use case within this context is then presented.
Related papers
- Towards an Integrated Performance Framework for Fire Science and Management Workflows [0.0]
This paper presents an artificial intelligence and machine learning (AI/ML) approach to performance assessment and optimization.
An associated early AI/ML framework spanning performance data collection, prediction and optimization is applied to wildfire science applications.
arXiv Detail & Related papers (2024-07-30T22:37:25Z) - LAMBDA: A Large Model Based Data Agent [7.240586338370509]
We introduce LArge Model Based Data Agent (LAMBDA), a novel open-source, code-free multi-agent data analysis system.
LAMBDA is designed to address data analysis challenges in complex data-driven applications.
It has the potential to enhance data analysis paradigms by seamlessly integrating human and artificial intelligence.
arXiv Detail & Related papers (2024-07-24T06:26:36Z) - Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows? [73.81908518992161]
We introduce Spider2-V, the first multimodal agent benchmark focusing on professional data science and engineering.
Spider2-V features real-world tasks in authentic computer environments and incorporating 20 enterprise-level professional applications.
These tasks evaluate the ability of a multimodal agent to perform data-related tasks by writing code and managing the GUI in enterprise data software systems.
arXiv Detail & Related papers (2024-07-15T17:54:37Z) - OpenDataLab: Empowering General Artificial Intelligence with Open Datasets [53.22840149601411]
This paper introduces OpenDataLab, a platform designed to bridge the gap between diverse data sources and the need for unified data processing.
OpenDataLab integrates a wide range of open-source AI datasets and enhances data acquisition efficiency through intelligent querying and high-speed downloading services.
We anticipate that OpenDataLab will significantly boost artificial general intelligence (AGI) research and facilitate advancements in related AI fields.
arXiv Detail & Related papers (2024-06-04T10:42:01Z) - EDA Corpus: A Large Language Model Dataset for Enhanced Interaction with OpenROAD [0.2581187101462483]
We present an open-source dataset tailored for OpenROAD, a widely adopted open-source EDA toolchain.
The dataset features over 1000 data points and is structured in two formats: (i) a pairwise set comprised of question prompts with prose answers, and (ii) a pairwise set comprised of code prompts and their corresponding OpenROAD scripts.
arXiv Detail & Related papers (2024-05-04T21:29:37Z) - Collaborative business intelligence virtual assistant [1.9953434933575993]
This study focuses on the applications of data mining within distributed virtual teams through the interaction of users and a CBI Virtual Assistant.
The proposed virtual assistant for CBI endeavors to enhance data exploration accessibility for a wider range of users and streamline the time and effort required for data analysis.
arXiv Detail & Related papers (2023-12-20T05:34:12Z) - GAIA Search: Hugging Face and Pyserini Interoperability for NLP Training
Data Exploration [97.68234051078997]
We discuss how Pyserini can be integrated with the Hugging Face ecosystem of open-source AI libraries and artifacts.
We include a Jupyter Notebook-based walk through the core interoperability features, available on GitHub.
We present GAIA Search - a search engine built following previously laid out principles, giving access to four popular large-scale text collections.
arXiv Detail & Related papers (2023-06-02T12:09:59Z) - Demonstration of InsightPilot: An LLM-Empowered Automated Data
Exploration System [48.62158108517576]
We introduce InsightPilot, an automated data exploration system designed to simplify the data exploration process.
InsightPilot automatically selects appropriate analysis intents, such as understanding, summarizing, and explaining.
In brief, an IQuery is an abstraction and automation of data analysis operations, which mimics the approach of data analysts.
arXiv Detail & Related papers (2023-04-02T07:27:49Z) - Modular approach to data preprocessing in ALOHA and application to a
smart industry use case [0.0]
The paper addresses a modular approach, integrated into the ALOHA tool flow, to support the data preprocessing and transformation pipeline.
To demonstrate the effectiveness of the approach, we present some experimental results related to a keyword spotting use case.
arXiv Detail & Related papers (2021-02-02T06:48:51Z) - Bandit Data-Driven Optimization [62.01362535014316]
There are four major pain points that a machine learning pipeline must overcome in order to be useful in settings.
We introduce bandit data-driven optimization, the first iterative prediction-prescription framework to address these pain points.
We propose PROOF, a novel algorithm for this framework and formally prove that it has no-regret.
arXiv Detail & Related papers (2020-08-26T17:50:49Z) - PyODDS: An End-to-end Outlier Detection System with Automated Machine
Learning [55.32009000204512]
We present PyODDS, an automated end-to-end Python system for Outlier Detection with Database Support.
Specifically, we define the search space in the outlier detection pipeline, and produce a search strategy within the given search space.
It also provides unified interfaces and visualizations for users with or without data science or machine learning background.
arXiv Detail & Related papers (2020-03-12T03:30:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.