What can Data-Centric AI Learn from Data and ML Engineering?
- URL: http://arxiv.org/abs/2112.06439v1
- Date: Mon, 13 Dec 2021 06:40:05 GMT
- Title: What can Data-Centric AI Learn from Data and ML Engineering?
- Authors: Neoklis Polyzotis and Matei Zaharia
- Abstract summary: Data-centric AI is a new and exciting research topic in the AI community.
Many organizations already build and maintain various "data-centric" applications.
We discuss several lessons from data and ML engineering that could be interesting to apply in data-centric AI.
- Score: 17.247372757533185
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Data-centric AI is a new and exciting research topic in the AI community, but
many organizations already build and maintain various "data-centric"
applications whose goal is to produce high quality data. These range from
traditional business data processing applications (e.g., "how much should we
charge each of our customers this month?") to production ML systems such as
recommendation engines. The fields of data and ML engineering have arisen in
recent years to manage these applications, and both include many interesting
novel tools and processes. In this paper, we discuss several lessons from data
and ML engineering that could be interesting to apply in data-centric AI, based
on our experience building data and ML platforms that serve thousands of
applications at a range of organizations.
Related papers
- Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows? [73.81908518992161]
We introduce Spider2-V, the first multimodal agent benchmark focusing on professional data science and engineering.
Spider2-V features real-world tasks in authentic computer environments and incorporating 20 enterprise-level professional applications.
These tasks evaluate the ability of a multimodal agent to perform data-related tasks by writing code and managing the GUI in enterprise data software systems.
arXiv Detail & Related papers (2024-07-15T17:54:37Z) - A Systematic Literature Review on the Use of Machine Learning in Software Engineering [0.0]
The study was carried out following the objective and the research questions to explore the current state of the art in applying machine learning techniques in software engineering processes.
The review identifies the key areas within software engineering where ML has been applied, including software quality assurance, software maintenance, software comprehension, and software documentation.
arXiv Detail & Related papers (2024-06-19T23:04:27Z) - What About the Data? A Mapping Study on Data Engineering for AI Systems [0.0]
There is a growing need for data engineers that know how to prepare data for AI systems.
We found 25 relevant papers between January 2019 and June 2023, explaining AI data engineering activities.
This paper creates an overview of the body of knowledge on data engineering for AI.
arXiv Detail & Related papers (2024-02-07T16:31:58Z) - Data-centric Artificial Intelligence: A Survey [47.24049907785989]
Recently, the role of data in AI has been significantly magnified, giving rise to the emerging concept of data-centric AI.
In this survey, we discuss the necessity of data-centric AI, followed by a holistic view of three general data-centric goals.
We believe this is the first comprehensive survey that provides a global view of a spectrum of tasks across various stages of the data lifecycle.
arXiv Detail & Related papers (2023-03-17T17:44:56Z) - OmniForce: On Human-Centered, Large Model Empowered and Cloud-Edge
Collaborative AutoML System [85.8338446357469]
We introduce OmniForce, a human-centered AutoML system that yields both human-assisted ML and ML-assisted human techniques.
We show how OmniForce can put an AutoML system into practice and build adaptive AI in open-environment scenarios.
arXiv Detail & Related papers (2023-03-01T13:35:22Z) - Privacy Adhering Machine Un-learning in NLP [66.17039929803933]
In real world industry use Machine Learning to build models on user data.
Such mandates require effort both in terms of data as well as model retraining.
continuous removal of data and model retraining steps do not scale.
We propose textitMachine Unlearning to tackle this challenge.
arXiv Detail & Related papers (2022-12-19T16:06:45Z) - Machine Learning for Software Engineering: A Tertiary Study [13.832268599253412]
Machine learning (ML) techniques increase the effectiveness of software engineering (SE) lifecycle activities.
We systematically collected, quality-assessed, summarized, and categorized 83 reviews in ML for SE published between 2009-2022, covering 6,117 primary studies.
The SE areas most tackled with ML are software quality and testing, while human-centered areas appear more challenging for ML.
arXiv Detail & Related papers (2022-11-17T09:19:53Z) - A Survey of Machine Unlearning [56.017968863854186]
Recent regulations now require that, on request, private information about a user must be removed from computer systems.
ML models often remember' the old data.
Recent works on machine unlearning have not been able to completely solve the problem.
arXiv Detail & Related papers (2022-09-06T08:51:53Z) - Enabling Automated Machine Learning for Model-Driven AI Engineering [60.09869520679979]
We propose a novel approach to enable Model-Driven Software Engineering and Model-Driven AI Engineering.
In particular, we support Automated ML, thus assisting software engineers without deep AI knowledge in developing AI-intensive systems.
arXiv Detail & Related papers (2022-03-06T10:12:56Z) - Towards Productizing AI/ML Models: An Industry Perspective from Data
Scientists [10.27276267081559]
The transition from AI/ML models to production-ready AI-based systems is a challenge for both data scientists and software engineers.
In this paper, we report the results of a workshop conducted in a consulting company to understand how this transition is perceived by practitioners.
arXiv Detail & Related papers (2021-03-18T22:25:44Z) - Data Engineering for Everyone [1.2585165426919136]
Data engineering is one of the fastest-growing fields within machine learning (ML)
ML requires more data than individual teams of data engineers can readily produce.
This article shows that open-source data sets are the rocket fuel for research and innovation at even some of the largest AI organizations.
arXiv Detail & Related papers (2021-02-23T01:24:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.