AutoDS: Towards Human-Centered Automation of Data Science
- URL: http://arxiv.org/abs/2101.05273v1
- Date: Wed, 13 Jan 2021 08:35:14 GMT
- Title: AutoDS: Towards Human-Centered Automation of Data Science
- Authors: Dakuo Wang, Josh Andres, Justin Weisz, Erick Oduor, Casey Dugan
- Abstract summary: This paper introduces AutoDS, an automated machine learning (AutoML) system to support data science projects.
As expected, AutoDS improves productivity; Yet surprisingly, we find that the models produced by the AutoDS group have higher quality and less errors, but lower human confidence scores.
- Score: 20.859067294445985
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Data science (DS) projects often follow a lifecycle that consists of
laborious tasks for data scientists and domain experts (e.g., data exploration,
model training, etc.). Only till recently, machine learning(ML) researchers
have developed promising automation techniques to aid data workers in these
tasks. This paper introduces AutoDS, an automated machine learning (AutoML)
system that aims to leverage the latest ML automation techniques to support
data science projects. Data workers only need to upload their dataset, then the
system can automatically suggest ML configurations, preprocess data, select
algorithm, and train the model. These suggestions are presented to the user via
a web-based graphical user interface and a notebook-based programming user
interface.
We studied AutoDS with 30 professional data scientists, where one group used
AutoDS, and the other did not, to complete a data science project. As expected,
AutoDS improves productivity; Yet surprisingly, we find that the models
produced by the AutoDS group have higher quality and less errors, but lower
human confidence scores. We reflect on the findings by presenting design
implications for incorporating automation techniques into human work in the
data science lifecycle.
Related papers
- Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows? [73.81908518992161]
We introduce Spider2-V, the first multimodal agent benchmark focusing on professional data science and engineering.
Spider2-V features real-world tasks in authentic computer environments and incorporating 20 enterprise-level professional applications.
These tasks evaluate the ability of a multimodal agent to perform data-related tasks by writing code and managing the GUI in enterprise data software systems.
arXiv Detail & Related papers (2024-07-15T17:54:37Z) - The Frontier of Data Erasure: Machine Unlearning for Large Language Models [56.26002631481726]
Large Language Models (LLMs) are foundational to AI advancements.
LLMs pose risks by potentially memorizing and disseminating sensitive, biased, or copyrighted information.
Machine unlearning emerges as a cutting-edge solution to mitigate these concerns.
arXiv Detail & Related papers (2024-03-23T09:26:15Z) - Assessing the Use of AutoML for Data-Driven Software Engineering [10.40771687966477]
AutoML promises to automate the building of end-to-end AI/ML pipelines.
Despite the growing interest and high expectations, there is a dearth of information about the extent to which AutoML is currently adopted.
arXiv Detail & Related papers (2023-07-20T11:14:24Z) - AutoML-GPT: Automatic Machine Learning with GPT [74.30699827690596]
We propose developing task-oriented prompts and automatically utilizing large language models (LLMs) to automate the training pipeline.
We present the AutoML-GPT, which employs GPT as the bridge to diverse AI models and dynamically trains models with optimized hyper parameters.
This approach achieves remarkable results in computer vision, natural language processing, and other challenging areas.
arXiv Detail & Related papers (2023-05-04T02:09:43Z) - OmniForce: On Human-Centered, Large Model Empowered and Cloud-Edge
Collaborative AutoML System [85.8338446357469]
We introduce OmniForce, a human-centered AutoML system that yields both human-assisted ML and ML-assisted human techniques.
We show how OmniForce can put an AutoML system into practice and build adaptive AI in open-environment scenarios.
arXiv Detail & Related papers (2023-03-01T13:35:22Z) - A Survey of Machine Unlearning [56.017968863854186]
Recent regulations now require that, on request, private information about a user must be removed from computer systems.
ML models often remember' the old data.
Recent works on machine unlearning have not been able to completely solve the problem.
arXiv Detail & Related papers (2022-09-06T08:51:53Z) - Automated Machine Learning Techniques for Data Streams [91.3755431537592]
This paper surveys the state-of-the-art open-source AutoML tools, applies them to data collected from streams, and measures how their performance changes over time.
The results show that off-the-shelf AutoML tools can provide satisfactory results but in the presence of concept drift, detection or adaptation techniques have to be applied to maintain the predictive accuracy over time.
arXiv Detail & Related papers (2021-06-14T11:42:46Z) - Automating Data Science: Prospects and Challenges [30.4496620661692]
Automation in data science aims to facilitate and transform the work of data scientists, not to replace them.
Important parts of data science are already being automated, especially in the modeling stages.
Other aspects are harder to automate, not only because of technological challenges, but because open-ended and context-dependent tasks require human interaction.
arXiv Detail & Related papers (2021-05-12T14:34:35Z) - Fits and Starts: Enterprise Use of AutoML and the Role of Humans in the
Loop [4.468952886990851]
AutoML systems can speed up routine data science work and make machine learning available to those without expertise in statistics and computer science.
We conduct interviews with 29 individuals from organizations of different sizes to characterize how they currently use, or intend to use, AutoML systems.
Our findings have implications for the design and implementation of human-in-the-loop visual analytics approaches.
arXiv Detail & Related papers (2021-01-12T04:52:48Z) - AutoML to Date and Beyond: Challenges and Opportunities [30.60364966752454]
AutoML tools aim to make machine learning accessible for non-machine learning experts.
We introduce a new classification system for AutoML systems.
We lay out a roadmap for the future, pinpointing the research required to further automate the end-to-end machine learning pipeline.
arXiv Detail & Related papers (2020-10-21T06:08:21Z) - Trust in AutoML: Exploring Information Needs for Establishing Trust in
Automated Machine Learning Systems [30.385703521998014]
We report results from three studies to understand the information needs of data scientists for establishing trust in AutoML systems.
We find that model performance metrics and visualizations are the most important information to data scientists when establishing their trust with an AutoML tool.
arXiv Detail & Related papers (2020-01-17T19:50:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.