Automating Data Science: Prospects and Challenges
- URL: http://arxiv.org/abs/2105.05699v1
- Date: Wed, 12 May 2021 14:34:35 GMT
- Title: Automating Data Science: Prospects and Challenges
- Authors: Tijl De Bie, Luc De Raedt, Jos\'e Hern\'andez-Orallo, Holger H. Hoos,
Padhraic Smyth, Christopher K. I. Williams
- Abstract summary: Automation in data science aims to facilitate and transform the work of data scientists, not to replace them.
Important parts of data science are already being automated, especially in the modeling stages.
Other aspects are harder to automate, not only because of technological challenges, but because open-ended and context-dependent tasks require human interaction.
- Score: 30.4496620661692
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Given the complexity of typical data science projects and the associated
demand for human expertise, automation has the potential to transform the data
science process.
Key insights:
* Automation in data science aims to facilitate and transform the work of
data scientists, not to replace them.
* Important parts of data science are already being automated, especially in
the modeling stages, where techniques such as automated machine learning
(AutoML) are gaining traction.
* Other aspects are harder to automate, not only because of technological
challenges, but because open-ended and context-dependent tasks require human
interaction.
Related papers
- Automating the Practice of Science -- Opportunities, Challenges, and Implications [48.54225838534946]
This article evaluates the scope of automation within scientific practice and assesses recent approaches.
By discussing the motivations behind automated science, analyzing the hurdles encountered, and examining its implications, this article invites researchers, policymakers, and stakeholders to navigate the frontier of automated scientific practice.
arXiv Detail & Related papers (2024-08-27T15:51:31Z) - Automated data processing and feature engineering for deep learning and big data applications: a survey [0.0]
Modern approach to artificial intelligence (AI) aims to design algorithms that learn directly from data.
Not all data processing tasks in conventional deep learning pipelines have been automated.
arXiv Detail & Related papers (2024-03-18T01:07:48Z) - AutoRT: Embodied Foundation Models for Large Scale Orchestration of Robotic Agents [109.3804962220498]
AutoRT is a system to scale up the deployment of operational robots in completely unseen scenarios with minimal human supervision.
We demonstrate AutoRT proposing instructions to over 20 robots across multiple buildings and collecting 77k real robot episodes via both teleoperation and autonomous robot policies.
We experimentally show that such "in-the-wild" data collected by AutoRT is significantly more diverse, and that AutoRT's use of LLMs allows for instruction following data collection robots that can align to human preferences.
arXiv Detail & Related papers (2024-01-23T18:45:54Z) - A Vision for Semantically Enriched Data Science [19.604667287258724]
Key areas such as utilizing domain knowledge and data semantics are areas where we have seen little automation.
We envision how leveraging "semantic" understanding and reasoning on data in combination with novel tools for data science automation can help with consistent and explainable data augmentation and transformation.
arXiv Detail & Related papers (2023-03-02T16:03:12Z) - RT-1: Robotics Transformer for Real-World Control at Scale [98.09428483862165]
We present a model class, dubbed Robotics Transformer, that exhibits promising scalable model properties.
We verify our conclusions in a study of different model classes and their ability to generalize as a function of the data size, model size, and data diversity based on a large-scale data collection on real robots performing real-world tasks.
arXiv Detail & Related papers (2022-12-13T18:55:15Z) - A Survey on Semantics in Automated Data Science [14.331183226753547]
Data Scientists leverage common sense reasoning and domain knowledge to understand and enrich data for building predictive models.
We discuss how leveraging basic semantic reasoning on data in combination with novel tools for data science automation can help with consistent and explainable data augmentation and transformation.
arXiv Detail & Related papers (2022-05-16T23:16:09Z) - Maximizing information from chemical engineering data sets: Applications
to machine learning [61.442473332320176]
We identify four characteristics of data arising in chemical engineering applications that make applying classical artificial intelligence approaches difficult.
For each of these data characteristics, we discuss applications where these data characteristics arise and show how current chemical engineering research is extending the fields of data science and machine learning to incorporate these challenges.
arXiv Detail & Related papers (2022-01-25T01:25:45Z) - Learning from learning machines: a new generation of AI technology to
meet the needs of science [59.261050918992325]
We outline emerging opportunities and challenges to enhance the utility of AI for scientific discovery.
The distinct goals of AI for industry versus the goals of AI for science create tension between identifying patterns in data versus discovering patterns in the world from data.
arXiv Detail & Related papers (2021-11-27T00:55:21Z) - AutoDS: Towards Human-Centered Automation of Data Science [20.859067294445985]
This paper introduces AutoDS, an automated machine learning (AutoML) system to support data science projects.
As expected, AutoDS improves productivity; Yet surprisingly, we find that the models produced by the AutoDS group have higher quality and less errors, but lower human confidence scores.
arXiv Detail & Related papers (2021-01-13T08:35:14Z) - AutoML to Date and Beyond: Challenges and Opportunities [30.60364966752454]
AutoML tools aim to make machine learning accessible for non-machine learning experts.
We introduce a new classification system for AutoML systems.
We lay out a roadmap for the future, pinpointing the research required to further automate the end-to-end machine learning pipeline.
arXiv Detail & Related papers (2020-10-21T06:08:21Z) - Learning Predictive Models From Observation and Interaction [137.77887825854768]
Learning predictive models from interaction with the world allows an agent, such as a robot, to learn about how the world works.
However, learning a model that captures the dynamics of complex skills represents a major challenge.
We propose a method to augment the training set with observational data of other agents, such as humans.
arXiv Detail & Related papers (2019-12-30T01:10:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.