Human-Machine Collaboration for Democratizing Data Science
- URL: http://arxiv.org/abs/2004.11113v1
- Date: Thu, 23 Apr 2020 12:50:52 GMT
- Title: Human-Machine Collaboration for Democratizing Data Science
- Authors: Cl\'ement Gautrais, Yann Dauxais, Stefano Teso, Samuel Kolb, Gust
Verbruggen, Luc De Raedt
- Abstract summary: textscVisualSynth relies on the user providing colored sketches, i.e., coloring parts of the spreadsheet, to partially specify data science tasks.
It performs various data analysis tasks ranging from data wrangling, data selection, clustering, constraint learning, predictive modeling and auto-completion.
- Score: 23.385646192087922
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Everybody wants to analyse their data, but only few posses the data science
expertise to to this. Motivated by this observation we introduce a novel
framework and system \textsc{VisualSynth} for human-machine collaboration in
data science.
It wants to democratize data science by allowing users to interact with
standard spreadsheet software in order to perform and automate various data
analysis tasks ranging from data wrangling, data selection, clustering,
constraint learning, predictive modeling and auto-completion.
\textsc{VisualSynth} relies on the user providing colored sketches, i.e.,
coloring parts of the spreadsheet, to partially specify data science tasks,
which are then determined and executed using artificial intelligence
techniques.
Related papers
- SciER: An Entity and Relation Extraction Dataset for Datasets, Methods, and Tasks in Scientific Documents [49.54155332262579]
We release a new entity and relation extraction dataset for entities related to datasets, methods, and tasks in scientific articles.
Our dataset contains 106 manually annotated full-text scientific publications with over 24k entities and 12k relations.
arXiv Detail & Related papers (2024-10-28T15:56:49Z) - Approaching Metaheuristic Deep Learning Combos for Automated Data Mining [0.5419570023862531]
This work proposes a means of combining meta-heuristic methods with conventional classifiers and neural networks in order to perform automated data mining.
Experiments on the MNIST dataset for handwritten digit recognition were performed.
It was empirically observed that using a ground truth labeled dataset's validation accuracy is inadequate for correcting labels of other previously unseen data instances.
arXiv Detail & Related papers (2024-10-16T10:28:22Z) - Reimagining Synthetic Tabular Data Generation through Data-Centric AI: A
Comprehensive Benchmark [56.8042116967334]
Synthetic data serves as an alternative in training machine learning models.
ensuring that synthetic data mirrors the complex nuances of real-world data is a challenging task.
This paper explores the potential of integrating data-centric AI techniques to guide the synthetic data generation process.
arXiv Detail & Related papers (2023-10-25T20:32:02Z) - Privacy-Preserving Graph Machine Learning from Data to Computation: A
Survey [67.7834898542701]
We focus on reviewing privacy-preserving techniques of graph machine learning.
We first review methods for generating privacy-preserving graph data.
Then we describe methods for transmitting privacy-preserved information.
arXiv Detail & Related papers (2023-07-10T04:30:23Z) - TSGM: A Flexible Framework for Generative Modeling of Synthetic Time Series [61.436361263605114]
Time series data are often scarce or highly sensitive, which precludes the sharing of data between researchers and industrial organizations.
We introduce Time Series Generative Modeling (TSGM), an open-source framework for the generative modeling of synthetic time series.
arXiv Detail & Related papers (2023-05-19T10:11:21Z) - A Vision for Semantically Enriched Data Science [19.604667287258724]
Key areas such as utilizing domain knowledge and data semantics are areas where we have seen little automation.
We envision how leveraging "semantic" understanding and reasoning on data in combination with novel tools for data science automation can help with consistent and explainable data augmentation and transformation.
arXiv Detail & Related papers (2023-03-02T16:03:12Z) - Privacy-Preserving Machine Learning for Collaborative Data Sharing via
Auto-encoder Latent Space Embeddings [57.45332961252628]
Privacy-preserving machine learning in data-sharing processes is an ever-critical task.
This paper presents an innovative framework that uses Representation Learning via autoencoders to generate privacy-preserving embedded data.
arXiv Detail & Related papers (2022-11-10T17:36:58Z) - Data+Shift: Supporting visual investigation of data distribution shifts
by data scientists [1.6311150636417262]
Data+Shift is a visual analytics tool to support data scientists in the task of investigating the underlying factors of shift in data features.
We validated our approach with a think-aloud experiment where a data scientist used the tool for a fraud detection use case.
arXiv Detail & Related papers (2022-04-29T11:50:25Z) - AutoDS: Towards Human-Centered Automation of Data Science [20.859067294445985]
This paper introduces AutoDS, an automated machine learning (AutoML) system to support data science projects.
As expected, AutoDS improves productivity; Yet surprisingly, we find that the models produced by the AutoDS group have higher quality and less errors, but lower human confidence scores.
arXiv Detail & Related papers (2021-01-13T08:35:14Z) - Learning Predictive Models From Observation and Interaction [137.77887825854768]
Learning predictive models from interaction with the world allows an agent, such as a robot, to learn about how the world works.
However, learning a model that captures the dynamics of complex skills represents a major challenge.
We propose a method to augment the training set with observational data of other agents, such as humans.
arXiv Detail & Related papers (2019-12-30T01:10:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.