A Survey on Semantics in Automated Data Science
- URL: http://arxiv.org/abs/2205.08018v1
- Date: Mon, 16 May 2022 23:16:09 GMT
- Title: A Survey on Semantics in Automated Data Science
- Authors: Udayan Khurana and Kavitha Srinivas and Horst Samulowitz
- Abstract summary: Data Scientists leverage common sense reasoning and domain knowledge to understand and enrich data for building predictive models.
We discuss how leveraging basic semantic reasoning on data in combination with novel tools for data science automation can help with consistent and explainable data augmentation and transformation.
- Score: 14.331183226753547
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Data Scientists leverage common sense reasoning and domain knowledge to
understand and enrich data for building predictive models. In recent years, we
have witnessed a surge in tools and techniques for {\em automated machine
learning}. While data scientists can employ various such tools to help with
model building, many other aspects such as {\em feature engineering} that
require semantic understanding of concepts, remain manual to a large extent. In
this paper we discuss important shortcomings of current automated data science
solutions and machine learning. We discuss how leveraging basic semantic
reasoning on data in combination with novel tools for data science automation
can help with consistent and explainable data augmentation and transformation.
Moreover, semantics can assist data scientists in a new manner by helping with
challenges related to {\em trust}, {\em bias}, and {\em explainability}.
Related papers
- Physical Consistency Bridges Heterogeneous Data in Molecular Multi-Task Learning [79.75718786477638]
We exploit the specialty of molecular tasks that there are physical laws connecting them, and design consistency training approaches.
We demonstrate that the more accurate energy data can improve the accuracy of structure prediction.
We also find that consistency training can directly leverage force and off-equilibrium structure data to improve structure prediction.
arXiv Detail & Related papers (2024-10-14T03:11:33Z) - The Frontier of Data Erasure: Machine Unlearning for Large Language Models [56.26002631481726]
Large Language Models (LLMs) are foundational to AI advancements.
LLMs pose risks by potentially memorizing and disseminating sensitive, biased, or copyrighted information.
Machine unlearning emerges as a cutting-edge solution to mitigate these concerns.
arXiv Detail & Related papers (2024-03-23T09:26:15Z) - Semantically Aligned Question and Code Generation for Automated Insight Generation [20.795381712667034]
We leverage the semantic knowledge of large language models to generate targeted and insightful questions about data.
We show that embeddings can be effectively used for filtering out semantically unaligned pairs of question and code.
arXiv Detail & Related papers (2024-03-21T10:01:05Z) - On Responsible Machine Learning Datasets with Fairness, Privacy, and Regulatory Norms [56.119374302685934]
There have been severe concerns over the trustworthiness of AI technologies.
Machine and deep learning algorithms depend heavily on the data used during their development.
We propose a framework to evaluate the datasets through a responsible rubric.
arXiv Detail & Related papers (2023-10-24T14:01:53Z) - Interpretable Machine Learning for Discovery: Statistical Challenges \&
Opportunities [1.2891210250935146]
We discuss and review the field of interpretable machine learning.
We outline the types of discoveries that can be made using Interpretable Machine Learning.
We focus on the grand challenge of how to validate these discoveries in a data-driven manner.
arXiv Detail & Related papers (2023-08-02T23:57:31Z) - A Vision for Semantically Enriched Data Science [19.604667287258724]
Key areas such as utilizing domain knowledge and data semantics are areas where we have seen little automation.
We envision how leveraging "semantic" understanding and reasoning on data in combination with novel tools for data science automation can help with consistent and explainable data augmentation and transformation.
arXiv Detail & Related papers (2023-03-02T16:03:12Z) - Constructing Effective Machine Learning Models for the Sciences: A
Multidisciplinary Perspective [77.53142165205281]
We show how flexible non-linear solutions will not always improve upon manually adding transforms and interactions between variables to linear regression models.
We discuss how to recognize this before constructing a data-driven model and how such analysis can help us move to intrinsically interpretable regression models.
arXiv Detail & Related papers (2022-11-21T17:48:44Z) - Automating Data Science: Prospects and Challenges [30.4496620661692]
Automation in data science aims to facilitate and transform the work of data scientists, not to replace them.
Important parts of data science are already being automated, especially in the modeling stages.
Other aspects are harder to automate, not only because of technological challenges, but because open-ended and context-dependent tasks require human interaction.
arXiv Detail & Related papers (2021-05-12T14:34:35Z) - Synthetic Data: Opening the data floodgates to enable faster, more
directed development of machine learning methods [96.92041573661407]
Many ground-breaking advancements in machine learning can be attributed to the availability of a large volume of rich data.
Many large-scale datasets are highly sensitive, such as healthcare data, and are not widely available to the machine learning community.
Generating synthetic data with privacy guarantees provides one such solution.
arXiv Detail & Related papers (2020-12-08T17:26:10Z) - Principles and Practice of Explainable Machine Learning [12.47276164048813]
This report focuses on data-driven methods -- machine learning (ML) and pattern recognition models in particular.
With the increasing prevalence and complexity of methods, business stakeholders in the very least have a growing number of concerns about the drawbacks of models.
We have undertaken a survey to help industry practitioners understand the field of explainable machine learning better.
arXiv Detail & Related papers (2020-09-18T14:50:27Z) - Learning Predictive Models From Observation and Interaction [137.77887825854768]
Learning predictive models from interaction with the world allows an agent, such as a robot, to learn about how the world works.
However, learning a model that captures the dynamics of complex skills represents a major challenge.
We propose a method to augment the training set with observational data of other agents, such as humans.
arXiv Detail & Related papers (2019-12-30T01:10:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.