The Future of Data Science Education
- URL: http://arxiv.org/abs/2407.11824v1
- Date: Tue, 16 Jul 2024 15:11:54 GMT
- Title: The Future of Data Science Education
- Authors: Brian Wright, Peter Alonzi, Ali Riveria,
- Abstract summary: The School of Data Science at the University of Virginia has developed a novel model for the definition of Data Science.
This paper will present the core features of the model and explain how it unifies various concepts going far beyond the analytics component of AI.
- Score: 0.11566458078238004
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The definition of Data Science is a hotly debated topic. For many, the definition is a simple shortcut to Artificial Intelligence or Machine Learning. However, there is far more depth and nuance to the field of Data Science than a simple shortcut can provide. The School of Data Science at the University of Virginia has developed a novel model for the definition of Data Science. This model is based on identifying a unified understanding of the data work done across all areas of Data Science. It represents a generational leap forward in how we understand and teach Data Science. In this paper we will present the core features of the model and explain how it unifies various concepts going far beyond the analytics component of AI. From this foundation we will present our Undergraduate Major curriculum in Data Science and demonstrate how it prepares students to be well-rounded Data Science team members and leaders. The paper will conclude with an in-depth overview of the Foundations of Data Science course designed to introduce students to the field while also implementing proven STEM oriented pedagogical methods. These include, for example, specifications grading, active learning lectures, guest lectures from industry experts and weekly gamification labs.
Related papers
- DSBench: How Far Are Data Science Agents to Becoming Data Science Experts? [58.330879414174476]
We introduce DSBench, a benchmark designed to evaluate data science agents with realistic tasks.
This benchmark includes 466 data analysis tasks and 74 data modeling tasks, sourced from Eloquence and Kaggle competitions.
Our evaluation of state-of-the-art LLMs, LVLMs, and agents shows that they struggle with most tasks, with the best agent solving only 34.12% of data analysis tasks and achieving a 34.74% Relative Performance Gap (RPG)
arXiv Detail & Related papers (2024-09-12T02:08:00Z) - On Responsible Machine Learning Datasets with Fairness, Privacy, and Regulatory Norms [56.119374302685934]
There have been severe concerns over the trustworthiness of AI technologies.
Machine and deep learning algorithms depend heavily on the data used during their development.
We propose a framework to evaluate the datasets through a responsible rubric.
arXiv Detail & Related papers (2023-10-24T14:01:53Z) - Privacy-Preserving Graph Machine Learning from Data to Computation: A
Survey [67.7834898542701]
We focus on reviewing privacy-preserving techniques of graph machine learning.
We first review methods for generating privacy-preserving graph data.
Then we describe methods for transmitting privacy-preserved information.
arXiv Detail & Related papers (2023-07-10T04:30:23Z) - Defining data science: a new field of inquiry [0.0]
Modern data science is in its infancy. Emerging slowly since 1962 and rapidly since 2000, it is one of the most active, powerful, and rapidly evolving 21st century innovations.
Due to its value, power, and applicability, it is emerging in over 40 disciplines, hundreds of research areas, and thousands of applications.
This research addresses this data science multiple definitions challenge by proposing the development of coherent, unified definition based on a data science reference framework.
arXiv Detail & Related papers (2023-06-28T12:58:42Z) - Position Paper on Dataset Engineering to Accelerate Science [1.952708415083428]
In this work, we will use the token ittextdataset to designate a structured set of data built to perform a well-defined task.
Specifically, in science, each area has unique forms to organize, gather and handle its datasets.
We advocate that science and engineering discovery processes are extreme instances of the need for such organization on datasets.
arXiv Detail & Related papers (2023-03-09T19:07:40Z) - Modeling Information Change in Science Communication with Semantically
Matched Paraphrases [50.67030449927206]
SPICED is the first paraphrase dataset of scientific findings annotated for degree of information change.
SPICED contains 6,000 scientific finding pairs extracted from news stories, social media discussions, and full texts of original papers.
Models trained on SPICED improve downstream performance on evidence retrieval for fact checking of real-world scientific claims.
arXiv Detail & Related papers (2022-10-24T07:44:38Z) - Opinionated practices for teaching reproducibility: motivation, guided
instruction and practice [0.0]
Predictive modelling is often one of the most interesting topics to novices in data science.
Students are not as intrinsically motivated to learn this topic, and it is not an easy one for them to learn.
Providing extra motivation, guided instruction and lots of practice are key to effectively teaching this topic.
arXiv Detail & Related papers (2021-09-17T19:15:41Z) - REGRAD: A Large-Scale Relational Grasp Dataset for Safe and
Object-Specific Robotic Grasping in Clutter [52.117388513480435]
We present a new dataset named regrad to sustain the modeling of relationships among objects and grasps.
Our dataset is collected in both forms of 2D images and 3D point clouds.
Users are free to import their own object models for the generation of as many data as they want.
arXiv Detail & Related papers (2021-04-29T05:31:21Z) - Computational Skills by Stealth in Secondary School Data Science [16.960800464621993]
We discuss a proposal for the stealth development of computational skills in students' first exposure to data science.
The intent of this approach is to support students, regardless of interest and self-efficacy in coding, in becoming data-driven learners.
arXiv Detail & Related papers (2020-10-08T09:11:51Z) - Data Science: A Comprehensive Overview [42.98602883069444]
The twenty-first century has ushered in the age of big data and data economy, in which data DNA has become an intrinsic constituent of all data-based organisms.
An appropriate understanding of data DNA and its organisms relies on the new field of data science and its keystone, analytics.
This article is the first in the field to draw a comprehensive big picture, in addition to offering rich observations, lessons and thinking about data science and analytics.
arXiv Detail & Related papers (2020-07-01T02:33:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.