Related papers: The Data Science Fire Next Time: Innovative strategies for mentoring in data science

The Data Science Fire Next Time: Innovative strategies for mentoring in data science

URL: http://arxiv.org/abs/2003.07681v1
Date: Sun, 1 Mar 2020 03:40:38 GMT
Title: The Data Science Fire Next Time: Innovative strategies for mentoring in data science
Authors: Latifa Jackson and Heriberto Acosta Maestre
Abstract summary: The Broadening Participation in Data Mining (BPDM) workshop was created more than 7 years ago. BPDM aims to foster mentorship, guidance, and connections for minority and underrepresented groups in the data science and machine learning community. To date it has impacted the lives of more than 330 underrepresented trainees in data science.
Score: 14.213973379473655
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: As data mining research and applications continue to expand in to a variety of fields such as medicine, finance, security, etc., the need for talented and diverse individuals is clearly felt. This is particularly the case as Big Data initiatives have taken off in the federal, private and academic sectors, providing a wealth of opportunities, nationally and internationally. The Broadening Participation in Data Mining (BPDM) workshop was created more than 7 years ago with the goal of fostering mentorship, guidance, and connections for minority and underrepresented groups in the data science and machine learning community, while also enriching technical aptitude and exposure for a group of talented students. To date it has impacted the lives of more than 330 underrepresented trainees in data science. We provide a venue to connect talented students with innovative researchers in industry, academia, professional societies, and government. Our mission is to facilitate meaningful, lasting relationships between BPDM participants to ultimately increase diversity in data mining. This most recent workshop took place at Howard University in Washington, DC in February 2019. Here we report on the mentoring strategies that we undertook at the 2019 BPDM and how those were received.

Related papers

Foundation Models for Spatio-Temporal Data Science: A Tutorial and Survey [69.0648659029394]
Spatio-Temporal (ST) data science is fundamental to understanding complex systems in domains such as urban computing, climate science, and intelligent transportation. Researchers have begun exploring the concept of Spatio-Temporal Foundation Models (STFMs) to enhance adaptability and generalization across diverse ST tasks. STFMs empower the entire workflow of ST data science, ranging from data sensing, management, to mining, thereby offering a more holistic and scalable approach.
arXiv Detail & Related papers (2025-03-12T09:42:18Z)
High School Summer Camps Help Democratize Coding, Data Science, and Deep Learning [0.0]
This study documents the impact of a summer camp series that introduces high school students to coding, data science, and deep learning. The camps provide an immersive university experience, fostering technical skills, collaboration, and inspiration. Survey data reveals increased confidence in coding, with 68.6% expressing interest in AI and data science careers.
arXiv Detail & Related papers (2024-09-17T19:59:39Z)
Pennsieve: A Collaborative Platform for Translational Neuroscience and Beyond [0.5130659559809153]
Pennsieve is an open-source, cloud-based scientific data management platform. It supports complex multimodal datasets and provides tools for data visualization and analyses. Pennsieve stores over 125 TB of scientific data, with 35 TB of data publicly available across more than 350 high-impact datasets.
arXiv Detail & Related papers (2024-09-16T17:55:58Z)
DSBench: How Far Are Data Science Agents to Becoming Data Science Experts? [58.330879414174476]
We introduce DSBench, a benchmark designed to evaluate data science agents with realistic tasks. This benchmark includes 466 data analysis tasks and 74 data modeling tasks, sourced from Eloquence and Kaggle competitions. Our evaluation of state-of-the-art LLMs, LVLMs, and agents shows that they struggle with most tasks, with the best agent solving only 34.12% of data analysis tasks and achieving a 34.74% Relative Performance Gap (RPG)
arXiv Detail & Related papers (2024-09-12T02:08:00Z)
DKDM: Data-Free Knowledge Distillation for Diffusion Models with Any Architecture [69.58440626023541]
Diffusion models (DMs) have demonstrated exceptional generative capabilities across various domains. DMs now consume increasingly large amounts of data. We propose a novel scenario: using existing DMs as data sources to train new DMs with any architecture.
arXiv Detail & Related papers (2024-09-05T14:12:22Z)
A Comprehensive Survey of Scientific Large Language Models and Their Applications in Scientific Discovery [68.48094108571432]
Large language models (LLMs) have revolutionized the way text and other modalities of data are handled. We aim to provide a more holistic view of the research landscape by unveiling cross-field and cross-modal connections between scientific LLMs.
arXiv Detail & Related papers (2024-06-16T08:03:24Z)
Simpson's Paradox and Lagging Progress in Completion Trends of Underrepresented Students in Computer Science [0.09831489366502298]
It is imperative for the Computer Science (CS) community to ensure active participation and success of students from diverse backgrounds. This work compares CS to other areas of study with respect to success of students from three underrepresented groups: Women, Black and Hispanic or Latino. Using a data-driven approach, we show that trends of success over the years for underrepresented groups in CS are lagging behind other disciplines.
arXiv Detail & Related papers (2023-11-25T01:00:23Z)
Evaluating and Incentivizing Diverse Data Contributions in Collaborative Learning [89.21177894013225]
For a federated learning model to perform well, it is crucial to have a diverse and representative dataset. We show that the statistical criterion used to quantify the diversity of the data, as well as the choice of the federated learning algorithm used, has a significant effect on the resulting equilibrium. We leverage this to design simple optimal federated learning mechanisms that encourage data collectors to contribute data representative of the global population.
arXiv Detail & Related papers (2023-06-08T23:38:25Z)
Assessing Scientific Contributions in Data Sharing Spaces [64.16762375635842]
This paper introduces the SCIENCE-index, a blockchain-based metric measuring a researcher's scientific contributions. To incentivize researchers to share their data, the SCIENCE-index is augmented to include a data-sharing parameter. Our model is evaluated by comparing the distribution of its output for geographically diverse researchers to that of the h-index.
arXiv Detail & Related papers (2023-03-18T19:17:47Z)
Data-centric AI: Perspectives and Challenges [51.70828802140165]
Data-centric AI (DCAI) advocates a fundamental shift from model advancements to ensuring data quality and reliability. We bring together three general missions: training data development, inference data development, and data maintenance.
arXiv Detail & Related papers (2023-01-12T05:28:59Z)
Diversifying the Genomic Data Science Research Community [22.633385577446617]
We have formed the Genomic Data Science Community Network to identify opportunities and support broadening access to cloud-enabled genomic data science. Here, we provide a summary of the priorities for faculty members at UIs, as well as administrators, funders, and R1 researchers to consider as we create a more diverse genomic data science community.
arXiv Detail & Related papers (2022-01-20T20:36:18Z)
COVID-19 Datathon Based on Deidentified Governmental Data as an Approach for Solving Policy Challenges, Increasing Trust, and Building a Community: Case Study [4.643473310978546]
Israel's Ministry of Health (MoH) held a virtual Datathon based on deidentified governmental data. The Datathon was designed to develop operationalizable data-driven models to address COVID-19 health-policy challenges. The most positive results were increased trust in the MoH and greater readiness to work with the government.
arXiv Detail & Related papers (2021-08-30T08:58:44Z)
Retiring Adult: New Datasets for Fair Machine Learning [47.27417042497261]
UCI Adult has served as the basis for the development and comparison of many algorithmic fairness interventions. We reconstruct a superset of the UCI Adult data from available US Census sources and reveal idiosyncrasies of the UCI Adult dataset that limit its external validity. Our primary contribution is a suite of new datasets that extend the existing data ecosystem for research on fair machine learning.
arXiv Detail & Related papers (2021-08-10T19:19:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.