The Data Science Fire Next Time: Innovative strategies for mentoring in
data science
- URL: http://arxiv.org/abs/2003.07681v1
- Date: Sun, 1 Mar 2020 03:40:38 GMT
- Title: The Data Science Fire Next Time: Innovative strategies for mentoring in
data science
- Authors: Latifa Jackson and Heriberto Acosta Maestre
- Abstract summary: The Broadening Participation in Data Mining (BPDM) workshop was created more than 7 years ago.
BPDM aims to foster mentorship, guidance, and connections for minority and underrepresented groups in the data science and machine learning community.
To date it has impacted the lives of more than 330 underrepresented trainees in data science.
- Score: 14.213973379473655
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As data mining research and applications continue to expand in to a variety
of fields such as medicine, finance, security, etc., the need for talented and
diverse individuals is clearly felt. This is particularly the case as Big Data
initiatives have taken off in the federal, private and academic sectors,
providing a wealth of opportunities, nationally and internationally. The
Broadening Participation in Data Mining (BPDM) workshop was created more than 7
years ago with the goal of fostering mentorship, guidance, and connections for
minority and underrepresented groups in the data science and machine learning
community, while also enriching technical aptitude and exposure for a group of
talented students. To date it has impacted the lives of more than 330
underrepresented trainees in data science. We provide a venue to connect
talented students with innovative researchers in industry, academia,
professional societies, and government. Our mission is to facilitate
meaningful, lasting relationships between BPDM participants to ultimately
increase diversity in data mining. This most recent workshop took place at
Howard University in Washington, DC in February 2019. Here we report on the
mentoring strategies that we undertook at the 2019 BPDM and how those were
received.
Related papers
- High School Summer Camps Help Democratize Coding, Data Science, and Deep Learning [0.0]
This study documents the impact of a summer camp series that introduces high school students to coding, data science, and deep learning.
The camps provide an immersive university experience, fostering technical skills, collaboration, and inspiration.
Survey data reveals increased confidence in coding, with 68.6% expressing interest in AI and data science careers.
arXiv Detail & Related papers (2024-09-17T19:59:39Z) - Pennsieve: A Collaborative Platform for Translational Neuroscience and Beyond [0.5130659559809153]
Pennsieve is an open-source, cloud-based scientific data management platform.
It supports complex multimodal datasets and provides tools for data visualization and analyses.
Pennsieve stores over 125 TB of scientific data, with 35 TB of data publicly available across more than 350 high-impact datasets.
arXiv Detail & Related papers (2024-09-16T17:55:58Z) - DSBench: How Far Are Data Science Agents to Becoming Data Science Experts? [58.330879414174476]
We introduce DSBench, a benchmark designed to evaluate data science agents with realistic tasks.
This benchmark includes 466 data analysis tasks and 74 data modeling tasks, sourced from Eloquence and Kaggle competitions.
Our evaluation of state-of-the-art LLMs, LVLMs, and agents shows that they struggle with most tasks, with the best agent solving only 34.12% of data analysis tasks and achieving a 34.74% Relative Performance Gap (RPG)
arXiv Detail & Related papers (2024-09-12T02:08:00Z) - A Comprehensive Survey of Scientific Large Language Models and Their Applications in Scientific Discovery [68.48094108571432]
Large language models (LLMs) have revolutionized the way text and other modalities of data are handled.
We aim to provide a more holistic view of the research landscape by unveiling cross-field and cross-modal connections between scientific LLMs.
arXiv Detail & Related papers (2024-06-16T08:03:24Z) - Simpson's Paradox and Lagging Progress in Completion Trends of
Underrepresented Students in Computer Science [0.09831489366502298]
It is imperative for the Computer Science (CS) community to ensure active participation and success of students from diverse backgrounds.
This work compares CS to other areas of study with respect to success of students from three underrepresented groups: Women, Black and Hispanic or Latino.
Using a data-driven approach, we show that trends of success over the years for underrepresented groups in CS are lagging behind other disciplines.
arXiv Detail & Related papers (2023-11-25T01:00:23Z) - Evaluating and Incentivizing Diverse Data Contributions in Collaborative
Learning [89.21177894013225]
For a federated learning model to perform well, it is crucial to have a diverse and representative dataset.
We show that the statistical criterion used to quantify the diversity of the data, as well as the choice of the federated learning algorithm used, has a significant effect on the resulting equilibrium.
We leverage this to design simple optimal federated learning mechanisms that encourage data collectors to contribute data representative of the global population.
arXiv Detail & Related papers (2023-06-08T23:38:25Z) - Assessing Scientific Contributions in Data Sharing Spaces [64.16762375635842]
This paper introduces the SCIENCE-index, a blockchain-based metric measuring a researcher's scientific contributions.
To incentivize researchers to share their data, the SCIENCE-index is augmented to include a data-sharing parameter.
Our model is evaluated by comparing the distribution of its output for geographically diverse researchers to that of the h-index.
arXiv Detail & Related papers (2023-03-18T19:17:47Z) - Data-centric AI: Perspectives and Challenges [51.70828802140165]
Data-centric AI (DCAI) advocates a fundamental shift from model advancements to ensuring data quality and reliability.
We bring together three general missions: training data development, inference data development, and data maintenance.
arXiv Detail & Related papers (2023-01-12T05:28:59Z) - Diversifying the Genomic Data Science Research Community [22.633385577446617]
We have formed the Genomic Data Science Community Network to identify opportunities and support broadening access to cloud-enabled genomic data science.
Here, we provide a summary of the priorities for faculty members at UIs, as well as administrators, funders, and R1 researchers to consider as we create a more diverse genomic data science community.
arXiv Detail & Related papers (2022-01-20T20:36:18Z) - COVID-19 Datathon Based on Deidentified Governmental Data as an Approach
for Solving Policy Challenges, Increasing Trust, and Building a Community:
Case Study [4.643473310978546]
Israel's Ministry of Health (MoH) held a virtual Datathon based on deidentified governmental data.
The Datathon was designed to develop operationalizable data-driven models to address COVID-19 health-policy challenges.
The most positive results were increased trust in the MoH and greater readiness to work with the government.
arXiv Detail & Related papers (2021-08-30T08:58:44Z) - Retiring Adult: New Datasets for Fair Machine Learning [47.27417042497261]
UCI Adult has served as the basis for the development and comparison of many algorithmic fairness interventions.
We reconstruct a superset of the UCI Adult data from available US Census sources and reveal idiosyncrasies of the UCI Adult dataset that limit its external validity.
Our primary contribution is a suite of new datasets that extend the existing data ecosystem for research on fair machine learning.
arXiv Detail & Related papers (2021-08-10T19:19:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.