The Data Science Fire Next Time: Innovative strategies for mentoring in
data science
- URL: http://arxiv.org/abs/2003.07681v1
- Date: Sun, 1 Mar 2020 03:40:38 GMT
- Title: The Data Science Fire Next Time: Innovative strategies for mentoring in
data science
- Authors: Latifa Jackson and Heriberto Acosta Maestre
- Abstract summary: The Broadening Participation in Data Mining (BPDM) workshop was created more than 7 years ago.
BPDM aims to foster mentorship, guidance, and connections for minority and underrepresented groups in the data science and machine learning community.
To date it has impacted the lives of more than 330 underrepresented trainees in data science.
- Score: 14.213973379473655
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As data mining research and applications continue to expand in to a variety
of fields such as medicine, finance, security, etc., the need for talented and
diverse individuals is clearly felt. This is particularly the case as Big Data
initiatives have taken off in the federal, private and academic sectors,
providing a wealth of opportunities, nationally and internationally. The
Broadening Participation in Data Mining (BPDM) workshop was created more than 7
years ago with the goal of fostering mentorship, guidance, and connections for
minority and underrepresented groups in the data science and machine learning
community, while also enriching technical aptitude and exposure for a group of
talented students. To date it has impacted the lives of more than 330
underrepresented trainees in data science. We provide a venue to connect
talented students with innovative researchers in industry, academia,
professional societies, and government. Our mission is to facilitate
meaningful, lasting relationships between BPDM participants to ultimately
increase diversity in data mining. This most recent workshop took place at
Howard University in Washington, DC in February 2019. Here we report on the
mentoring strategies that we undertook at the 2019 BPDM and how those were
received.
Related papers
- MMSci: A Multimodal Multi-Discipline Dataset for PhD-Level Scientific Comprehension [59.41495657570397]
We collected a multimodal, multidisciplinary dataset from open-access scientific articles published in Nature Communications journals.
This dataset spans 72 scientific disciplines, ensuring both diversity and quality.
We created benchmarks with various tasks and settings to comprehensively evaluate LMMs' capabilities in understanding scientific figures and content.
arXiv Detail & Related papers (2024-07-06T00:40:53Z) - A Comprehensive Survey of Scientific Large Language Models and Their Applications in Scientific Discovery [68.48094108571432]
We aim to provide a more holistic view of the research landscape by unveiling cross-field and cross-modal connections between scientific LLMs.
We comprehensively survey over 250 scientific LLMs, discuss their commonalities and differences, as well as summarize pre-training datasets and evaluation tasks for each field and modality.
arXiv Detail & Related papers (2024-06-16T08:03:24Z) - Simpson's Paradox and Lagging Progress in Completion Trends of
Underrepresented Students in Computer Science [0.09831489366502298]
It is imperative for the Computer Science (CS) community to ensure active participation and success of students from diverse backgrounds.
This work compares CS to other areas of study with respect to success of students from three underrepresented groups: Women, Black and Hispanic or Latino.
Using a data-driven approach, we show that trends of success over the years for underrepresented groups in CS are lagging behind other disciplines.
arXiv Detail & Related papers (2023-11-25T01:00:23Z) - Evaluating and Incentivizing Diverse Data Contributions in Collaborative
Learning [89.21177894013225]
For a federated learning model to perform well, it is crucial to have a diverse and representative dataset.
We show that the statistical criterion used to quantify the diversity of the data, as well as the choice of the federated learning algorithm used, has a significant effect on the resulting equilibrium.
We leverage this to design simple optimal federated learning mechanisms that encourage data collectors to contribute data representative of the global population.
arXiv Detail & Related papers (2023-06-08T23:38:25Z) - Assessing Scientific Contributions in Data Sharing Spaces [64.16762375635842]
This paper introduces the SCIENCE-index, a blockchain-based metric measuring a researcher's scientific contributions.
To incentivize researchers to share their data, the SCIENCE-index is augmented to include a data-sharing parameter.
Our model is evaluated by comparing the distribution of its output for geographically diverse researchers to that of the h-index.
arXiv Detail & Related papers (2023-03-18T19:17:47Z) - Data-centric AI: Perspectives and Challenges [51.70828802140165]
Data-centric AI (DCAI) advocates a fundamental shift from model advancements to ensuring data quality and reliability.
We bring together three general missions: training data development, inference data development, and data maintenance.
arXiv Detail & Related papers (2023-01-12T05:28:59Z) - BigScience: A Case Study in the Social Construction of a Multilingual
Large Language Model [11.366450629112459]
The BigScience Workshop was a value-driven initiative that spanned one and half years of interdisciplinary research.
This paper focuses on the collaborative research aspects of BigScience and takes a step back to look at the challenges of large-scale participatory research.
arXiv Detail & Related papers (2022-12-09T16:15:35Z) - Diversifying the Genomic Data Science Research Community [22.633385577446617]
We have formed the Genomic Data Science Community Network to identify opportunities and support broadening access to cloud-enabled genomic data science.
Here, we provide a summary of the priorities for faculty members at UIs, as well as administrators, funders, and R1 researchers to consider as we create a more diverse genomic data science community.
arXiv Detail & Related papers (2022-01-20T20:36:18Z) - COVID-19 Datathon Based on Deidentified Governmental Data as an Approach
for Solving Policy Challenges, Increasing Trust, and Building a Community:
Case Study [4.643473310978546]
Israel's Ministry of Health (MoH) held a virtual Datathon based on deidentified governmental data.
The Datathon was designed to develop operationalizable data-driven models to address COVID-19 health-policy challenges.
The most positive results were increased trust in the MoH and greater readiness to work with the government.
arXiv Detail & Related papers (2021-08-30T08:58:44Z) - Retiring Adult: New Datasets for Fair Machine Learning [47.27417042497261]
UCI Adult has served as the basis for the development and comparison of many algorithmic fairness interventions.
We reconstruct a superset of the UCI Adult data from available US Census sources and reveal idiosyncrasies of the UCI Adult dataset that limit its external validity.
Our primary contribution is a suite of new datasets that extend the existing data ecosystem for research on fair machine learning.
arXiv Detail & Related papers (2021-08-10T19:19:41Z) - Toward Knowledge Discovery Framework for Data Science Job Market in the
United States [1.7205106391379024]
This paper introduces a framework to analyze the job market for data science-related jobs within the US.
The proposed framework includes three sub-modules allowing continuous data collection, information extraction, and a web-based visualization dashboard.
The current version of this application is deployed on the web and allows individuals and institutes to investigate skills required for data science positions.
arXiv Detail & Related papers (2021-06-14T21:23:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.