Data Science Methodologies: Current Challenges and Future Approaches
- URL: http://arxiv.org/abs/2106.07287v1
- Date: Mon, 14 Jun 2021 10:34:50 GMT
- Title: Data Science Methodologies: Current Challenges and Future Approaches
- Authors: I\~nigo Martinez, Elisabeth Viles, Igor G. Olaizola
- Abstract summary: Lack of vision and clear objectives, a biased emphasis on technical issues, a low level of maturity for ad-hoc projects are among these challenges.
Few methodologies offer a complete guideline across team, project and data & information management.
We propose a conceptual framework containing general characteristics that a methodology for managing data science projects with a holistic point of view should have.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Data science has employed great research efforts in developing advanced
analytics, improving data models and cultivating new algorithms. However, not
many authors have come across the organizational and socio-technical challenges
that arise when executing a data science project: lack of vision and clear
objectives, a biased emphasis on technical issues, a low level of maturity for
ad-hoc projects and the ambiguity of roles in data science are among these
challenges. Few methodologies have been proposed on the literature that tackle
these type of challenges, some of them date back to the mid-1990, and
consequently they are not updated to the current paradigm and the latest
developments in big data and machine learning technologies. In addition, fewer
methodologies offer a complete guideline across team, project and data &
information management. In this article we would like to explore the necessity
of developing a more holistic approach for carrying out data science projects.
We first review methodologies that have been presented on the literature to
work on data science projects and classify them according to the their focus:
project, team, data and information management. Finally, we propose a
conceptual framework containing general characteristics that a methodology for
managing data science projects with a holistic point of view should have. This
framework can be used by other researchers as a roadmap for the design of new
data science methodologies or the updating of existing ones.
Related papers
- MASSW: A New Dataset and Benchmark Tasks for AI-Assisted Scientific Workflows [58.56005277371235]
We introduce MASSW, a comprehensive text dataset on Multi-Aspect Summarization of ScientificAspects.
MASSW includes more than 152,000 peer-reviewed publications from 17 leading computer science conferences spanning the past 50 years.
We demonstrate the utility of MASSW through multiple novel machine-learning tasks that can be benchmarked using this new dataset.
arXiv Detail & Related papers (2024-06-10T15:19:09Z) - Data Augmentation in Human-Centric Vision [54.97327269866757]
This survey presents a comprehensive analysis of data augmentation techniques in human-centric vision tasks.
It delves into a wide range of research areas including person ReID, human parsing, human pose estimation, and pedestrian detection.
Our work categorizes data augmentation methods into two main types: data generation and data perturbation.
arXiv Detail & Related papers (2024-03-13T16:05:18Z) - Large Language Models(LLMs) on Tabular Data: Prediction, Generation, and Understanding -- A Survey [17.19337964440007]
There is currently a lack of comprehensive review that summarizes and compares the key techniques, metrics, datasets, models, and optimization approaches in this research domain.
This survey aims to address this gap by consolidating recent progress in these areas, offering a thorough survey and taxonomy of the datasets, metrics, and methodologies utilized.
It identifies strengths, limitations, unexplored territories, and gaps in the existing literature, while providing some insights for future research directions in this vital and rapidly evolving field.
arXiv Detail & Related papers (2024-02-27T23:59:01Z) - A Survey on Data Selection for Language Models [148.300726396877]
Data selection methods aim to determine which data points to include in a training dataset.
Deep learning is mostly driven by empirical evidence and experimentation on large-scale data is expensive.
Few organizations have the resources for extensive data selection research.
arXiv Detail & Related papers (2024-02-26T18:54:35Z) - A Comprehensive Review of Machine Learning Advances on Data Change: A
Cross-Field Perspective [16.904588676267526]
We identify two major related research fields, domain shift and concept drift.
In this review, we regroup domain shift and concept drift into a single research problem, namely the data change problem.
We propose a three-phase problem categorization scheme to link the key ideas in the two technical fields.
arXiv Detail & Related papers (2024-02-20T01:16:01Z) - On Responsible Machine Learning Datasets with Fairness, Privacy, and Regulatory Norms [56.119374302685934]
There have been severe concerns over the trustworthiness of AI technologies.
Machine and deep learning algorithms depend heavily on the data used during their development.
We propose a framework to evaluate the datasets through a responsible rubric.
arXiv Detail & Related papers (2023-10-24T14:01:53Z) - How Data Scientists Review the Scholarly Literature [4.406926847270567]
We examine the literature review practices of data scientists.
Data science represents a field seeing an exponential rise in papers.
No prior work has examined the specific practices and challenges faced by these scientists.
arXiv Detail & Related papers (2023-01-10T03:53:05Z) - Research Trends and Applications of Data Augmentation Algorithms [77.34726150561087]
We identify the main areas of application of data augmentation algorithms, the types of algorithms used, significant research trends, their progression over time and research gaps in data augmentation literature.
We expect readers to understand the potential of data augmentation, as well as identify future research directions and open questions within data augmentation research.
arXiv Detail & Related papers (2022-07-18T11:38:32Z) - A survey study of success factors in data science projects [0.0]
Agile data science lifecycle is the most widely used framework, but only 25% of the survey participants state to follow a data science project methodology.
Professionals who adhere to a project methodology place greater emphasis on the project's potential risks and pitfalls.
arXiv Detail & Related papers (2022-01-17T09:50:46Z) - Deep Learning Schema-based Event Extraction: Literature Review and
Current Trends [60.29289298349322]
Event extraction technology based on deep learning has become a research hotspot.
This paper fills the gap by reviewing the state-of-the-art approaches, focusing on deep learning-based models.
arXiv Detail & Related papers (2021-07-05T16:32:45Z) - Data Science: Challenges and Directions [42.98602883069444]
We review hundreds of pieces of literature which include data science in their titles.
We find that the majority of the discussions essentially concern statistics, data mining, machine learning, big data, or broadly data analytics.
We focus on the research and innovation challenges inspired by the nature of data science problems as complex systems.
arXiv Detail & Related papers (2020-06-28T01:49:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.