Defining data science: a new field of inquiry
- URL: http://arxiv.org/abs/2306.16177v3
- Date: Mon, 24 Jul 2023 12:32:58 GMT
- Title: Defining data science: a new field of inquiry
- Authors: Michael L Brodie
- Abstract summary: Modern data science is in its infancy. Emerging slowly since 1962 and rapidly since 2000, it is one of the most active, powerful, and rapidly evolving 21st century innovations.
Due to its value, power, and applicability, it is emerging in over 40 disciplines, hundreds of research areas, and thousands of applications.
This research addresses this data science multiple definitions challenge by proposing the development of coherent, unified definition based on a data science reference framework.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Data science is not a science. It is a research paradigm. Its power, scope,
and scale will surpass science, our most powerful research paradigm, to enable
knowledge discovery and change our world. We have yet to understand and define
it, vital to realizing its potential and managing its risks. Modern data
science is in its infancy. Emerging slowly since 1962 and rapidly since 2000,
it is a fundamentally new field of inquiry, one of the most active, powerful,
and rapidly evolving 21st century innovations. Due to its value, power, and
applicability, it is emerging in over 40 disciplines, hundreds of research
areas, and thousands of applications. Millions of data science publications
contain myriad definitions of data science and data science problem solving.
Due to its infancy, many definitions are independent, application specific,
mutually incomplete, redundant, or inconsistent, hence so is data science. This
research addresses this data science multiple definitions challenge by
proposing the development of coherent, unified definition based on a data
science reference framework using a data science journal for the data science
community to achieve such a definition. This paper provides candidate
definitions for essential data science artifacts that are required to discuss
such a definition. They are based on the classical research paradigm concept
consisting of a philosophy of data science, the data science problem solving
paradigm, and the six component data science reference framework (axiology,
ontology, epistemology, methodology, methods, technology) that is a frequently
called for unifying framework with which to define, unify, and evolve data
science. It presents challenges for defining data science, solution approaches,
i.e., means for defining data science, and their requirements and benefits as
the basis of a comprehensive solution.
Related papers
- Causal Representation Learning in Temporal Data via Single-Parent Decoding [66.34294989334728]
Scientific research often seeks to understand the causal structure underlying high-level variables in a system.
Scientists typically collect low-level measurements, such as geographically distributed temperature readings.
We propose a differentiable method, Causal Discovery with Single-parent Decoding, that simultaneously learns the underlying latents and a causal graph over them.
arXiv Detail & Related papers (2024-10-09T15:57:50Z) - The Future of Data Science Education [0.11566458078238004]
The School of Data Science at the University of Virginia has developed a novel model for the definition of Data Science.
This paper will present the core features of the model and explain how it unifies various concepts going far beyond the analytics component of AI.
arXiv Detail & Related papers (2024-07-16T15:11:54Z) - A Comprehensive Survey of Scientific Large Language Models and Their Applications in Scientific Discovery [68.48094108571432]
Large language models (LLMs) have revolutionized the way text and other modalities of data are handled.
We aim to provide a more holistic view of the research landscape by unveiling cross-field and cross-modal connections between scientific LLMs.
arXiv Detail & Related papers (2024-06-16T08:03:24Z) - SciInstruct: a Self-Reflective Instruction Annotated Dataset for Training Scientific Language Models [57.96527452844273]
We introduce SciInstruct, a suite of scientific instructions for training scientific language models capable of college-level scientific reasoning.
We curated a diverse and high-quality dataset encompassing physics, chemistry, math, and formal proofs.
To verify the effectiveness of SciInstruct, we fine-tuned different language models with SciInstruct, i.e., ChatGLM3 (6B and 32B), Llama3-8B-Instruct, and Mistral-7B: MetaMath.
arXiv Detail & Related papers (2024-01-15T20:22:21Z) - A data science axiology: the nature, value, and risks of data science [0.0]
Data science is a research paradigm with an unfathomed scope, scale, complexity, and power for knowledge discovery.
This paper presents an axiology of data science, its purpose, nature, importance, risks, and value for problem solving.
arXiv Detail & Related papers (2023-07-19T21:12:04Z) - Modeling Information Change in Science Communication with Semantically
Matched Paraphrases [50.67030449927206]
SPICED is the first paraphrase dataset of scientific findings annotated for degree of information change.
SPICED contains 6,000 scientific finding pairs extracted from news stories, social media discussions, and full texts of original papers.
Models trained on SPICED improve downstream performance on evidence retrieval for fact checking of real-world scientific claims.
arXiv Detail & Related papers (2022-10-24T07:44:38Z) - A Review into Data Science and Its Approaches in Mechanical Engineering [0.0]
This article briefly introduced data science and reviewed its methods.
In the introduction, different definitions of data science and its background in technology reviewed.
Some researches in the mechanical engineering area that used data science methods in their studies are reviewed.
arXiv Detail & Related papers (2020-12-30T23:05:29Z) - Data Science: Challenges and Directions [42.98602883069444]
We review hundreds of pieces of literature which include data science in their titles.
We find that the majority of the discussions essentially concern statistics, data mining, machine learning, big data, or broadly data analytics.
We focus on the research and innovation challenges inspired by the nature of data science problems as complex systems.
arXiv Detail & Related papers (2020-06-28T01:49:00Z) - Fact or Fiction: Verifying Scientific Claims [53.29101835904273]
We introduce scientific claim verification, a new task to select abstracts from the research literature containing evidence that SUPPORTS or REFUTES a given scientific claim.
We construct SciFact, a dataset of 1.4K expert-written scientific claims paired with evidence-containing abstracts annotated with labels and rationales.
We show that our system is able to verify claims related to COVID-19 by identifying evidence from the CORD-19 corpus.
arXiv Detail & Related papers (2020-04-30T17:22:57Z) - Ten Research Challenge Areas in Data Science [4.670305538969914]
Data science builds on knowledge from computer science, mathematics, statistics, and other disciplines.
This article starts with meta-questions about data science as a discipline and then elaborates on ten ideas for the basis of a research agenda for data science.
arXiv Detail & Related papers (2020-01-27T21:39:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.