Knowledge Scientists: Unlocking the data-driven organization
- URL: http://arxiv.org/abs/2004.07917v1
- Date: Thu, 16 Apr 2020 20:14:20 GMT
- Title: Knowledge Scientists: Unlocking the data-driven organization
- Authors: George Fletcher, Paul Groth, Juan Sequeda
- Abstract summary: We argue that the technologies for reliable data are driven by distinct concerns and expertise.
Those organizations which identify the central importance of meaningful, explainable, reproducible, and maintainable data will be at the forefront of the democratization of reliable data.
- Score: 5.05432938384774
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Organizations across all sectors are increasingly undergoing deep
transformation and restructuring towards data-driven operations. The central
role of data highlights the need for reliable and clean data. Unreliable,
erroneous, and incomplete data lead to critical bottlenecks in processing
pipelines and, ultimately, service failures, which are disastrous for the
competitive performance of the organization. Given its central importance,
those organizations which recognize and react to the need for reliable data
will have the advantage in the coming decade. We argue that the technologies
for reliable data are driven by distinct concerns and expertise which
complement those of the data scientist and the data engineer. Those
organizations which identify the central importance of meaningful, explainable,
reproducible, and maintainable data will be at the forefront of the
democratization of reliable data. We call the new role which must be developed
to fill this critical need the Knowledge Scientist. The organizational
structures, tools, methodologies and techniques to support and make possible
the work of knowledge scientists are still in their infancy. As organizations
not only use data but increasingly rely on data, it is time to empower the
people who are central to this transformation.
Related papers
- Open-sourced Data Ecosystem in Autonomous Driving: the Present and Future [130.87142103774752]
This review systematically assesses over seventy open-source autonomous driving datasets.
It offers insights into various aspects, such as the principles underlying the creation of high-quality datasets.
It also delves into the scientific and technical challenges that warrant resolution.
arXiv Detail & Related papers (2023-12-06T10:46:53Z) - On Responsible Machine Learning Datasets with Fairness, Privacy, and Regulatory Norms [56.119374302685934]
There have been severe concerns over the trustworthiness of AI technologies.
Machine and deep learning algorithms depend heavily on the data used during their development.
We propose a framework to evaluate the datasets through a responsible rubric.
arXiv Detail & Related papers (2023-10-24T14:01:53Z) - Towards Avoiding the Data Mess: Industry Insights from Data Mesh Implementations [1.5029560229270191]
Data mesh is a socio-technical, decentralized, distributed concept for enterprise data management.
We conduct 15 semi-structured interviews with industry experts.
Our findings synthesize insights from industry experts and provide researchers and professionals with preliminary guidelines for the successful adoption of data mesh.
arXiv Detail & Related papers (2023-02-03T13:09:57Z) - Exploring the Critical Success Factors for Data Democratization [0.0]
Data democratization is an ongoing process of broadening data access to employees.
This paper aims to identify the critical success factors for data democratization through an in-depth review of the literature.
arXiv Detail & Related papers (2022-12-04T22:07:45Z) - Invisible Data Curation Practices: A Case Study from Facility Management [0.0]
In facility management sector, companies seek to ex-tract value from data about their buildings.
Janitors are becoming involved in data curation.
This paper investigates how janitors' data curation practices shape data being produced in three facility management-tions.
arXiv Detail & Related papers (2021-11-05T09:09:42Z) - Understanding the World Through Action [91.3755431537592]
I will argue that a general, principled, and powerful framework for utilizing unlabeled data can be derived from reinforcement learning.
I will discuss how such a procedure is more closely aligned with potential downstream tasks.
arXiv Detail & Related papers (2021-10-24T22:33:52Z) - Trustworthy Transparency by Design [57.67333075002697]
We propose a transparency framework for software design, incorporating research on user trust and experience.
Our framework enables developing software that incorporates transparency in its design.
arXiv Detail & Related papers (2021-03-19T12:34:01Z) - Explainable Patterns: Going from Findings to Insights to Support Data
Analytics Democratization [60.18814584837969]
We present Explainable Patterns (ExPatt), a new framework to support lay users in exploring and creating data storytellings.
ExPatt automatically generates plausible explanations for observed or selected findings using an external (textual) source of information.
arXiv Detail & Related papers (2021-01-19T16:13:44Z) - Dataset Security for Machine Learning: Data Poisoning, Backdoor Attacks,
and Defenses [150.64470864162556]
This work systematically categorizes and discusses a wide range of dataset vulnerabilities and exploits.
In addition to describing various poisoning and backdoor threat models and the relationships among them, we develop their unified taxonomy.
arXiv Detail & Related papers (2020-12-18T22:38:47Z) - Towards Accountability for Machine Learning Datasets: Practices from
Software Engineering and Infrastructure [9.825840279544465]
datasets which empower machine learning are often used, shared and re-used with little visibility into the processes of deliberation which led to their creation.
This paper introduces a rigorous framework for dataset development transparency which supports decision-making and accountability.
arXiv Detail & Related papers (2020-10-23T01:57:42Z) - From Data to Knowledge to Action: A Global Enabler for the 21st Century [26.32590947516587]
A confluence of advances in the computer and mathematical sciences has unleashed unprecedented capabilities for enabling true evidence-based decision making.
These capabilities are making possible the large-scale capture of data and the transformation of that data into insights and recommendations.
The shift of commerce, science, education, art, and entertainment to the web makes available unprecedented quantities of structured and unstructured databases about human activities.
arXiv Detail & Related papers (2020-07-31T19:19:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.