Invisible Data Curation Practices: A Case Study from Facility Management
- URL: http://arxiv.org/abs/2112.01225v1
- Date: Fri, 5 Nov 2021 09:09:42 GMT
- Title: Invisible Data Curation Practices: A Case Study from Facility Management
- Authors: Tor Sporsem, Morten Hatling and Marius Mikalsen
- Abstract summary: In facility management sector, companies seek to ex-tract value from data about their buildings.
Janitors are becoming involved in data curation.
This paper investigates how janitors' data curation practices shape data being produced in three facility management-tions.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Facility management, which concerns the administration, operations, and
mainte-nance of buildings, is a sector undergoing significant changes while
becoming digitalized and data driven. In facility management sector, companies
seek to ex-tract value from data about their buildings. As a consequence,
craftsmen, such as janitors, are becoming involved in data curation. Data
curation refers to activities related to cleaning, assembling, setting up, and
stewarding data to make them fit existing templates. Craftsmen in facility
management, despite holding a pivotal role for successful data curation in the
domain, are understudied and disregarded. To remedy this, our holistic case
study investigates how janitors' data curation practices shape the data being
produced in three facility management organiza-tions. Our findings illustrate
the unfortunate that janitors are treated more like a sensor than a human data
curator. This treatment makes them less engaged in data curation, and hence do
not engage in a much necessary correction of essential fa-cility data. We apply
the conceptual lens of invisible work - work that blends into the background
and is taken for granted - to explain why this happens and how data comes to
be. The findings also confirm the usefulness of a previously pro-posed
analytical framework by using it to interpret data curation practices within
facility management. The paper contributes to practitioners by proposing
training and education in data curation.
Related papers
- Machine Learning Data Practices through a Data Curation Lens: An Evaluation Framework [1.5993707490601146]
We evaluate data practices in machine learning as data curation practices.
We find that researchers in machine learning, which often emphasizes model development, struggle to apply standard data curation principles.
arXiv Detail & Related papers (2024-05-04T16:21:05Z) - Retrieval Augmented Thought Process for Private Data Handling in Healthcare [53.89406286212502]
We introduce the Retrieval-Augmented Thought Process (RATP)
RATP formulates the thought generation of Large Language Models (LLMs)
On a private dataset of electronic medical records, RATP achieves 35% additional accuracy compared to in-context retrieval-augmented generation for the question-answering task.
arXiv Detail & Related papers (2024-02-12T17:17:50Z) - Data Management For Training Large Language Models: A Survey [64.18200694790787]
Data plays a fundamental role in training Large Language Models (LLMs)
This survey aims to provide a comprehensive overview of current research in data management within both the pretraining and supervised fine-tuning stages of LLMs.
arXiv Detail & Related papers (2023-12-04T07:42:16Z) - A Data-Based Perspective on Transfer Learning [76.30206800557411]
We take a closer look at the role of the source dataset's composition in transfer learning.
Our framework gives rise to new capabilities such as pinpointing transfer learning brittleness.
arXiv Detail & Related papers (2022-07-12T17:58:28Z) - Leveraging Machine Learning to Detect Data Curation Activities [1.9949261242626626]
This paper describes a machine learning approach for annotating and analyzing data curation work logs at ICPSR.
Repository staff use systems to organize, prioritize, and document curation work done on datasets.
A key challenge is classifying similar activities so that they can be measured and associated with impact metrics.
arXiv Detail & Related papers (2021-04-30T18:17:18Z) - Dataset Security for Machine Learning: Data Poisoning, Backdoor Attacks,
and Defenses [150.64470864162556]
This work systematically categorizes and discusses a wide range of dataset vulnerabilities and exploits.
In addition to describing various poisoning and backdoor threat models and the relationships among them, we develop their unified taxonomy.
arXiv Detail & Related papers (2020-12-18T22:38:47Z) - On the application of transfer learning in prognostics and health
management [0.0]
Data availability has encouraged researchers and industry practitioners to rely on data-based machine learning.
Deep learning, models for fault diagnostics and prognostics more than ever.
These models provide unique advantages, however, their performance is heavily dependent on the training data and how well that data represents the test data.
transfer learning is an approach that can remedy this issue by keeping portions of what is learned from previous training and transferring them to the new application.
arXiv Detail & Related papers (2020-07-03T23:35:18Z) - Provably Efficient Causal Reinforcement Learning with Confounded
Observational Data [135.64775986546505]
We study how to incorporate the dataset (observational data) collected offline, which is often abundantly available in practice, to improve the sample efficiency in the online setting.
We propose the deconfounded optimistic value iteration (DOVI) algorithm, which incorporates the confounded observational data in a provably efficient manner.
arXiv Detail & Related papers (2020-06-22T14:49:33Z) - Knowledge Scientists: Unlocking the data-driven organization [5.05432938384774]
We argue that the technologies for reliable data are driven by distinct concerns and expertise.
Those organizations which identify the central importance of meaningful, explainable, reproducible, and maintainable data will be at the forefront of the democratization of reliable data.
arXiv Detail & Related papers (2020-04-16T20:14:20Z) - How Useful is Self-Supervised Pretraining for Visual Tasks? [133.1984299177874]
We evaluate various self-supervised algorithms across a comprehensive array of synthetic datasets and downstream tasks.
Our experiments offer insights into how the utility of self-supervision changes as the number of available labels grows.
arXiv Detail & Related papers (2020-03-31T16:03:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.