Related papers: Data Collection of Real-Life Knowledge Work in Context: The RLKWiC Dataset

Data Collection of Real-Life Knowledge Work in Context: The RLKWiC Dataset

URL: http://arxiv.org/abs/2404.10505v1
Date: Tue, 16 Apr 2024 12:23:59 GMT
Title: Data Collection of Real-Life Knowledge Work in Context: The RLKWiC Dataset
Authors: Mahta Bakhshizadeh, Christian Jilek, Markus Schröder, Heiko Maus, Andreas Dengel,
Abstract summary: This paper presents RLKWiC, a novel dataset of Real-Life Knowledge Work in Context. RLKWiC is the first publicly available dataset offering a wealth of essential information dimensions.
Score: 4.388282062290401
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Over the years, various approaches have been employed to enhance the productivity of knowledge workers, from addressing psychological well-being to the development of personal knowledge assistants. A significant challenge in this research area has been the absence of a comprehensive, publicly accessible dataset that mirrors real-world knowledge work. Although a handful of datasets exist, many are restricted in access or lack vital information dimensions, complicating meaningful comparison and benchmarking in the domain. This paper presents RLKWiC, a novel dataset of Real-Life Knowledge Work in Context, derived from monitoring the computer interactions of eight participants over a span of two months. As the first publicly available dataset offering a wealth of essential information dimensions (such as explicated contexts, textual contents, and semantics), RLKWiC seeks to address the research gap in the personal information management domain, providing valuable insights for modeling user behavior.

Related papers

Object Recognition Datasets and Challenges: A Review [5.638005500131518]
We provide a detailed analysis of datasets in the highly investigated object recognition areas.<n>We present an overview of the prominent object recognition benchmarks and competitions.<n>All introduced datasets and challenges can be found online at.com/AbtinDjavadifar/ORDC.
arXiv Detail & Related papers (2025-07-30T03:56:37Z)
Using Large Language Models to Generate Authentic Multi-agent Knowledge Work Datasets [5.465422605475246]
Current publicly available knowledge work data collections lack diversity, extensive annotations, and contextual information about the users and their documents. This paper introduces our approach's design and vision and focuses on generating authentic knowledge work documents using Large Language Models. Our study involving human raters who assessed 53% of the generated and 74% of the real documents as realistic demonstrates the potential of our approach.
arXiv Detail & Related papers (2024-09-06T13:53:28Z)
LLM-PBE: Assessing Data Privacy in Large Language Models [111.58198436835036]
Large Language Models (LLMs) have become integral to numerous domains, significantly advancing applications in data management, mining, and analysis. Despite the critical nature of this issue, there has been no existing literature to offer a comprehensive assessment of data privacy risks in LLMs. Our paper introduces LLM-PBE, a toolkit crafted specifically for the systematic evaluation of data privacy risks in LLMs.
arXiv Detail & Related papers (2024-08-23T01:37:29Z)
MMSci: A Dataset for Graduate-Level Multi-Discipline Multimodal Scientific Understanding [59.41495657570397]
We present a comprehensive dataset compiled from Nature Communications articles covering 72 scientific fields. We evaluated 19 proprietary and open-source models on two benchmark tasks, figure captioning and multiple-choice, and conducted human expert annotation. Fine-tuning Qwen2-VL-7B with our task-specific data achieved better performance than GPT-4o and even human experts in multiple-choice evaluations.
arXiv Detail & Related papers (2024-07-06T00:40:53Z)
Collection, usage and privacy of mobility data in the enterprise and public administrations [55.2480439325792]
Security measures such as anonymization are needed to protect individuals' privacy. Within our study, we conducted expert interviews to gain insights into practices in the field. We survey privacy-enhancing methods in use, which generally do not comply with state-of-the-art standards of differential privacy.
arXiv Detail & Related papers (2024-07-04T08:29:27Z)
Data-Centric AI in the Age of Large Language Models [51.20451986068925]
This position paper proposes a data-centric viewpoint of AI research, focusing on large language models (LLMs) We make the key observation that data is instrumental in the developmental (e.g., pretraining and fine-tuning) and inferential stages (e.g., in-context learning) of LLMs. We identify four specific scenarios centered around data, covering data-centric benchmarks and data curation, data attribution, knowledge transfer, and inference contextualization.
arXiv Detail & Related papers (2024-06-20T16:34:07Z)
Query of CC: Unearthing Large Scale Domain-Specific Knowledge from Public Corpora [104.16648246740543]
We propose an efficient data collection method based on large language models. The method bootstraps seed information through a large language model and retrieves related data from public corpora. It not only collects knowledge-related data for specific domains but unearths the data with potential reasoning procedures.
arXiv Detail & Related papers (2024-01-26T03:38:23Z)
Capture the Flag: Uncovering Data Insights with Large Language Models [90.47038584812925]
This study explores the potential of using Large Language Models (LLMs) to automate the discovery of insights in data. We propose a new evaluation methodology based on a "capture the flag" principle, measuring the ability of such models to recognize meaningful and pertinent information (flags) in a dataset.
arXiv Detail & Related papers (2023-12-21T14:20:06Z)
Interpreting Deep Knowledge Tracing Model on EdNet Dataset [67.81797777936868]
In this work, we perform the similar tasks but on a large and newly available dataset, called EdNet. The preliminary experiment results show the effectiveness of the interpreting techniques.
arXiv Detail & Related papers (2021-10-31T07:18:59Z)
Data and its (dis)contents: A survey of dataset development and use in machine learning research [11.042648980854487]
We survey the many concerns raised about the way we collect and use data in machine learning. We advocate that a more cautious and thorough understanding of data is necessary to address several of the practical and ethical issues of the field.
arXiv Detail & Related papers (2020-12-09T22:13:13Z)
Bringing the People Back In: Contesting Benchmark Machine Learning Datasets [11.00769651520502]
We outline a research program - a genealogy of machine learning data - for investigating how and why these datasets have been created. We describe the ways in which benchmark datasets in machine learning operate as infrastructure and pose four research questions for these datasets.
arXiv Detail & Related papers (2020-07-14T23:22:13Z)
Ontologies in CLARIAH: Towards Interoperability in History, Language and Media [0.05277024349608833]
One of the most important goals of digital humanities is to provide researchers with data and tools for new research questions. The FAIR principles provide a framework as these state that data needs to be: Findable, as they are often scattered among various sources; Accessible, since some might be offline or behind paywalls; Interoperable, thus using standard knowledge representation formats and shared. We describe the tools developed and integrated in the Dutch national project CLARIAH to address these issues.
arXiv Detail & Related papers (2020-04-06T17:38:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.