Related papers: PASTA-4-PHT: A Pipeline for Automated Security and Technical Audits for the Personal Health Train

PASTA-4-PHT: A Pipeline for Automated Security and Technical Audits for the Personal Health Train

URL: http://arxiv.org/abs/2412.01275v1
Date: Mon, 02 Dec 2024 08:43:40 GMT
Title: PASTA-4-PHT: A Pipeline for Automated Security and Technical Audits for the Personal Health Train
Authors: Sascha Welten, Karl Kindermann, Ahmet Polat, Martin Görz, Maximilian Jugl, Laurenz Neumann, Alexander Neumann, Johannes Lohmöller, Jan Pennekamp, Stefan Decker,
Abstract summary: This work discusses a PHT-aligned security and audit pipeline inspired by DevSecOps principles.<n>We introduce vulnerabilities into a PHT and apply our pipeline to five real-world PHTs, which have been utilised in real-world studies.<n>Ultimately, our work contributes to an increased security and overall transparency of data processing activities within the PHT framework.
Score: 34.203290179252555
License: http://creativecommons.org/licenses/by/4.0/
Abstract: With the introduction of data protection regulations, the need for innovative privacy-preserving approaches to process and analyse sensitive data has become apparent. One approach is the Personal Health Train (PHT) that brings analysis code to the data and conducts the data processing at the data premises. However, despite its demonstrated success in various studies, the execution of external code in sensitive environments, such as hospitals, introduces new research challenges because the interactions of the code with sensitive data are often incomprehensible and lack transparency. These interactions raise concerns about potential effects on the data and increases the risk of data breaches. To address this issue, this work discusses a PHT-aligned security and audit pipeline inspired by DevSecOps principles. The automated pipeline incorporates multiple phases that detect vulnerabilities. To thoroughly study its versatility, we evaluate this pipeline in two ways. First, we deliberately introduce vulnerabilities into a PHT. Second, we apply our pipeline to five real-world PHTs, which have been utilised in real-world studies, to audit them for potential vulnerabilities. Our evaluation demonstrates that our designed pipeline successfully identifies potential vulnerabilities and can be applied to real-world studies. In compliance with the requirements of the GDPR for data management, documentation, and protection, our automated approach supports researchers using in their data-intensive work and reduces manual overhead. It can be used as a decision-making tool to assess and document potential vulnerabilities in code for data processing. Ultimately, our work contributes to an increased security and overall transparency of data processing activities within the PHT framework.

Related papers

Beyond the Safety Bundle: Auditing the Helpful and Harmless Dataset [4.522849055040843]
This study audited the Helpful and Harmless dataset by Anthropic. Our findings highlight the need for more nuanced, context-sensitive approaches to safety mitigation in large language models.
arXiv Detail & Related papers (2024-11-12T23:43:20Z)
AutoPT: How Far Are We from the End2End Automated Web Penetration Testing? [54.65079443902714]
We introduce AutoPT, an automated penetration testing agent based on the principle of PSM driven by LLMs. Our results show that AutoPT outperforms the baseline framework ReAct on the GPT-4o mini model.
arXiv Detail & Related papers (2024-11-02T13:24:30Z)
Data Quality Issues in Vulnerability Detection Datasets [1.6114012813668932]
Vulnerability detection is a crucial yet challenging task to identify potential weaknesses in software for cyber security. Deep learning (DL) has made great progress in automating the detection process. Many datasets have been created to train DL models for this purpose. However, these datasets suffer from several issues that will lead to low detection accuracy of DL models.
arXiv Detail & Related papers (2024-10-08T13:31:29Z)
EARBench: Towards Evaluating Physical Risk Awareness for Task Planning of Foundation Model-based Embodied AI Agents [53.717918131568936]
Embodied artificial intelligence (EAI) integrates advanced AI models into physical entities for real-world interaction. Foundation models as the "brain" of EAI agents for high-level task planning have shown promising results. However, the deployment of these agents in physical environments presents significant safety challenges. This study introduces EARBench, a novel framework for automated physical risk assessment in EAI scenarios.
arXiv Detail & Related papers (2024-08-08T13:19:37Z)
Generative AI for Secure and Privacy-Preserving Mobile Crowdsensing [74.58071278710896]
generative AI has attracted much attention from both academic and industrial fields. Secure and privacy-preserving mobile crowdsensing (SPPMCS) has been widely applied in data collection/ acquirement.
arXiv Detail & Related papers (2024-05-17T04:00:58Z)
Secure and Verifiable Data Collaboration with Low-Cost Zero-Knowledge Proofs [30.260427020479536]
In this paper, we propose a novel and highly efficient solution RiseFL for secure and verifiable data collaboration. Firstly, we devise a probabilistic integrity check method that significantly reduces the cost of ZKP generation and verification. Thirdly, we design a hybrid commitment scheme to satisfy Byzantine robustness with improved performance.
arXiv Detail & Related papers (2023-11-26T14:19:46Z)
Integration of Domain Expert-Centric Ontology Design into the CRISP-DM for Cyber-Physical Production Systems [45.05372822216111]
Methods from Machine Learning (ML) and Data Mining (DM) have proven to be promising in extracting complex and hidden patterns from the data collected. However, such data-driven projects, usually performed with the Cross-Industry Standard Process for Data Mining (CRISPDM), often fail due to the disproportionate amount of time needed for understanding and preparing the data. This contribution intends present an integrated approach so that data scientists are able to more quickly and reliably gain insights into the CPPS challenges.
arXiv Detail & Related papers (2023-07-21T15:04:00Z)
Towards Generalizable Data Protection With Transferable Unlearnable Examples [50.628011208660645]
We present a novel, generalizable data protection method by generating transferable unlearnable examples. To the best of our knowledge, this is the first solution that examines data privacy from the perspective of data distribution.
arXiv Detail & Related papers (2023-05-18T04:17:01Z)
Stop Uploading Test Data in Plain Text: Practical Strategies for Mitigating Data Contamination by Evaluation Benchmarks [70.39633252935445]
Data contamination has become prevalent and challenging with the rise of models pretrained on large automatically-crawled corpora. For closed models, the training data becomes a trade secret, and even for open models, it is not trivial to detect contamination. We propose three strategies that can make a difference: (1) Test data made public should be encrypted with a public key and licensed to disallow derivative distribution; (2) demand training exclusion controls from closed API holders, and protect your test data by refusing to evaluate without them; and (3) avoid data which appears with its solution on the internet, and release the web-page context of internet-derived
arXiv Detail & Related papers (2023-05-17T12:23:38Z)
Bringing the Algorithms to the Data -- Secure Distributed Medical Analytics using the Personal Health Train (PHT-meDIC) [1.451998131020241]
Personal Health Train (PHT) paradigm implements an 'algorithm to the data' paradigm. We present PHT-meDIC, a productively deployed open-source implementation of the PHT concept.
arXiv Detail & Related papers (2022-12-07T06:29:15Z)
VELVET: a noVel Ensemble Learning approach to automatically locate VulnErable sTatements [62.93814803258067]
This paper presents VELVET, a novel ensemble learning approach to locate vulnerable statements in source code. Our model combines graph-based and sequence-based neural networks to successfully capture the local and global context of a program graph. VELVET achieves 99.6% and 43.6% top-1 accuracy over synthetic data and real-world data, respectively.
arXiv Detail & Related papers (2021-12-20T22:45:27Z)
Privacy Preservation in Federated Learning: An insightful survey from the GDPR Perspective [10.901568085406753]
Article is dedicated to surveying on the state-of-the-art privacy techniques, which can be employed in Federated learning. Recent research has demonstrated that retaining data and on computation in FL is not enough for privacy-guarantee. This is because ML model parameters exchanged between parties in an FL system, which can be exploited in some privacy attacks.
arXiv Detail & Related papers (2020-11-10T21:41:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.