Prink: $k_s$-Anonymization for Streaming Data in Apache Flink
- URL: http://arxiv.org/abs/2505.13153v1
- Date: Mon, 19 May 2025 14:13:30 GMT
- Title: Prink: $k_s$-Anonymization for Streaming Data in Apache Flink
- Authors: Philip Groneberg, Saskia Nuñez von Voigt, Thomas Janke, Louis Loechel, Karl Wolf, Elias Grünewald, Frank Pallas,
- Abstract summary: We present Prink, a novel concept and fully implemented prototype for ks-anonymizing data streams in real-world application architectures.<n>Prink for the first time introduces semantics-aware ks-anonymization of non-numerical (such as categorical or hierarchically generalizable) streaming data in a information loss-optimized manner.
- Score: 0.6282171844772422
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we present Prink, a novel and practically applicable concept and fully implemented prototype for ks-anonymizing data streams in real-world application architectures. Building upon the pre-existing, yet rudimentary CASTLE scheme, Prink for the first time introduces semantics-aware ks-anonymization of non-numerical (such as categorical or hierarchically generalizable) streaming data in a information loss-optimized manner. In addition, it provides native integration into Apache Flink, one of the prevailing frameworks for enterprise-grade stream data processing in numerous application domains. Our contributions excel the previously established state of the art for the privacy guarantee-providing anonymization of streaming data in that they 1) allow to include non-numerical data in the anonymization process, 2) provide discrete datapoints instead of aggregates, thereby facilitating flexible data use, 3) are applicable in real-world system contexts with minimal integration efforts, and 4) are experimentally proven to raise acceptable performance overheads and information loss in realistic settings. With these characteristics, Prink provides an anonymization approach which is practically feasible for a broad variety of real-world, enterprise-grade stream processing applications and environments.
Related papers
- T2UE: Generating Unlearnable Examples from Text Descriptions [60.111026156038264]
Unlearnable Examples (UEs) have emerged as a promising countermeasure against unauthorized model training.<n>We introduce textbfText-to-Unlearnable Example (T2UE), a novel framework that enables users to generate UEs using only text descriptions.
arXiv Detail & Related papers (2025-08-05T05:10:14Z) - From Past to Present: A Survey of Malicious URL Detection Techniques, Datasets and Code Repositories [3.323388021979584]
Malicious URLs persistently threaten the cybersecurity ecosystem, by either deceiving users into divulging private data or distributing harmful payloads to infiltrate host systems.<n>This review systematically analyzes methods from traditional blacklisting to advanced deep learning approaches.<n>Unlike prior surveys, we propose a novel modality-based taxonomy that categorizes existing works according to their primary data modalities.
arXiv Detail & Related papers (2025-04-23T06:23:18Z) - Robust Utility-Preserving Text Anonymization Based on Large Language Models [80.5266278002083]
Text anonymization is crucial for sharing sensitive data while maintaining privacy.
Existing techniques face the emerging challenges of re-identification attack ability of Large Language Models.
This paper proposes a framework composed of three LLM-based components -- a privacy evaluator, a utility evaluator, and an optimization component.
arXiv Detail & Related papers (2024-07-16T14:28:56Z) - Causally Inspired Regularization Enables Domain General Representations [14.036422506623383]
Given a causal graph representing the data-generating process shared across different domains/distributions, enforcing sufficient graph-implied conditional independencies can identify domain-general (non-spurious) feature representations.
We propose a novel framework with regularizations, which we demonstrate are sufficient for identifying domain-general feature representations without a priori knowledge (or proxies) of the spurious features.
Our proposed method is effective for both (semi) synthetic and real-world data, outperforming other state-of-the-art methods in average and worst-domain transfer accuracy.
arXiv Detail & Related papers (2024-04-25T01:33:55Z) - Fake It Till Make It: Federated Learning with Consensus-Oriented
Generation [52.82176415223988]
We propose federated learning with consensus-oriented generation (FedCOG)
FedCOG consists of two key components at the client side: complementary data generation and knowledge-distillation-based model training.
Experiments on classical and real-world FL datasets show that FedCOG consistently outperforms state-of-the-art methods.
arXiv Detail & Related papers (2023-12-10T18:49:59Z) - FedSIS: Federated Split Learning with Intermediate Representation
Sampling for Privacy-preserving Generalized Face Presentation Attack
Detection [4.1897081000881045]
Lack of generalization to unseen domains/attacks is the Achilles heel of most face presentation attack detection (FacePAD) algorithms.
In this work, a novel framework called Federated Split learning with Intermediate representation Sampling (FedSIS) is introduced for privacy-preserving domain generalization.
arXiv Detail & Related papers (2023-08-20T11:49:12Z) - PS-FedGAN: An Efficient Federated Learning Framework Based on Partially
Shared Generative Adversarial Networks For Data Privacy [56.347786940414935]
Federated Learning (FL) has emerged as an effective learning paradigm for distributed computation.
This work proposes a novel FL framework that requires only partial GAN model sharing.
Named as PS-FedGAN, this new framework enhances the GAN releasing and training mechanism to address heterogeneous data distributions.
arXiv Detail & Related papers (2023-05-19T05:39:40Z) - Modeling Entities as Semantic Points for Visual Information Extraction
in the Wild [55.91783742370978]
We propose an alternative approach to precisely and robustly extract key information from document images.
We explicitly model entities as semantic points, i.e., center points of entities are enriched with semantic information describing the attributes and relationships of different entities.
The proposed method can achieve significantly enhanced performance on entity labeling and linking, compared with previous state-of-the-art models.
arXiv Detail & Related papers (2023-03-23T08:21:16Z) - FedLAP-DP: Federated Learning by Sharing Differentially Private Loss Approximations [53.268801169075836]
We propose FedLAP-DP, a novel privacy-preserving approach for federated learning.
A formal privacy analysis demonstrates that FedLAP-DP incurs the same privacy costs as typical gradient-sharing schemes.
Our approach presents a faster convergence speed compared to typical gradient-sharing methods.
arXiv Detail & Related papers (2023-02-02T12:56:46Z) - FV-UPatches: Enhancing Universality in Finger Vein Recognition [0.6299766708197883]
We propose a universal learning-based framework, which achieves generalization while training with limited data.
The proposed framework shows application potential in other vein-based biometric recognition as well.
arXiv Detail & Related papers (2022-06-02T14:20:22Z) - FedAvg with Fine Tuning: Local Updates Lead to Representation Learning [54.65133770989836]
Federated Averaging (FedAvg) algorithm consists of alternating between a few local gradient updates at client nodes, followed by a model averaging update at the server.
We show that the reason behind generalizability of the FedAvg's output is its power in learning the common data representation among the clients' tasks.
We also provide empirical evidence demonstrating FedAvg's representation learning ability in federated image classification with heterogeneous data.
arXiv Detail & Related papers (2022-05-27T00:55:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.