Privacy-preserving Fuzzy Name Matching for Sharing Financial Intelligence
- URL: http://arxiv.org/abs/2407.19979v2
- Date: Fri, 08 Nov 2024 12:24:53 GMT
- Title: Privacy-preserving Fuzzy Name Matching for Sharing Financial Intelligence
- Authors: Harsh Kasyap, Ugur Ilker Atmaca, Carsten Maple, Graham Cormode, Jiancong He,
- Abstract summary: We introduce a novel privacy-preserving scheme for fuzzy name matching across institutions.
It takes around 100 and 1000 seconds to search 1000 names from 10k and 100k names, respectively.
It exhibits significant performance improvement in reducing communication overhead by 30-300 times.
- Score: 13.323602505055245
- License:
- Abstract: Financial institutions rely on data for many operations, including a need to drive efficiency, enhance services and prevent financial crime. Data sharing across an organisation or between institutions can facilitate rapid, evidence-based decision-making, including identifying money laundering and fraud. However, modern data privacy regulations impose restrictions on data sharing. For this reason, privacy-enhancing technologies are being increasingly employed to allow organisations to derive shared intelligence while ensuring regulatory compliance. This paper examines the case in which regulatory restrictions mean a party cannot share data on accounts of interest with another (internal or external) party to determine individuals that hold accounts in both datasets. The names of account holders may be recorded differently in each dataset. We introduce a novel privacy-preserving scheme for fuzzy name matching across institutions, employing fully homomorphic encryption over MinHash signatures. The efficiency of the proposed scheme is enhanced using a clustering mechanism. Our scheme ensures privacy by only revealing the possibility of a potential match to the querying party. The practicality and effectiveness are evaluated using different datasets, and compared against state-of-the-art schemes. It takes around 100 and 1000 seconds to search 1000 names from 10k and 100k names, respectively, meeting the requirements of financial institutions. Furthermore, it exhibits significant performance improvement in reducing communication overhead by 30-300 times.
Related papers
- Privacy-Preserving Dataset Combination [1.9168342959190845]
We present SecureKL, a privacy-preserving framework that enables organizations to identify beneficial data partnerships without exposing sensitive information.
In experiments with real-world hospital data, SecureKL successfully identifies beneficial data partnerships that improve model performance.
These results demonstrate the potential for privacy-preserving data collaboration to advance machine learning applications in high-stakes domains.
arXiv Detail & Related papers (2025-02-09T03:54:17Z) - Wasserstein Markets for Differentially-Private Data [1.4266656344673316]
Data markets provide a means to enable wider access as well as determine the appropriate privacy-utility trade-off.
Existing data market frameworks either require a trusted third party to perform expensive valuations or are unable to capture the nature of data value.
This paper proposes a valuation mechanism based on the Wasserstein distance for differentially-private data, and corresponding procurement mechanisms.
arXiv Detail & Related papers (2024-12-03T17:40:26Z) - Breaking the Communication-Privacy-Accuracy Tradeoff with
$f$-Differential Privacy [51.11280118806893]
We consider a federated data analytics problem in which a server coordinates the collaborative data analysis of multiple users with privacy concerns and limited communication capability.
We study the local differential privacy guarantees of discrete-valued mechanisms with finite output space through the lens of $f$-differential privacy (DP)
More specifically, we advance the existing literature by deriving tight $f$-DP guarantees for a variety of discrete-valued mechanisms.
arXiv Detail & Related papers (2023-02-19T16:58:53Z) - Tight Auditing of Differentially Private Machine Learning [77.38590306275877]
For private machine learning, existing auditing mechanisms are tight.
They only give tight estimates under implausible worst-case assumptions.
We design an improved auditing scheme that yields tight privacy estimates for natural (not adversarially crafted) datasets.
arXiv Detail & Related papers (2023-02-15T21:40:33Z) - A Privacy-Preserving Hybrid Federated Learning Framework for Financial
Crime Detection [27.284477227066972]
We propose a hybrid federated learning system that offers secure and privacy-aware learning and inference for financial crime detection.
We conduct extensive empirical studies to evaluate the proposed framework's detection performance and privacy-protection capability.
arXiv Detail & Related papers (2023-02-07T18:12:48Z) - Collective Privacy Recovery: Data-sharing Coordination via Decentralized
Artificial Intelligence [2.309914459672557]
We show how to automate and scale-up complex collective arrangements for privacy recovery.
We compare for first time attitudinal, intrinsic, rewarded and coordinated data sharing.
Strikingly, data-sharing coordination proves to be a win-win for all.
arXiv Detail & Related papers (2023-01-15T01:36:46Z) - Smooth Anonymity for Sparse Graphs [69.1048938123063]
differential privacy has emerged as the gold standard of privacy, however, when it comes to sharing sparse datasets.
In this work, we consider a variation of $k$-anonymity, which we call smooth-$k$-anonymity, and design simple large-scale algorithms that efficiently provide smooth-$k$-anonymity.
arXiv Detail & Related papers (2022-07-13T17:09:25Z) - Data Sharing Markets [95.13209326119153]
We study a setup where each agent can be both buyer and seller of data.
We consider two cases: bilateral data exchange (trading data with data) and unilateral data exchange (trading data with money)
arXiv Detail & Related papers (2021-07-19T06:00:34Z) - Second layer data governance for permissioned blockchains: the privacy
management challenge [58.720142291102135]
In pandemic situations, such as the COVID-19 and Ebola outbreak, the action related to sharing health data is crucial to avoid the massive infection and decrease the number of deaths.
In this sense, permissioned blockchain technology emerges to empower users to get their rights providing data ownership, transparency, and security through an immutable, unified, and distributed database ruled by smart contracts.
arXiv Detail & Related papers (2020-10-22T13:19:38Z) - Differential Privacy of Hierarchical Census Data: An Optimization
Approach [53.29035917495491]
Census Bureaus are interested in releasing aggregate socio-economic data about a large population without revealing sensitive information about any individual.
Recent events have identified some of the privacy challenges faced by these organizations.
This paper presents a novel differential-privacy mechanism for releasing hierarchical counts of individuals.
arXiv Detail & Related papers (2020-06-28T18:19:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.