Automatic Anonymization of Swiss Federal Supreme Court Rulings
- URL: http://arxiv.org/abs/2310.04632v2
- Date: Tue, 31 Oct 2023 22:53:02 GMT
- Title: Automatic Anonymization of Swiss Federal Supreme Court Rulings
- Authors: Joel Niklaus, Robin Mami\'e, Matthias St\"urmer, Daniel Brunner,
Marcel Gygli
- Abstract summary: We enhance the existing anonymization software using a large dataset annotated with entities to be anonymized.
Our results show that using in-domain data to pre-train the models further improves the F1-score by more than 5% compared to existing models.
- Score: 2.1963472367016426
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Releasing court decisions to the public relies on proper anonymization to
protect all involved parties, where necessary. The Swiss Federal Supreme Court
relies on an existing system that combines different traditional computational
methods with human experts. In this work, we enhance the existing anonymization
software using a large dataset annotated with entities to be anonymized. We
compared BERT-based models with models pre-trained on in-domain data. Our
results show that using in-domain data to pre-train the models further improves
the F1-score by more than 5\% compared to existing models. Our work
demonstrates that combining existing anonymization methods, such as regular
expressions, with machine learning can further reduce manual labor and enhance
automatic suggestions.
Related papers
- Anonymization of Documents for Law Enforcement with Machine Learning [1.237454174824584]
We present a system for automatically anonymizing images of scanned documents.
Our method considers the viability of further forensic processing after anonymization.
We show that our approach outperforms both a purely automatic redaction system and also a naive copy-paste scheme of the reference anonymization.
arXiv Detail & Related papers (2025-01-13T13:47:00Z) - Robust Utility-Preserving Text Anonymization Based on Large Language Models [80.5266278002083]
Text anonymization is crucial for sharing sensitive data while maintaining privacy.
Existing techniques face the emerging challenges of re-identification attack ability of Large Language Models.
This paper proposes a framework composed of three LLM-based components -- a privacy evaluator, a utility evaluator, and an optimization component.
arXiv Detail & Related papers (2024-07-16T14:28:56Z) - SEBA: Strong Evaluation of Biometric Anonymizations [3.18294468240512]
We introduce SEBA, a framework for strong evaluation of biometric anonymizations.
It combines and implements the state-of-the-art methodology in an easy-to-use and easy-to-expand software framework.
As part of this discourse, we introduce and discuss new metrics that allow for a more straightforward evaluation of the privacy-utility trade-off.
arXiv Detail & Related papers (2024-07-09T08:20:03Z) - Federated Face Forgery Detection Learning with Personalized Representation [63.90408023506508]
Deep generator technology can produce high-quality fake videos that are indistinguishable, posing a serious social threat.
Traditional forgery detection methods directly centralized training on data.
The paper proposes a novel federated face forgery detection learning with personalized representation.
arXiv Detail & Related papers (2024-06-17T02:20:30Z) - Can Public Large Language Models Help Private Cross-device Federated Learning? [58.05449579773249]
We study (differentially) private federated learning (FL) of language models.
Public data has been used to improve privacy-utility trade-offs for both large and small language models.
We propose a novel distribution matching algorithm with theoretical grounding to sample public data close to private data distribution.
arXiv Detail & Related papers (2023-05-20T07:55:58Z) - LDFA: Latent Diffusion Face Anonymization for Self-driving Applications [3.501026362812183]
We introduce a novel deep learning-based pipeline for face anonymization in the context of ITS.
We propose a two-stage method, which contains a face detection model followed by a latent diffusion model to generate realistic face in-paintings.
Our experiment reveal that our pipeline is better suited to anonymize data for segmentation than naive methods and performes comparably with recent GAN-based methods.
arXiv Detail & Related papers (2023-02-17T15:14:00Z) - Cluster-level pseudo-labelling for source-free cross-domain facial
expression recognition [94.56304526014875]
We propose the first Source-Free Unsupervised Domain Adaptation (SFUDA) method for Facial Expression Recognition (FER)
Our method exploits self-supervised pretraining to learn good feature representations from the target data.
We validate the effectiveness of our method in four adaptation setups, proving that it consistently outperforms existing SFUDA methods when applied to FER.
arXiv Detail & Related papers (2022-10-11T08:24:50Z) - Few-Shot Named Entity Recognition: A Comprehensive Study [92.40991050806544]
We investigate three schemes to improve the model generalization ability for few-shot settings.
We perform empirical comparisons on 10 public NER datasets with various proportions of labeled data.
We create new state-of-the-art results on both few-shot and training-free settings.
arXiv Detail & Related papers (2020-12-29T23:43:16Z) - Anonymizing Machine Learning Models [0.0]
Anonymized data is exempt from obligations set out in regulations such as the EU General Data Protection Regulation.
We propose a method that is able to achieve better model accuracy by using the knowledge encoded within the trained model.
We also demonstrate that our approach has a similar, and sometimes even better ability to prevent membership attacks as approaches based on differential privacy.
arXiv Detail & Related papers (2020-07-26T09:29:03Z) - Sensitive Data Detection and Classification in Spanish Clinical Text:
Experiments with BERT [0.8379286663107844]
In this paper, we use a BERT-based sequence labelling model to conduct anonymisation experiments in Spanish.
Experiments show that a simple BERT-based model with general-domain pre-training obtains highly competitive results without any domain specific feature engineering.
arXiv Detail & Related papers (2020-03-06T09:46:51Z) - Intra-Camera Supervised Person Re-Identification [87.88852321309433]
We propose a novel person re-identification paradigm based on an idea of independent per-camera identity annotation.
This eliminates the most time-consuming and tedious inter-camera identity labelling process.
We formulate a Multi-tAsk mulTi-labEl (MATE) deep learning method for Intra-Camera Supervised (ICS) person re-id.
arXiv Detail & Related papers (2020-02-12T15:26:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.