NeurIPS 2023 Competition: Privacy Preserving Federated Learning Document VQA
- URL: http://arxiv.org/abs/2411.03730v1
- Date: Wed, 06 Nov 2024 07:51:19 GMT
- Title: NeurIPS 2023 Competition: Privacy Preserving Federated Learning Document VQA
- Authors: Marlon Tobaben, Mohamed Ali Souibgui, Rubèn Tito, Khanh Nguyen, Raouf Kerkouche, Kangsoo Jung, Joonas Jälkö, Lei Kang, Andrey Barsky, Vincent Poulain d'Andecy, Aurélie Joseph, Aashiq Muhamed, Kevin Kuo, Virginia Smith, Yusuke Yamasaki, Takumi Fukami, Kenta Niwa, Iifan Tyou, Hiro Ishii, Rio Yokota, Ragul N, Rintu Kutum, Josep Llados, Ernest Valveny, Antti Honkela, Mario Fritz, Dimosthenis Karatzas,
- Abstract summary: The competition introduced a dataset of real invoice documents, along with associated questions and answers.
The base model is a multi-modal generative language model, and sensitive information could be exposed through either the visual or textual input modality.
Participants proposed elegant solutions to reduce communication costs while maintaining a minimum utility threshold.
- Score: 49.74911193222192
- License:
- Abstract: The Privacy Preserving Federated Learning Document VQA (PFL-DocVQA) competition challenged the community to develop provably private and communication-efficient solutions in a federated setting for a real-life use case: invoice processing. The competition introduced a dataset of real invoice documents, along with associated questions and answers requiring information extraction and reasoning over the document images. Thereby, it brings together researchers and expertise from the document analysis, privacy, and federated learning communities. Participants fine-tuned a pre-trained, state-of-the-art Document Visual Question Answering model provided by the organizers for this new domain, mimicking a typical federated invoice processing setup. The base model is a multi-modal generative language model, and sensitive information could be exposed through either the visual or textual input modality. Participants proposed elegant solutions to reduce communication costs while maintaining a minimum utility threshold in track 1 and to protect all information from each document provider using differential privacy in track 2. The competition served as a new testbed for developing and testing private federated learning methods, simultaneously raising awareness about privacy within the document image analysis and recognition community. Ultimately, the competition analysis provides best practices and recommendations for successfully running privacy-focused federated learning challenges in the future.
Related papers
- Towards Split Learning-based Privacy-Preserving Record Linkage [49.1574468325115]
Split Learning has been introduced to facilitate applications where user data privacy is a requirement.
In this paper, we investigate the potentials of Split Learning for Privacy-Preserving Record Matching.
arXiv Detail & Related papers (2024-09-02T09:17:05Z) - A Multivocal Literature Review on Privacy and Fairness in Federated Learning [1.6124402884077915]
Federated learning presents a way to revolutionize AI applications by eliminating the necessity for data sharing.
Recent research has demonstrated an inherent tension between privacy and fairness.
We argue that the relationship between privacy and fairness has been neglected, posing a critical risk for real-world applications.
arXiv Detail & Related papers (2024-08-16T11:15:52Z) - Pencil: Private and Extensible Collaborative Learning without the Non-Colluding Assumption [24.339382371386876]
Pencil is the first private training framework for collaborative learning that simultaneously offers data privacy, model privacy, and extensibility to multiple data providers.
We introduce several novel cryptographic protocols to realize this design principle and conduct a rigorous security and privacy analysis.
Pencil achieves 10 260x higher throughput and 2 orders of magnitude less communication than prior art.
arXiv Detail & Related papers (2024-03-17T10:26:41Z) - Privacy-Aware Document Visual Question Answering [44.82362488593259]
This work highlights privacy issues in state of the art multi-modal LLM models used for DocVQA.
We propose a large scale DocVQA dataset comprising invoice documents and associated questions and answers.
We demonstrate that non-private models tend to memorise, a behaviour that can lead to exposing private information.
arXiv Detail & Related papers (2023-12-15T06:30:55Z) - SynFacePAD 2023: Competition on Face Presentation Attack Detection Based
on Privacy-aware Synthetic Training Data [51.42380508231581]
The paper presents a summary of the Competition on Face Presentation Attack Detection Based on Privacy-aware Synthetic Training Data (SynFacePAD 2023) held at the 2023 International Joint Conference on Biometrics (IJCB 2023)
The competition aimed to motivate and attract solutions that target detecting face presentation attacks while considering synthetic-based training data motivated by privacy, legal and ethical concerns associated with personal data.
The submitted solutions presented innovations and novel approaches that led to outperforming the considered baseline in the investigated benchmarks.
arXiv Detail & Related papers (2023-11-09T13:02:04Z) - Practical Vertical Federated Learning with Unsupervised Representation
Learning [47.77625754666018]
Federated learning enables multiple parties to collaboratively train a machine learning model without sharing their raw data.
We propose a novel communication-efficient vertical federated learning algorithm named FedOnce, which requires only one-shot communication among parties.
Our privacy-preserving technique significantly outperforms the state-of-the-art approaches under the same privacy budget.
arXiv Detail & Related papers (2022-08-13T08:41:32Z) - Algorithmic Fairness Datasets: the Story so Far [68.45921483094705]
Data-driven algorithms are studied in diverse domains to support critical decisions, directly impacting people's well-being.
A growing community of researchers has been investigating the equity of existing algorithms and proposing novel ones, advancing the understanding of risks and opportunities of automated decision-making for historically disadvantaged populations.
Progress in fair Machine Learning hinges on data, which can be appropriately used only if adequately documented.
Unfortunately, the algorithmic fairness community suffers from a collective data documentation debt caused by a lack of information on specific resources (opacity) and scatteredness of available information (sparsity)
arXiv Detail & Related papers (2022-02-03T17:25:46Z) - Privacy and Trust Redefined in Federated Machine Learning [5.4475482673944455]
We present a privacy-preserving decentralised workflow that facilitates trusted federated learning among participants.
Only entities in possession of Verifiable Credentials issued from the appropriate authorities are able to establish secure, authenticated communication channels.
arXiv Detail & Related papers (2021-03-29T16:47:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.