IDNet: A Novel Dataset for Identity Document Analysis and Fraud Detection
- URL: http://arxiv.org/abs/2408.01690v2
- Date: Tue, 3 Sep 2024 22:30:34 GMT
- Title: IDNet: A Novel Dataset for Identity Document Analysis and Fraud Detection
- Authors: Hong Guan, Yancheng Wang, Lulu Xie, Soham Nag, Rajeev Goel, Niranjan Erappa Narayana Swamy, Yingzhen Yang, Chaowei Xiao, Jonathan Prisby, Ross Maciejewski, Jia Zou,
- Abstract summary: IDNet is a benchmark dataset designed to advance privacy-preserving fraud detection efforts.
It comprises 837,060 images of synthetically generated identity documents, totaling approximately 490 gigabytes.
We evaluate the utility and present use cases of the dataset, illustrating how it can aid in training privacy-preserving fraud detection methods.
- Score: 25.980165854663145
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Effective fraud detection and analysis of government-issued identity documents, such as passports, driver's licenses, and identity cards, are essential in thwarting identity theft and bolstering security on online platforms. The training of accurate fraud detection and analysis tools depends on the availability of extensive identity document datasets. However, current publicly available benchmark datasets for identity document analysis, including MIDV-500, MIDV-2020, and FMIDV, fall short in several respects: they offer a limited number of samples, cover insufficient varieties of fraud patterns, and seldom include alterations in critical personal identifying fields like portrait images, limiting their utility in training models capable of detecting realistic frauds while preserving privacy. In response to these shortcomings, our research introduces a new benchmark dataset, IDNet, designed to advance privacy-preserving fraud detection efforts. The IDNet dataset comprises 837,060 images of synthetically generated identity documents, totaling approximately 490 gigabytes, categorized into 20 types from $10$ U.S. states and 10 European countries. We evaluate the utility and present use cases of the dataset, illustrating how it can aid in training privacy-preserving fraud detection methods, facilitating the generation of camera and video capturing of identity documents, and testing schema unification and other identity document management functionalities.
Related papers
- LLM for Barcodes: Generating Diverse Synthetic Data for Identity Documents [2.697503433221448]
We introduce a new approach to synthetic data generation that uses LLMs to create contextually rich and realistic data without relying on predefined field.
Our approach simplifies the process of dataset creation, eliminating the need for extensive domain knowledge.
This scalable, privacy-first solution is a big step forward in advancing machine learning for automated document processing and identity verification.
arXiv Detail & Related papers (2024-11-22T14:21:18Z) - DocXPand-25k: a large and diverse benchmark dataset for identity documents analysis [0.0]
Identity document (ID) image analysis has become essential for many online services, like bank account opening or insurance subscription.
There are only a few available to benchmark ID analysis methods, mainly because of privacy restrictions, security requirements and legal reasons.
We present the DocXPand-25k dataset, which consists of 24,994 richly labeled IDs images.
arXiv Detail & Related papers (2024-07-30T08:55:27Z) - Synthetic dataset of ID and Travel Document [1.9296797946506603]
This paper presents a new synthetic dataset of ID and travel documents, called SIDTD.
The SIDTD dataset is created to help training and evaluating forged ID documents detection systems.
arXiv Detail & Related papers (2024-01-03T18:06:28Z) - Diff-Privacy: Diffusion-based Face Privacy Protection [58.1021066224765]
In this paper, we propose a novel face privacy protection method based on diffusion models, dubbed Diff-Privacy.
Specifically, we train our proposed multi-scale image inversion module (MSI) to obtain a set of SDM format conditional embeddings of the original image.
Based on the conditional embeddings, we design corresponding embedding scheduling strategies and construct different energy functions during the denoising process to achieve anonymization and visual identity information hiding.
arXiv Detail & Related papers (2023-09-11T09:26:07Z) - Synthetic ID Card Image Generation for Improving Presentation Attack
Detection [12.232059909207578]
This work explores three methods for synthetically generating ID card images to increase the amount of data while training fraud-detection networks.
Our results indicate that databases can be supplemented with synthetic images without any loss in performance for the print/scan Presentation Attack Instrument Species (PAIS) and a loss in performance of 1% for the screen capture PAIS.
arXiv Detail & Related papers (2022-10-31T19:07:30Z) - Unsupervised Text Deidentification [101.2219634341714]
We propose an unsupervised deidentification method that masks words that leak personally-identifying information.
Motivated by K-anonymity based privacy, we generate redactions that ensure a minimum reidentification rank.
arXiv Detail & Related papers (2022-10-20T18:54:39Z) - Protecting Celebrities with Identity Consistency Transformer [119.67996461810304]
Identity Consistency Transformer focuses on high-level semantics, specifically identity information, and detecting a suspect face by finding identity inconsistency in inner and outer face regions.
We show that Identity Consistency Transformer exhibits superior generalization ability not only across different datasets but also across various types of image degradation forms found in real-world applications including deepfake videos.
arXiv Detail & Related papers (2022-03-02T18:59:58Z) - MIDV-2020: A Comprehensive Benchmark Dataset for Identity Document
Analysis [48.35030471041193]
MIDV-2020 consists of 1000 video clips, 2000 scanned images, and 1000 photos of 1000 unique mock identity documents.
With 72409 annotated images in total, to the date of publication the proposed dataset is the largest publicly available identity documents dataset.
arXiv Detail & Related papers (2021-07-01T12:14:17Z) - Towards Face Encryption by Generating Adversarial Identity Masks [53.82211571716117]
We propose a targeted identity-protection iterative method (TIP-IM) to generate adversarial identity masks.
TIP-IM provides 95%+ protection success rate against various state-of-the-art face recognition models.
arXiv Detail & Related papers (2020-03-15T12:45:10Z) - Intra-Camera Supervised Person Re-Identification [87.88852321309433]
We propose a novel person re-identification paradigm based on an idea of independent per-camera identity annotation.
This eliminates the most time-consuming and tedious inter-camera identity labelling process.
We formulate a Multi-tAsk mulTi-labEl (MATE) deep learning method for Intra-Camera Supervised (ICS) person re-id.
arXiv Detail & Related papers (2020-02-12T15:26:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.