Related papers: Towards an Efficient Semantic Segmentation Method of ID Cards for Verification Systems

Towards an Efficient Semantic Segmentation Method of ID Cards for Verification Systems

URL: http://arxiv.org/abs/2111.12764v1
Date: Wed, 24 Nov 2021 19:54:17 GMT
Title: Towards an Efficient Semantic Segmentation Method of ID Cards for Verification Systems
Authors: Rodrigo Lara, Andres Valenzuela, Daniel Schulz, Juan Tapia, and Christoph Busch
Abstract summary: This work proposes a method for removing the background using semantic segmentation of ID Cards. Two Deep Learning approaches were explored, based on MobileUNet and DenseNet10. The proposed methods are lightweight enough to be used in real-time operation on mobile devices.
Score: 8.820032281861227
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Removing the background in ID Card images is a real challenge for remote verification systems because many of the re-digitalised images present cluttered backgrounds, poor illumination conditions, distortion and occlusions. The background in ID Card images confuses the classifiers and the text extraction. Due to the lack of available images for research, this field represents an open problem in computer vision today. This work proposes a method for removing the background using semantic segmentation of ID Cards. In the end, images captured in the wild from the real operation, using a manually labelled dataset consisting of 45,007 images, with five types of ID Cards from three countries (Chile, Argentina and Mexico), including typical presentation attack scenarios, were used. This method can help to improve the following stages in a regular identity verification or document tampering detection system. Two Deep Learning approaches were explored, based on MobileUNet and DenseNet10. The best results were obtained using MobileUNet, with 6.5 million parameters. A Chilean ID Card's mean Intersection Over Union (IoU) was 0.9926 on a private test dataset of 4,988 images. The best results for the fused multi-country dataset of ID Card images from Chile, Argentina and Mexico reached an IoU of 0.9911. The proposed methods are lightweight enough to be used in real-time operation on mobile devices.

Related papers

FantasyID: A dataset for detecting digital manipulations of ID-documents [23.7548607375651]
We propose a novel dataset, FantasyID, which mimics real-world IDs but without tampering with legal documents.<n>FantasyID contains ID cards with diverse design styles, languages, and faces of real people.<n>We have emulated digital forgery/injection attacks that could be performed by a malicious actor to tamper the IDs using the existing generative tools.
arXiv Detail & Related papers (2025-07-28T13:20:18Z)
Image Demoiréing Using Dual Camera Fusion on Mobile Phones [58.39212652291496]
We propose to utilize Dual Camera fusion for Image Demoir'eing (DCID), ie, using the ultra-wide-angle (UW) image to assist the moir'e removal of wide-angle (W) image.<n>In particular, we propose an efficient DCID method, where a lightweight UW image encoder is integrated into an existing demoir'eing network.
arXiv Detail & Related papers (2025-06-10T02:20:37Z)
AddressCLIP: Empowering Vision-Language Models for City-wide Image Address Localization [57.34659640776723]
We propose an end-to-end framework named AddressCLIP to solve the problem with more semantics. We have built three datasets from Pittsburgh and San Francisco on different scales specifically for the IAL problem.
arXiv Detail & Related papers (2024-07-11T03:18:53Z)
CricaVPR: Cross-image Correlation-aware Representation Learning for Visual Place Recognition [73.51329037954866]
We propose a robust global representation method with cross-image correlation awareness for visual place recognition. Our method uses the attention mechanism to correlate multiple images within a batch. Our method outperforms state-of-the-art methods by a large margin with significantly less training time.
arXiv Detail & Related papers (2024-02-29T15:05:11Z)
CLIPC8: Face liveness detection algorithm based on image-text pairs and contrastive learning [3.90443799528247]
We propose a face liveness detection method based on image-text pairs and contrastive learning. The proposed method is capable of effectively detecting specific liveness attack behaviors in certain scenarios. It is also effective in detecting traditional liveness attack methods, such as printing photo attacks and screen remake attacks.
arXiv Detail & Related papers (2023-11-29T12:21:42Z)
T-MARS: Improving Visual Representations by Circumventing Text Feature Learning [99.3682210827572]
We propose a new data filtering approach motivated by our observation that nearly 40% of LAION's images contain text that overlaps significantly with the caption. Our simple and scalable approach, T-MARS, filters out only those pairs where the text dominates the remaining visual features.
arXiv Detail & Related papers (2023-07-06T16:59:52Z)
Zero-Shot In-Distribution Detection in Multi-Object Settings Using Vision-Language Foundation Models [37.36999826208225]
In this paper, we propose a novel problem setting called zero-shot in-distribution (ID) detection. We identify images containing ID objects as ID images (even if they contain OOD objects) and images lacking ID objects as OOD images without any training. We present a simple and effective approach, Global-Local Concept Matching, based on both global and local visual-text alignments of CLIP features.
arXiv Detail & Related papers (2023-04-10T11:35:42Z)
Improving Presentation Attack Detection for ID Cards on Remote Verification Systems [2.0305676256390934]
This paper presents an updated two-stage, end-to-end Presentation Attack Detection method for remote biometric verification systems of ID cards. Proposal was developed using a database consisting of 190.000 real case Chilean ID card images with the support of a third-party company. Our method is trained on two convolutional neural networks separately, reaching BPCERtextsubscript100 scores on ID cards attacks of 1.69% and 2.36% respectively.
arXiv Detail & Related papers (2023-01-23T16:59:26Z)
Synthetic ID Card Image Generation for Improving Presentation Attack Detection [12.232059909207578]
This work explores three methods for synthetically generating ID card images to increase the amount of data while training fraud-detection networks. Our results indicate that databases can be supplemented with synthetic images without any loss in performance for the print/scan Presentation Attack Instrument Species (PAIS) and a loss in performance of 1% for the screen capture PAIS.
arXiv Detail & Related papers (2022-10-31T19:07:30Z)
ObjectFormer for Image Manipulation Detection and Localization [118.89882740099137]
We propose ObjectFormer to detect and localize image manipulations. We extract high-frequency features of the images and combine them with RGB features as multimodal patch embeddings. We conduct extensive experiments on various datasets and the results verify the effectiveness of the proposed method.
arXiv Detail & Related papers (2022-03-28T12:27:34Z)
SISL:Self-Supervised Image Signature Learning for Splicing Detection and Localization [11.437760125881049]
We propose self-supervised approach for training splicing detection/localization models from frequency transforms of images. Our proposed model can yield similar or better performances on standard datasets without relying on labels or metadata.
arXiv Detail & Related papers (2022-03-15T12:26:29Z)
Camera-aware Proxies for Unsupervised Person Re-Identification [60.26031011794513]
This paper tackles the purely unsupervised person re-identification (Re-ID) problem that requires no annotations. We propose to split each single cluster into multiple proxies and each proxy represents the instances coming from the same camera. Based on the camera-aware proxies, we design both intra- and inter-camera contrastive learning components for our Re-ID model.
arXiv Detail & Related papers (2020-12-19T12:37:04Z)
Distilling Localization for Self-Supervised Representation Learning [82.79808902674282]
Contrastive learning has revolutionized unsupervised representation learning. Current contrastive models are ineffective at localizing the foreground object. We propose a data-driven approach for learning in variance to backgrounds.
arXiv Detail & Related papers (2020-04-14T16:29:42Z)
A Fast Fully Octave Convolutional Neural Network for Document Image Segmentation [1.8426817621478804]
We investigate a method based on U-Net to detect the document edges and text regions in ID images. We propose a model optimization based on Octave Convolutions to qualify the method to situations where storage, processing, and time resources are limited. Our results showed that the proposed models are efficient to document segmentation tasks and portable.
arXiv Detail & Related papers (2020-04-03T00:57:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.