DiffVein: A Unified Diffusion Network for Finger Vein Segmentation and
Authentication
- URL: http://arxiv.org/abs/2402.02060v1
- Date: Sat, 3 Feb 2024 06:49:42 GMT
- Title: DiffVein: A Unified Diffusion Network for Finger Vein Segmentation and
Authentication
- Authors: Yanjun Liu, Wenming Yang and Qingmin Liao
- Abstract summary: We introduce DiffVein, a unified diffusion model-based framework which simultaneously addresses vein segmentation and authentication tasks.
For better feature interaction between these two branches, we introduce two specialized modules.
In this way, our framework allows for a dynamic interplay between diffusion and segmentation embeddings.
- Score: 50.017055360261665
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Finger vein authentication, recognized for its high security and specificity,
has become a focal point in biometric research. Traditional methods
predominantly concentrate on vein feature extraction for discriminative
modeling, with a limited exploration of generative approaches. Suffering from
verification failure, existing methods often fail to obtain authentic vein
patterns by segmentation. To fill this gap, we introduce DiffVein, a unified
diffusion model-based framework which simultaneously addresses vein
segmentation and authentication tasks. DiffVein is composed of two dedicated
branches: one for segmentation and the other for denoising. For better feature
interaction between these two branches, we introduce two specialized modules to
improve their collective performance. The first, a mask condition module,
incorporates the semantic information of vein patterns from the segmentation
branch into the denoising process. Additionally, we also propose a Semantic
Difference Transformer (SD-Former), which employs Fourier-space self-attention
and cross-attention modules to extract category embedding before feeding it to
the segmentation task. In this way, our framework allows for a dynamic
interplay between diffusion and segmentation embeddings, thus vein segmentation
and authentication tasks can inform and enhance each other in the joint
training. To further optimize our model, we introduce a Fourier-space
Structural Similarity (FourierSIM) loss function, which is tailored to improve
the denoising network's learning efficacy. Extensive experiments on the USM and
THU-MVFV3V datasets substantiates DiffVein's superior performance, setting new
benchmarks in both vein segmentation and authentication tasks.
Related papers
- Coarse-to-Fine Proposal Refinement Framework for Audio Temporal Forgery Detection and Localization [60.899082019130766]
We introduce a frame-level detection network (FDN) and a proposal refinement network (PRN) for audio temporal forgery detection and localization.
FDN aims to mine informative inconsistency cues between real and fake frames to obtain discriminative features that are beneficial for roughly indicating forgery regions.
PRN is responsible for predicting confidence scores and regression offsets to refine the coarse-grained proposals derived from the FDN.
arXiv Detail & Related papers (2024-07-23T15:07:52Z) - HiDiff: Hybrid Diffusion Framework for Medical Image Segmentation [16.906987804797975]
HiDiff is a hybrid diffusion framework for medical image segmentation.
It can synergize the strengths of existing discriminative segmentation models and new generative diffusion models.
It excels at segmenting small objects and generalizing to new datasets.
arXiv Detail & Related papers (2024-07-03T23:59:09Z) - Denoising Diffusion Semantic Segmentation with Mask Prior Modeling [61.73352242029671]
We propose to ameliorate the semantic segmentation quality of existing discriminative approaches with a mask prior modeled by a denoising diffusion generative model.
We evaluate the proposed prior modeling with several off-the-shelf segmentors, and our experimental results on ADE20K and Cityscapes demonstrate that our approach could achieve competitively quantitative performance.
arXiv Detail & Related papers (2023-06-02T17:47:01Z) - Object Segmentation by Mining Cross-Modal Semantics [68.88086621181628]
We propose a novel approach by mining the Cross-Modal Semantics to guide the fusion and decoding of multimodal features.
Specifically, we propose a novel network, termed XMSNet, consisting of (1) all-round attentive fusion (AF), (2) coarse-to-fine decoder (CFD), and (3) cross-layer self-supervision.
arXiv Detail & Related papers (2023-05-17T14:30:11Z) - DOAD: Decoupled One Stage Action Detection Network [77.14883592642782]
Localizing people and recognizing their actions from videos is a challenging task towards high-level video understanding.
Existing methods are mostly two-stage based, with one stage for person bounding box generation and the other stage for action recognition.
We present a decoupled one-stage network dubbed DOAD, to improve the efficiency for-temporal action detection.
arXiv Detail & Related papers (2023-04-01T08:06:43Z) - Part-guided Relational Transformers for Fine-grained Visual Recognition [59.20531172172135]
We propose a framework to learn the discriminative part features and explore correlations with a feature transformation module.
Our proposed approach does not rely on additional part branches and reaches state-the-of-art performance on 3-of-the-level object recognition.
arXiv Detail & Related papers (2022-12-28T03:45:56Z) - GaitStrip: Gait Recognition via Effective Strip-based Feature
Representations and Multi-Level Framework [34.397404430838286]
We present a strip-based multi-level gait recognition network, named GaitStrip, to extract comprehensive gait information at different levels.
To be specific, our high-level branch explores the context of gait sequences and our low-level one focuses on detailed posture changes.
Our GaitStrip achieves state-of-the-art performance in both normal walking and complex conditions.
arXiv Detail & Related papers (2022-03-08T09:49:48Z) - Label-Efficient Semantic Segmentation with Diffusion Models [27.01899943738203]
We demonstrate that diffusion models can also serve as an instrument for semantic segmentation.
In particular, for several pretrained diffusion models, we investigate the intermediate activations from the networks that perform the Markov step of the reverse diffusion process.
We show that these activations effectively capture the semantic information from an input image and appear to be excellent pixel-level representations for the segmentation problem.
arXiv Detail & Related papers (2021-12-06T15:55:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.