Related papers: PAL-Net: A Point-Wise CNN with Patch-Attention for 3D Facial Landmark Localization

PAL-Net: A Point-Wise CNN with Patch-Attention for 3D Facial Landmark Localization

URL: http://arxiv.org/abs/2510.00910v1
Date: Wed, 01 Oct 2025 13:52:35 GMT
Title: PAL-Net: A Point-Wise CNN with Patch-Attention for 3D Facial Landmark Localization
Authors: Ali Shadman Yazdi, Annalisa Cappella, Benedetta Baldini, Riccardo Solazzo, Gianluca Tartaglia, Chiarella Sforza, Giuseppe Baselli,
Abstract summary: Manual annotation of anatomical landmarks on 3D facial scans is a time-consuming and expertise-dependent task.<n>This study presents a fully automated deep learning pipeline (PALNet) for localizing 50 anatomical landmarks on stereophotogrammetry facial models.
Score: 0.4637385034504733
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Manual annotation of anatomical landmarks on 3D facial scans is a time-consuming and expertise-dependent task, yet it remains critical for clinical assessments, morphometric analysis, and craniofacial research. While several deep learning methods have been proposed for facial landmark localization, most focus on pseudo-landmarks or require complex input representations, limiting their clinical applicability. This study presents a fully automated deep learning pipeline (PAL-Net) for localizing 50 anatomical landmarks on stereo-photogrammetry facial models. The method combines coarse alignment, region-of-interest filtering, and an initial approximation of landmarks with a patch-based pointwise CNN enhanced by attention mechanisms. Trained and evaluated on 214 annotated scans from healthy adults, PAL-Net achieved a mean localization error of 3.686 mm and preserves relevant anatomical distances with a 2.822 mm average error, comparable to intra-observer variability. To assess generalization, the model was further evaluated on 700 subjects from the FaceScape dataset, achieving a point-wise error of 0.41\,mm and a distance-wise error of 0.38\,mm. Compared to existing methods, PAL-Net offers a favorable trade-off between accuracy and computational cost. While performance degrades in regions with poor mesh quality (e.g., ears, hairline), the method demonstrates consistent accuracy across most anatomical regions. PAL-Net generalizes effectively across datasets and facial regions, outperforming existing methods in both point-wise and structural evaluations. It provides a lightweight, scalable solution for high-throughput 3D anthropometric analysis, with potential to support clinical workflows and reduce reliance on manual annotation. Source code can be found at https://github.com/Ali5hadman/PAL-Net-A-Point-Wise-CNN-with-Patch-Attention

Related papers

3DViT-GAT: A Unified Atlas-Based 3D Vision Transformer and Graph Learning Framework for Major Depressive Disorder Detection Using Structural MRI Data [0.0]
Major depressive disorder (MDD) is a prevalent mental health condition that negatively impacts both individual well-being and global public health.<n>This paper develops a unified pipeline that utilizes Vision Transformers (ViTs) for extracting 3D region embeddings from sMRI data and Graph Neural Network (GNN) for classification.
arXiv Detail & Related papers (2025-09-15T17:10:39Z)
Adapting Visual-Language Models for Generalizable Anomaly Detection in Medical Images [68.42215385041114]
This paper introduces a novel lightweight multi-level adaptation and comparison framework to repurpose the CLIP model for medical anomaly detection. Our approach integrates multiple residual adapters into the pre-trained visual encoder, enabling a stepwise enhancement of visual features across different levels. Our experiments on medical anomaly detection benchmarks demonstrate that our method significantly surpasses current state-of-the-art models.
arXiv Detail & Related papers (2024-03-19T09:28:19Z)
White Matter Tracts are Point Clouds: Neuropsychological Score Prediction and Critical Region Localization via Geometric Deep Learning [68.5548609642999]
We propose a deep-learning-based framework for neuropsychological score prediction using white matter tract data. We represent the arcuate fasciculus (AF) as a point cloud with microstructure measurements at each point. We improve prediction performance with the proposed Paired-Siamese Loss that utilizes information about differences between continuous neuropsychological scores.
arXiv Detail & Related papers (2022-07-06T02:03:28Z)
KTN: Knowledge Transfer Network for Learning Multi-person 2D-3D Correspondences [77.56222946832237]
We present a novel framework to detect the densepose of multiple people in an image. The proposed method, which we refer to Knowledge Transfer Network (KTN), tackles two main problems. It simultaneously maintains feature resolution and suppresses background pixels, and this strategy results in substantial increase in accuracy.
arXiv Detail & Related papers (2022-06-21T03:11:37Z)
3D unsupervised anomaly detection and localization through virtual multi-view projection and reconstruction: Clinical validation on low-dose chest computed tomography [2.2302915692528367]
We propose a method based on a deep neural network for computer-aided diagnosis called virtual multi-view projection and reconstruction. The proposed method improves the patient-level anomaly detection by 10% compared with a gold standard based on supervised learning. It localizes the anomaly region with 93% accuracy, demonstrating its high performance.
arXiv Detail & Related papers (2022-06-18T13:22:00Z)
Localized Perturbations For Weakly-Supervised Segmentation of Glioma Brain Tumours [0.5801621787540266]
This work proposes the use of localized perturbations as a weakly-supervised solution to extract segmentation masks of brain tumours from a pretrained 3D classification model. We also propose a novel optimal perturbation method that exploits 3D superpixels to find the most relevant area for a given classification using a U-net architecture.
arXiv Detail & Related papers (2021-11-29T21:01:20Z)
Structure-Aware Long Short-Term Memory Network for 3D Cephalometric Landmark Detection [37.031819721889676]
We propose a novel Structure-Aware Long Short-Term Memory framework (SA-LSTM) for efficient and accurate 3D landmark detection. SA-LSTM first locates the coarse landmarks via heatmap regression on a down-sampled CBCT volume. It then progressively refines landmarks by attentive offset regression using high-resolution cropped patches. Experiments show that our method significantly outperforms state-of-the-art methods in terms of efficiency and accuracy.
arXiv Detail & Related papers (2021-07-21T06:35:52Z)
Prediction of progressive lens performance from neural network simulations [62.997667081978825]
The purpose of this study is to present a framework to predict visual acuity (VA) based on a convolutional neural network (CNN) The proposed holistic simulation tool was shown to act as an accurate model for subjective visual performance.
arXiv Detail & Related papers (2021-03-19T14:51:02Z)
Siamese Network Features for Endoscopy Image and Video Localization [0.0]
Localizing frames provide valuable information about anomaly location. In this study, we present a combination of meta-learning and deep learning for localizing both endoscopy images and video.
arXiv Detail & Related papers (2021-03-15T16:24:30Z)
Automated 3D cephalometric landmark identification using computerized tomography [1.4349468613117398]
Identification of 3D cephalometric landmarks that serve as proxy to the shape of human skull is the fundamental step in cephalometric analysis. Recently, automatic landmarking of 2D cephalograms using deep learning (DL) has achieved great success, but 3D landmarking for more than 80 landmarks has not yet reached a satisfactory level. This paper presents a semi-supervised DL method for 3D landmarking that takes advantage of anonymized landmark dataset with paired CT data being removed.
arXiv Detail & Related papers (2020-12-16T07:29:32Z)
Collaborative Boundary-aware Context Encoding Networks for Error Map Prediction [65.44752447868626]
We propose collaborative boundaryaware context encoding networks called AEP-Net for error prediction task. Specifically, we propose a collaborative feature transformation branch for better feature fusion between images and masks, and precise localization of error regions. The AEP-Net achieves an average DSC of 0.8358, 0.8164 for error prediction task, and shows a high Pearson correlation coefficient of 0.9873.
arXiv Detail & Related papers (2020-06-25T12:42:01Z)
Structured Landmark Detection via Topology-Adapting Deep Graph Learning [75.20602712947016]
We present a new topology-adapting deep graph learning approach for accurate anatomical facial and medical landmark detection. The proposed method constructs graph signals leveraging both local image features and global shape features. Experiments are conducted on three public facial image datasets (WFLW, 300W, and COFW-68) as well as three real-world X-ray medical datasets (Cephalometric (public), Hand and Pelvis)
arXiv Detail & Related papers (2020-04-17T11:55:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.