Training Frozen Feature Pyramid DINOv2 for Eyelid Measurements with Infinite Encoding and Orthogonal Regularization
- URL: http://arxiv.org/abs/2504.00515v1
- Date: Tue, 01 Apr 2025 08:06:08 GMT
- Title: Training Frozen Feature Pyramid DINOv2 for Eyelid Measurements with Infinite Encoding and Orthogonal Regularization
- Authors: Chun-Hung Chen,
- Abstract summary: Accurate measurement of eyelid parameters is critical in oculoplastic diagnostics but remains limited by manual, inconsistent methods.<n>This study evaluates deep learning models: SE-ResNet, EfficientNet, and the vision transformer-based DINOv2 for automating these measurements using smartphone-acquired images.<n>DINOv2 demonstrates superior scalability and robustness, especially under frozen conditions ideal for mobile deployment.
- Score: 0.9065034043031668
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Accurate measurement of eyelid parameters such as Margin Reflex Distances (MRD1, MRD2) and Levator Function (LF) is critical in oculoplastic diagnostics but remains limited by manual, inconsistent methods. This study evaluates deep learning models: SE-ResNet, EfficientNet, and the vision transformer-based DINOv2 for automating these measurements using smartphone-acquired images. We assess performance across frozen and fine-tuned settings, using MSE, MAE, and R2 metrics. DINOv2, pretrained through self-supervised learning, demonstrates superior scalability and robustness, especially under frozen conditions ideal for mobile deployment. Lightweight regressors such as MLP and Deep Ensemble offer high precision with minimal computational overhead. To address class imbalance and improve generalization, we integrate focal loss, orthogonal regularization, and binary encoding strategies. Our results show that DINOv2 combined with these enhancements delivers consistent, accurate predictions across all tasks, making it a strong candidate for real-world, mobile-friendly clinical applications. This work highlights the potential of foundation models in advancing AI-powered ophthalmic care.
Related papers
- Nemotron-Flash: Towards Latency-Optimal Hybrid Small Language Models [97.55009021098554]
This work aims to identify the key determinants of SLMs' real-device latency and offer generalizable principles and methodologies for SLM design and training.<n>We introduce a new family of hybrid SLMs, called Nemotron-Flash, which significantly advances the accuracy-efficiency frontier of state-of-the-art SLMs.
arXiv Detail & Related papers (2025-11-24T08:46:36Z) - Improving Predictive Confidence in Medical Imaging via Online Label Smoothing [0.18472148461613158]
Online Label Smoothing (OLS) is a dynamic approach that adjusts soft labels throughout training based on the model's own prediction patterns.<n>OLS consistently improves both Top-1 and Top-5 classification accuracy compared to standard training methods.
arXiv Detail & Related papers (2025-10-22T20:25:14Z) - PointMAC: Meta-Learned Adaptation for Robust Test-Time Point Cloud Completion [16.852116353523257]
Point cloud completion is essential for robust 3D perception in safety-critical applications such as robotics and augmented reality.<n>Existing models perform static inference and rely heavily on inductive biases learned during training.<n>We propose PointMAC, a meta-learned framework for robust test-time adaptation in point cloud completion.
arXiv Detail & Related papers (2025-10-11T23:13:17Z) - Estimating 2D Keypoints of Surgical Tools Using Vision-Language Models with Low-Rank Adaptation [7.606609210844433]
This paper presents a novel pipeline for 2D keypoint Estima- tion of surgical tools by leveraging Vision Language Models (VLMs) fine- tuned using a low rank adjusting (LoRA) technique.<n>We carefully design prompts to create an instruction-tuning dataset and use them to align visual features with semantic keypoint descriptions.<n> Experimental results show that with only two epochs of fine tuning, the adapted VLM outperforms the baseline models.
arXiv Detail & Related papers (2025-08-28T14:25:32Z) - NM-Hebb: Coupling Local Hebbian Plasticity with Metric Learning for More Accurate and Interpretable CNNs [0.0]
NM-Hebb integrates neuro-inspired local plasticity with distance-aware supervision.<n>Phase 1 extends standard supervised training by jointly optimising a cross-entropy objective.<n>Phase 2 fine-tunes the backbone with a pairwise metric-learning loss.
arXiv Detail & Related papers (2025-08-27T13:53:04Z) - LSM-2: Learning from Incomplete Wearable Sensor Data [65.58595667477505]
This paper introduces the second generation of Large Sensor Model (LSM-2) with Adaptive and Inherited Masking (AIM)<n>AIM learns robust representations directly from incomplete data without requiring explicit imputation.<n>Our LSM-2 with AIM achieves the best performance across a diverse range of tasks, including classification, regression and generative modeling.
arXiv Detail & Related papers (2025-06-05T17:57:11Z) - BRIGHT-VO: Brightness-Guided Hybrid Transformer for Visual Odometry with Multi-modality Refinement Module [11.898515581215708]
Visual odometry (VO) plays a crucial role in autonomous driving, robotic navigation, and other related tasks.<n>We introduce BrightVO, a novel VO model based on Transformer architecture, which performs front-end visual feature extraction.<n>Using pose graph optimization, this module iteratively refines pose estimates to reduce errors and improve both accuracy and robustness.
arXiv Detail & Related papers (2025-01-15T08:50:52Z) - Intent Detection in the Age of LLMs [3.755082744150185]
Intent detection is a critical component of task-oriented dialogue systems (TODS)
Traditional approaches relied on computationally efficient supervised sentence transformer encoder models.
The emergence of generative large language models (LLMs) with intrinsic world knowledge presents new opportunities to address these challenges.
arXiv Detail & Related papers (2024-10-02T15:01:55Z) - Model Inversion Attacks Through Target-Specific Conditional Diffusion Models [54.69008212790426]
Model inversion attacks (MIAs) aim to reconstruct private images from a target classifier's training set, thereby raising privacy concerns in AI applications.
Previous GAN-based MIAs tend to suffer from inferior generative fidelity due to GAN's inherent flaws and biased optimization within latent space.
We propose Diffusion-based Model Inversion (Diff-MI) attacks to alleviate these issues.
arXiv Detail & Related papers (2024-07-16T06:38:49Z) - RLEEGNet: Integrating Brain-Computer Interfaces with Adaptive AI for
Intuitive Responsiveness and High-Accuracy Motor Imagery Classification [0.0]
We introduce a framework that leverages Reinforcement Learning with Deep Q-Networks (DQN) for classification tasks.
We present a preprocessing technique for multiclass motor imagery (MI) classification in a One-Versus-The-Rest (OVR) manner.
The integration of DQN with a 1D-CNN-LSTM architecture optimize the decision-making process in real-time.
arXiv Detail & Related papers (2024-02-09T02:03:13Z) - Meta-Learning Adversarial Bandit Algorithms [55.72892209124227]
We study online meta-learning with bandit feedback.
We learn to tune online mirror descent generalization (OMD) with self-concordant barrier regularizers.
arXiv Detail & Related papers (2023-07-05T13:52:10Z) - Class Anchor Margin Loss for Content-Based Image Retrieval [97.81742911657497]
We propose a novel repeller-attractor loss that falls in the metric learning paradigm, yet directly optimize for the L2 metric without the need of generating pairs.
We evaluate the proposed objective in the context of few-shot and full-set training on the CBIR task, by using both convolutional and transformer architectures.
arXiv Detail & Related papers (2023-06-01T12:53:10Z) - Self-learning locally-optimal hypertuning using maximum entropy, and
comparison of machine learning approaches for estimating fatigue life in
composite materials [0.0]
We develop an ML nearest-neighbors-alike algorithm based on the principle of maximum entropy to predict fatigue damage.
The predictions achieve a good level of accuracy, similar to other ML algorithms.
arXiv Detail & Related papers (2022-10-19T12:20:07Z) - Towards Scale-Aware, Robust, and Generalizable Unsupervised Monocular
Depth Estimation by Integrating IMU Motion Dynamics [74.1720528573331]
Unsupervised monocular depth and ego-motion estimation has drawn extensive research attention in recent years.
We propose DynaDepth, a novel scale-aware framework that integrates information from vision and IMU motion dynamics.
We validate the effectiveness of DynaDepth by conducting extensive experiments and simulations on the KITTI and Make3D datasets.
arXiv Detail & Related papers (2022-07-11T07:50:22Z) - Model of the Weak Reset Process in HfOx Resistive Memory for Deep
Learning Frameworks [0.6745502291821955]
We present a model of the weak RESET process in hafnium oxide RRAM.
We integrate this model within the PyTorch deep learning framework.
We use this tool to train Binarized Neural Networks for the MNIST handwritten digit recognition task.
arXiv Detail & Related papers (2021-07-02T08:50:35Z) - Meta-Learning with Neural Tangent Kernels [58.06951624702086]
We propose the first meta-learning paradigm in the Reproducing Kernel Hilbert Space (RKHS) induced by the meta-model's Neural Tangent Kernel (NTK)
Within this paradigm, we introduce two meta-learning algorithms, which no longer need a sub-optimal iterative inner-loop adaptation as in the MAML framework.
We achieve this goal by 1) replacing the adaptation with a fast-adaptive regularizer in the RKHS; and 2) solving the adaptation analytically based on the NTK theory.
arXiv Detail & Related papers (2021-02-07T20:53:23Z) - Calibrating Deep Neural Networks using Focal Loss [77.92765139898906]
Miscalibration is a mismatch between a model's confidence and its correctness.
We show that focal loss allows us to learn models that are already very well calibrated.
We show that our approach achieves state-of-the-art calibration without compromising on accuracy in almost all cases.
arXiv Detail & Related papers (2020-02-21T17:35:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.