Related papers: Lightweight Facial Landmark Detection in Thermal Images via Multi-Level Cross-Modal Knowledge Transfer

Lightweight Facial Landmark Detection in Thermal Images via Multi-Level Cross-Modal Knowledge Transfer

URL: http://arxiv.org/abs/2510.11128v2
Date: Fri, 24 Oct 2025 17:14:46 GMT
Title: Lightweight Facial Landmark Detection in Thermal Images via Multi-Level Cross-Modal Knowledge Transfer
Authors: Qiyi Tong, Olivia Nocentini, Marta Lagomarsino, Kuanqi Cai, Marta Lorenzini, Arash Ajoudani,
Abstract summary: Facial Landmark Detection in thermal imagery is critical for applications in challenging lighting conditions.<n>We propose a novel framework that decouples high-fidelity RGB-to-thermal knowledge transfer from model compression.<n> Experiments show that our approach sets a new state-of-the-art on public thermal FLD benchmarks, notably outperforming previous methods.
Score: 13.887803692033073
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Facial Landmark Detection (FLD) in thermal imagery is critical for applications in challenging lighting conditions, but it is hampered by the lack of rich visual cues. Conventional cross-modal solutions, like feature fusion or image translation from RGB data, are often computationally expensive or introduce structural artifacts, limiting their practical deployment. To address this, we propose Multi-Level Cross-Modal Knowledge Distillation (MLCM-KD), a novel framework that decouples high-fidelity RGB-to-thermal knowledge transfer from model compression to create both accurate and efficient thermal FLD models. A central challenge during knowledge transfer is the profound modality gap between RGB and thermal data, where traditional unidirectional distillation fails to enforce semantic consistency across disparate feature spaces. To overcome this, we introduce Dual-Injected Knowledge Distillation (DIKD), a bidirectional mechanism designed specifically for this task. DIKD establishes a connection between modalities: it not only guides the thermal student with rich RGB features but also validates the student's learned representations by feeding them back into the frozen teacher's prediction head. This closed-loop supervision forces the student to learn modality-invariant features that are semantically aligned with the teacher, ensuring a robust and profound knowledge transfer. Experiments show that our approach sets a new state-of-the-art on public thermal FLD benchmarks, notably outperforming previous methods while drastically reducing computational overhead.

Related papers

Generalizable Knowledge Distillation from Vision Foundation Models for Semantic Segmentation [73.32435804067883]
Generalizable Knowledge Distillation (GKD) is a multi-stage framework that explicitly enhances generalization.<n>Experiments on five domain generalization benchmarks demonstrate that GKD consistently outperforms existing KD methods.
arXiv Detail & Related papers (2026-03-03T03:18:12Z)
Contrast-Guided Cross-Modal Distillation for Thermal Object Detection [1.8477401359673709]
Low contrast and weak high-frequency cues lead to duplicate, overlapping boxes, missed small objects, and class confusion.<n>We introduce training-only objectives that sharpen instance-level decision boundaries by pulling together features of the same class.<n>In experiments, our method outperformed prior approaches and achieved state-of-the-art performance.
arXiv Detail & Related papers (2025-11-03T10:38:01Z)
Boosting Cross-spectral Unsupervised Domain Adaptation for Thermal Semantic Segmentation [2.034732821736745]
In autonomous driving, thermal image semantic segmentation has emerged as a critical research area.<n>In this paper, we present a comprehensive study on cross-spectral UDA for thermal image semantic segmentation.<n>We introduce a novel self-supervised loss designed to enhance the performance of the thermal segmentation model in nighttime scenarios.
arXiv Detail & Related papers (2025-05-11T11:45:44Z)
Human Activity Recognition using RGB-Event based Sensors: A Multi-modal Heat Conduction Model and A Benchmark Dataset [65.76480665062363]
Human Activity Recognition primarily relied on traditional RGB cameras to achieve high-performance activity recognition.<n>Challenges in real-world scenarios, such as insufficient lighting and rapid movements, inevitably degrade the performance of RGB cameras.<n>In this work, we rethink human activity recognition by combining the RGB and event cameras.
arXiv Detail & Related papers (2025-04-08T09:14:24Z)
Breaking Modality Gap in RGBT Tracking: Coupled Knowledge Distillation [21.161244379091833]
Modality gap between RGB and thermal infrared (TIR) images is a crucial issue but often overlooked in existing RGBT tracking methods. We propose a novel Coupled Knowledge Distillation framework called CKD, which pursues common styles of different modalities to break modality gap. In particular, we introduce two student networks and employ the style distillation loss to make their style features consistent.
arXiv Detail & Related papers (2024-10-15T13:22:58Z)
From Two-Stream to One-Stream: Efficient RGB-T Tracking via Mutual Prompt Learning and Knowledge Distillation [9.423279246172923]
Inspired by visual prompt learning, we designed a novel two-stream RGB-T tracking architecture based on cross-modal mutual prompt learning. Our designed teacher model achieved the highest precision rate, while the student model, with comparable precision rate to the teacher model, realized an inference speed more than three times faster than the teacher model.
arXiv Detail & Related papers (2024-03-25T14:57:29Z)
Residual Spatial Fusion Network for RGB-Thermal Semantic Segmentation [19.41334573257174]
Traditional methods mostly use RGB images which are heavily affected by lighting conditions, eg, darkness. Recent studies show thermal images are robust to the night scenario as a compensating modality for segmentation. This work proposes a Residual Spatial Fusion Network (RSFNet) for RGB-T semantic segmentation.
arXiv Detail & Related papers (2023-06-17T14:28:08Z)
EmbedDistill: A Geometric Knowledge Distillation for Information Retrieval [83.79667141681418]
Large neural models (such as Transformers) achieve state-of-the-art performance for information retrieval (IR) We propose a novel distillation approach that leverages the relative geometry among queries and documents learned by the large teacher model. We show that our approach successfully distills from both dual-encoder (DE) and cross-encoder (CE) teacher models to 1/10th size asymmetric students that can retain 95-97% of the teacher performance.
arXiv Detail & Related papers (2023-01-27T22:04:37Z)
Does Thermal Really Always Matter for RGB-T Salient Object Detection? [153.17156598262656]
This paper proposes a network named TNet to solve the RGB-T salient object detection (SOD) task. In this paper, we introduce a global illumination estimation module to predict the global illuminance score of the image. On the other hand, we introduce a two-stage localization and complementation module in the decoding phase to transfer object localization cue and internal integrity cue in thermal features to the RGB modality.
arXiv Detail & Related papers (2022-10-09T13:50:12Z)
Unsupervised Spike Depth Estimation via Cross-modality Cross-domain Knowledge Transfer [53.413305467674434]
We introduce open-source RGB data to support spike depth estimation, leveraging its annotations and spatial information. We propose a cross-modality cross-domain (BiCross) framework to realize unsupervised spike depth estimation. Our method achieves state-of-the-art (SOTA) performances, compared with RGB-oriented unsupervised depth estimation methods.
arXiv Detail & Related papers (2022-08-26T09:35:20Z)
Wasserstein Contrastive Representation Distillation [114.24609306495456]
We propose Wasserstein Contrastive Representation Distillation (WCoRD), which leverages both primal and dual forms of Wasserstein distance for knowledge distillation. The dual form is used for global knowledge transfer, yielding a contrastive learning objective that maximizes the lower bound of mutual information between the teacher and the student networks. Experiments demonstrate that the proposed WCoRD method outperforms state-of-the-art approaches on privileged information distillation, model compression and cross-modal transfer.
arXiv Detail & Related papers (2020-12-15T23:43:28Z)
Learning Selective Mutual Attention and Contrast for RGB-D Saliency Detection [145.4919781325014]
How to effectively fuse cross-modal information is the key problem for RGB-D salient object detection. Many models use the feature fusion strategy but are limited by the low-order point-to-point fusion methods. We propose a novel mutual attention model by fusing attention and contexts from different modalities.
arXiv Detail & Related papers (2020-10-12T08:50:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.