AnyThermal: Towards Learning Universal Representations for Thermal Perception
- URL: http://arxiv.org/abs/2602.06203v1
- Date: Thu, 05 Feb 2026 21:27:26 GMT
- Title: AnyThermal: Towards Learning Universal Representations for Thermal Perception
- Authors: Parv Maheshwari, Jay Karhade, Yogesh Chawla, Isaiah Adu, Florian Heisen, Andrew Porco, Andrew Jong, Yifei Liu, Santosh Pitla, Sebastian Scherer, Wenshan Wang,
- Abstract summary: We present AnyThermal, a thermal backbone that captures robust task-agnostic thermal features suitable for a variety of tasks.<n>Our key insight is to distill the feature representations from visual foundation models into a thermal encoder using thermal data from multiple environments.<n>We demonstrate the efficacy of AnyThermal and TartanRGBT, achieving state-of-the-art results with improvements of up to 36% across diverse environments.
- Score: 12.226040201382231
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present AnyThermal, a thermal backbone that captures robust task-agnostic thermal features suitable for a variety of tasks such as cross-modal place recognition, thermal segmentation, and monocular depth estimation using thermal images. Existing thermal backbones that follow task-specific training from small-scale data result in utility limited to a specific environment and task. Unlike prior methods, AnyThermal can be used for a wide range of environments (indoor, aerial, off-road, urban) and tasks, all without task-specific training. Our key insight is to distill the feature representations from visual foundation models such as DINOv2 into a thermal encoder using thermal data from these multiple environments. To bridge the diversity gap of the existing RGB-Thermal datasets, we introduce the TartanRGBT platform, the first open-source data collection platform with synced RGB-Thermal image acquisition. We use this payload to collect the TartanRGBT dataset - a diverse and balanced dataset collected in 4 environments. We demonstrate the efficacy of AnyThermal and TartanRGBT, achieving state-of-the-art results with improvements of up to 36% across diverse environments and downstream tasks on existing datasets.
Related papers
- ThermalGen: Style-Disentangled Flow-Based Generative Models for RGB-to-Thermal Image Translation [14.108149959967095]
Paired RGB-thermal data is crucial for visual-thermal sensor fusion and cross-modality tasks.<n>To overcome this challenge, RGB-to-Thermal (RGB-T) image translation has emerged as a promising solution.<n>We propose ThermalGen, an adaptive flow-based generative model for RGB-T image translation.
arXiv Detail & Related papers (2025-09-29T14:55:51Z) - Taming Diffusion for Dataset Distillation with High Representativeness [49.3818035378669]
D3HR is a novel diffusion-based framework to generate distilled datasets with high representativeness.<n>Our experiments demonstrate that D3HR can achieve higher accuracy across different model architectures.
arXiv Detail & Related papers (2025-05-23T22:05:59Z) - EarthView: A Large Scale Remote Sensing Dataset for Self-Supervision [72.84868704100595]
This paper presents a dataset specifically designed for self-supervision on remote sensing data, intended to enhance deep learning applications on Earth monitoring tasks.<n>The dataset spans 15 tera pixels of global remote-sensing data, combining imagery from a diverse range of sources, including NEON, Sentinel, and a novel release of 1m spatial resolution data from Satellogic.<n>Accompanying the dataset is EarthMAE, a tailored Masked Autoencoder developed to tackle the distinct challenges of remote sensing data.
arXiv Detail & Related papers (2025-01-14T13:42:22Z) - T-FAKE: Synthesizing Thermal Images for Facial Landmarking [8.20594611891252]
We introduce the T-FAKE dataset, a large-scale synthetic thermal dataset with sparse and dense landmarks.<n>We propose a novel RGB2Thermal loss function, which enables the domain-adaptive transfer of RGB faces to thermal style.<n>Our models show excellent performance with both sparse 70-point landmarks and dense 478-point landmark annotations.
arXiv Detail & Related papers (2024-08-27T15:07:58Z) - LWIRPOSE: A novel LWIR Thermal Image Dataset and Benchmark [9.679771580702258]
This dataset comprises over 2,400 high-quality LWIR (thermal) images.
Each image is meticulously annotated with 2D human poses, offering a valuable resource for researchers and practitioners.
We benchmark state-of-the-art pose estimation methods on the dataset to showcase its potential.
arXiv Detail & Related papers (2024-04-16T01:49:35Z) - Caltech Aerial RGB-Thermal Dataset in the Wild [14.699908177967181]
We present the first publicly-available RGB-thermal dataset designed for aerial robotics operating in natural environments.
Our dataset captures a variety of terrain across the United States, including rivers, lakes, coastlines, deserts, and forests.
We provide semantic segmentation annotations for 10 classes commonly encountered in natural settings.
arXiv Detail & Related papers (2024-03-13T23:31:04Z) - Rethinking Transformers Pre-training for Multi-Spectral Satellite
Imagery [78.43828998065071]
Recent advances in unsupervised learning have demonstrated the ability of large vision models to achieve promising results on downstream tasks.
Such pre-training techniques have also been explored recently in the remote sensing domain due to the availability of large amount of unlabelled data.
In this paper, we re-visit transformers pre-training and leverage multi-scale information that is effectively utilized with multiple modalities.
arXiv Detail & Related papers (2024-03-08T16:18:04Z) - Unlocking the Use of Raw Multispectral Earth Observation Imagery for Onboard Artificial Intelligence [3.3810628880631226]
This work presents a novel methodology to automate the creation of datasets for the detection of target events.
The presented approach first processes the raw data by applying a pipeline consisting of spatial band registration and georeferencing.
It detects the target events by leveraging event-specific state-of-the-art algorithms on the Level-1C products.
We apply the proposed methodology to realize THRawS (Thermal Hotspots in Raw Sentinel-2 data), the first dataset of Sentinel-2 raw data containing warm thermal hotspots.
arXiv Detail & Related papers (2023-05-12T09:54:21Z) - Does Thermal Really Always Matter for RGB-T Salient Object Detection? [153.17156598262656]
This paper proposes a network named TNet to solve the RGB-T salient object detection (SOD) task.
In this paper, we introduce a global illumination estimation module to predict the global illuminance score of the image.
On the other hand, we introduce a two-stage localization and complementation module in the decoding phase to transfer object localization cue and internal integrity cue in thermal features to the RGB modality.
arXiv Detail & Related papers (2022-10-09T13:50:12Z) - Visible-Thermal UAV Tracking: A Large-Scale Benchmark and New Baseline [80.13652104204691]
In this paper, we construct a large-scale benchmark with high diversity for visible-thermal UAV tracking (VTUAV)
We provide a coarse-to-fine attribute annotation, where frame-level attributes are provided to exploit the potential of challenge-specific trackers.
In addition, we design a new RGB-T baseline, named Hierarchical Multi-modal Fusion Tracker (HMFT), which fuses RGB-T data in various levels.
arXiv Detail & Related papers (2022-04-08T15:22:33Z) - Hyperspectral Image Super-Resolution with Spectral Mixup and
Heterogeneous Datasets [99.92564298432387]
This work studies Hyperspectral image (HSI) super-resolution (SR)
HSI SR is characterized by high-dimensional data and a limited amount of training examples.
This exacerbates the undesirable behaviors of neural networks such as memorization and sensitivity to out-of-distribution samples.
arXiv Detail & Related papers (2021-01-19T12:19:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.