Related papers: FoMo-Bench: a multi-modal, multi-scale and multi-task Forest Monitoring Benchmark for remote sensing foundation models

FoMo-Bench: a multi-modal, multi-scale and multi-task Forest Monitoring Benchmark for remote sensing foundation models

URL: http://arxiv.org/abs/2312.10114v2
Date: Wed, 27 Mar 2024 09:00:54 GMT
Title: FoMo-Bench: a multi-modal, multi-scale and multi-task Forest Monitoring Benchmark for remote sensing foundation models
Authors: Nikolaos Ioannis Bountos, Arthur Ouaknine, David Rolnick,
Abstract summary: We present the first unified Forest Monitoring Benchmark (FoMo-Bench) FoMo-Bench consists of 15 diverse datasets encompassing satellite, aerial, and inventory data. To further enhance the diversity of tasks and geographies represented in FoMo-Bench, we introduce a novel global dataset, TalloS.
Score: 24.141443217910986
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Forests are an essential part of Earth's ecosystems and natural systems, as well as providing services on which humanity depends, yet they are rapidly changing as a result of land use decisions and climate change. Understanding and mitigating negative effects requires parsing data on forests at global scale from a broad array of sensory modalities, and recently many such problems have been approached using machine learning algorithms for remote sensing. To date, forest-monitoring problems have largely been addressed in isolation. Inspired by the rise of foundation models for computer vision and remote sensing, we here present the first unified Forest Monitoring Benchmark (FoMo-Bench). FoMo-Bench consists of 15 diverse datasets encompassing satellite, aerial, and inventory data, covering a variety of geographical regions, and including multispectral, red-green-blue, synthetic aperture radar (SAR) and LiDAR data with various temporal, spatial and spectral resolutions. FoMo-Bench includes multiple types of forest-monitoring tasks, spanning classification, segmentation, and object detection. To further enhance the diversity of tasks and geographies represented in FoMo-Bench, we introduce a novel global dataset, TalloS, combining satellite imagery with ground-based annotations for tree species classification, encompassing 1,000+ categories across multiple hierarchical taxonomic levels (species, genus, family). Finally, we propose FoMo-Net, a baseline foundation model with the capacity to process any combination of commonly used spectral bands in remote sensing, across diverse ground sampling distances and geographical locations worldwide. This work aims to inspire research collaborations between machine learning and forest biology researchers in exploring scalable multi-modal and multi-task models for forest monitoring. All code and data will be made publicly available.

Related papers

TerraFM: A Scalable Foundation Model for Unified Multisensor Earth Observation [65.74990259650984]
We introduce TerraFM, a scalable self-supervised learning model that leverages globally distributed Sentinel-1 and Sentinel-2 imagery.<n>Our training strategy integrates local-global contrastive learning and introduces a dual-centering mechanism.<n>TerraFM achieves strong generalization on both classification and segmentation tasks, outperforming prior models on GEO-Bench and Copernicus-Bench.
arXiv Detail & Related papers (2025-06-06T17:59:50Z)
EarthMind: Towards Multi-Granular and Multi-Sensor Earth Observation with Large Multimodal Models [96.18182289276649]
We present EarthMind, a novel vision-language framework for multi-granular and multi-sensor Earth Observation (EO) data understanding.<n>EarthMind features two core components: (1) Spatial Attention Prompting (SAP), which reallocates attention within the LLM to enhance pixel-level understanding; and (2) Cross-modal Fusion, which aligns heterogeneous modalities into a shared space.<n>To facilitate multi-sensor fusion evaluation, we propose EarthMind-Bench, a comprehensive benchmark with over 2,000 human-annotated multi-sensor image-question pairs.
arXiv Detail & Related papers (2025-06-02T13:36:05Z)
Data Augmentation and Resolution Enhancement using GANs and Diffusion Models for Tree Segmentation [49.13393683126712]
Urban forests play a key role in enhancing environmental quality and supporting biodiversity in cities.<n> accurately detecting trees is challenging due to complex landscapes and the variability in image resolution caused by different satellite sensors or UAV flight altitudes.<n>We propose a novel pipeline that integrates domain adaptation with GANs and Diffusion models to enhance the quality of low-resolution aerial images.
arXiv Detail & Related papers (2025-05-21T03:57:10Z)
Not Every Tree Is a Forest: Benchmarking Forest Types from Satellite Remote Sensing [1.2266381182650026]
This work introduces ForTy, a benchmark for global-scale FORest TYpes mapping using multi-temporal satellite data.<n>The benchmark comprises 200,000 time series of image patches, each consisting of Sentinel-2, Sentinel-1, climate, and elevation data.<n>We evaluate the forest types dataset using several baseline models, including convolution neural networks and transformer-based models.
arXiv Detail & Related papers (2025-05-03T12:20:50Z)
SatelliteCalculator: A Multi-Task Vision Foundation Model for Quantitative Remote Sensing Inversion [4.824120664293887]
We introduce SatelliteCalculator, the first vision foundation model for quantitative remote sensing inversion. By leveraging physically defined index adapters, we automatically construct a large-scale dataset of over one million paired samples. Experiments demonstrate that SatelliteCalculator achieves competitive accuracy across all tasks while significantly reducing inference cost.
arXiv Detail & Related papers (2025-04-18T03:48:04Z)
PointsToWood: A deep learning framework for complete canopy leaf-wood segmentation of TLS data across diverse European forests [0.0]
We show a new framework that uses a deep learning architecture newly developed from PointNet and point clouds for processing 3D point clouds. We evaluate its performance across open datasets from boreal, temperate, Mediterranean and tropical regions. Results show consistent outperformance against the most widely used PointNet based approach for leaf/wood segmentation.
arXiv Detail & Related papers (2025-03-06T13:23:03Z)
EarthView: A Large Scale Remote Sensing Dataset for Self-Supervision [72.84868704100595]
This paper presents a dataset specifically designed for self-supervision on remote sensing data, intended to enhance deep learning applications on Earth monitoring tasks. The dataset spans 15 tera pixels of global remote-sensing data, combining imagery from a diverse range of sources, including NEON, Sentinel, and a novel release of 1m spatial resolution data from Satellogic. Accompanying the dataset is EarthMAE, a tailored Masked Autoencoder developed to tackle the distinct challenges of remote sensing data.
arXiv Detail & Related papers (2025-01-14T13:42:22Z)
AnySat: One Earth Observation Model for Many Resolutions, Scales, and Modalities [5.767156832161819]
We propose AnySat, a multimodal model based on joint embedding predictive architecture (JEPA) and scale-adaptive spatial encoders.<n>To demonstrate the advantages of this unified approach, we compile GeoPlex, a collection of 5 multimodal datasets with varying characteristics.<n>We then train a single powerful model on these diverse datasets simultaneously.
arXiv Detail & Related papers (2024-12-18T18:11:53Z)
Foundation Models for Remote Sensing and Earth Observation: A Survey [101.77425018347557]
This survey systematically reviews the emerging field of Remote Sensing Foundation Models (RSFMs) It begins with an outline of their motivation and background, followed by an introduction of their foundational concepts. We benchmark these models against publicly available datasets, discuss existing challenges, and propose future research directions.
arXiv Detail & Related papers (2024-10-22T01:08:21Z)
Soil nitrogen forecasting from environmental variables provided by multisensor remote sensing images [0.0]
This study introduces a framework for forecasting soil nitrogen content, leveraging multi-modal data, including remote sensing images and machine learning methods. We integrate the Land Use/Land Cover Area Frame Survey (LUCAS) database, which covers European and UK territory, with environmental variables from satellite sensors to create a dataset of novel features. We test the proposed methods with a variety of land cover classes, including croplands and grasslands to ensure the robustness of this approach.
arXiv Detail & Related papers (2024-06-14T08:10:44Z)
Towards Vision-Language Geo-Foundation Model: A Survey [65.70547895998541]
Vision-Language Foundation Models (VLFMs) have made remarkable progress on various multimodal tasks. This paper thoroughly reviews VLGFMs, summarizing and analyzing recent developments in the field.
arXiv Detail & Related papers (2024-06-13T17:57:30Z)
Planted: a dataset for planted forest identification from multi-satellite time series [23.822292894884427]
We present a dataset consisting of data from five public satellites for recognizing forest plantations and planted tree species across the globe. The dataset, named PlantD, includes over 2M examples of 64 tree label classes (46 genera and 40 species), distributed among 41 countries.
arXiv Detail & Related papers (2024-05-24T15:49:00Z)
MTMMC: A Large-Scale Real-World Multi-Modal Camera Tracking Benchmark [63.878793340338035]
Multi-target multi-camera tracking is a crucial task that involves identifying and tracking individuals over time using video streams from multiple cameras. Existing datasets for this task are either synthetically generated or artificially constructed within a controlled camera network setting. We present MTMMC, a real-world, large-scale dataset that includes long video sequences captured by 16 multi-modal cameras in two different environments.
arXiv Detail & Related papers (2024-03-29T15:08:37Z)
Neural Plasticity-Inspired Multimodal Foundation Model for Earth Observation [48.66623377464203]
Our novel approach introduces the Dynamic One-For-All (DOFA) model, leveraging the concept of neural plasticity in brain science. This dynamic hypernetwork, adjusting to different wavelengths, enables a single versatile Transformer jointly trained on data from five sensors to excel across 12 distinct Earth observation tasks.
arXiv Detail & Related papers (2024-03-22T17:11:47Z)
Forest Inspection Dataset for Aerial Semantic Segmentation and Depth Estimation [6.635604919499181]
We introduce a new large aerial dataset for forest inspection. It contains both real-world and virtual recordings of natural environments. We develop a framework to assess the deforestation degree of an area.
arXiv Detail & Related papers (2024-03-11T11:26:44Z)
LiveHPS: LiDAR-based Scene-level Human Pose and Shape Estimation in Free Environment [59.320414108383055]
We present LiveHPS, a novel single-LiDAR-based approach for scene-level human pose and shape estimation. We propose a huge human motion dataset, named FreeMotion, which is collected in various scenarios with diverse human poses.
arXiv Detail & Related papers (2024-02-27T03:08:44Z)
SatBird: Bird Species Distribution Modeling with Remote Sensing and Citizen Science Data [68.2366021016172]
We present SatBird, a satellite dataset of locations in the USA with labels derived from presence-absence observation data from the citizen science database eBird. We also provide a dataset in Kenya representing low-data regimes. We benchmark a set of baselines on our dataset, including SOTA models for remote sensing tasks.
arXiv Detail & Related papers (2023-11-02T02:00:27Z)
OpenForest: A data catalogue for machine learning in forest monitoring [21.005174521192675]
Advancing forest monitoring offers advantages in mitigating human impacts and enhancing our comprehension of forest composition. We provide a comprehensive overview of 86 open access forest datasets across spatial scales. These datasets are grouped in OpenForest, a dynamic catalogue open to contributions that strives to reference all available open access forest datasets.
arXiv Detail & Related papers (2023-11-01T03:59:20Z)
Rapid Deforestation and Burned Area Detection using Deep Multimodal Learning on Satellite Imagery [3.8073142980733]
Deforestation estimation and fire detection in the Amazon forest poses a significant challenge due to the vast size of the area. multimodal satellite imagery and remote sensing offer a promising solution for estimating deforestation and detecting wildfire in the Amazonia region. This research paper introduces a new curated dataset and a deep learning-based approach to solve these problems using convolutional neural networks (CNNs) and comprehensive data processing techniques.
arXiv Detail & Related papers (2023-07-10T21:49:30Z)
An Efficient General-Purpose Modular Vision Model via Multi-Task Heterogeneous Training [79.78201886156513]
We present a model that can perform multiple vision tasks and can be adapted to other downstream tasks efficiently. Our approach achieves comparable results to single-task state-of-the-art models and demonstrates strong generalization on downstream tasks.
arXiv Detail & Related papers (2023-06-29T17:59:57Z)
Bird Distribution Modelling using Remote Sensing and Citizen Science data [31.375576105932442]
Climate change is a major driver of biodiversity loss. There are significant knowledge gaps about the distribution of species. We propose an approach leveraging computer vision to improve species distribution modelling.
arXiv Detail & Related papers (2023-05-01T20:27:11Z)
A General Purpose Neural Architecture for Geospatial Systems [142.43454584836812]
We present a roadmap towards the construction of a general-purpose neural architecture (GPNA) with a geospatial inductive bias. We envision how such a model may facilitate cooperation between members of the community.
arXiv Detail & Related papers (2022-11-04T09:58:57Z)
Multiple-environment Self-adaptive Network for Aerial-view Geo-localization [85.52750931345287]
Aerial-view geo-localization tends to determine an unknown position through matching the drone-view image with the geo-tagged satellite-view image. We propose a Multiple-environment Self-adaptive Network (MuSe-Net) to adjust the domain shift caused by environmental changing. In particular, MuSe-Net employs a two-branch neural network containing one multiple-environment style extraction network and one self-adaptive feature extraction network.
arXiv Detail & Related papers (2022-04-18T16:04:29Z)
Meta-Learning for Few-Shot Land Cover Classification [3.8529010979482123]
We evaluate the model-agnostic meta-learning (MAML) algorithm on classification and segmentation tasks. We find that few-shot model adaptation outperforms pre-training with regular gradient descent. This indicates that model optimization with meta-learning may benefit tasks in the Earth sciences.
arXiv Detail & Related papers (2020-04-28T09:42:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.