Related papers: CerraData-4MM: A multimodal benchmark dataset on Cerrado for land use and land cover classification

CerraData-4MM: A multimodal benchmark dataset on Cerrado for land use and land cover classification

URL: http://arxiv.org/abs/2502.00083v1
Date: Fri, 31 Jan 2025 15:57:17 GMT
Title: CerraData-4MM: A multimodal benchmark dataset on Cerrado for land use and land cover classification
Authors: Mateus de Souza Miranda, Ronny Hänsch, Valdivino Alexandre de Santiago Júnior, Thales Sehn Körting, Erison Carlos dos Santos Monteiro,
Abstract summary: CerraData-4MM is a dataset combining Sentinel-1 Synthetic Aperture Radar (SAR) and Sentinel-2 MultiSpectral Imagery (MSI) with 10m spatial resolution.<n>The dataset includes two hierarchical classification levels with 7 and 14 classes, respectively, focusing on the diverse Bico do Papagaio ecoregion.<n>We highlight CerraData-4MM's capacity to benchmark advanced semantic segmentation techniques by evaluating a standard U-Net and a more sophisticated Vision Transformer (ViT) model.
Score: 5.503948543987285
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: The Cerrado faces increasing environmental pressures, necessitating accurate land use and land cover (LULC) mapping despite challenges such as class imbalance and visually similar categories. To address this, we present CerraData-4MM, a multimodal dataset combining Sentinel-1 Synthetic Aperture Radar (SAR) and Sentinel-2 MultiSpectral Imagery (MSI) with 10m spatial resolution. The dataset includes two hierarchical classification levels with 7 and 14 classes, respectively, focusing on the diverse Bico do Papagaio ecoregion. We highlight CerraData-4MM's capacity to benchmark advanced semantic segmentation techniques by evaluating a standard U-Net and a more sophisticated Vision Transformer (ViT) model. The ViT achieves superior performance in multimodal scenarios, with the highest macro F1-score of 57.60% and a mean Intersection over Union (mIoU) of 49.05% at the first hierarchical level. Both models struggle with minority classes, particularly at the second hierarchical level, where U-Net's performance drops to an F1-score of 18.16%. Class balancing improves representation for underrepresented classes but reduces overall accuracy, underscoring the trade-off in weighted training. CerraData-4MM offers a challenging benchmark for advancing deep learning models to handle class imbalance and multimodal data fusion. Code, trained models, and data are publicly available at https://github.com/ai4luc/CerraData-4MM.

Related papers

AHDMIL: Asymmetric Hierarchical Distillation Multi-Instance Learning for Fast and Accurate Whole-Slide Image Classification [51.525891360380285]
AHDMIL is an Asymmetric Hierarchical Distillation Multi-Instance Learning framework.<n>It eliminates irrelevant patches through a two-step training process.<n>It consistently outperforms previous state-of-the-art methods in both classification performance and inference speed.
arXiv Detail & Related papers (2025-08-07T07:47:16Z)
TFOC-Net: A Short-time Fourier Transform-based Deep Learning Approach for Enhancing Cross-Subject Motor Imagery Classification [0.47498241053872914]
Cross-subject motor imagery (CS-MI) classification in brain-computers (BCIs) is a challenging task due to the significant variability in Electroencephalography (EEG) patterns across different individuals.<n>This variability often results in lower classification accuracy compared to subject-specific models.<n>We introduce a novel approach that significantly enhances cross-subject MI classification performance through optimized preprocessing and deep learning techniques.
arXiv Detail & Related papers (2025-07-03T10:17:39Z)
University of Indonesia at SemEval-2025 Task 11: Evaluating State-of-the-Art Encoders for Multi-Label Emotion Detection [1.2564343689544841]
This paper focuses on multilabel emotion classification across 28 languages.<n>We explore two main strategies: fully fine-tuning transformer models and classifier-only training.<n>Our findings suggest that training a classifier on top of prompt-based encoders such as mE5 and BGE yields significantly better results than fully fine-tuning XLMR and mBERT.
arXiv Detail & Related papers (2025-05-22T09:42:11Z)
Benchmarking Large Language Models for Image Classification of Marine Mammals [4.274291455715579]
We build a benchmark dataset with 1,423 images of 65 kinds of marine mammals. Each animal is uniquely classified into different levels of class, ranging from species-level to medium-level to group-level. We evaluate several approaches for classifying these marine mammals.
arXiv Detail & Related papers (2024-10-22T01:49:49Z)
Evaluating Large Language Models on Spatial Tasks: A Multi-Task Benchmarking Study [4.80612909282198]
This study introduces a new multi-task spatial evaluation dataset designed to explore and compare the performance of several advanced models on spatial tasks.<n>The dataset includes twelve distinct task types, such as spatial understanding and simple route planning, each with verified and accurate answers.
arXiv Detail & Related papers (2024-08-26T17:25:16Z)
RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs [60.38044044203333]
Large language models (LLMs) typically utilize the top-k contexts from a retriever in retrieval-augmented generation (RAG) We propose a novel instruction fine-tuning framework RankRAG, which instruction-tunes a single LLM for the dual purpose of context ranking and answer generation in RAG. For generation, we compare our model with many strong baselines, including GPT-4-0613, GPT-4-turbo-2024-0409, and ChatQA-1.5, an open-sourced model with the state-of-the-art performance on RAG benchmarks.
arXiv Detail & Related papers (2024-07-02T17:59:17Z)
ShareGPT4V: Improving Large Multi-Modal Models with Better Captions [81.95879920888716]
We introduce ShareGPT4V, a dataset featuring 1.2 million descriptive captions. This dataset surpasses existing datasets in diversity and information content, covering world knowledge, object properties, spatial relationships, and aesthetic evaluations. We further incorporate ShareGPT4V data into both the pre-training and SFT phases, obtaining ShareGPT4V-7B, a superior LMM.
arXiv Detail & Related papers (2023-11-21T18:58:11Z)
CROMA: Remote Sensing Representations with Contrastive Radar-Optical Masked Autoencoders [2.7624021966289605]
Remote sensing offers vast yet sparsely labeled, spatially aligned multimodal data. We present CROMA, a framework that combines contrastive and reconstruction self-supervised objectives to learn rich unimodal and multimodal representations.
arXiv Detail & Related papers (2023-11-01T15:07:27Z)
DataComp: In search of the next generation of multimodal datasets [179.79323076587255]
DataComp is a testbed for dataset experiments centered around a new candidate pool of 12.8 billion image-text pairs from Common Crawl. Our benchmark consists of multiple compute scales spanning four orders of magnitude. In particular, our best baseline, DataComp-1B, enables training a CLIP ViT-L/14 from scratch to 79.2% zero-shot accuracy on ImageNet.
arXiv Detail & Related papers (2023-04-27T11:37:18Z)
Efficient deep learning models for land cover image classification [0.29748898344267777]
This work experiments with the BigEarthNet dataset for land use land cover (LULC) image classification. We benchmark different state-of-the-art models, including Convolution Neural Networks, Multi-Layer Perceptrons, Visual Transformers, EfficientNets and Wide Residual Networks (WRN) Our proposed lightweight model has an order of magnitude less trainable parameters, achieves 4.5% higher averaged f-score classification accuracy for all 19 LULC classes and is trained two times faster with respect to a ResNet50 state-of-the-art model that we use as a baseline.
arXiv Detail & Related papers (2021-11-18T00:03:14Z)
No Fear of Heterogeneity: Classifier Calibration for Federated Learning with Non-IID Data [78.69828864672978]
A central challenge in training classification models in the real-world federated system is learning with non-IID data. We propose a novel and simple algorithm called Virtual Representations (CCVR), which adjusts the classifier using virtual representations sampled from an approximated ssian mixture model. Experimental results demonstrate that CCVR state-of-the-art performance on popular federated learning benchmarks including CIFAR-10, CIFAR-100, and CINIC-10.
arXiv Detail & Related papers (2021-06-09T12:02:29Z)
Learning to Fairly Classify the Quality of Wireless Links [0.5352699766206808]
We propose a new tree-based link quality classifier that meets high performance and fairly classifies the minority class. We compare the tree-based model, to a multilayer perceptron (MLP) non-linear model and two linear models, namely logistic regression (LR) and SVM, on a selected imbalanced dataset. Our study shows that 1) non-linear models perform slightly better than linear models in general, 2) the proposed non-linear tree-based model yields the best performance trade-off considering F1, training time and fairness, and 3) single metric aggregated evaluations based only on accuracy can hide poor,
arXiv Detail & Related papers (2021-02-23T12:23:27Z)
Deep F-measure Maximization for End-to-End Speech Understanding [52.36496114728355]
We propose a differentiable approximation to the F-measure and train the network with this objective using standard backpropagation. We perform experiments on two standard fairness datasets, Adult, Communities and Crime, and also on speech-to-intent detection on the ATIS dataset and speech-to-image concept classification on the Speech-COCO dataset. In all four of these tasks, F-measure results in improved micro-F1 scores, with absolute improvements of up to 8% absolute, as compared to models trained with the cross-entropy loss function.
arXiv Detail & Related papers (2020-08-08T03:02:27Z)
Device-Robust Acoustic Scene Classification Based on Two-Stage Categorization and Data Augmentation [63.98724740606457]
We present a joint effort of four groups, namely GT, USTC, Tencent, and UKE, to tackle Task 1 - Acoustic Scene Classification (ASC) in the DCASE 2020 Challenge. Task 1a focuses on ASC of audio signals recorded with multiple (real and simulated) devices into ten different fine-grained classes. Task 1b concerns with classification of data into three higher-level classes using low-complexity solutions.
arXiv Detail & Related papers (2020-07-16T15:07:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.