Related papers: Bridging Classical and Modern Computer Vision: PerceptiveNet for Tree Crown Semantic Segmentation

Bridging Classical and Modern Computer Vision: PerceptiveNet for Tree Crown Semantic Segmentation

URL: http://arxiv.org/abs/2505.23597v1
Date: Thu, 29 May 2025 16:11:08 GMT
Title: Bridging Classical and Modern Computer Vision: PerceptiveNet for Tree Crown Semantic Segmentation
Authors: Georgios Voulgaris,
Abstract summary: PerceptiveNet is a novel model incorporating a Logarithmic Gabor- parameterised convolutional layer with trainable filter parameters.<n>We investigate the impact of Log-Gabor, Gabor, and standard convolutional layers on semantic segmentation performance.<n>Our results outperform state-of-the-art models, demonstrating significant performance improvements on a tree crown dataset.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The accurate semantic segmentation of tree crowns within remotely sensed data is crucial for scientific endeavours such as forest management, biodiversity studies, and carbon sequestration quantification. However, precise segmentation remains challenging due to complexities in the forest canopy, including shadows, intricate backgrounds, scale variations, and subtle spectral differences among tree species. Compared to the traditional methods, Deep Learning models improve accuracy by extracting informative and discriminative features, but often fall short in capturing the aforementioned complexities. To address these challenges, we propose PerceptiveNet, a novel model incorporating a Logarithmic Gabor-parameterised convolutional layer with trainable filter parameters, alongside a backbone that extracts salient features while capturing extensive context and spatial information through a wider receptive field. We investigate the impact of Log-Gabor, Gabor, and standard convolutional layers on semantic segmentation performance through extensive experimentation. Additionally, we conduct an ablation study to assess the contributions of individual layers and their combinations to overall model performance, and we evaluate PerceptiveNet as a backbone within a novel hybrid CNN-Transformer model. Our results outperform state-of-the-art models, demonstrating significant performance improvements on a tree crown dataset while generalising across domains, including two benchmark aerial scene semantic segmentation datasets with varying complexities.

Related papers

It Takes Two: Accurate Gait Recognition in the Wild via Cross-granularity Alignment [72.75844404617959]
This paper proposes a novel cross-granularity alignment gait recognition method, named XGait. To achieve this goal, the XGait first contains two branches of backbone encoders to map the silhouette sequences and the parsing sequences into two latent spaces. Comprehensive experiments on two large-scale gait datasets show XGait with the Rank-1 accuracy of 80.5% on Gait3D and 88.3% CCPG.
arXiv Detail & Related papers (2024-11-16T08:54:27Z)
Persistent Topological Features in Large Language Models [0.6597195879147556]
We introduce topological descriptors that measure how topological features, $p$-dimensional holes, persist and evolve throughout the layers.<n>This offers a statistical perspective on how prompts are rearranged and their relative positions changed in the representation space.<n>As a showcase application, we use zigzag persistence to establish a criterion for layer pruning, achieving results comparable to state-of-the-art methods.
arXiv Detail & Related papers (2024-10-14T19:46:23Z)
A Lightweight Clustering Framework for Unsupervised Semantic Segmentation [28.907274978550493]
Unsupervised semantic segmentation aims to categorize each pixel in an image into a corresponding class without the use of annotated data. We propose a lightweight clustering framework for unsupervised semantic segmentation. Our framework achieves state-of-the-art results on PASCAL VOC and MS COCO datasets.
arXiv Detail & Related papers (2023-11-30T15:33:42Z)
Benchmarking Individual Tree Mapping with Sub-meter Imagery [6.907098367807166]
We introduce an evaluation framework suited for individual tree mapping in any physical environment. We review and compare different approaches and deep architectures, and introduce a new method that we experimentally prove to be a good compromise between segmentation and detection.
arXiv Detail & Related papers (2023-11-14T08:21:36Z)
On Characterizing the Evolution of Embedding Space of Neural Networks using Algebraic Topology [9.537910170141467]
We study how the topology of feature embedding space changes as it passes through the layers of a well-trained deep neural network (DNN) through Betti numbers. We demonstrate that as depth increases, a topologically complicated dataset is transformed into a simple one, resulting in Betti numbers attaining their lowest possible value.
arXiv Detail & Related papers (2023-11-08T10:45:12Z)
RGM: A Robust Generalizable Matching Model [49.60975442871967]
We propose a deep model for sparse and dense matching, termed RGM (Robust Generalist Matching) To narrow the gap between synthetic training samples and real-world scenarios, we build a new, large-scale dataset with sparse correspondence ground truth. We are able to mix up various dense and sparse matching datasets, significantly improving the training diversity.
arXiv Detail & Related papers (2023-10-18T07:30:08Z)
CHALLENGER: Training with Attribution Maps [63.736435657236505]
We show that utilizing attribution maps for training neural networks can improve regularization of models and thus increase performance. In particular, we show that our generic domain-independent approach yields state-of-the-art results in vision, natural language processing and on time series tasks.
arXiv Detail & Related papers (2022-05-30T13:34:46Z)
DepthFormer: Exploiting Long-Range Correlation and Local Information for Accurate Monocular Depth Estimation [50.08080424613603]
Long-range correlation is essential for accurate monocular depth estimation. We propose to leverage the Transformer to model this global context with an effective attention mechanism. Our proposed model, termed DepthFormer, surpasses state-of-the-art monocular depth estimation methods with prominent margins.
arXiv Detail & Related papers (2022-03-27T05:03:56Z)
Instance segmentation of fallen trees in aerial color infrared imagery using active multi-contour evolution with fully convolutional network-based intensity priors [0.5276232626689566]
We introduce a framework for segmenting instances of a common object class by multiple active contour evolution over semantic segmentation maps of images. We instantiate the proposed framework in the context of segmenting individual fallen stems from high-resolution aerial multispectral imagery.
arXiv Detail & Related papers (2021-05-05T11:54:05Z)
Polynomial Networks in Deep Classifiers [55.90321402256631]
We cast the study of deep neural networks under a unifying framework. Our framework provides insights on the inductive biases of each model. The efficacy of the proposed models is evaluated on standard image and audio classification benchmarks.
arXiv Detail & Related papers (2021-04-16T06:41:20Z)
Adversarial Feature Augmentation and Normalization for Visual Recognition [109.6834687220478]
Recent advances in computer vision take advantage of adversarial data augmentation to ameliorate the generalization ability of classification models. Here, we present an effective and efficient alternative that advocates adversarial augmentation on intermediate feature embeddings. We validate the proposed approach across diverse visual recognition tasks with representative backbone networks.
arXiv Detail & Related papers (2021-03-22T20:36:34Z)
Context Decoupling Augmentation for Weakly Supervised Semantic Segmentation [53.49821324597837]
Weakly supervised semantic segmentation is a challenging problem that has been deeply studied in recent years. We present a Context Decoupling Augmentation ( CDA) method to change the inherent context in which the objects appear. To validate the effectiveness of the proposed method, extensive experiments on PASCAL VOC 2012 dataset with several alternative network architectures demonstrate that CDA can boost various popular WSSS methods to the new state-of-the-art by a large margin.
arXiv Detail & Related papers (2021-03-02T15:05:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.