Automated Marine Biofouling Assessment: Benchmarking Computer Vision and Multimodal LLMs on the Level of Fouling Scale
- URL: http://arxiv.org/abs/2601.20196v1
- Date: Wed, 28 Jan 2026 02:46:21 GMT
- Title: Automated Marine Biofouling Assessment: Benchmarking Computer Vision and Multimodal LLMs on the Level of Fouling Scale
- Authors: Brayden Hamilton, Tim Cashmore, Peter Driscoll, Trevor Gee, Henry Williams,
- Abstract summary: Biofouling on vessel hulls poses major ecological, economic, and biosecurity risks.<n>This work investigates automated classification of biofouling severity on the Level of Fouling scale using both custom computer vision models and large multimodal language models.
- Score: 1.4484301765138528
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Marine biofouling on vessel hulls poses major ecological, economic, and biosecurity risks. Traditional survey methods rely on diver inspections, which are hazardous and limited in scalability. This work investigates automated classification of biofouling severity on the Level of Fouling (LoF) scale using both custom computer vision models and large multimodal language models (LLMs). Convolutional neural networks, transformer-based segmentation, and zero-shot LLMs were evaluated on an expert-labelled dataset from the New Zealand Ministry for Primary Industries. Computer vision models showed high accuracy at extreme LoF categories but struggled with intermediate levels due to dataset imbalance and image framing. LLMs, guided by structured prompts and retrieval, achieved competitive performance without training and provided interpretable outputs. The results demonstrate complementary strengths across approaches and suggest that hybrid methods integrating segmentation coverage with LLM reasoning offer a promising pathway toward scalable and interpretable biofouling assessment.
Related papers
- BioVessel-Net and RetinaMix: Unsupervised Retinal Vessel Segmentation from OCTA Images [16.61148063147746]
BioVessel-Net is an unsupervised generative framework that integrates vessel biostatistics with adversarial refinement and a radius-guided segmentation strategy.<n>We introduce RetinaMix, a new benchmark dataset of 2D and 3D OCTA images with high-resolution vessel details from diverse populations.<n>BioVessel-Net achieves near-perfect segmentation accuracy across RetinaMix and existing datasets, substantially outperforming state-of-the-art supervised and semi-supervised methods.
arXiv Detail & Related papers (2025-09-28T03:46:20Z) - WLFM: A Well-Logs Foundation Model for Multi-Task and Cross-Well Geological Interpretation [12.858491655938026]
We propose WLFM, a foundation model pretrained on multi-curve logs from 1200 wells.<n> WLFM consistently outperforms state-of-the-art baselines, achieving 0.0041 MSE in porosity estimation and 74.13% accuracy in lithology classification.<n>These results establish WLFM as a scalable, interpretable, and transferable backbone for geological AI, with implications for multi-modal integration of logs, seismic, and textual data.
arXiv Detail & Related papers (2025-09-16T14:59:45Z) - YH-MINER: Multimodal Intelligent System for Natural Ecological Reef Metric Extraction [23.4289262373633]
Coral reefs, crucial for sustaining marine biodiversity and ecological processes, face escalating threats.<n>This study develops the YH-MINER system, establishing an intelligent framework for "object detection-semantic segmentation-prior input"<n>The system achieves genus-level classification accuracy of 88% and simultaneously extracting core ecological metrics.
arXiv Detail & Related papers (2025-05-28T11:36:18Z) - Detecting LLM Hallucination Through Layer-wise Information Deficiency: Analysis of Ambiguous Prompts and Unanswerable Questions [60.31496362993982]
Large language models (LLMs) frequently generate confident yet inaccurate responses.<n>We present a novel, test-time approach to detecting model hallucination through systematic analysis of information flow.
arXiv Detail & Related papers (2024-12-13T16:14:49Z) - Both Text and Images Leaked! A Systematic Analysis of Data Contamination in Multimodal LLM [53.05486269607166]
multimodal large language models (MLLMs) have significantly enhanced performance across benchmarks.<n>Existing detection methods for unimodal large language models (LLMs) are inadequate for MLLMs due to multimodal data complexity and multi-phase training.<n>We analyze multimodal data contamination using our analytical framework, MM-Detect, which defines two contamination categories-unimodal and cross-modal.
arXiv Detail & Related papers (2024-11-06T10:44:15Z) - The Misclassification Likelihood Matrix: Some Classes Are More Likely To Be Misclassified Than Others [1.654278807602897]
This study introduces Misclassification Likelihood Matrix (MLM) as a novel tool for quantifying the reliability of neural network predictions under distribution shifts.
The implications of this work extend beyond image classification, with ongoing applications in autonomous systems, such as self-driving cars.
arXiv Detail & Related papers (2024-07-10T16:43:14Z) - Assessing biomedical knowledge robustness in large language models by query-efficient sampling attacks [0.6282171844772422]
An increasing depth of parametric domain knowledge in large language models (LLMs) is fueling their rapid deployment in real-world applications.<n>The recent discovery of named entities as adversarial examples in natural language processing tasks raises questions about their potential impact on the knowledge robustness of pre-trained and finetuned LLMs.<n>We developed an embedding-space attack based on powerscaled distance-weighted sampling to assess the robustness of their biomedical knowledge.
arXiv Detail & Related papers (2024-02-16T09:29:38Z) - Improving Biomedical Entity Linking with Retrieval-enhanced Learning [53.24726622142558]
$k$NN-BioEL provides a BioEL model with the ability to reference similar instances from the entire training corpus as clues for prediction.
We show that $k$NN-BioEL outperforms state-of-the-art baselines on several datasets.
arXiv Detail & Related papers (2023-12-15T14:04:23Z) - Learning in Imperfect Environment: Multi-Label Classification with
Long-Tailed Distribution and Partial Labels [53.68653940062605]
We introduce a novel task, Partial labeling and Long-Tailed Multi-Label Classification (PLT-MLC)
We find that most LT-MLC and PL-MLC approaches fail to solve the degradation-MLC.
We propose an end-to-end learning framework: textbfCOrrection $rightarrow$ textbfModificattextbfIon $rightarrow$ balantextbfCe.
arXiv Detail & Related papers (2023-04-20T20:05:08Z) - Bayesian Transformer Language Models for Speech Recognition [59.235405107295655]
State-of-the-art neural language models (LMs) represented by Transformers are highly complex.
This paper proposes a full Bayesian learning framework for Transformer LM estimation.
arXiv Detail & Related papers (2021-02-09T10:55:27Z) - Revisiting LSTM Networks for Semi-Supervised Text Classification via
Mixed Objective Function [106.69643619725652]
We develop a training strategy that allows even a simple BiLSTM model, when trained with cross-entropy loss, to achieve competitive results.
We report state-of-the-art results for text classification task on several benchmark datasets.
arXiv Detail & Related papers (2020-09-08T21:55:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.