FishNet++: Analyzing the capabilities of Multimodal Large Language Models in marine biology
- URL: http://arxiv.org/abs/2509.25564v1
- Date: Mon, 29 Sep 2025 22:39:58 GMT
- Title: FishNet++: Analyzing the capabilities of Multimodal Large Language Models in marine biology
- Authors: Faizan Farooq Khan, Yousef Radwan, Eslam Abdelrahman, Abdulwahab Felemban, Aymen Mir, Nico K. Michiels, Andrew J. Temple, Michael L. Berumen, Mohamed Elhoseiny,
- Abstract summary: Multimodal large language models (MLLMs) have demonstrated impressive cross-domain capabilities, yet their proficiency in specialized scientific fields like marine biology remains underexplored.<n>In this work, we systematically evaluate state-of-the-art MLLMs and reveal significant limitations in their ability to perform fine-grained recognition of fish species.<n>We introduce FishNet++, a large-scale, multimodal benchmark.<n>FishNet++ significantly extends existing resources with 35,133 textual descriptions for multimodal learning, 706,426 key-point annotations for morphological studies, and 119,399 bounding boxes for detection.
- Score: 28.683426892594458
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multimodal large language models (MLLMs) have demonstrated impressive cross-domain capabilities, yet their proficiency in specialized scientific fields like marine biology remains underexplored. In this work, we systematically evaluate state-of-the-art MLLMs and reveal significant limitations in their ability to perform fine-grained recognition of fish species, with the best open-source models achieving less than 10\% accuracy. This task is critical for monitoring marine ecosystems under anthropogenic pressure. To address this gap and investigate whether these failures stem from a lack of domain knowledge, we introduce FishNet++, a large-scale, multimodal benchmark. FishNet++ significantly extends existing resources with 35,133 textual descriptions for multimodal learning, 706,426 key-point annotations for morphological studies, and 119,399 bounding boxes for detection. By providing this comprehensive suite of annotations, our work facilitates the development and evaluation of specialized vision-language models capable of advancing aquatic science.
Related papers
- HiSciBench: A Hierarchical Multi-disciplinary Benchmark for Scientific Intelligence from Reading to Discovery [50.8841471967624]
HiSciBench is a hierarchical benchmark designed to evaluate foundation models across five levels that mirror the complete scientific workflow.<n>HiSciBench contains 8,735 carefully curated instances spanning six major scientific disciplines.
arXiv Detail & Related papers (2025-12-28T12:08:05Z) - RoBiologyDataChoiceQA: A Romanian Dataset for improving Biology understanding of Large Language Models [0.15293427903448023]
Large language models (LLMs) have demonstrated significant potential across various natural language processing (NLP) tasks.<n>This study introduces a novel Romanian-language dataset for multiple-choice biology questions.
arXiv Detail & Related papers (2025-09-30T05:41:50Z) - Jellyfish Species Identification: A CNN Based Artificial Neural Network Approach [0.0]
Jellyfish play a crucial role in maintaining marine ecosystems but pose significant challenges for biodiversity and conservation.<n>In this study, we proposed a deep learning framework for jellyfish species detection and classification using an underwater image dataset.
arXiv Detail & Related papers (2025-07-15T09:10:36Z) - A Framework for Multi-View Multiple Object Tracking using Single-View Multi-Object Trackers on Fish Data [0.559239450391449]
This thesis adapts state-of-the-art single-view MOT models, FairMOT and YOLOv8, for underwater fish detecting and tracking in ecological studies.<n>The proposed framework detects fish entities with a relative accuracy of 47% and employs stereo-matching techniques to produce a novel 3D output.
arXiv Detail & Related papers (2025-05-22T18:12:08Z) - VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models [121.03333569013148]
We introduce VisuLogic: a benchmark of 1,000 human-verified problems across six categories.<n>These types of questions can be evaluated to assess the visual reasoning capabilities of MLLMs from multiple perspectives.<n>Most models score below 30% accuracy-only slightly above the 25% random baseline and far below the 51.4% achieved by humans.
arXiv Detail & Related papers (2025-04-21T17:59:53Z) - Biological Sequence with Language Model Prompting: A Survey [14.270959261105968]
Large Language models (LLMs) have emerged as powerful tools for addressing challenges across diverse domains.<n>This paper systematically investigates the application of prompt-based methods with LLMs to biological sequences.
arXiv Detail & Related papers (2025-03-06T06:28:36Z) - Biology-Instructions: A Dataset and Benchmark for Multi-Omics Sequence Understanding Capability of Large Language Models [55.74944165932666]
We introduce Biology-Instructions, the first large-scale instruction-tuning dataset for multi-omics biological sequences.<n>This dataset bridges large language models (LLMs) and complex biological sequence-related tasks, enhancing their versatility and reasoning.<n>We also highlight significant limitations of current state-of-the-art LLMs on multi-omics tasks without specialized training.
arXiv Detail & Related papers (2024-12-26T12:12:23Z) - Data-Efficient Inference of Neural Fluid Fields via SciML Foundation Model [49.06911227670408]
We show that SciML foundation model can significantly improve the data efficiency of inferring real-world 3D fluid dynamics with improved generalization.<n>We equip neural fluid fields with a novel collaborative training approach that utilizes augmented views and fluid features extracted by our foundation model.
arXiv Detail & Related papers (2024-12-18T14:39:43Z) - An Evaluation of Large Language Models in Bioinformatics Research [52.100233156012756]
We study the performance of large language models (LLMs) on a wide spectrum of crucial bioinformatics tasks.
These tasks include the identification of potential coding regions, extraction of named entities for genes and proteins, detection of antimicrobial and anti-cancer peptides, molecular optimization, and resolution of educational bioinformatics problems.
Our findings indicate that, given appropriate prompts, LLMs like GPT variants can successfully handle most of these tasks.
arXiv Detail & Related papers (2024-02-21T11:27:31Z) - A quantitative analysis of knowledge-learning preferences in large language models in molecular science [24.80165173525286]
Large language models (LLMs) introduce a fresh research paradigm to tackle scientific problems from a natural language processing (NLP) perspective.<n>LLMs significantly enhance our understanding and generation of molecules, often surpassing existing methods with their capabilities to decode and synthesize complex molecular patterns.<n>We propose a multi-modal benchmark, named ChEBI-20-MM, and perform 1263 experiments to assess the model's compatibility with data modalities and knowledge acquisition.
arXiv Detail & Related papers (2024-02-06T16:12:36Z) - From Training-Free to Adaptive: Empirical Insights into MLLMs' Understanding of Detection Information [32.57246173437492]
Vision detection models excel at recognizing fine-grained image details.<n>One effective strategy is to infuse detection information in text format, which has proven simple and effective.<n>This paper addresses the question: How does training impact MLLMs' understanding of infused textual detection information?
arXiv Detail & Related papers (2024-01-31T16:38:32Z) - Advancing bioinformatics with large language models: components, applications and perspectives [12.728981464533918]
Large language models (LLMs) are a class of artificial intelligence models based on deep learning.<n>We will provide a comprehensive overview of the essential components of large language models (LLMs) in bioinformatics.<n>Key aspects covered include tokenization methods for diverse data types, the architecture of transformer models, and the core attention mechanism.
arXiv Detail & Related papers (2024-01-08T17:26:59Z) - Gone Fishing: Neural Active Learning with Fisher Embeddings [55.08537975896764]
There is an increasing need for active learning algorithms that are compatible with deep neural networks.
This article introduces BAIT, a practical representation of tractable, and high-performing active learning algorithm for neural networks.
arXiv Detail & Related papers (2021-06-17T17:26:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.