Related papers: FishNet++: Analyzing the capabilities of Multimodal Large Language Models in marine biology

FishNet++: Analyzing the capabilities of Multimodal Large Language Models in marine biology

URL: http://arxiv.org/abs/2509.25564v1
Date: Mon, 29 Sep 2025 22:39:58 GMT
Title: FishNet++: Analyzing the capabilities of Multimodal Large Language Models in marine biology
Authors: Faizan Farooq Khan, Yousef Radwan, Eslam Abdelrahman, Abdulwahab Felemban, Aymen Mir, Nico K. Michiels, Andrew J. Temple, Michael L. Berumen, Mohamed Elhoseiny,
Abstract summary: Multimodal large language models (MLLMs) have demonstrated impressive cross-domain capabilities, yet their proficiency in specialized scientific fields like marine biology remains underexplored.<n>In this work, we systematically evaluate state-of-the-art MLLMs and reveal significant limitations in their ability to perform fine-grained recognition of fish species.<n>We introduce FishNet++, a large-scale, multimodal benchmark.<n>FishNet++ significantly extends existing resources with 35,133 textual descriptions for multimodal learning, 706,426 key-point annotations for morphological studies, and 119,399 bounding boxes for detection.
Score: 28.683426892594458
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Multimodal large language models (MLLMs) have demonstrated impressive cross-domain capabilities, yet their proficiency in specialized scientific fields like marine biology remains underexplored. In this work, we systematically evaluate state-of-the-art MLLMs and reveal significant limitations in their ability to perform fine-grained recognition of fish species, with the best open-source models achieving less than 10\% accuracy. This task is critical for monitoring marine ecosystems under anthropogenic pressure. To address this gap and investigate whether these failures stem from a lack of domain knowledge, we introduce FishNet++, a large-scale, multimodal benchmark. FishNet++ significantly extends existing resources with 35,133 textual descriptions for multimodal learning, 706,426 key-point annotations for morphological studies, and 119,399 bounding boxes for detection. By providing this comprehensive suite of annotations, our work facilitates the development and evaluation of specialized vision-language models capable of advancing aquatic science.

Related papers

HiSciBench: A Hierarchical Multi-disciplinary Benchmark for Scientific Intelligence from Reading to Discovery [50.8841471967624]
HiSciBench is a hierarchical benchmark designed to evaluate foundation models across five levels that mirror the complete scientific workflow.<n>HiSciBench contains 8,735 carefully curated instances spanning six major scientific disciplines.
arXiv Detail & Related papers (2025-12-28T12:08:05Z)
RoBiologyDataChoiceQA: A Romanian Dataset for improving Biology understanding of Large Language Models [0.15293427903448023]
Large language models (LLMs) have demonstrated significant potential across various natural language processing (NLP) tasks.<n>This study introduces a novel Romanian-language dataset for multiple-choice biology questions.
arXiv Detail & Related papers (2025-09-30T05:41:50Z)
Jellyfish Species Identification: A CNN Based Artificial Neural Network Approach [0.0]
Jellyfish play a crucial role in maintaining marine ecosystems but pose significant challenges for biodiversity and conservation.<n>In this study, we proposed a deep learning framework for jellyfish species detection and classification using an underwater image dataset.
arXiv Detail & Related papers (2025-07-15T09:10:36Z)
A Framework for Multi-View Multiple Object Tracking using Single-View Multi-Object Trackers on Fish Data [0.559239450391449]
This thesis adapts state-of-the-art single-view MOT models, FairMOT and YOLOv8, for underwater fish detecting and tracking in ecological studies.<n>The proposed framework detects fish entities with a relative accuracy of 47% and employs stereo-matching techniques to produce a novel 3D output.
arXiv Detail & Related papers (2025-05-22T18:12:08Z)
VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models [121.03333569013148]
We introduce VisuLogic: a benchmark of 1,000 human-verified problems across six categories.<n>These types of questions can be evaluated to assess the visual reasoning capabilities of MLLMs from multiple perspectives.<n>Most models score below 30% accuracy-only slightly above the 25% random baseline and far below the 51.4% achieved by humans.
arXiv Detail & Related papers (2025-04-21T17:59:53Z)
Biological Sequence with Language Model Prompting: A Survey [14.270959261105968]
Large Language models (LLMs) have emerged as powerful tools for addressing challenges across diverse domains.<n>This paper systematically investigates the application of prompt-based methods with LLMs to biological sequences.
arXiv Detail & Related papers (2025-03-06T06:28:36Z)
Biology-Instructions: A Dataset and Benchmark for Multi-Omics Sequence Understanding Capability of Large Language Models [55.74944165932666]
We introduce Biology-Instructions, the first large-scale instruction-tuning dataset for multi-omics biological sequences.<n>This dataset bridges large language models (LLMs) and complex biological sequence-related tasks, enhancing their versatility and reasoning.<n>We also highlight significant limitations of current state-of-the-art LLMs on multi-omics tasks without specialized training.
arXiv Detail & Related papers (2024-12-26T12:12:23Z)
Data-Efficient Inference of Neural Fluid Fields via SciML Foundation Model [49.06911227670408]
We show that SciML foundation model can significantly improve the data efficiency of inferring real-world 3D fluid dynamics with improved generalization.<n>We equip neural fluid fields with a novel collaborative training approach that utilizes augmented views and fluid features extracted by our foundation model.
arXiv Detail & Related papers (2024-12-18T14:39:43Z)
An Evaluation of Large Language Models in Bioinformatics Research [52.100233156012756]
We study the performance of large language models (LLMs) on a wide spectrum of crucial bioinformatics tasks. These tasks include the identification of potential coding regions, extraction of named entities for genes and proteins, detection of antimicrobial and anti-cancer peptides, molecular optimization, and resolution of educational bioinformatics problems. Our findings indicate that, given appropriate prompts, LLMs like GPT variants can successfully handle most of these tasks.
arXiv Detail & Related papers (2024-02-21T11:27:31Z)
A quantitative analysis of knowledge-learning preferences in large language models in molecular science [24.80165173525286]
Large language models (LLMs) introduce a fresh research paradigm to tackle scientific problems from a natural language processing (NLP) perspective.<n>LLMs significantly enhance our understanding and generation of molecules, often surpassing existing methods with their capabilities to decode and synthesize complex molecular patterns.<n>We propose a multi-modal benchmark, named ChEBI-20-MM, and perform 1263 experiments to assess the model's compatibility with data modalities and knowledge acquisition.
arXiv Detail & Related papers (2024-02-06T16:12:36Z)
From Training-Free to Adaptive: Empirical Insights into MLLMs' Understanding of Detection Information [32.57246173437492]
Vision detection models excel at recognizing fine-grained image details.<n>One effective strategy is to infuse detection information in text format, which has proven simple and effective.<n>This paper addresses the question: How does training impact MLLMs' understanding of infused textual detection information?
arXiv Detail & Related papers (2024-01-31T16:38:32Z)
Advancing bioinformatics with large language models: components, applications and perspectives [12.728981464533918]
Large language models (LLMs) are a class of artificial intelligence models based on deep learning.<n>We will provide a comprehensive overview of the essential components of large language models (LLMs) in bioinformatics.<n>Key aspects covered include tokenization methods for diverse data types, the architecture of transformer models, and the core attention mechanism.
arXiv Detail & Related papers (2024-01-08T17:26:59Z)
Gone Fishing: Neural Active Learning with Fisher Embeddings [55.08537975896764]
There is an increasing need for active learning algorithms that are compatible with deep neural networks. This article introduces BAIT, a practical representation of tractable, and high-performing active learning algorithm for neural networks.
arXiv Detail & Related papers (2021-06-17T17:26:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.