Related papers: VLM4Bio: A Benchmark Dataset to Evaluate Pretrained Vision-Language Models for Trait Discovery from Biological Images

VLM4Bio: A Benchmark Dataset to Evaluate Pretrained Vision-Language Models for Trait Discovery from Biological Images

URL: http://arxiv.org/abs/2408.16176v1
Date: Wed, 28 Aug 2024 23:53:57 GMT
Title: VLM4Bio: A Benchmark Dataset to Evaluate Pretrained Vision-Language Models for Trait Discovery from Biological Images
Authors: M. Maruf, Arka Daw, Kazi Sajeed Mehrab, Harish Babu Manogaran, Abhilash Neog, Medha Sawhney, Mridul Khurana, James P. Balhoff, Yasin Bakis, Bahadir Altintas, Matthew J. Thompson, Elizabeth G. Campolongo, Josef C. Uyeda, Hilmar Lapp, Henry L. Bart, Paula M. Mabee, Yu Su, Wei-Lun Chao, Charles Stewart, Tanya Berger-Wolf, Wasila Dahdul, Anuj Karpatne,
Abstract summary: We evaluate the effectiveness of 12 state-of-the-art (SOTA) VLMs in the field of organismal biology using a novel dataset, VLM4Bio. We also explore the effects of applying prompting techniques and tests for reasoning hallucination on the performance of VLMs.
Score: 21.497452524517783
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Images are increasingly becoming the currency for documenting biodiversity on the planet, providing novel opportunities for accelerating scientific discoveries in the field of organismal biology, especially with the advent of large vision-language models (VLMs). We ask if pre-trained VLMs can aid scientists in answering a range of biologically relevant questions without any additional fine-tuning. In this paper, we evaluate the effectiveness of 12 state-of-the-art (SOTA) VLMs in the field of organismal biology using a novel dataset, VLM4Bio, consisting of 469K question-answer pairs involving 30K images from three groups of organisms: fishes, birds, and butterflies, covering five biologically relevant tasks. We also explore the effects of applying prompting techniques and tests for reasoning hallucination on the performance of VLMs, shedding new light on the capabilities of current SOTA VLMs in answering biologically relevant questions using images. The code and datasets for running all the analyses reported in this paper can be found at https://github.com/sammarfy/VLM4Bio.

Related papers

BioMaze: Benchmarking and Enhancing Large Language Models for Biological Pathway Reasoning [49.487327661584686]
We introduce BioMaze, a dataset with 5.1K complex pathway problems from real research. Our evaluation of methods such as CoT and graph-augmented reasoning, shows that LLMs struggle with pathway reasoning. To address this, we propose PathSeeker, an LLM agent that enhances reasoning through interactive subgraph-based navigation.
arXiv Detail & Related papers (2025-02-23T17:38:10Z)
Large Language Models for Bioinformatics [58.892165394487414]
This survey focuses on the evolution, classification, and distinguishing features of bioinformatics-specific language models (BioLMs) We explore the wide-ranging applications of BioLMs in critical areas such as disease diagnosis, drug discovery, and vaccine development. We identify key challenges and limitations inherent in BioLMs, including data privacy and security concerns, interpretability issues, biases in training data and model outputs, and domain adaptation complexities.
arXiv Detail & Related papers (2025-01-10T01:43:05Z)
Biology Instructions: A Dataset and Benchmark for Multi-Omics Sequence Understanding Capability of Large Language Models [51.316001071698224]
We introduce Biology-Instructions, the first large-scale multi-omics biological sequences-related instruction-tuning dataset. This dataset can bridge the gap between large language models (LLMs) and complex biological sequences-related tasks. We also develop a strong baseline called ChatMultiOmics with a novel three-stage training pipeline.
arXiv Detail & Related papers (2024-12-26T12:12:23Z)
μ-Bench: A Vision-Language Benchmark for Microscopy Understanding [43.27182445778988]
Vision-language models (VLMs) offer a promising solution for large-scale biological image analysis. There is a lack of standardized, diverse, and large-scale vision-language benchmarks to evaluate VLMs. mu-Bench is an expert-curated benchmark encompassing 22 biomedical tasks.
arXiv Detail & Related papers (2024-07-01T20:30:26Z)
BioTrove: A Large Curated Image Dataset Enabling AI for Biodiversity [14.949271003068107]
BioTrove is the largest dataset designed to advance AI applications in biodiversity. It contains 161.9 million images, offering unprecedented scale and diversity from three primary kingdoms. Each image is annotated with scientific names, taxonomic hierarchies, and common names.
arXiv Detail & Related papers (2024-06-25T17:09:54Z)
BIOSCAN-5M: A Multimodal Dataset for Insect Biodiversity [19.003642885871546]
BIOSCAN-5M is a comprehensive dataset containing multi-modal information for over 5 million insect specimens. We propose three benchmark experiments to demonstrate the impact of the multi-modal data types on the classification and clustering accuracy.
arXiv Detail & Related papers (2024-06-18T15:45:21Z)
BioT5+: Towards Generalized Biological Understanding with IUPAC Integration and Multi-task Tuning [77.90250740041411]
This paper introduces BioT5+, an extension of the BioT5 framework, tailored to enhance biological research and drug discovery. BioT5+ incorporates several novel features: integration of IUPAC names for molecular understanding, inclusion of extensive bio-text and molecule data from sources like bioRxiv and PubChem, the multi-task instruction tuning for generality across tasks, and a numerical tokenization technique for improved processing of numerical data.
arXiv Detail & Related papers (2024-02-27T12:43:09Z)
An Evaluation of Large Language Models in Bioinformatics Research [52.100233156012756]
We study the performance of large language models (LLMs) on a wide spectrum of crucial bioinformatics tasks. These tasks include the identification of potential coding regions, extraction of named entities for genes and proteins, detection of antimicrobial and anti-cancer peptides, molecular optimization, and resolution of educational bioinformatics problems. Our findings indicate that, given appropriate prompts, LLMs like GPT variants can successfully handle most of these tasks.
arXiv Detail & Related papers (2024-02-21T11:27:31Z)
BioCLIP: A Vision Foundation Model for the Tree of Life [34.187429586642146]
We release TreeOfLife-10M, the largest and most diverse ML-ready dataset of biology images. We then develop BioCLIP, a foundation model for the tree of life. We rigorously benchmark our approach on diverse fine-grained biology classification tasks.
arXiv Detail & Related papers (2023-11-30T18:49:43Z)
Evaluating the Potential of Leading Large Language Models in Reasoning Biology Questions [33.81650223615028]
This study evaluated the capabilities of leading Large Language Models (LLMs) in answering conceptual biology questions. The models were tested on a 108-question multiple-choice exam covering biology topics in molecular biology, biological techniques, metabolic engineering, and synthetic biology. The results indicated GPT-4's proficiency in logical reasoning and its potential to aid biology research through capabilities like data analysis, hypothesis generation, and knowledge integration.
arXiv Detail & Related papers (2023-11-05T03:34:17Z)
ProBio: A Protocol-guided Multimodal Dataset for Molecular Biology Lab [67.24684071577211]
The challenge of replicating research results has posed a significant impediment to the field of molecular biology. We first curate a comprehensive multimodal dataset, named ProBio, as an initial step towards this objective. Next, we devise two challenging benchmarks, transparent solution tracking and multimodal action recognition, to emphasize the unique characteristics and difficulties associated with activity understanding in BioLab settings.
arXiv Detail & Related papers (2023-11-01T14:44:01Z)
BioT5: Enriching Cross-modal Integration in Biology with Chemical Knowledge and Natural Language Associations [54.97423244799579]
$mathbfBioT5$ is a pre-training framework that enriches cross-modal integration in biology with chemical knowledge and natural language associations. $mathbfBioT5$ distinguishes between structured and unstructured knowledge, leading to more effective utilization of information.
arXiv Detail & Related papers (2023-10-11T07:57:08Z)
LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day [85.19963303642427]
We propose a cost-efficient approach for training a vision-language conversational assistant that can answer open-ended research questions of biomedical images. The model first learns to align biomedical vocabulary using the figure-caption pairs as is, then learns to master open-ended conversational semantics. This enables us to train a Large Language and Vision Assistant for BioMedicine in less than 15 hours (with eight A100s)
arXiv Detail & Related papers (2023-06-01T16:50:07Z)
Automatic image-based identification and biomass estimation of invertebrates [70.08255822611812]
Time-consuming sorting and identification of taxa pose strong limitations on how many insect samples can be processed. We propose to replace the standard manual approach of human expert-based sorting and identification with an automatic image-based technology. We use state-of-the-art Resnet-50 and InceptionV3 CNNs for the classification task.
arXiv Detail & Related papers (2020-02-05T21:38:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.