Related papers: BioCLIP: A Vision Foundation Model for the Tree of Life

BioCLIP: A Vision Foundation Model for the Tree of Life

URL: http://arxiv.org/abs/2311.18803v3
Date: Tue, 14 May 2024 19:53:18 GMT
Title: BioCLIP: A Vision Foundation Model for the Tree of Life
Authors: Samuel Stevens, Jiaman Wu, Matthew J Thompson, Elizabeth G Campolongo, Chan Hee Song, David Edward Carlyn, Li Dong, Wasila M Dahdul, Charles Stewart, Tanya Berger-Wolf, Wei-Lun Chao, Yu Su,
Abstract summary: We release TreeOfLife-10M, the largest and most diverse ML-ready dataset of biology images. We then develop BioCLIP, a foundation model for the tree of life. We rigorously benchmark our approach on diverse fine-grained biology classification tasks.
Score: 34.187429586642146
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Images of the natural world, collected by a variety of cameras, from drones to individual phones, are increasingly abundant sources of biological information. There is an explosion of computational methods and tools, particularly computer vision, for extracting biologically relevant information from images for science and conservation. Yet most of these are bespoke approaches designed for a specific task and are not easily adaptable or extendable to new questions, contexts, and datasets. A vision model for general organismal biology questions on images is of timely need. To approach this, we curate and release TreeOfLife-10M, the largest and most diverse ML-ready dataset of biology images. We then develop BioCLIP, a foundation model for the tree of life, leveraging the unique properties of biology captured by TreeOfLife-10M, namely the abundance and variety of images of plants, animals, and fungi, together with the availability of rich structured biological knowledge. We rigorously benchmark our approach on diverse fine-grained biology classification tasks and find that BioCLIP consistently and substantially outperforms existing baselines (by 16% to 17% absolute). Intrinsic evaluation reveals that BioCLIP has learned a hierarchical representation conforming to the tree of life, shedding light on its strong generalizability. https://imageomics.github.io/bioclip has models, data and code.

Related papers

BioCLIP 2: Emergent Properties from Scaling Hierarchical Contrastive Learning [51.341003735575335]
We find emergent behaviors in biological vision models via large-scale contrastive vision-language training.<n>We train BioCLIP 2 on TreeOfLife-200M to distinguish different species.<n>We identify emergent properties in the learned embedding space of BioCLIP 2.
arXiv Detail & Related papers (2025-05-29T17:48:20Z)
Biology Instructions: A Dataset and Benchmark for Multi-Omics Sequence Understanding Capability of Large Language Models [51.316001071698224]
We introduce Biology-Instructions, the first large-scale multi-omics biological sequences-related instruction-tuning dataset. This dataset can bridge the gap between large language models (LLMs) and complex biological sequences-related tasks. We also develop a strong baseline called ChatMultiOmics with a novel three-stage training pipeline.
arXiv Detail & Related papers (2024-12-26T12:12:23Z)
ViTally Consistent: Scaling Biological Representation Learning for Cell Microscopy [3.432992120614117]
We present the largest foundation model for cell microscopy data to date. Compared to a previous published ViT-L/8 MAE, our new model achieves a 60% improvement in linear separability of genetic perturbations.
arXiv Detail & Related papers (2024-11-04T20:09:51Z)
VLM4Bio: A Benchmark Dataset to Evaluate Pretrained Vision-Language Models for Trait Discovery from Biological Images [21.497452524517783]
We evaluate the effectiveness of 12 state-of-the-art (SOTA) VLMs in the field of organismal biology using a novel dataset, VLM4Bio. We also explore the effects of applying prompting techniques and tests for reasoning hallucination on the performance of VLMs.
arXiv Detail & Related papers (2024-08-28T23:53:57Z)
Arboretum: A Large Multimodal Dataset Enabling AI for Biodiversity [14.949271003068107]
This dataset includes 134.6 million images, surpassing existing datasets in scale by an order of magnitude. The dataset encompasses image-language paired data for a diverse set of species from birds (Aves), spiders/ticks/mites (Arachnida), insects (usha), plants (Plantae), fungus/mrooms (Fungi), snails (Mollusca), and snakes/Insectards (Reptilia)
arXiv Detail & Related papers (2024-06-25T17:09:54Z)
CLIBD: Bridging Vision and Genomics for Biodiversity Monitoring at Scale [21.995678534789615]
We use contrastive learning to align images, barcode DNA, and text-based representations of taxonomic labels in a unified embedding space. Our method surpasses previous single-modality approaches in accuracy by over 8% on zero-shot learning tasks.
arXiv Detail & Related papers (2024-05-27T17:57:48Z)
BioT5+: Towards Generalized Biological Understanding with IUPAC Integration and Multi-task Tuning [77.90250740041411]
This paper introduces BioT5+, an extension of the BioT5 framework, tailored to enhance biological research and drug discovery. BioT5+ incorporates several novel features: integration of IUPAC names for molecular understanding, inclusion of extensive bio-text and molecule data from sources like bioRxiv and PubChem, the multi-task instruction tuning for generality across tasks, and a numerical tokenization technique for improved processing of numerical data.
arXiv Detail & Related papers (2024-02-27T12:43:09Z)
BioT5: Enriching Cross-modal Integration in Biology with Chemical Knowledge and Natural Language Associations [54.97423244799579]
$mathbfBioT5$ is a pre-training framework that enriches cross-modal integration in biology with chemical knowledge and natural language associations. $mathbfBioT5$ distinguishes between structured and unstructured knowledge, leading to more effective utilization of information.
arXiv Detail & Related papers (2023-10-11T07:57:08Z)
BiomedCLIP: a multimodal biomedical foundation model pretrained from fifteen million scientific image-text pairs [48.376109878173956]
We present PMC-15M, a novel dataset that is two orders of magnitude larger than existing biomedical multimodal datasets. PMC-15M contains 15 million biomedical image-text pairs collected from 4.4 million scientific articles. Based on PMC-15M, we have pretrained BiomedCLIP, a multimodal foundation model, with domain-specific adaptations tailored to biomedical vision-language processing.
arXiv Detail & Related papers (2023-03-02T02:20:04Z)
Taxonomy and evolution predicting using deep learning in images [9.98733710208427]
This study creates a novel recognition framework by systematically studying the mushroom image recognition problem. We present the first method to map images to DNA, namely used an encoder mapping image to genetic distances, and then decoded DNA through a pre-trained decoder.
arXiv Detail & Related papers (2022-06-28T13:54:14Z)
Learning multi-scale functional representations of proteins from single-cell microscopy data [77.34726150561087]
We show that simple convolutional networks trained on localization classification can learn protein representations that encapsulate diverse functional information. We also propose a robust evaluation strategy to assess quality of protein representations across different scales of biological function.
arXiv Detail & Related papers (2022-05-24T00:00:07Z)
Automatic image-based identification and biomass estimation of invertebrates [70.08255822611812]
Time-consuming sorting and identification of taxa pose strong limitations on how many insect samples can be processed. We propose to replace the standard manual approach of human expert-based sorting and identification with an automatic image-based technology. We use state-of-the-art Resnet-50 and InceptionV3 CNNs for the classification task.
arXiv Detail & Related papers (2020-02-05T21:38:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.