BioCLIP 2: Emergent Properties from Scaling Hierarchical Contrastive Learning
- URL: http://arxiv.org/abs/2505.23883v1
- Date: Thu, 29 May 2025 17:48:20 GMT
- Title: BioCLIP 2: Emergent Properties from Scaling Hierarchical Contrastive Learning
- Authors: Jianyang Gu, Samuel Stevens, Elizabeth G Campolongo, Matthew J Thompson, Net Zhang, Jiaman Wu, Andrei Kopanev, Zheda Mai, Alexander E. White, James Balhoff, Wasila Dahdul, Daniel Rubenstein, Hilmar Lapp, Tanya Berger-Wolf, Wei-Lun Chao, Yu Su,
- Abstract summary: We find emergent behaviors in biological vision models via large-scale contrastive vision-language training.<n>We train BioCLIP 2 on TreeOfLife-200M to distinguish different species.<n>We identify emergent properties in the learned embedding space of BioCLIP 2.
- Score: 51.341003735575335
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Foundation models trained at scale exhibit remarkable emergent behaviors, learning new capabilities beyond their initial training objectives. We find such emergent behaviors in biological vision models via large-scale contrastive vision-language training. To achieve this, we first curate TreeOfLife-200M, comprising 214 million images of living organisms, the largest and most diverse biological organism image dataset to date. We then train BioCLIP 2 on TreeOfLife-200M to distinguish different species. Despite the narrow training objective, BioCLIP 2 yields extraordinary accuracy when applied to various biological visual tasks such as habitat classification and trait prediction. We identify emergent properties in the learned embedding space of BioCLIP 2. At the inter-species level, the embedding distribution of different species aligns closely with functional and ecological meanings (e.g., beak sizes and habitats). At the intra-species level, instead of being diminished, the intra-species variations (e.g., life stages and sexes) are preserved and better separated in subspaces orthogonal to inter-species distinctions. We provide formal proof and analyses to explain why hierarchical supervision and contrastive objectives encourage these emergent properties. Crucially, our results reveal that these properties become increasingly significant with larger-scale training data, leading to a biologically meaningful embedding space.
Related papers
- AnimalClue: Recognizing Animals by their Traces [43.09184077724619]
AnimalClue is the first large-scale dataset for species identification from images of indirect evidence.<n>It covers 968 species, 200 families, and 65 orders.<n>Unlike existing datasets, AnimalClue presents unique challenges for classification, detection, and instance segmentation tasks.
arXiv Detail & Related papers (2025-07-27T11:48:03Z) - From Images to Insights: Explainable Biodiversity Monitoring with Plain Language Habitat Explanations [4.12825661607328]
We propose an end-to-end visual-to-causal framework that transforms a species image into interpretable causal insights about its habitat preference.<n>The system integrates species recognition, global occurrence retrieval, pseudo-absence sampling, and climate data extraction.<n>We generate statistically grounded, human-readable causal explanations from structured templates and large language models.
arXiv Detail & Related papers (2025-06-12T10:33:30Z) - CrypticBio: A Large Multimodal Dataset for Visually Confusing Biodiversity [3.73232466691291]
We present CrypticBio, the largest publicly available dataset of visually confusing species.<n>Criticized from real-world trends in species misidentification among community annotators of iNaturalist, CrypticBio contains 52K unique cryptic groups spanning 67K species.
arXiv Detail & Related papers (2025-05-16T14:35:56Z) - BeetleVerse: A study on taxonomic classification of ground beetles [0.310688583550805]
Ground beetles are a highly sensitive and speciose biological indicator, making them vital for monitoring biodiversity.<n>In this paper, we evaluate 12 vision models on taxonomic classification across four diverse, long-tailed datasets.
arXiv Detail & Related papers (2025-04-18T01:06:37Z) - G2PDiffusion: Cross-Species Genotype-to-Phenotype Prediction via Evolutionary Diffusion [108.94237816552024]
We propose the first genotype-to-phenotype diffusion model (G2PDiffusion) that generates morphological images from DNA.<n>The model contains three novel components: 1) a MSA retrieval engine that identifies conserved and co-evolutionary patterns; 2) an environment-aware MSA conditional encoder that effectively models complex genotype-environment interactions; and 3) an adaptive phenomic alignment module to improve genotype-phenotype consistency.
arXiv Detail & Related papers (2025-02-07T06:16:31Z) - Biology Instructions: A Dataset and Benchmark for Multi-Omics Sequence Understanding Capability of Large Language Models [51.316001071698224]
We introduce Biology-Instructions, the first large-scale multi-omics biological sequences-related instruction-tuning dataset.<n>This dataset can bridge the gap between large language models (LLMs) and complex biological sequences-related tasks.<n>We also develop a strong baseline called ChatMultiOmics with a novel three-stage training pipeline.
arXiv Detail & Related papers (2024-12-26T12:12:23Z) - DivShift: Exploring Domain-Specific Distribution Shifts in Large-Scale, Volunteer-Collected Biodiversity Datasets [0.0]
Large-scale, volunteer-collected datasets of community-identified natural world imagery like iNaturalist have enabled marked performance gains for fine-grained visual classification of species using machine learning methods.<n>Here we introduce Diversity Shift, a framework for quantifying the effects of domain-specific distribution shifts on machine learning model performance.<n>To diagnose the performance effects of biases specific to volunteer-collected biodiversity data, we also introduce DivShift - North American West Coast (DivShift-NAWC), a curated dataset of almost 7.5 million iNaturalist images across the western coast of North America partitioned across five types of expert-verified bias.
arXiv Detail & Related papers (2024-10-17T23:56:30Z) - BioCLIP: A Vision Foundation Model for the Tree of Life [34.187429586642146]
We release TreeOfLife-10M, the largest and most diverse ML-ready dataset of biology images.
We then develop BioCLIP, a foundation model for the tree of life.
We rigorously benchmark our approach on diverse fine-grained biology classification tasks.
arXiv Detail & Related papers (2023-11-30T18:49:43Z) - Discovering Novel Biological Traits From Images Using Phylogeny-Guided
Neural Networks [10.372001949268636]
We present a novel approach for discovering evolutionary traits directly from images without relying on trait labels.
Our proposed approach, Phylo-NN, encodes the image of an organism into a sequence of quantized feature vectors.
We demonstrate the effectiveness of our approach in producing biologically meaningful results in a number of downstream tasks.
arXiv Detail & Related papers (2023-06-05T20:22:05Z) - Deep Low-Shot Learning for Biological Image Classification and
Visualization from Limited Training Samples [52.549928980694695]
In situ hybridization (ISH) gene expression pattern images from the same developmental stage are compared.
labeling training data with precise stages is very time-consuming even for biologists.
We propose a deep two-step low-shot learning framework to accurately classify ISH images using limited training images.
arXiv Detail & Related papers (2020-10-20T06:06:06Z) - Transferring Dense Pose to Proximal Animal Classes [83.84439508978126]
We show that it is possible to transfer the knowledge existing in dense pose recognition for humans, as well as in more general object detectors and segmenters, to the problem of dense pose recognition in other classes.
We do this by establishing a DensePose model for the new animal which is also geometrically aligned to humans.
We also introduce two benchmark datasets labelled in the manner of DensePose for the class chimpanzee and use them to evaluate our approach.
arXiv Detail & Related papers (2020-02-28T21:43:53Z) - Automatic image-based identification and biomass estimation of
invertebrates [70.08255822611812]
Time-consuming sorting and identification of taxa pose strong limitations on how many insect samples can be processed.
We propose to replace the standard manual approach of human expert-based sorting and identification with an automatic image-based technology.
We use state-of-the-art Resnet-50 and InceptionV3 CNNs for the classification task.
arXiv Detail & Related papers (2020-02-05T21:38:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.