SciVid: Cross-Domain Evaluation of Video Models in Scientific Applications
- URL: http://arxiv.org/abs/2507.03578v1
- Date: Fri, 04 Jul 2025 13:48:12 GMT
- Title: SciVid: Cross-Domain Evaluation of Video Models in Scientific Applications
- Authors: Yana Hasson, Pauline Luc, Liliane Momeni, Maks Ovsjanikov, Guillaume Le Moing, Alina Kuznetsova, Ira Ktena, Jennifer J. Sun, Skanda Koppula, Dilara Gokay, Joseph Heyward, Etienne Pot, Andrew Zisserman,
- Abstract summary: Video foundation models (FMs) hold considerable promise as general-purpose domain-agnostic approaches.<n>We introduce SciVid, a benchmark comprising five tasks across medical computer vision, animal behavior, and weather forecasting.<n>We adapt six leading ViFMs to SciVid using simple trainable readout modules, establishing strong baselines and demonstrating potential for effective transfer learning.
- Score: 63.92604046592333
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In recent years, there has been a proliferation of spatiotemporal foundation models in different scientific disciplines. While promising, these models are often domain-specific and are only assessed within the particular applications for which they are designed. Given that many tasks can be represented as video modeling problems, video foundation models (ViFMs) hold considerable promise as general-purpose domain-agnostic approaches. However, it is not known whether the knowledge acquired on large-scale but potentially out-of-domain data can be effectively transferred across diverse scientific disciplines, and if a single, pretrained ViFM can be competitive with domain-specific baselines. To address this, we introduce SciVid, a comprehensive benchmark comprising five *Sci*entific *Vid*eo tasks, across medical computer vision, animal behavior, and weather forecasting. We adapt six leading ViFMs to SciVid using simple trainable readout modules, establishing strong baselines and demonstrating the potential for effective transfer learning. Specifically, we show that state-of-the-art results can be obtained in several applications by leveraging the general-purpose representations from ViFM backbones. Furthermore, our results reveal the limitations of existing ViFMs, and highlight opportunities for the development of generalizable models for high-impact scientific applications. We release our code at https://github.com/google-deepmind/scivid to facilitate further research in the development of ViFMs.
Related papers
- Designing a reliable lateral movement detector using a graph foundation model [0.0]
Foundation models have recently emerged as a new paradigm in machine learning (ML)<n>These models are pre-trained on large and diverse datasets and can subsequently be applied to various downstream tasks with little or no retraining.<n>We study the usability of graph foundation models (GFMs) in cybersecurity through the lens of one specific use case, namely lateral movement detection.
arXiv Detail & Related papers (2025-04-18T07:39:21Z) - Biomedical Foundation Model: A Survey [84.26268124754792]
Foundation models are large-scale pre-trained models that learn from extensive unlabeled datasets.<n>These models can be adapted to various applications such as question answering and visual understanding.<n>This survey explores the potential of foundation models across diverse domains within biomedical fields.
arXiv Detail & Related papers (2025-03-03T22:42:00Z) - AI Foundation Model for Heliophysics: Applications, Design, and Implementation [1.2851259989174175]
Foundation models (FMs) are pre-trained on a large-scale datasets.
This paper provides our perspective on the criteria for designing an FM for heliophysics.
We believe that this is the first study to design an FM in the domain of heliophysics.
arXiv Detail & Related papers (2024-09-30T15:48:28Z) - SciDFM: A Large Language Model with Mixture-of-Experts for Science [18.748699390397363]
We introduce SciDFM, a mixture-of-experts LLM that is trained from scratch and is able to conduct college-level scientific reasoning.
We collect a large-scale training corpus containing numerous scientific papers and books from different disciplines as well as data from domain-specific databases.
We show that SciDFM achieves strong performance on general scientific benchmarks such as SciEval and SciQ, and it reaches a SOTA performance on domain-specific benchmarks among models of similar size.
arXiv Detail & Related papers (2024-09-27T03:00:29Z) - Probing Fine-Grained Action Understanding and Cross-View Generalization of Foundation Models [13.972809192907931]
Foundation models (FMs) are large neural networks trained on broad datasets.
Human activity recognition in video has advanced with FMs, driven by competition among different architectures.
This paper empirically evaluates how perspective changes affect different FMs in fine-grained human activity recognition.
arXiv Detail & Related papers (2024-07-22T12:59:57Z) - MMSci: A Dataset for Graduate-Level Multi-Discipline Multimodal Scientific Understanding [59.41495657570397]
We present a comprehensive dataset compiled from Nature Communications articles covering 72 scientific fields.<n>We evaluated 19 proprietary and open-source models on two benchmark tasks, figure captioning and multiple-choice, and conducted human expert annotation.<n>Fine-tuning Qwen2-VL-7B with our task-specific data achieved better performance than GPT-4o and even human experts in multiple-choice evaluations.
arXiv Detail & Related papers (2024-07-06T00:40:53Z) - A Comprehensive Survey of Scientific Large Language Models and Their Applications in Scientific Discovery [68.48094108571432]
Large language models (LLMs) have revolutionized the way text and other modalities of data are handled.
We aim to provide a more holistic view of the research landscape by unveiling cross-field and cross-modal connections between scientific LLMs.
arXiv Detail & Related papers (2024-06-16T08:03:24Z) - Towards Vision-Language Geo-Foundation Model: A Survey [65.70547895998541]
Vision-Language Foundation Models (VLFMs) have made remarkable progress on various multimodal tasks.
This paper thoroughly reviews VLGFMs, summarizing and analyzing recent developments in the field.
arXiv Detail & Related papers (2024-06-13T17:57:30Z) - INDUS: Effective and Efficient Language Models for Scientific Applications [8.653859684720231]
Large language models (LLMs) trained on general domain corpora showed remarkable results on natural language processing (NLP) tasks.
We developed INDUS, a comprehensive suite of LLMs tailored for the closely-related domains of Earth science, biology, physics, heliophysics, planetary sciences and astrophysics.
We show that our models outperform both general-purpose (RoBERTa) and domain-specific (SCIBERT) encoders on new tasks as well as existing tasks in the domains of interest.
arXiv Detail & Related papers (2024-05-17T12:15:07Z) - Learning from models beyond fine-tuning [78.20895343699658]
Learn From Model (LFM) focuses on the research, modification, and design of foundation models (FM) based on the model interface.<n>The study of LFM techniques can be broadly categorized into five major areas: model tuning, model distillation, model reuse, meta learning and model editing.<n>This paper gives a comprehensive review of the current methods based on FM from the perspective of LFM.
arXiv Detail & Related papers (2023-10-12T10:20:36Z) - StableLLaVA: Enhanced Visual Instruction Tuning with Synthesized
Image-Dialogue Data [129.92449761766025]
We propose a novel data collection methodology that synchronously synthesizes images and dialogues for visual instruction tuning.
This approach harnesses the power of generative models, marrying the abilities of ChatGPT and text-to-image generative models.
Our research includes comprehensive experiments conducted on various datasets.
arXiv Detail & Related papers (2023-08-20T12:43:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.