CoralVQA: A Large-Scale Visual Question Answering Dataset for Coral Reef Image Understanding
- URL: http://arxiv.org/abs/2507.10449v1
- Date: Mon, 14 Jul 2025 16:29:10 GMT
- Title: CoralVQA: A Large-Scale Visual Question Answering Dataset for Coral Reef Image Understanding
- Authors: Hongyong Han, Wei Wang, Gaowei Zhang, Mingjie Li, Yi Wang,
- Abstract summary: CoralVQA is the first large-scale dataset for coral reef analysis.<n>It contains 12,805 real-world coral images from 67 coral genera collected from 3 oceans.<n>It provides a benchmark for studying vision-language reasoning in the context of coral reef images.
- Score: 11.245091683779615
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Coral reefs are vital yet vulnerable ecosystems that require continuous monitoring to support conservation. While coral reef images provide essential information in coral monitoring, interpreting such images remains challenging due to the need for domain expertise. Visual Question Answering (VQA), powered by Large Vision-Language Models (LVLMs), has great potential in user-friendly interaction with coral reef images. However, applying VQA to coral imagery demands a dedicated dataset that addresses two key challenges: domain-specific annotations and multidimensional questions. In this work, we introduce CoralVQA, the first large-scale VQA dataset for coral reef analysis. It contains 12,805 real-world coral images from 67 coral genera collected from 3 oceans, along with 277,653 question-answer pairs that comprehensively assess ecological and health-related conditions. To construct this dataset, we develop a semi-automatic data construction pipeline in collaboration with marine biologists to ensure both scalability and professional-grade data quality. CoralVQA presents novel challenges and provides a comprehensive benchmark for studying vision-language reasoning in the context of coral reef images. By evaluating several state-of-the-art LVLMs, we reveal key limitations and opportunities. These insights form a foundation for future LVLM development, with a particular emphasis on supporting coral conservation efforts.
Related papers
- Efficient Self-Supervised Learning for Earth Observation via Dynamic Dataset Curation [67.23953699167274]
Self-supervised learning (SSL) has enabled the development of vision foundation models for Earth Observation (EO)<n>In EO, this challenge is amplified by the redundancy and heavy-tailed distributions common in satellite imagery.<n>We propose a dynamic dataset pruning strategy designed to improve SSL pre-training by maximizing dataset diversity and balance.
arXiv Detail & Related papers (2025-04-09T15:13:26Z) - The Coralscapes Dataset: Semantic Scene Understanding in Coral Reefs [4.096374910845255]
We release the first general-purpose dense semantic segmentation dataset for coral reefs, covering 2075 images, 39 benthic classes, and 174k segmentation masks annotated by experts.<n>We benchmark a wide range of semantic segmentation models, and find that transfer learning from Coralscapes to existing smaller datasets consistently leads to state-of-the-art performance.<n>Coralscapes will catalyze research on efficient, scalable, and standardized coral reef surveying methods based on computer vision, and holds the potential to streamline the development of underwater ecological robotics.
arXiv Detail & Related papers (2025-03-25T18:33:59Z) - Image-Based Relocalization and Alignment for Long-Term Monitoring of Dynamic Underwater Environments [57.59857784298534]
We propose an integrated pipeline that combines Visual Place Recognition (VPR), feature matching, and image segmentation on video-derived images.<n>This method enables robust identification of revisited areas, estimation of rigid transformations, and downstream analysis of ecosystem changes.
arXiv Detail & Related papers (2025-03-06T05:13:19Z) - CoralSCOP-LAT: Labeling and Analyzing Tool for Coral Reef Images with Dense Mask [14.092526875441221]
We propose CoralSCOP-LAT, an automatic and semi-automatic coral reef labeling and analysis tool.
The proposed CoralSCOP-LAT surpasses the existing tools by a large margin from analysis efficiency, accuracy, and flexibility.
Our CoralSCOP-LAT, as the first dense coral reef analysis tool in the market, facilitates repeated large-scale coral reef monitoring analysis.
arXiv Detail & Related papers (2024-10-27T13:26:44Z) - Combining Observational Data and Language for Species Range Estimation [63.65684199946094]
We propose a novel approach combining millions of citizen science species observations with textual descriptions from Wikipedia.<n>Our framework maps locations, species, and text descriptions into a common space, enabling zero-shot range estimation from textual descriptions.<n>Our approach also acts as a strong prior when combined with observational data, resulting in more accurate range estimation with less data.
arXiv Detail & Related papers (2024-10-14T17:22:55Z) - Contrastive Region Guidance: Improving Grounding in Vision-Language
Models without Training [79.27663870280038]
We introduce Contrastive Region Guidance (CRG), a training-free guidance method that enables open-source vision-language models to respond to visual prompts.
When region annotations are provided, CRG increases absolute accuracy by up to 11.1% on ViP-Bench.
We also show CRG's applicability to spatial reasoning, with 10% improvement on What'sUp.
arXiv Detail & Related papers (2024-03-04T18:55:30Z) - CoralVOS: Dataset and Benchmark for Coral Video Segmentation [12.434773034255455]
We propose a large-scale coral video segmentation dataset: textbfCoralVOS as demonstrated in Fig. 1.
We perform experiments on our CoralVOS dataset, including 6 recent state-of-the-art video object segmentation (VOS) algorithms.
The results show that there is still great potential for further promoting the segmentation accuracy.
arXiv Detail & Related papers (2023-10-03T10:45:37Z) - Pengembangan Model untuk Mendeteksi Kerusakan pada Terumbu Karang dengan
Klasifikasi Citra [3.254879465902239]
This study utilizes a specialized dataset consisting of 923 images collected from Flickr using the Flickr API.
The method employed in this research involves the use of machine learning models, particularly convolutional neural networks (CNN)
It was found that a from-scratch ResNet model can outperform pretrained models in terms of precision and accuracy.
arXiv Detail & Related papers (2023-08-08T15:30:08Z) - Robot Goes Fishing: Rapid, High-Resolution Biological Hotspot Mapping in
Coral Reefs with Vision-Guided Autonomous Underwater Vehicles [6.658103076536836]
Biological hotspot detection can help coral reef managers prioritize limited resources for monitoring and intervention tasks.
Here, we explore the use of autonomous underwater vehicles (AUVs) with cameras, coupled with visual detectors and photogrammetry, to map and identify these hotspots.
To the best of our knowledge, we present one of the first attempts at using an AUV to gather visually-observed, fine-grain biological hotspot maps.
arXiv Detail & Related papers (2023-05-03T16:12:47Z) - Towards Generating Large Synthetic Phytoplankton Datasets for Efficient
Monitoring of Harmful Algal Blooms [77.25251419910205]
Harmful algal blooms (HABs) cause significant fish deaths in aquaculture farms.
Currently, the standard method to enumerate harmful algae and other phytoplankton is to manually observe and count them under a microscope.
We employ Generative Adversarial Networks (GANs) to generate synthetic images.
arXiv Detail & Related papers (2022-08-03T20:15:55Z) - Underwater Image Restoration via Contrastive Learning and a Real-world
Dataset [59.35766392100753]
We present a novel method for underwater image restoration based on unsupervised image-to-image translation framework.
Our proposed method leveraged contrastive learning and generative adversarial networks to maximize the mutual information between raw and restored images.
arXiv Detail & Related papers (2021-06-20T16:06:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.