MarineGPT: Unlocking Secrets of Ocean to the Public
- URL: http://arxiv.org/abs/2310.13596v1
- Date: Fri, 20 Oct 2023 15:45:39 GMT
- Title: MarineGPT: Unlocking Secrets of Ocean to the Public
- Authors: Ziqiang Zheng and Jipeng Zhang and Tuan-Anh Vu and Shizhe Diao and Yue
Him Wong Tim and Sai-Kit Yeung
- Abstract summary: Large language models (LLMs) have proven to be powerful tools in promoting the user experience as an AI assistant.
We propose textbfMarineGPT, the first vision-language model specially designed for the marine domain, unlocking the secrets of the ocean to the public.
- Score: 32.17362940242431
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Large language models (LLMs), such as ChatGPT/GPT-4, have proven to be
powerful tools in promoting the user experience as an AI assistant. The
continuous works are proposing multi-modal large language models (MLLM),
empowering LLMs with the ability to sense multiple modality inputs through
constructing a joint semantic space (e.g. visual-text space). Though
significant success was achieved in LLMs and MLLMs, exploring LLMs and MLLMs in
domain-specific applications that required domain-specific knowledge and
expertise has been less conducted, especially for \textbf{marine domain}.
Different from general-purpose MLLMs, the marine-specific MLLM is required to
yield much more \textbf{sensitive}, \textbf{informative}, and
\textbf{scientific} responses. In this work, we demonstrate that the existing
MLLMs optimized on huge amounts of readily available general-purpose training
data show a minimal ability to understand domain-specific intents and then
generate informative and satisfactory responses. To address these issues, we
propose \textbf{MarineGPT}, the first vision-language model specially designed
for the marine domain, unlocking the secrets of the ocean to the public. We
present our \textbf{Marine-5M} dataset with more than 5 million marine
image-text pairs to inject domain-specific marine knowledge into our model and
achieve better marine vision and language alignment. Our MarineGPT not only
pushes the boundaries of marine understanding to the general public but also
offers a standard protocol for adapting a general-purpose assistant to
downstream domain-specific experts. We pave the way for a wide range of marine
applications while setting valuable data and pre-trained models for future
research in both academic and industrial communities.
Related papers
- MarineEval: Assessing the Marine Intelligence of Vision-Language Models [35.08637645476385]
We construct the first large-scale marine VLM dataset and benchmark called MarineEval, with 2,000 image-based question-answering pairs.<n>We benchmark 17 existing VLMs on our MarineEval and also investigate the limitations of existing models in answering marine research questions.
arXiv Detail & Related papers (2025-12-24T11:57:50Z) - Injecting Domain-Specific Knowledge into Large Language Models: A Comprehensive Survey [39.82566660592583]
Large Language Models (LLMs) have demonstrated remarkable success in various tasks such as natural language understanding, text summarization, and machine translation.
Their general-purpose nature often limits their effectiveness in domain-specific applications that require specialized knowledge, such as healthcare, chemistry, or legal analysis.
To address this, researchers have explored diverse methods to enhance LLMs by integrating domain-specific knowledge.
arXiv Detail & Related papers (2025-02-15T07:43:43Z) - On Domain-Specific Post-Training for Multimodal Large Language Models [72.67107077850939]
This paper systematically investigates domain adaptation of MLLMs through post-training.
We focus on data synthesis, training pipelines, and task evaluation.
We conduct experiments in high-impact domains such as biomedicine, food, and remote sensing.
arXiv Detail & Related papers (2024-11-29T18:42:28Z) - MLLM-LLaVA-FL: Multimodal Large Language Model Assisted Federated Learning [25.45278447786954]
We introduce a novel federated learning framework, named Multimodal Large Language Model Assisted Federated Learning (MLLM-LLaVA-FL)
Our framework is adept at harnessing the extensive, yet previously underexploited, open-source data accessible from websites and powerful server-side computational resources.
arXiv Detail & Related papers (2024-09-09T21:04:16Z) - Rethinking Visual Prompting for Multimodal Large Language Models with External Knowledge [76.45868419402265]
multimodal large language models (MLLMs) have made significant strides by training on vast high-quality image-text datasets.
However, the inherent difficulty in explicitly conveying fine-grained or spatially dense information in text, such as masks, poses a challenge for MLLMs.
This paper proposes a new visual prompt approach to integrate fine-grained external knowledge, gleaned from specialized vision models, into MLLMs.
arXiv Detail & Related papers (2024-07-05T17:43:30Z) - BLADE: Enhancing Black-box Large Language Models with Small Domain-Specific Models [56.89958793648104]
Large Language Models (LLMs) are versatile and capable of addressing a diverse range of tasks.
Previous approaches either conduct continuous pre-training with domain-specific data or employ retrieval augmentation to support general LLMs.
We present a novel framework named BLADE, which enhances Black-box LArge language models with small Domain-spEcific models.
arXiv Detail & Related papers (2024-03-27T08:57:21Z) - Fine-tuning Large Language Models for Domain-specific Machine
Translation [8.439661191792897]
Large language models (LLMs) have made significant progress in machine translation (MT)
However, their potential in domain-specific MT remains under-explored.
This paper proposes a prompt-oriented fine-tuning method, denoted as LlamaIT, to effectively and efficiently fine-tune a general-purpose LLM for domain-specific MT tasks.
arXiv Detail & Related papers (2024-02-23T02:24:15Z) - PANDA: Preference Adaptation for Enhancing Domain-Specific Abilities of LLMs [49.32067576992511]
Large language models often fall short of the performance achieved by domain-specific state-of-the-art models.
One potential approach to enhance domain-specific capabilities of LLMs involves fine-tuning them using corresponding datasets.
We propose Preference Adaptation for Enhancing Domain-specific Abilities of LLMs (PANDA)
Our experimental results reveal that PANDA significantly enhances the domain-specific ability of LLMs on text classification and interactive decision tasks.
arXiv Detail & Related papers (2024-02-20T09:02:55Z) - A Self-enhancement Approach for Domain-specific Chatbot Training via
Knowledge Mining and Digest [62.63606958140248]
Large Language Models (LLMs) often encounter challenges when dealing with intricate and knowledge-demanding queries in specific domains.
This paper introduces a novel approach to enhance LLMs by effectively extracting the relevant knowledge from domain-specific textual sources.
We train a knowledge miner, namely LLMiner, which autonomously extracts Question-Answer pairs from relevant documents.
arXiv Detail & Related papers (2023-11-17T16:09:10Z) - Knowledge Plugins: Enhancing Large Language Models for Domain-Specific
Recommendations [50.81844184210381]
We propose a general paradigm that augments large language models with DOmain-specific KnowledgE to enhance their performance on practical applications, namely DOKE.
This paradigm relies on a domain knowledge extractor, working in three steps: 1) preparing effective knowledge for the task; 2) selecting the knowledge for each specific sample; and 3) expressing the knowledge in an LLM-understandable way.
arXiv Detail & Related papers (2023-11-16T07:09:38Z) - Enhancing the Spatial Awareness Capability of Multi-Modal Large Language
Model [25.86351431223383]
The Multi-Modal Large Language Model (MLLM) is an extension of the Large Language Model (LLM) equipped with the capability to receive and infer multi-modal data.
This paper proposes using more precise spatial position information between objects to guide MLLM in providing more accurate responses to user-related inquiries.
arXiv Detail & Related papers (2023-10-31T10:57:35Z) - Domain Specialization as the Key to Make Large Language Models Disruptive: A Comprehensive Survey [100.24095818099522]
Large language models (LLMs) have significantly advanced the field of natural language processing (NLP)
They provide a highly useful, task-agnostic foundation for a wide range of applications.
However, directly applying LLMs to solve sophisticated problems in specific domains meets many hurdles.
arXiv Detail & Related papers (2023-05-30T03:00:30Z) - Augmented Large Language Models with Parametric Knowledge Guiding [72.71468058502228]
Large Language Models (LLMs) have significantly advanced natural language processing (NLP) with their impressive language understanding and generation capabilities.
Their performance may be suboptimal for domain-specific tasks that require specialized knowledge due to limited exposure to the related data.
We propose the novel Parametric Knowledge Guiding (PKG) framework, which equips LLMs with a knowledge-guiding module to access relevant knowledge.
arXiv Detail & Related papers (2023-05-08T15:05:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.