Towards Artificial Intelligence Research Assistant for Expert-Involved Learning
- URL: http://arxiv.org/abs/2505.04638v1
- Date: Sat, 03 May 2025 14:21:48 GMT
- Title: Towards Artificial Intelligence Research Assistant for Expert-Involved Learning
- Authors: Tianyu Liu, Simeng Han, Xiao Luo, Hanchen Wang, Pan Lu, Biqing Zhu, Yuge Wang, Keyi Li, Jiapeng Chen, Rihao Qu, Yufeng Liu, Xinyue Cui, Aviv Yaish, Yuhang Chen, Minsheng Hao, Chuhan Li, Kexing Li, Arman Cohan, Hua Xu, Mark Gerstein, James Zou, Hongyu Zhao,
- Abstract summary: Large Language Models (LLMs) and Large Multi-Modal Models (LMMs) have emerged as transformative tools in scientific research.<n>We present textbfARtificial textbfIntelligence research assistant for textbfExpert-involved textbfLearning (ARIEL)
- Score: 64.7438151207189
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Large Language Models (LLMs) and Large Multi-Modal Models (LMMs) have emerged as transformative tools in scientific research, yet their reliability and specific contributions to biomedical applications remain insufficiently characterized. In this study, we present \textbf{AR}tificial \textbf{I}ntelligence research assistant for \textbf{E}xpert-involved \textbf{L}earning (ARIEL), a multimodal dataset designed to benchmark and enhance two critical capabilities of LLMs and LMMs in biomedical research: summarizing extensive scientific texts and interpreting complex biomedical figures. To facilitate rigorous assessment, we create two open-source sets comprising biomedical articles and figures with designed questions. We systematically benchmark both open- and closed-source foundation models, incorporating expert-driven human evaluations conducted by doctoral-level experts. Furthermore, we improve model performance through targeted prompt engineering and fine-tuning strategies for summarizing research papers, and apply test-time computational scaling to enhance the reasoning capabilities of LMMs, achieving superior accuracy compared to human-expert corrections. We also explore the potential of using LMM Agents to generate scientific hypotheses from diverse multimodal inputs. Overall, our results delineate clear strengths and highlight significant limitations of current foundation models, providing actionable insights and guiding future advancements in deploying large-scale language and multi-modal models within biomedical research.
Related papers
- Dynamic Knowledge Exchange and Dual-diversity Review: Concisely Unleashing the Potential of a Multi-Agent Research Team [53.38438460574943]
IDVSCI is a multi-agent framework built on large language models (LLMs)<n>It incorporates two key innovations: a Dynamic Knowledge Exchange mechanism and a Dual-Diversity Review paradigm.<n>Results show that IDVSCI consistently achieves the best performance across two datasets.
arXiv Detail & Related papers (2025-06-23T07:12:08Z) - m-KAILIN: Knowledge-Driven Agentic Scientific Corpus Distillation Framework for Biomedical Large Language Models Training [8.238980609871042]
We propose a knowledge-driven, multi-agent framework for scientific corpus distillation tailored for biomedical training.<n>Our approach is a collaborative multi-agent architecture, where specialized agents, each guided by the Medical Subject Headings (MeSH) hierarchy, work in concert to autonomously extract, synthesize, and self-evaluate high-quality data.
arXiv Detail & Related papers (2025-04-28T08:18:24Z) - Towards Scientific Intelligence: A Survey of LLM-based Scientific Agents [11.74019905854637]
Large language models (LLMs) are evolving into scientific agents that automate critical tasks.<n>Unlike general-purpose LLMs, specialized agents integrate domain-specific knowledge, advanced tool sets, and robust validation mechanisms.<n>We highlight why they differ from general agents and the ways in which they advance research across various scientific fields.
arXiv Detail & Related papers (2025-03-31T13:11:28Z) - Biomedical Foundation Model: A Survey [84.26268124754792]
Foundation models are large-scale pre-trained models that learn from extensive unlabeled datasets.<n>These models can be adapted to various applications such as question answering and visual understanding.<n>This survey explores the potential of foundation models across diverse domains within biomedical fields.
arXiv Detail & Related papers (2025-03-03T22:42:00Z) - Personalized Multimodal Large Language Models: A Survey [127.9521218125761]
Multimodal Large Language Models (MLLMs) have become increasingly important due to their state-of-the-art performance and ability to integrate multiple data modalities.<n>This paper presents a comprehensive survey on personalized multimodal large language models, focusing on their architecture, training methods, and applications.
arXiv Detail & Related papers (2024-12-03T03:59:03Z) - Retrieval-Enhanced Machine Learning: Synthesis and Opportunities [60.34182805429511]
Retrieval-enhancement can be extended to a broader spectrum of machine learning (ML)
This work introduces a formal framework of this paradigm, Retrieval-Enhanced Machine Learning (REML), by synthesizing the literature in various domains in ML with consistent notations which is missing from the current literature.
The goal of this work is to equip researchers across various disciplines with a comprehensive, formally structured framework of retrieval-enhanced models, thereby fostering interdisciplinary future research.
arXiv Detail & Related papers (2024-07-17T20:01:21Z) - LLMs-in-the-loop Part-1: Expert Small AI Models for Bio-Medical Text Translation [0.0]
This study introduces a novel "LLMs-in-the-loop" approach to develop supervised neural machine translation models optimized for medical texts.
Custom parallel corpora in six languages were compiled from scientific articles, synthetically generated clinical documents, and medical texts.
Our MarianMT-based models outperform Google Translate, DeepL, and GPT-4-Turbo.
arXiv Detail & Related papers (2024-07-16T19:32:23Z) - Large Language Models as Biomedical Hypothesis Generators: A Comprehensive Evaluation [15.495976478018264]
Large language models (LLMs) have emerged as a promising tool to revolutionize knowledge interaction.
We construct a dataset of background-hypothesis pairs from biomedical literature, partitioned into training, seen, and unseen test sets.
We assess the hypothesis generation capabilities of top-tier instructed models in zero-shot, few-shot, and fine-tuning settings.
arXiv Detail & Related papers (2024-07-12T02:55:13Z) - MMSci: A Dataset for Graduate-Level Multi-Discipline Multimodal Scientific Understanding [59.41495657570397]
We present a comprehensive dataset compiled from Nature Communications articles covering 72 scientific fields.<n>We evaluated 19 proprietary and open-source models on two benchmark tasks, figure captioning and multiple-choice, and conducted human expert annotation.<n>Fine-tuning Qwen2-VL-7B with our task-specific data achieved better performance than GPT-4o and even human experts in multiple-choice evaluations.
arXiv Detail & Related papers (2024-07-06T00:40:53Z) - M-QALM: A Benchmark to Assess Clinical Reading Comprehension and Knowledge Recall in Large Language Models via Question Answering [14.198330378235632]
We use Multiple Choice and Abstractive Question Answering to conduct a large-scale empirical study on 22 datasets in three generalist and three specialist biomedical sub-domains.
Our multifaceted analysis of the performance of 15 LLMs uncovers success factors such as instruction tuning that lead to improved recall and comprehension.
We show that while recently proposed domain-adapted models may lack adequate knowledge, directly fine-tuning on our collected medical knowledge datasets shows encouraging results.
We complement the quantitative results with a skill-oriented manual error analysis, which reveals a significant gap between the models' capabilities to simply recall necessary knowledge and to integrate it with the presented
arXiv Detail & Related papers (2024-06-06T02:43:21Z) - ResearchAgent: Iterative Research Idea Generation over Scientific Literature with Large Language Models [56.08917291606421]
ResearchAgent is an AI-based system for ideation and operationalization of novel work.<n>ResearchAgent automatically defines novel problems, proposes methods and designs experiments, while iteratively refining them.<n>We experimentally validate our ResearchAgent on scientific publications across multiple disciplines.
arXiv Detail & Related papers (2024-04-11T13:36:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.