Related papers: Understanding Large Language Models' Ability on Interdisciplinary Research

Understanding Large Language Models' Ability on Interdisciplinary Research

URL: http://arxiv.org/abs/2507.15736v1
Date: Mon, 21 Jul 2025 15:43:05 GMT
Title: Understanding Large Language Models' Ability on Interdisciplinary Research
Authors: Yuanhao Shen, Daniel Xavier de Sousa, Ricardo Marçal, Ali Asad, Hongyu Guo, Xiaodan Zhu,
Abstract summary: Large Language Models (LLMs) are powerful tools and collaborators in scientific discovery.<n>The lack of a dedicated benchmark that evaluates LLMs' ability to develop ideas in Interdisciplinary Research poses a critical barrier to fully understanding their strengths and limitations.<n>We introduce IDRBench -- a pioneering benchmark featuring an expert annotated dataset and a suite of tasks tailored to evaluate LLMs' capabilities.
Score: 27.539601507270575
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advancements in Large Language Models (LLMs) have revealed their impressive ability to perform multi-step, logic-driven reasoning across complex domains, positioning them as powerful tools and collaborators in scientific discovery while challenging the long-held view that inspiration-driven ideation is uniquely human. However, the lack of a dedicated benchmark that evaluates LLMs' ability to develop ideas in Interdisciplinary Research (IDR) settings poses a critical barrier to fully understanding their strengths and limitations. To address this gap, we introduce IDRBench -- a pioneering benchmark featuring an expert annotated dataset and a suite of tasks tailored to evaluate LLMs' capabilities in proposing valuable research ideas from different scientific domains for interdisciplinary research. This benchmark aims to provide a systematic framework for assessing LLM performance in complex, cross-domain scientific research. Our dataset consists of scientific publications sourced from the ArXiv platform covering six distinct disciplines, and is annotated by domain experts with diverse academic backgrounds. To ensure high-quality annotations, we emphasize clearly defined dimensions that characterize authentic interdisciplinary research. The design of evaluation tasks in IDRBench follows a progressive, real-world perspective, reflecting the natural stages of interdisciplinary research development, including 1) IDR Paper Identification, 2) IDR Idea Integration, and 3) IDR Idea Recommendation. Using IDRBench, we construct baselines across 10 LLMs and observe that despite fostering some level of IDR awareness, LLMs still struggle to produce quality IDR ideas. These findings could not only spark new research directions, but also help to develop next-generation LLMs that excel in interdisciplinary research.

Related papers

Dr.Mi-Bench: A Modular-integrated Benchmark for Scientific Deep Research Agent [52.876617746453995]
Dr.Mi-Bench is a Modular-integrated benchmark for scientific deep research (DR) agents.<n>Dr.Mi-Eval is a novel modular-integrated evaluation paradigm.
arXiv Detail & Related papers (2025-11-30T17:16:47Z)
Advancing Cognitive Science with LLMs [2.882998793707955]
This review examines how large language models can support areas where the field has historically struggled.<n>We outline the current capabilities and limitations of LLMs in these domains, including potential pitfalls.<n>We conclude that LLMs can serve as tools for a more integrative and cumulative cognitive science when used judiciously.
arXiv Detail & Related papers (2025-10-31T19:08:48Z)
Idea2Plan: Exploring AI-Powered Research Planning [9.792815240248476]
Large language models (LLMs) have demonstrated significant potential to accelerate scientific discovery.<n>We investigate how LLMs can handle the transition from conceptual research ideas to well-structured research plans.<n>Our study provides new insights into LLMs' capability for research planning and lay the groundwork for future progress.
arXiv Detail & Related papers (2025-10-28T18:54:51Z)
Demystifying Scientific Problem-Solving in LLMs by Probing Knowledge and Reasoning [53.82037883518254]
We introduce SciReas, a diverse suite of existing benchmarks for scientific reasoning tasks.<n>We then propose KRUX, a probing framework for studying the distinct roles of reasoning and knowledge in scientific tasks.
arXiv Detail & Related papers (2025-08-26T17:04:23Z)
Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities [62.05713042908654]
This paper provides a review of advances in Large Language Models (LLMs) alignment through the lens of inverse reinforcement learning (IRL)<n>We highlight the necessity of constructing neural reward models from human data and discuss the formal and practical implications of this paradigm shift.
arXiv Detail & Related papers (2025-07-17T14:22:24Z)
A Survey of Large Language Models in Discipline-specific Research: Challenges, Methods and Opportunities [33.66845016584256]
Large Language Models (LLMs) have demonstrated their transformative potential across numerous disciplinary studies.<n>This survey paper provides a comprehensive overview of the application of LLMs in interdisciplinary studies.
arXiv Detail & Related papers (2025-07-11T09:11:18Z)
Dynamic Knowledge Exchange and Dual-diversity Review: Concisely Unleashing the Potential of a Multi-Agent Research Team [53.38438460574943]
IDVSCI is a multi-agent framework built on large language models (LLMs)<n>It incorporates two key innovations: a Dynamic Knowledge Exchange mechanism and a Dual-Diversity Review paradigm.<n>Results show that IDVSCI consistently achieves the best performance across two datasets.
arXiv Detail & Related papers (2025-06-23T07:12:08Z)
Scaling and Beyond: Advancing Spatial Reasoning in MLLMs Requires New Recipes [84.1059652774853]
Multimodal Large Language Models (MLLMs) have demonstrated impressive performance in general vision-language tasks.<n>Recent studies have exposed critical limitations in their spatial reasoning capabilities.<n>This deficiency in spatial reasoning significantly constrains MLLMs' ability to interact effectively with the physical world.
arXiv Detail & Related papers (2025-04-21T11:48:39Z)
Large Language Models Post-training: Surveying Techniques from Alignment to Reasoning [185.51013463503946]
Large Language Models (LLMs) have fundamentally transformed natural language processing, making them indispensable across domains ranging from conversational systems to scientific exploration.<n>These challenges necessitate advanced post-training language models (PoLMs) to address shortcomings, such as restricted reasoning capacities, ethical uncertainties, and suboptimal domain-specific performance.<n>This paper presents the first comprehensive survey of PoLMs, systematically tracing their evolution across five core paradigms: Fine-tuning, which enhances task-specific accuracy; Alignment, which ensures ethical coherence and alignment with human preferences; Reasoning, which advances multi-step inference despite challenges in reward design; Integration and Adaptation, which
arXiv Detail & Related papers (2025-03-08T05:41:42Z)
IdeaBench: Benchmarking Large Language Models for Research Idea Generation [19.66218274796796]
Large Language Models (LLMs) have transformed how people interact with artificial intelligence (AI) systems. We propose IdeaBench, a benchmark system that includes a comprehensive dataset and an evaluation framework. Our dataset comprises titles and abstracts from a diverse range of influential papers, along with their referenced works. Our evaluation framework is a two-stage process: first, using GPT-4o to rank ideas based on user-specified quality indicators such as novelty and feasibility, enabling scalable personalization.
arXiv Detail & Related papers (2024-10-31T17:04:59Z)
AAAR-1.0: Assessing AI's Potential to Assist Research [34.88341605349765]
We introduce AAAR-1.0, a benchmark dataset designed to evaluate large language models (LLMs) performance in three fundamental, expertise-intensive research tasks.<n> AAAR-1.0 differs from prior benchmarks in two key ways: first, it is explicitly research-oriented, with tasks requiring deep domain expertise; second, it is researcher-oriented, mirroring the primary activities that researchers engage in on a daily basis.
arXiv Detail & Related papers (2024-10-29T17:58:29Z)
What is the Role of Large Language Models in the Evolution of Astronomy Research? [0.0]
ChatGPT and other state-of-the-art large language models (LLMs) are rapidly transforming multiple fields. These models, commonly trained on vast datasets, exhibit human-like text generation capabilities.
arXiv Detail & Related papers (2024-09-30T12:42:25Z)
Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers [90.26363107905344]
Large language models (LLMs) have sparked optimism about their potential to accelerate scientific discovery. No evaluations have shown that LLM systems can take the very first step of producing novel, expert-level ideas.
arXiv Detail & Related papers (2024-09-06T08:25:03Z)
ResearchAgent: Iterative Research Idea Generation over Scientific Literature with Large Language Models [56.08917291606421]
ResearchAgent is an AI-based system for ideation and operationalization of novel work.<n>ResearchAgent automatically defines novel problems, proposes methods and designs experiments, while iteratively refining them.<n>We experimentally validate our ResearchAgent on scientific publications across multiple disciplines.
arXiv Detail & Related papers (2024-04-11T13:36:29Z)
A Comprehensive Overview of Large Language Models [68.22178313875618]
Large Language Models (LLMs) have recently demonstrated remarkable capabilities in natural language processing tasks. This article provides an overview of the existing literature on a broad range of LLM-related concepts.
arXiv Detail & Related papers (2023-07-12T20:01:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.