Related papers: Tree-of-Debate: Multi-Persona Debate Trees Elicit Critical Thinking for Scientific Comparative Analysis

Related papers

Deep Ideation: Designing LLM Agents to Generate Novel Research Ideas on Scientific Concept Network [9.317340414316446]
We propose a framework to integrate a scientific network that captures keyword co-occurrence and contextual relationships.<n>A critic engine, trained on real-world reviewer feedback, guides the process by providing continuous feedback on the novelty and feasibility of ideas.<n>Our approach improves the quality of generated ideas by 10.67% compared to other methods, with ideas surpassing top conference acceptance levels.
arXiv Detail & Related papers (2025-11-04T04:00:20Z)
Autonomous Agents for Scientific Discovery: Orchestrating Scientists, Language, Code, and Physics [82.55776608452017]
Large language models (LLMs) provide a flexible and versatile framework that orchestrates interactions with human scientists, natural language, computer language and code, and physics.<n>This paper presents our view and vision of LLM-based scientific agents and their growing role in transforming the scientific discovery lifecycle.<n>We identify open research challenges and outline promising directions for building more robust, generalizable, and adaptive scientific agents.
arXiv Detail & Related papers (2025-10-10T22:26:26Z)
Roundtable Policy: Improving Scientific Reasoning and Narratives through Confidence-Weighted Consensus of LLMs [44.65081087151887]
We introduce Roundtable Policy, a complementary inference-time reasoning framework that performs inference through the weighted consensus of multiple large language models (LLMs)<n>Our findings indicate that this approach significantly enhances reasoning in complex heterogeneous scientific tasks and improves scientific narratives in terms of creativity, rigor, and logical coherence.<n>Our approach emphasizes structured and interpretable consensus rather than opaque convergence, while requiring only black-box access and uniform procedures.
arXiv Detail & Related papers (2025-09-20T23:31:53Z)
Demystifying Scientific Problem-Solving in LLMs by Probing Knowledge and Reasoning [53.82037883518254]
We introduce SciReas, a diverse suite of existing benchmarks for scientific reasoning tasks.<n>We then propose KRUX, a probing framework for studying the distinct roles of reasoning and knowledge in scientific tasks.
arXiv Detail & Related papers (2025-08-26T17:04:23Z)
Understanding Large Language Models' Ability on Interdisciplinary Research [27.539601507270575]
Large Language Models (LLMs) are powerful tools and collaborators in scientific discovery.<n>The lack of a dedicated benchmark that evaluates LLMs' ability to develop ideas in Interdisciplinary Research poses a critical barrier to fully understanding their strengths and limitations.<n>We introduce IDRBench -- a pioneering benchmark featuring an expert annotated dataset and a suite of tasks tailored to evaluate LLMs' capabilities.
arXiv Detail & Related papers (2025-07-21T15:43:05Z)
Dispute Resolution in Peer Review with Abstract Argumentation and OWL DL [0.12289361708127876]
This research addresses challenges by applying formal methods from argumentation theory to support transparent and unbiased dispute resolution in peer review.<n>We conceptualize scientific peer review as a single mixed argumentative dispute between manuscript authors and reviewers and formalize it using abstract argumentation frameworks.<n>We validate our approach by annotating a corpus of scientific peer reviews with abstract argumentation frameworks and applying a proof of concept to resolve the disputes.
arXiv Detail & Related papers (2025-07-18T12:21:25Z)
Dynamic Knowledge Exchange and Dual-diversity Review: Concisely Unleashing the Potential of a Multi-Agent Research Team [53.38438460574943]
IDVSCI is a multi-agent framework built on large language models (LLMs)<n>It incorporates two key innovations: a Dynamic Knowledge Exchange mechanism and a Dual-Diversity Review paradigm.<n>Results show that IDVSCI consistently achieves the best performance across two datasets.
arXiv Detail & Related papers (2025-06-23T07:12:08Z)
The Budget AI Researcher and the Power of RAG Chains [4.797627592793464]
Current approaches to supporting research idea generation often rely on generic large language models (LLMs)<n>Our framework, The Budget AI Researcher, uses retrieval-augmented generation chains, vector databases, and topic-guided pairing to recombine concepts from hundreds of machine learning papers.<n>The system ingests papers from nine major AI conferences, which collectively span the vast subfields of machine learning, and organizes them into a hierarchical topic tree.
arXiv Detail & Related papers (2025-06-14T02:40:35Z)
What Makes a Good Natural Language Prompt? [72.3282960118995]
We conduct a meta-analysis surveying more than 150 prompting-related papers from leading NLP and AI conferences from 2022 to 2025.<n>We propose a property- and human-centric framework for evaluating prompt quality, encompassing 21 properties categorized into six dimensions.<n>We then empirically explore multi-property prompt enhancements in reasoning tasks, observing that single-property enhancements often have the greatest impact.
arXiv Detail & Related papers (2025-06-07T23:19:27Z)
Debating for Better Reasoning: An Unsupervised Multimodal Approach [56.74157117060815]
We extend the debate paradigm to a multimodal setting, exploring its potential for weaker models to supervise and enhance the performance of stronger models.<n>We focus on visual question answering (VQA), where two "sighted" expert vision-language models debate an answer, while a "blind" (text-only) judge adjudicates based solely on the quality of the arguments.<n>In our framework, the experts defend only answers aligned with their beliefs, thereby obviating the need for explicit role-playing and concentrating the debate on instances of expert disagreement.
arXiv Detail & Related papers (2025-05-20T17:18:17Z)
Position: Multimodal Large Language Models Can Significantly Advance Scientific Reasoning [51.11965014462375]
Multimodal Large Language Models (MLLMs) integrate text, images, and other modalities.<n>This paper argues that MLLMs can significantly advance scientific reasoning across disciplines such as mathematics, physics, chemistry, and biology.
arXiv Detail & Related papers (2025-02-05T04:05:27Z)
Many Heads Are Better Than One: Improved Scientific Idea Generation by A LLM-Based Multi-Agent System [62.832818186789545]
Virtual Scientists (VirSci) is a multi-agent system designed to mimic the teamwork inherent in scientific research. VirSci organizes a team of agents to collaboratively generate, evaluate, and refine research ideas. We show that this multi-agent approach outperforms the state-of-the-art method in producing novel scientific ideas.
arXiv Detail & Related papers (2024-10-12T07:16:22Z)
LLM and Simulation as Bilevel Optimizers: A New Paradigm to Advance Physical Scientific Discovery [141.39722070734737]
We propose to enhance the knowledge-driven, abstract reasoning abilities of Large Language Models with the computational strength of simulations. We introduce Scientific Generative Agent (SGA), a bilevel optimization framework. We conduct experiments to demonstrate our framework's efficacy in law discovery and molecular design.
arXiv Detail & Related papers (2024-05-16T03:04:10Z)
ResearchAgent: Iterative Research Idea Generation over Scientific Literature with Large Language Models [56.08917291606421]
ResearchAgent is an AI-based system for ideation and operationalization of novel work.<n>ResearchAgent automatically defines novel problems, proposes methods and designs experiments, while iteratively refining them.<n>We experimentally validate our ResearchAgent on scientific publications across multiple disciplines.
arXiv Detail & Related papers (2024-04-11T13:36:29Z)
Uni-SMART: Universal Science Multimodal Analysis and Research Transformer [22.90687836544612]
We present bfUni-text, an innovative model designed for in-depth understanding of scientific literature. Uni-text demonstrates superior performance over other text-focused LLMs. Our exploration extends to practical applications, including patent infringement detection and nuanced analysis of charts.
arXiv Detail & Related papers (2024-03-15T13:43:47Z)
Automated Fact-Checking of Climate Change Claims with Large Language Models [3.1080484250243425]
This paper presents Climinator, a novel AI-based tool designed to automate the fact-checking of climate change claims. Climinator employs an innovative Mediator-Advocate framework to synthesize varying scientific perspectives. Our model demonstrates remarkable accuracy when testing claims collected from Climate Feedback and Skeptical Science.
arXiv Detail & Related papers (2024-01-23T08:49:23Z)
An Interdisciplinary Outlook on Large Language Models for Scientific Research [3.4108358650013573]
We describe the capabilities and constraints of Large Language Models (LLMs) within disparate academic disciplines, aiming to delineate their strengths and limitations with precision. We examine how LLMs augment scientific inquiry, offering concrete examples such as accelerating literature review by summarizing vast numbers of publications. We articulate the challenges LLMs face, including their reliance on extensive and sometimes biased datasets, and the potential ethical dilemmas stemming from their use.
arXiv Detail & Related papers (2023-11-03T19:41:09Z)
Scientific Opinion Summarization: Paper Meta-review Generation Dataset, Methods, and Evaluation [55.00687185394986]
We propose the task of scientific opinion summarization, where research paper reviews are synthesized into meta-reviews. We introduce the ORSUM dataset covering 15,062 paper meta-reviews and 57,536 paper reviews from 47 conferences. Our experiments show that (1) human-written summaries do not always satisfy all necessary criteria such as depth of discussion, and identifying consensus and controversy for the specific domain, and (2) the combination of task decomposition and iterative self-refinement shows strong potential for enhancing the opinions.
arXiv Detail & Related papers (2023-05-24T02:33:35Z)
A Diachronic Analysis of Paradigm Shifts in NLP Research: When, How, and Why? [84.46288849132634]
We propose a systematic framework for analyzing the evolution of research topics in a scientific field using causal discovery and inference techniques. We define three variables to encompass diverse facets of the evolution of research topics within NLP. We utilize a causal discovery algorithm to unveil the causal connections among these variables using observational data.
arXiv Detail & Related papers (2023-05-22T11:08:00Z)
What Changed Your Mind: The Roles of Dynamic Topics and Discourse in Argumentation Process [78.4766663287415]
This paper presents a study that automatically analyzes the key factors in argument persuasiveness. We propose a novel neural model that is able to track the changes of latent topics and discourse in argumentative conversations.
arXiv Detail & Related papers (2020-02-10T04:27:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.