Related papers: Sci-Reasoning: A Dataset Decoding AI Innovation Patterns

Sci-Reasoning: A Dataset Decoding AI Innovation Patterns

URL: http://arxiv.org/abs/2601.04577v1
Date: Thu, 08 Jan 2026 04:12:47 GMT
Title: Sci-Reasoning: A Dataset Decoding AI Innovation Patterns
Authors: Jiachen Liu, Maestro Harmon, Zechen Zhang,
Abstract summary: Sci-Reasoning is the first dataset capturing the intellectual synthesis behind high-quality AI research.<n>Our analysis identifies 15 distinct thinking patterns, with three dominant strategies accounting for 52.7%.<n>This dataset enables quantitative studies of scientific progress and provides structured reasoning trajectories for training the next generation AI research agents.
Score: 14.720475159371361
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: While AI innovation accelerates rapidly, the intellectual process behind breakthroughs -- how researchers identify gaps, synthesize prior work, and generate insights -- remains poorly understood. The lack of structured data on scientific reasoning hinders systematic analysis and development of AI research agents. We introduce Sci-Reasoning, the first dataset capturing the intellectual synthesis behind high-quality AI research. Using community-validated quality signals and an LLM-accelerated, human-verified pipeline, we trace Oral and Spotlight papers across NeurIPS, ICML, and ICLR (2023-2025) to its key predecessors, articulating specific reasoning links in a structured format. Our analysis identifies 15 distinct thinking patterns, with three dominant strategies accounting for 52.7%: Gap-Driven Reframing (24.2%), Cross-Domain Synthesis (18.0%), and Representation Shift (10.5%). The most powerful innovation recipes combine multiple patterns: Gap-Driven Reframing + Representation Shift, Cross-Domain Synthesis + Representation Shift, and Gap-Driven Reframing + Cross-Domain Synthesis. This dataset enables quantitative studies of scientific progress and provides structured reasoning trajectories for training the next generation AI research agents.

Related papers

Accelerating Scientific Research with Gemini: Case Studies and Common Techniques [105.15622072347811]
Large language models (LLMs) have opened new avenues for accelerating scientific research.<n>We present a collection of case studies demonstrating how researchers have successfully collaborated with advanced AI models.
arXiv Detail & Related papers (2026-02-03T18:56:17Z)
Cross-Disciplinary Knowledge Retrieval and Synthesis: A Compound AI Architecture for Scientific Discovery [1.5143261755366868]
BioSage is a novel compound AI architecture that integrates LLMs with RAG, orchestrated specialized agents and tools to enable discoveries across AI, data science, biomedical, and biosecurity domains.<n>Our system features several specialized agents including the retrieval agent with query planning and response synthesis that enable knowledge retrieval across domains with citation-backed responses.<n>Our ongoing work focuses on multimodal retrieval and reasoning over charts, tables, and structured scientific data, along with developing comprehensive multimodal benchmarks for cross-disciplinary discovery.
arXiv Detail & Related papers (2025-11-23T05:33:11Z)
Neo-Grounded Theory: A Methodological Innovation Integrating High-Dimensional Vector Clustering and Multi-Agent Collaboration for Qualitative Research [5.848041907318412]
Neo Grounded Theory (NGT) integrates vector clustering with multi agent systems to resolve qualitative research's scale depth paradox.<n>NGT achieved 168-fold speed improvement (3 hours vs 3 weeks), superior quality (0.904 vs 0.883), and 96% cost reduction.
arXiv Detail & Related papers (2025-09-26T16:26:33Z)
A Survey of Scientific Large Language Models: From Data Foundations to Agent Frontiers [251.23085679210206]
Scientific Large Language Models (Sci-LLMs) are transforming how knowledge is represented, integrated, and applied in scientific research.<n>This survey reframes the development of Sci-LLMs as a co-evolution between models and their underlying data substrate.<n>We formulate a unified taxonomy of scientific data and a hierarchical model of scientific knowledge.
arXiv Detail & Related papers (2025-08-28T18:30:52Z)
From AI for Science to Agentic Science: A Survey on Autonomous Scientific Discovery [108.1082357960201]
Agentic AI shows capabilities in hypothesis generation, experimental design, execution, analysis, and iterative refinement.<n>This survey provides a domain-oriented review of autonomous scientific discovery across life sciences, chemistry, materials science, and physics.
arXiv Detail & Related papers (2025-08-18T05:25:54Z)
Dynamic Knowledge Exchange and Dual-diversity Review: Concisely Unleashing the Potential of a Multi-Agent Research Team [53.38438460574943]
IDVSCI is a multi-agent framework built on large language models (LLMs)<n>It incorporates two key innovations: a Dynamic Knowledge Exchange mechanism and a Dual-Diversity Review paradigm.<n>Results show that IDVSCI consistently achieves the best performance across two datasets.
arXiv Detail & Related papers (2025-06-23T07:12:08Z)
AI-Driven Automation Can Become the Foundation of Next-Era Science of Science Research [58.944125758758936]
The Science of Science (SoS) explores the mechanisms underlying scientific discovery.<n>The advent of artificial intelligence (AI) presents a transformative opportunity for the next generation of SoS.<n>We outline the advantages of AI over traditional methods, discuss potential limitations, and propose pathways to overcome them.
arXiv Detail & Related papers (2025-05-17T15:01:33Z)
IRIS: Interactive Research Ideation System for Accelerating Scientific Discovery [27.218896203253987]
IRIS is an open-source platform designed for researchers to leverage large language models (LLMs)-assisted scientific ideation.<n>IRIS incorporates innovative features to enhance ideation, including adaptive test-time compute expansion via Monte Carlo Tree Search (MCTS), fine-grained feedback mechanism, and query-based literature synthesis.<n>We conduct a user study with researchers across diverse disciplines, validating the effectiveness of our system in enhancing ideation.
arXiv Detail & Related papers (2025-04-23T14:01:36Z)
MindGYM: What Matters in Question Synthesis for Thinking-Centric Fine-Tuning? [51.85759493254735]
MindGYM is a structured and scalable framework for question synthesis.<n>It infuses high-level reasoning objectives to shape the model's synthesis behavior.<n>It composes more complex multi-hop questions based on QA seeds for deeper reasoning.
arXiv Detail & Related papers (2025-03-12T16:03:03Z)
CS-PaperSum: A Large-Scale Dataset of AI-Generated Summaries for Scientific Papers [3.929864777332447]
CS-PaperSum is a large-scale dataset of 91,919 papers from 31 top-tier computer science conferences.<n>Our dataset enables automated literature analysis, research trend forecasting, and AI-driven scientific discovery.
arXiv Detail & Related papers (2025-02-27T22:48:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.