Related papers: Large Language Models for Automated Open-domain Scientific Hypotheses Discovery

Large Language Models for Automated Open-domain Scientific Hypotheses Discovery

URL: http://arxiv.org/abs/2309.02726v3
Date: Wed, 12 Jun 2024 08:40:15 GMT
Title: Large Language Models for Automated Open-domain Scientific Hypotheses Discovery
Authors: Zonglin Yang, Xinya Du, Junxian Li, Jie Zheng, Soujanya Poria, Erik Cambria,
Abstract summary: This work proposes the first dataset for social science academic hypotheses discovery. Unlike previous settings, the new dataset requires (1) using open-domain data (raw web corpus) as observations; and (2) proposing hypotheses even new to humanity. A multi- module framework is developed for the task, including three different feedback mechanisms to boost performance.
Score: 50.40483334131271
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Hypothetical induction is recognized as the main reasoning type when scientists make observations about the world and try to propose hypotheses to explain those observations. Past research on hypothetical induction is under a constrained setting: (1) the observation annotations in the dataset are carefully manually handpicked sentences (resulting in a close-domain setting); and (2) the ground truth hypotheses are mostly commonsense knowledge, making the task less challenging. In this work, we tackle these problems by proposing the first dataset for social science academic hypotheses discovery, with the final goal to create systems that automatically generate valid, novel, and helpful scientific hypotheses, given only a pile of raw web corpus. Unlike previous settings, the new dataset requires (1) using open-domain data (raw web corpus) as observations; and (2) proposing hypotheses even new to humanity. A multi-module framework is developed for the task, including three different feedback mechanisms to boost performance, which exhibits superior performance in terms of both GPT-4 based and expert-based evaluation. To the best of our knowledge, this is the first work showing that LLMs are able to generate novel (''not existing in literature'') and valid (''reflecting reality'') scientific hypotheses.

Related papers

Open-ended Scientific Discovery via Bayesian Surprise [63.26412847240136]
AutoDS is a method for open-ended scientific discovery that instead drives scientific exploration using Bayesian surprise.<n>We evaluate AutoDS in the setting of data-driven discovery across 21 real-world datasets spanning domains such as biology, economics, finance, and behavioral science.
arXiv Detail & Related papers (2025-06-30T22:53:59Z)
MOOSE-Chem2: Exploring LLM Limits in Fine-Grained Scientific Hypothesis Discovery via Hierarchical Search [93.64235254640967]
Large language models (LLMs) have shown promise in automating scientific hypothesis generation.<n>We define the novel task of fine-grained scientific hypothesis discovery.<n>We propose a hierarchical search method that incrementally proposes and integrates details into the hypothesis.
arXiv Detail & Related papers (2025-05-25T16:13:46Z)
MOOSE-Chem3: Toward Experiment-Guided Hypothesis Ranking via Simulated Experimental Feedback [128.2992631982687]
We introduce the task of experiment-guided ranking, which aims to prioritize candidate hypotheses based on the results of previously tested ones.<n>We propose a simulator grounded in three domain-informed assumptions, modeling hypothesis performance as a function of similarity to a known ground truth hypothesis.<n>We curate a dataset of 124 chemistry hypotheses with experimentally reported outcomes to validate the simulator.
arXiv Detail & Related papers (2025-05-23T13:24:50Z)
Sparks of Science: Hypothesis Generation Using Structured Paper Data [1.250723303641055]
We introduce HypoGen, the first dataset of approximately 5500 structured problem-hypothesis pairs extracted from top-tier computer science conferences. We demonstrate that framing hypothesis generation as conditional language modelling, with the model fine-tuned on Bit-Flip-Spark and the Chain-of-Reasoning. We show that by fine-tuning on our HypoGen dataset we improve the novelty, feasibility, and overall quality of the generated hypotheses.
arXiv Detail & Related papers (2025-04-17T14:29:18Z)
HypoBench: Towards Systematic and Principled Benchmarking for Hypothesis Generation [24.656083479331645]
We introduce HypoBench, a novel benchmark designed to evaluate hypothesis generation methods across multiple aspects. We evaluate four state-of-the-art LLMs combined with six existing hypothesis-generation methods. Results indicate that there is still significant room for improvement, as current hypothesis generation methods do not fully uncover all relevant or meaningful patterns.
arXiv Detail & Related papers (2025-04-15T18:00:00Z)
ResearchBench: Benchmarking LLMs in Scientific Discovery via Inspiration-Based Task Decomposition [67.26124739345332]
Large language models (LLMs) have demonstrated potential in assisting scientific research, yet their ability to discover high-quality research hypotheses remains unexamined. We introduce the first large-scale benchmark for evaluating LLMs with a near-sufficient set of sub-tasks of scientific discovery. We develop an automated framework that extracts critical components - research questions, background surveys, inspirations, and hypotheses - from scientific papers.
arXiv Detail & Related papers (2025-03-27T08:09:15Z)
Improving Scientific Hypothesis Generation with Knowledge Grounded Large Language Models [20.648157071328807]
Large language models (LLMs) can identify novel research directions by analyzing existing knowledge. LLMs are prone to generating hallucinations'', outputs that are plausible-sounding but factually incorrect. We propose KG-CoI, a system that enhances LLM hypothesis generation by integrating external, structured knowledge from knowledge graphs.
arXiv Detail & Related papers (2024-11-04T18:50:00Z)
Graph Stochastic Neural Process for Inductive Few-shot Knowledge Graph Completion [63.68647582680998]
We focus on a task called inductive few-shot knowledge graph completion (I-FKGC) Inspired by the idea of inductive reasoning, we cast I-FKGC as an inductive reasoning problem. We present a neural process-based hypothesis extractor that models the joint distribution of hypothesis, from which we can sample a hypothesis for predictions. In the second module, based on the hypothesis, we propose a graph attention-based predictor to test if the triple in the query set aligns with the extracted hypothesis.
arXiv Detail & Related papers (2024-08-03T13:37:40Z)
LLM and Simulation as Bilevel Optimizers: A New Paradigm to Advance Physical Scientific Discovery [141.39722070734737]
We propose to enhance the knowledge-driven, abstract reasoning abilities of Large Language Models with the computational strength of simulations. We introduce Scientific Generative Agent (SGA), a bilevel optimization framework. We conduct experiments to demonstrate our framework's efficacy in law discovery and molecular design.
arXiv Detail & Related papers (2024-05-16T03:04:10Z)
Hypothesis Generation with Large Language Models [28.73562677221476]
We focus on hypothesis generation based on data (i.e., labeled examples) Inspired by multi-armed bandits, we design a reward function to inform the exploitation-exploration tradeoff in the update process. Our algorithm is able to generate hypotheses that enable much better predictive performance than few-shot prompting in classification tasks.
arXiv Detail & Related papers (2024-04-05T18:00:07Z)
Large Language Models are Zero Shot Hypothesis Proposers [17.612235393984744]
Large Language Models (LLMs) hold a wealth of global and interdisciplinary knowledge that promises to break down information barriers. We construct a dataset consist of background knowledge and hypothesis pairs from biomedical literature. We evaluate the hypothesis generation capabilities of various top-tier instructed models in zero-shot, few-shot, and fine-tuning settings.
arXiv Detail & Related papers (2023-11-10T10:03:49Z)
Can Large Language Models Discern Evidence for Scientific Hypotheses? Case Studies in the Social Sciences [3.9985385067438344]
A strong hypothesis is a best guess based on existing evidence and informed by a comprehensive view of relevant literature. With exponential increase in the number of scientific articles published annually, manual aggregation and synthesis of evidence related to a given hypothesis is a challenge. We share a novel dataset for the task of scientific hypothesis evidencing using community-driven annotations of studies in the social sciences.
arXiv Detail & Related papers (2023-09-07T04:15:17Z)
SciMON: Scientific Inspiration Machines Optimized for Novelty [68.46036589035539]
We explore and enhance the ability of neural language models to generate novel scientific directions grounded in literature. We take a dramatic departure with a novel setting in which models use as input background contexts. We present SciMON, a modeling framework that uses retrieval of "inspirations" from past scientific papers.
arXiv Detail & Related papers (2023-05-23T17:12:08Z)
The role of prior information and computational power in Machine Learning [0.0]
We discuss how prior information and computational power can be employed to solve a learning problem. We argue that employing high computational power has the advantage of a higher performance.
arXiv Detail & Related papers (2022-10-31T20:39:53Z)
SciFact-Open: Towards open-domain scientific claim verification [61.288725621156864]
We present SciFact-Open, a new test collection designed to evaluate the performance of scientific claim verification systems. We collect evidence for scientific claims by pooling and annotating the top predictions of four state-of-the-art scientific claim verification models. We find that systems developed on smaller corpora struggle to generalize to SciFact-Open, exhibiting performance drops of at least 15 F1.
arXiv Detail & Related papers (2022-10-25T05:45:00Z)
L2R2: Leveraging Ranking for Abductive Reasoning [65.40375542988416]
The abductive natural language inference task ($alpha$NLI) is proposed to evaluate the abductive reasoning ability of a learning system. A novel $L2R2$ approach is proposed under the learning-to-rank framework. Experiments on the ART dataset reach the state-of-the-art in the public leaderboard.
arXiv Detail & Related papers (2020-05-22T15:01:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.