Accelerating Social Science Research via Agentic Hypothesization and Experimentation
- URL: http://arxiv.org/abs/2602.07983v1
- Date: Sun, 08 Feb 2026 14:20:56 GMT
- Title: Accelerating Social Science Research via Agentic Hypothesization and Experimentation
- Authors: Jishu Sen Gupta, Harini SI, Somesh Kumar Singh, Syed Mohamad Tawseeq, Yaman Kumar Singla, David Doermann, Rajiv Ratn Shah, Balaji Krishnamurthy,
- Abstract summary: EXPERIGEN is a framework that operationalizes end-to-end discovery through a Bayesian optimization inspired two-phase search.<n>It consistently discovers 2-4x more statistically significant hypotheses that are 7-17 percent more predictive than prior approaches.<n>We conduct the first A/B test of LLM-generated hypotheses, observing statistically significant results with p less than 1e-6 and a large effect size of 344 percent.
- Score: 33.55093074029515
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Data-driven social science research is inherently slow, relying on iterative cycles of observation, hypothesis generation, and experimental validation. While recent data-driven methods promise to accelerate parts of this process, they largely fail to support end-to-end scientific discovery. To address this gap, we introduce EXPERIGEN, an agentic framework that operationalizes end-to-end discovery through a Bayesian optimization inspired two-phase search, in which a Generator proposes candidate hypotheses and an Experimenter evaluates them empirically. Across multiple domains, EXPERIGEN consistently discovers 2-4x more statistically significant hypotheses that are 7-17 percent more predictive than prior approaches, and naturally extends to complex data regimes including multimodal and relational datasets. Beyond statistical performance, hypotheses must be novel, empirically grounded, and actionable to drive real scientific progress. To evaluate these qualities, we conduct an expert review of machine-generated hypotheses, collecting feedback from senior faculty. Among 25 reviewed hypotheses, 88 percent were rated moderately or strongly novel, 70 percent were deemed impactful and worth pursuing, and most demonstrated rigor comparable to senior graduate-level research. Finally, recognizing that ultimate validation requires real-world evidence, we conduct the first A/B test of LLM-generated hypotheses, observing statistically significant results with p less than 1e-6 and a large effect size of 344 percent.
Related papers
- Principle-Evolvable Scientific Discovery via Uncertainty Minimization [9.216546947535244]
We present PiEvo, a principle-evolvable framework that treats scientific discovery as Bayesian optimization over an expanding principle space.<n>PiEvo achieves an average solution quality of up to 90.81%93.15%, representing a 29.7%31.1% improvement over the state-of-the-art.
arXiv Detail & Related papers (2026-02-06T07:19:27Z) - FIRE-Bench: Evaluating Agents on the Rediscovery of Scientific Insights [63.32178443510396]
We introduce FIRE-Bench (Full-cycle Insight Rediscovery Evaluation), a benchmark that evaluates agents through the rediscovery of established findings.<n>Even the strongest agents achieve limited rediscovery success (50 F1), exhibit high variance across runs, and display recurring failure modes in experimental design, execution, and evidence-based reasoning.
arXiv Detail & Related papers (2026-02-02T23:21:13Z) - HARPA: A Testability-Driven, Literature-Grounded Framework for Research Ideation [29.9491787481972]
HARPA is a tool to generate hypotheses that are both testable and grounded in the scientific literature.<n>Our evaluations show that HARPA-generated hypothesis-driven research proposals perform comparably to a strong baseline AI-researcher.<n>When tested with the ASD agent (CodeScientist), HARPA produced more successful executions (20 vs. 11 out of 40) and fewer failures (16 vs. 21 out of 40)
arXiv Detail & Related papers (2025-10-01T07:52:19Z) - Bayes-Entropy Collaborative Driven Agents for Research Hypotheses Generation and Optimization [4.469102316542763]
This paper proposes a multi-agent collaborative framework called HypoAgents.<n>It generates hypotheses through diversity sampling and establishes prior beliefs.<n>It then employs etrieval-augmented generation (RAG) to gather external literature evidence.<n>It identifies high-uncertainty hypotheses using information entropy $H = - sum p_ilog p_i$ and actively refines them.
arXiv Detail & Related papers (2025-08-03T13:05:32Z) - Open-ended Scientific Discovery via Bayesian Surprise [63.26412847240136]
AutoDS is a method for open-ended scientific discovery that instead drives scientific exploration using Bayesian surprise.<n>We evaluate AutoDS in the setting of data-driven discovery across 21 real-world datasets spanning domains such as biology, economics, finance, and behavioral science.
arXiv Detail & Related papers (2025-06-30T22:53:59Z) - MOOSE-Chem3: Toward Experiment-Guided Hypothesis Ranking via Simulated Experimental Feedback [136.27567671480156]
We introduce experiment-guided ranking, which prioritizes hypotheses based on feedback from prior tests.<n>We frame experiment-guided ranking as a sequential decision-making problem.<n>Our approach significantly outperforms pre-experiment baselines and strong ablations.
arXiv Detail & Related papers (2025-05-23T13:24:50Z) - Prediction-Powered Causal Inferences [59.98498488132307]
We focus on Prediction-Powered Causal Inferences (PPCI)<n>We first show that conditional calibration guarantees valid PPCI at population level.<n>We then introduce a sufficient representation constraint transferring validity across experiments.
arXiv Detail & Related papers (2025-02-10T10:52:17Z) - Literature Meets Data: A Synergistic Approach to Hypothesis Generation [24.98928229927995]
We develop the first method that combines literature-based insights with data to perform hypothesis generation.<n>We also conduct the first human evaluation to assess the utility of LLM-generated hypotheses in assisting human decision-making.
arXiv Detail & Related papers (2024-10-22T18:00:00Z) - Large Language Models for Automated Open-domain Scientific Hypotheses Discovery [50.40483334131271]
This work proposes the first dataset for social science academic hypotheses discovery.
Unlike previous settings, the new dataset requires (1) using open-domain data (raw web corpus) as observations; and (2) proposing hypotheses even new to humanity.
A multi- module framework is developed for the task, including three different feedback mechanisms to boost performance.
arXiv Detail & Related papers (2023-09-06T05:19:41Z) - A Double Machine Learning Approach to Combining Experimental and Observational Data [58.05402364136958]
We propose a double machine learning approach to combine experimental and observational studies.<n>Our framework proposes a falsification test for external validity and ignorability under milder assumptions.
arXiv Detail & Related papers (2023-07-04T02:53:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.