Related papers: Improving Research Idea Generation Through Data: An Empirical Investigation in Social Science

Improving Research Idea Generation Through Data: An Empirical Investigation in Social Science

URL: http://arxiv.org/abs/2505.21396v1
Date: Tue, 27 May 2025 16:23:42 GMT
Title: Improving Research Idea Generation Through Data: An Empirical Investigation in Social Science
Authors: Xiao Liu, Xinyi Dong, Xinyang Gao, Yansong Feng, Xun Pang,
Abstract summary: This paper explores how augmenting large language models with relevant data during the idea generation process can enhance the quality of generated ideas.<n>We conduct experiments in the social science domain, specifically with climate negotiation topics, and find that metadata improves the feasibility of generated ideas by 20%.<n>A human study shows that LLM-generated ideas, along with their related data and validation processes, inspire researchers to propose research ideas with higher quality.
Score: 25.857554476782827
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Recent advancements in large language models (LLMs) have shown promise in generating novel research ideas. However, these ideas often face challenges related to feasibility and expected effectiveness. This paper explores how augmenting LLMs with relevant data during the idea generation process can enhance the quality of generated ideas. We introduce two ways of incorporating data: (1) providing metadata during the idea generation stage to guide LLMs toward feasible directions, and (2) adding automatic validation during the idea selection stage to assess the empirical plausibility of hypotheses within ideas. We conduct experiments in the social science domain, specifically with climate negotiation topics, and find that metadata improves the feasibility of generated ideas by 20%, while automatic validation improves the overall quality of selected ideas by 7%. A human study shows that LLM-generated ideas, along with their related data and validation processes, inspire researchers to propose research ideas with higher quality. Our work highlights the potential of data-driven research idea generation, and underscores the practical utility of LLM-assisted ideation in real-world academic settings.

Related papers

Harnessing Large Language Models for Scientific Novelty Detection [49.10608128661251]
We propose to harness large language models (LLMs) for scientific novelty detection (ND)<n>To capture idea conception, we propose to train a lightweight retriever by distilling the idea-level knowledge from LLMs.<n> Experiments show our method consistently outperforms others on the proposed benchmark datasets for idea retrieval and ND tasks.
arXiv Detail & Related papers (2025-05-30T14:08:13Z)
IdeaBench: Benchmarking Large Language Models for Research Idea Generation [19.66218274796796]
Large Language Models (LLMs) have transformed how people interact with artificial intelligence (AI) systems. We propose IdeaBench, a benchmark system that includes a comprehensive dataset and an evaluation framework. Our dataset comprises titles and abstracts from a diverse range of influential papers, along with their referenced works. Our evaluation framework is a two-stage process: first, using GPT-4o to rank ideas based on user-specified quality indicators such as novelty and feasibility, enabling scalable personalization.
arXiv Detail & Related papers (2024-10-31T17:04:59Z)
Chain of Ideas: Revolutionizing Research Via Novel Idea Development with LLM Agents [64.64280477958283]
An exponential increase in scientific literature makes it challenging for researchers to stay current with recent advances and identify meaningful research directions. Recent developments in large language models(LLMs) suggest a promising avenue for automating the generation of novel research ideas. We propose a Chain-of-Ideas(CoI) agent, an LLM-based agent that organizes relevant literature in a chain structure to effectively mirror the progressive development in a research domain.
arXiv Detail & Related papers (2024-10-17T03:26:37Z)
Good Idea or Not, Representation of LLM Could Tell [86.36317971482755]
We focus on idea assessment, which aims to leverage the knowledge of large language models to assess the merit of scientific ideas. We release a benchmark dataset from nearly four thousand manuscript papers with full texts, meticulously designed to train and evaluate the performance of different approaches to this task. Our findings suggest that the representations of large language models hold more potential in quantifying the value of ideas than their generative outputs.
arXiv Detail & Related papers (2024-09-07T02:07:22Z)
Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers [90.26363107905344]
Large language models (LLMs) have sparked optimism about their potential to accelerate scientific discovery. No evaluations have shown that LLM systems can take the very first step of producing novel, expert-level ideas.
arXiv Detail & Related papers (2024-09-06T08:25:03Z)
Data-Centric AI in the Age of Large Language Models [51.20451986068925]
This position paper proposes a data-centric viewpoint of AI research, focusing on large language models (LLMs) We make the key observation that data is instrumental in the developmental (e.g., pretraining and fine-tuning) and inferential stages (e.g., in-context learning) of LLMs. We identify four specific scenarios centered around data, covering data-centric benchmarks and data curation, data attribution, knowledge transfer, and inference contextualization.
arXiv Detail & Related papers (2024-06-20T16:34:07Z)
Towards Data-Centric Automatic R&D [17.158255487686997]
Researchers often seek the potential research directions by reading and then verifying them through experiments. The data-driven black-box deep learning method has demonstrated its effectiveness in a wide range of real-world scenarios. We propose a Real-world Data-centric automatic R&D Benchmark, namely RD2Bench.
arXiv Detail & Related papers (2024-04-17T11:33:21Z)
ResearchAgent: Iterative Research Idea Generation over Scientific Literature with Large Language Models [56.08917291606421]
ResearchAgent is an AI-based system for ideation and operationalization of novel work.<n>ResearchAgent automatically defines novel problems, proposes methods and designs experiments, while iteratively refining them.<n>We experimentally validate our ResearchAgent on scientific publications across multiple disciplines.
arXiv Detail & Related papers (2024-04-11T13:36:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.