Learning to Generate Novel Scientific Directions with Contextualized
Literature-based Discovery
- URL: http://arxiv.org/abs/2305.14259v3
- Date: Thu, 12 Oct 2023 16:10:51 GMT
- Title: Learning to Generate Novel Scientific Directions with Contextualized
Literature-based Discovery
- Authors: Qingyun Wang, Doug Downey, Heng Ji, Tom Hope
- Abstract summary: Literature-Based Discovery aims to discover new scientific knowledge by mining papers and generating hypotheses.
We present a novel formulation of contextualized-LBD: generating scientific hypotheses in natural language, while grounding them in a context that controls the hypothesis search space.
Our evaluations reveal that GPT-4 tends to generate ideas with overall low technical depth and novelty, while our inspiration prompting approaches partially mitigate this issue.
- Score: 74.78803157606083
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Literature-Based Discovery (LBD) aims to discover new scientific knowledge by
mining papers and generating hypotheses. Standard LBD is limited to predicting
pairwise relations between discrete concepts (e.g., drug-disease links), and
ignores critical contexts like experimental settings (e.g., a specific patient
population where a drug is evaluated) and background motivations (e.g., to find
drugs without specific side effects). We address these limitations with a novel
formulation of contextualized-LBD (C-LBD): generating scientific hypotheses in
natural language, while grounding them in a context that controls the
hypothesis search space. We present a modeling framework using retrieval of
``inspirations'' from past scientific papers. Our evaluations reveal that GPT-4
tends to generate ideas with overall low technical depth and novelty, while our
inspiration prompting approaches partially mitigate this issue. Our work
represents a first step toward building language models that generate new ideas
derived from scientific literature.
Related papers
- Self-reflecting Large Language Models: A Hegelian Dialectical Approach [13.910371970437708]
Investigating NLP through a philosophical lens has recently caught researcher's eyes as it connects computational methods with classical schools of philosophy.
This paper introduces a philosophical approach inspired by the Hegelian Dialectic for LLMs' self-reflection, utilizing a self-dialectical approach to emulate internal critiques and then synthesize new ideas by resolving the contradicting points.
Our experiments show promise in generating new ideas and provide a stepping stone for future research.
arXiv Detail & Related papers (2025-01-24T20:54:29Z) - Good Idea or Not, Representation of LLM Could Tell [86.36317971482755]
We focus on idea assessment, which aims to leverage the knowledge of large language models to assess the merit of scientific ideas.
We release a benchmark dataset from nearly four thousand manuscript papers with full texts, meticulously designed to train and evaluate the performance of different approaches to this task.
Our findings suggest that the representations of large language models hold more potential in quantifying the value of ideas than their generative outputs.
arXiv Detail & Related papers (2024-09-07T02:07:22Z) - A Survey on Natural Language Counterfactual Generation [7.022371235308068]
Natural language counterfactual generation aims to minimally modify a given text such that the modified text will be classified into a different class.
We propose a new taxonomy that systematically categorizes the generation methods into four groups and summarizes the metrics for evaluating the generation quality.
arXiv Detail & Related papers (2024-07-04T15:13:59Z) - ResearchAgent: Iterative Research Idea Generation over Scientific Literature with Large Language Models [56.08917291606421]
ResearchAgent is an AI-based system for ideation and operationalization of novel work.
ResearchAgent automatically defines novel problems, proposes methods and designs experiments, while iteratively refining them.
We experimentally validate our ResearchAgent on scientific publications across multiple disciplines.
arXiv Detail & Related papers (2024-04-11T13:36:29Z) - Grounded Intuition of GPT-Vision's Abilities with Scientific Images [44.44139684561664]
We formalize a process that many have instinctively been trying already to develop "grounded intuition" of GPT-Vision.
We use our technique to examine alt text generation for scientific figures, finding that GPT-Vision is particularly sensitive to prompting.
Our method and analysis aim to help researchers ramp up their own grounded intuitions of new models while exposing how GPT-Vision can be applied to make information more accessible.
arXiv Detail & Related papers (2023-11-03T17:53:43Z) - Large Language Models for Automated Open-domain Scientific Hypotheses Discovery [50.40483334131271]
This work proposes the first dataset for social science academic hypotheses discovery.
Unlike previous settings, the new dataset requires (1) using open-domain data (raw web corpus) as observations; and (2) proposing hypotheses even new to humanity.
A multi- module framework is developed for the task, including three different feedback mechanisms to boost performance.
arXiv Detail & Related papers (2023-09-06T05:19:41Z) - Exploring and Verbalizing Academic Ideas by Concept Co-occurrence [42.16213986603552]
This study devises a framework based on concept co-occurrence for academic idea inspiration.
We construct evolving concept graphs according to the co-occurrence relationship of concepts from 20 disciplines or topics.
We generate a description of an idea based on a new data structure called co-occurrence citation quintuple.
arXiv Detail & Related papers (2023-06-04T07:01:30Z) - A Survey on Non-Autoregressive Generation for Neural Machine Translation
and Beyond [145.43029264191543]
Non-autoregressive (NAR) generation is first proposed in machine translation (NMT) to speed up inference.
While NAR generation can significantly accelerate machine translation, the inference of autoregressive (AR) generation sacrificed translation accuracy.
Many new models and algorithms have been designed/proposed to bridge the accuracy gap between NAR generation and AR generation.
arXiv Detail & Related papers (2022-04-20T07:25:22Z) - Improving Adversarial Text Generation by Modeling the Distant Future [155.83051741029732]
We consider a text planning scheme and present a model-based imitation-learning approach to alleviate the aforementioned issues.
We propose a novel guider network to focus on the generative process over a longer horizon, which can assist next-word prediction and provide intermediate rewards for generator optimization.
arXiv Detail & Related papers (2020-05-04T05:45:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.