Causal Knowledge Extraction from Scholarly Papers in Social Sciences
- URL: http://arxiv.org/abs/2006.08904v1
- Date: Tue, 16 Jun 2020 03:37:40 GMT
- Title: Causal Knowledge Extraction from Scholarly Papers in Social Sciences
- Authors: Victor Zitian Chen, Felipe Montano-Campos and Wlodek Zadrozny
- Abstract summary: We develop models to classify sentences in scholarly documents in business and management.
We identify hypotheses from these papers, and extract the cause-and-effect entities.
Our approach may be generalizable to scholarly documents in a wide range of social sciences.
- Score: 1.976652238476722
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The scale and scope of scholarly articles today are overwhelming human
researchers who seek to timely digest and synthesize knowledge. In this paper,
we seek to develop natural language processing (NLP) models to accelerate the
speed of extraction of relationships from scholarly papers in social sciences,
identify hypotheses from these papers, and extract the cause-and-effect
entities. Specifically, we develop models to 1) classify sentences in scholarly
documents in business and management as hypotheses (hypothesis classification),
2) classify these hypotheses as causal relationships or not (causality
classification), and, if they are causal, 3) extract the cause and effect
entities from these hypotheses (entity extraction). We have achieved high
performance for all the three tasks using different modeling techniques. Our
approach may be generalizable to scholarly documents in a wide range of social
sciences, as well as other types of textual materials.
Related papers
- Language agents achieve superhuman synthesis of scientific knowledge [0.7635132958167216]
PaperQA2 is a frontier language model agent optimized for improved factuality, matches or exceeds subject matter expert performance.
PaperQA2 writes cited, Wikipedia-style summaries of scientific topics that are significantly more accurate than existing, human-written Wikipedia articles.
We apply PaperQA2 to identify contradictions within the scientific literature, an important scientific task that is challenging for humans.
arXiv Detail & Related papers (2024-09-10T16:37:58Z) - Hypothesizing Missing Causal Variables with LLMs [55.28678224020973]
We formulate a novel task where the input is a partial causal graph with missing variables, and the output is a hypothesis about the missing variables to complete the partial graph.
We show the strong ability of LLMs to hypothesize the mediation variables between a cause and its effect.
We also observe surprising results where some of the open-source models outperform the closed GPT-4 model.
arXiv Detail & Related papers (2024-09-04T10:37:44Z) - Understanding Fine-grained Distortions in Reports of Scientific Findings [46.96512578511154]
Distorted science communication harms individuals and society as it can lead to unhealthy behavior change and decrease trust in scientific institutions.
Given the rapidly increasing volume of science communication in recent years, a fine-grained understanding of how findings from scientific publications are reported to the general public is crucial.
arXiv Detail & Related papers (2024-02-19T19:00:01Z) - Neural Causal Abstractions [63.21695740637627]
We develop a new family of causal abstractions by clustering variables and their domains.
We show that such abstractions are learnable in practical settings through Neural Causal Models.
Our experiments support the theory and illustrate how to scale causal inferences to high-dimensional settings involving image data.
arXiv Detail & Related papers (2024-01-05T02:00:27Z) - Can Large Language Models Discern Evidence for Scientific Hypotheses? Case Studies in the Social Sciences [3.9985385067438344]
A strong hypothesis is a best guess based on existing evidence and informed by a comprehensive view of relevant literature.
With exponential increase in the number of scientific articles published annually, manual aggregation and synthesis of evidence related to a given hypothesis is a challenge.
We share a novel dataset for the task of scientific hypothesis evidencing using community-driven annotations of studies in the social sciences.
arXiv Detail & Related papers (2023-09-07T04:15:17Z) - Large Language Models for Automated Open-domain Scientific Hypotheses Discovery [50.40483334131271]
This work proposes the first dataset for social science academic hypotheses discovery.
Unlike previous settings, the new dataset requires (1) using open-domain data (raw web corpus) as observations; and (2) proposing hypotheses even new to humanity.
A multi- module framework is developed for the task, including three different feedback mechanisms to boost performance.
arXiv Detail & Related papers (2023-09-06T05:19:41Z) - Improving Primary Healthcare Workflow Using Extreme Summarization of
Scientific Literature Based on Generative AI [8.901148687545103]
Our objective is to investigate the potential of generative artificial intelligence in diminishing the cognitive load experienced by practitioners.
Our research demonstrates that the use of generative AI for literature review is efficient and effective.
arXiv Detail & Related papers (2023-07-24T21:42:27Z) - CitationIE: Leveraging the Citation Graph for Scientific Information
Extraction [89.33938657493765]
We use the citation graph of referential links between citing and cited papers.
We observe a sizable improvement in end-to-end information extraction over the state-of-the-art.
arXiv Detail & Related papers (2021-06-03T03:00:12Z) - Low Resource Recognition and Linking of Biomedical Concepts from a Large
Ontology [30.324906836652367]
PubMed, the most well known database of biomedical papers, relies on human curators to add these annotations.
Our approach achieves new state-of-the-art results for the UMLS in both traditional recognition/linking and semantic indexing-based evaluation.
arXiv Detail & Related papers (2021-01-26T06:41:12Z) - Explaining Relationships Between Scientific Documents [55.23390424044378]
We address the task of explaining relationships between two scientific documents using natural language text.
In this paper we establish a dataset of 622K examples from 154K documents.
arXiv Detail & Related papers (2020-02-02T03:54:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.