PST-Bench: Tracing and Benchmarking the Source of Publications
- URL: http://arxiv.org/abs/2402.16009v1
- Date: Sun, 25 Feb 2024 06:56:43 GMT
- Title: PST-Bench: Tracing and Benchmarking the Source of Publications
- Authors: Fanjin Zhang, Kun Cao, Yukuo Cen, Jifan Yu, Da Yin, Jie Tang
- Abstract summary: We study the problem of paper source tracing (PST) and construct a high-quality and ever-increasing dataset PST-Bench in computer science.
Based on PST-Bench, we reveal several intriguing discoveries, such as the differing evolution patterns across various topics.
- Score: 39.250042251037144
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Tracing the source of research papers is a fundamental yet challenging task
for researchers. The billion-scale citation relations between papers hinder
researchers from understanding the evolution of science efficiently. To date,
there is still a lack of an accurate and scalable dataset constructed by
professional researchers to identify the direct source of their studied papers,
based on which automatic algorithms can be developed to expand the evolutionary
knowledge of science. In this paper, we study the problem of paper source
tracing (PST) and construct a high-quality and ever-increasing dataset
PST-Bench in computer science. Based on PST-Bench, we reveal several intriguing
discoveries, such as the differing evolution patterns across various topics. An
exploration of various methods underscores the hardness of PST-Bench,
pinpointing potential directions on this topic. The dataset and codes have been
available at https://github.com/THUDM/paper-source-trace.
Related papers
- SciDQA: A Deep Reading Comprehension Dataset over Scientific Papers [20.273439120429025]
SciDQA is a new dataset for reading comprehension that challenges LLMs for a deep understanding of scientific articles.
Unlike other scientific QA datasets, SciDQA sources questions from peer reviews by domain experts and answers by paper authors.
Questions in SciDQA necessitate reasoning across figures, tables, equations, appendices, and supplementary materials.
arXiv Detail & Related papers (2024-11-08T05:28:22Z) - Why Tabular Foundation Models Should Be a Research Priority [65.75744962286538]
Tabular data is the dominant modality in many fields, yet it is given hardly any research attention and significantly lags behind in terms of scale and power.
We believe the time is now to start developing tabular foundation models, or what we coin a Large Tabular Model (LTM)
arXiv Detail & Related papers (2024-05-02T10:05:16Z) - Autonomous LLM-driven research from data to human-verifiable research papers [0.0]
We build an automation platform that guides interacting through complete stepwise process.
In mode provided annotated data alone, datapaper raised hypotheses, designed plans, wrote and interpreted analysis codes, generated and interpreted results.
We demonstrate potential for AI-driven acceleration of scientific discovery while enhancing traceability, transparency and verifiability.
arXiv Detail & Related papers (2024-04-24T23:15:49Z) - A Diachronic Analysis of Paradigm Shifts in NLP Research: When, How, and
Why? [84.46288849132634]
We propose a systematic framework for analyzing the evolution of research topics in a scientific field using causal discovery and inference techniques.
We define three variables to encompass diverse facets of the evolution of research topics within NLP.
We utilize a causal discovery algorithm to unveil the causal connections among these variables using observational data.
arXiv Detail & Related papers (2023-05-22T11:08:00Z) - CiteBench: A benchmark for Scientific Citation Text Generation [69.37571393032026]
CiteBench is a benchmark for citation text generation.
We make the code for CiteBench publicly available at https://github.com/UKPLab/citebench.
arXiv Detail & Related papers (2022-12-19T16:10:56Z) - Navigating causal deep learning [78.572170629379]
Causal deep learning (CDL) is a new and important research area in the larger field of machine learning.
This paper categorises methods in causal deep learning beyond Pearl's ladder of causation.
Our paradigm is a tool which helps researchers to: find benchmarks, compare methods, and most importantly: identify research gaps.
arXiv Detail & Related papers (2022-12-01T23:44:23Z) - Tell Me How to Survey: Literature Review Made Simple with Automatic
Reading Path Generation [16.07200776251764]
How to glean papers worth reading from the massive literature to do a quick survey or keep up with the latest advancement about a specific research topic has become a challenging task.
Existing academic search engines such as Google Scholar return relevant papers by individually calculating the relevance between each paper and query.
We introduce Reading Path Generation (RPG) which aims at automatically producing a path of papers to read for a given query.
arXiv Detail & Related papers (2021-10-12T20:58:46Z) - Paperswithtopic: Topic Identification from Paper Title Only [5.025654873456756]
We present a dataset of papers paired by title and sub-field from the field of artificial intelligence (AI)
We also present results on how to predict a paper's AI sub-field from a given paper title only.
For the transformer models, we also present gradient-based, attention visualizations to further explain the model's classification process.
arXiv Detail & Related papers (2021-10-09T06:32:09Z) - Semi-Supervised Exaggeration Detection of Health Science Press Releases [23.930041685595775]
Recent studies have demonstrated a tendency of news media to misrepresent scientific papers by exaggerating their findings.
We present a formalization of and study into the problem of exaggeration detection in science communication.
We introduce MT-PET, a multi-task version of Pattern Exploiting Training (PET), which leverages knowledge from complementary cloze-style QA tasks to improve few-shot learning.
arXiv Detail & Related papers (2021-08-30T19:32:20Z) - CitationIE: Leveraging the Citation Graph for Scientific Information
Extraction [89.33938657493765]
We use the citation graph of referential links between citing and cited papers.
We observe a sizable improvement in end-to-end information extraction over the state-of-the-art.
arXiv Detail & Related papers (2021-06-03T03:00:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.