Change Summarization of Diachronic Scholarly Paper Collections by
Semantic Evolution Analysis
- URL: http://arxiv.org/abs/2112.03634v1
- Date: Tue, 7 Dec 2021 11:15:19 GMT
- Title: Change Summarization of Diachronic Scholarly Paper Collections by
Semantic Evolution Analysis
- Authors: Naman Paharia, Muhammad Syafiq Mohd Pozi, Adam Jatowt
- Abstract summary: We demonstrate a novel approach to analyze the collections of research papers published over longer time periods.
Our approach is based on comparing word semantic representations over time and aims to support users in a better understanding of large domain-focused archives of scholarly publications.
- Score: 10.554831859741851
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The amount of scholarly data has been increasing dramatically over the last
years. For newcomers to a particular science domain (e.g., IR, physics, NLP) it
is often difficult to spot larger trends and to position the latest research in
the context of prior scientific achievements and breakthroughs. Similarly,
researchers in the history of science are interested in tools that allow them
to analyze and visualize changes in particular scientific domains. Temporal
summarization and related methods should be then useful for making sense of
large volumes of scientific discourse data aggregated over time. We demonstrate
a novel approach to analyze the collections of research papers published over
longer time periods to provide a high-level overview of important semantic
changes that occurred over the progress of time. Our approach is based on
comparing word semantic representations over time and aims to support users in
a better understanding of large domain-focused archives of scholarly
publications. As an example dataset we use the ACL Anthology Reference Corpus
that spans from 1979 to 2015 and contains 22,878 scholarly articles.
Related papers
- A Comprehensive Survey of Scientific Large Language Models and Their Applications in Scientific Discovery [68.48094108571432]
Large language models (LLMs) have revolutionized the way text and other modalities of data are handled.
We aim to provide a more holistic view of the research landscape by unveiling cross-field and cross-modal connections between scientific LLMs.
arXiv Detail & Related papers (2024-06-16T08:03:24Z) - Mapping the Increasing Use of LLMs in Scientific Papers [99.67983375899719]
We conduct the first systematic, large-scale analysis across 950,965 papers published between January 2020 and February 2024 on the arXiv, bioRxiv, and Nature portfolio journals.
Our findings reveal a steady increase in LLM usage, with the largest and fastest growth observed in Computer Science papers.
arXiv Detail & Related papers (2024-04-01T17:45:15Z) - SciMMIR: Benchmarking Scientific Multi-modal Information Retrieval [64.03631654052445]
Current benchmarks for evaluating MMIR performance in image-text pairing within the scientific domain show a notable gap.
We develop a specialised scientific MMIR benchmark by leveraging open-access paper collections.
This benchmark comprises 530K meticulously curated image-text pairs, extracted from figures and tables with detailed captions in scientific documents.
arXiv Detail & Related papers (2024-01-24T14:23:12Z) - A Comprehensive Study of Groundbreaking Machine Learning Research:
Analyzing highly cited and impactful publications across six decades [1.6442870218029522]
Machine learning (ML) has emerged as a prominent field of research in computer science and other related fields.
It is crucial to understand the landscape of highly cited publications to identify key trends, influential authors, and significant contributions made thus far.
arXiv Detail & Related papers (2023-08-01T21:43:22Z) - A Diachronic Analysis of Paradigm Shifts in NLP Research: When, How, and
Why? [84.46288849132634]
We propose a systematic framework for analyzing the evolution of research topics in a scientific field using causal discovery and inference techniques.
We define three variables to encompass diverse facets of the evolution of research topics within NLP.
We utilize a causal discovery algorithm to unveil the causal connections among these variables using observational data.
arXiv Detail & Related papers (2023-05-22T11:08:00Z) - Modeling Information Change in Science Communication with Semantically
Matched Paraphrases [50.67030449927206]
SPICED is the first paraphrase dataset of scientific findings annotated for degree of information change.
SPICED contains 6,000 scientific finding pairs extracted from news stories, social media discussions, and full texts of original papers.
Models trained on SPICED improve downstream performance on evidence retrieval for fact checking of real-world scientific claims.
arXiv Detail & Related papers (2022-10-24T07:44:38Z) - CitationIE: Leveraging the Citation Graph for Scientific Information
Extraction [89.33938657493765]
We use the citation graph of referential links between citing and cited papers.
We observe a sizable improvement in end-to-end information extraction over the state-of-the-art.
arXiv Detail & Related papers (2021-06-03T03:00:12Z) - Early Indicators of Scientific Impact: Predicting Citations with
Altmetrics [0.0]
We use altmetrics to predict the short-term and long-term citations that a scholarly publication could receive.
We build various classification and regression models and evaluate their performance, finding neural networks and ensemble models to perform best for these tasks.
arXiv Detail & Related papers (2020-12-25T16:25:07Z) - Semantic and Relational Spaces in Science of Science: Deep Learning
Models for Article Vectorisation [4.178929174617172]
We focus on document-level embeddings based on the semantic and relational aspects of articles, using Natural Language Processing (NLP) and Graph Neural Networks (GNNs)
Our results show that using NLP we can encode a semantic space of articles, while with GNN we are able to build a relational space where the social practices of a research community are also encoded.
arXiv Detail & Related papers (2020-11-05T14:57:41Z) - Topic Space Trajectories: A case study on machine learning literature [0.0]
We present topic space trajectories, a structure that allows for the comprehensible tracking of research topics.
We show the applicability of our approach on a publication corpus spanning 50 years of machine learning research from 32 publication venues.
Our novel analysis method may be employed for paper classification, for the prediction of future research topics, and for the recommendation of fitting conferences and journals for submitting unpublished work.
arXiv Detail & Related papers (2020-10-23T10:53:42Z) - Attention: to Better Stand on the Shoulders of Giants [34.5017808610466]
This paper develops an attention mechanism for the long-term scientific impact prediction.
It validates the method based on a real large-scale citation data set.
arXiv Detail & Related papers (2020-05-27T00:25:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.