Related papers: DRIFT: A Toolkit for Diachronic Analysis of Scientific Literature

DRIFT: A Toolkit for Diachronic Analysis of Scientific Literature

URL: http://arxiv.org/abs/2107.01198v1
Date: Fri, 2 Jul 2021 17:33:25 GMT
Title: DRIFT: A Toolkit for Diachronic Analysis of Scientific Literature
Authors: Abheesht Sharma, Gunjan Chhablani, Harshit Pandey, Rajaswa Patil
Abstract summary: We open source DRIFT, which allows researchers to track research trends and development over the years. The analysis methods are collated from well-cited research works, with a few of our own methods added for good measure. To demonstrate the utility and efficacy of our tool, we perform a case study on the cs.CL corpus of the arXiv repository and draw inferences from the analysis methods.
Score: 0.7349727826230862
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this work, we present to the NLP community, and to the wider research community as a whole, an application for the diachronic analysis of research corpora. We open source an easy-to-use tool coined: DRIFT, which allows researchers to track research trends and development over the years. The analysis methods are collated from well-cited research works, with a few of our own methods added for good measure. Succinctly put, some of the analysis methods are: keyword extraction, word clouds, predicting declining/stagnant/growing trends using Productivity, tracking bi-grams using Acceleration plots, finding the Semantic Drift of words, tracking trends using similarity, etc. To demonstrate the utility and efficacy of our tool, we perform a case study on the cs.CL corpus of the arXiv repository and draw inferences from the analysis methods. The toolkit and the associated code are available here: https://github.com/rajaswa/DRIFT.

Related papers

Exploring the Garden of Forking Paths in Empirical Software Engineering Research: A Multiverse Analysis [3.6324565773746147]
We conduct a so-called multiverse analysis on a published empirical SE paper.<n>We identify nine pivotal analytical decisions with at least one equally defensible alternative.<n>The overwhelming majority produced qualitatively different, and sometimes even opposite, findings.
arXiv Detail & Related papers (2025-12-09T18:47:00Z)
MOLE: Metadata Extraction and Validation in Scientific Papers Using LLMs [54.5729817345543]
MOLE is a framework that automatically extracts metadata attributes from scientific papers covering datasets of languages other than Arabic.<n>Our methodology processes entire documents across multiple input formats and incorporates robust validation mechanisms for consistent output.
arXiv Detail & Related papers (2025-05-26T10:31:26Z)
Manalyzer: End-to-end Automated Meta-analysis with Multi-agent System [48.093356587573666]
Meta-analysis is a systematic research methodology that synthesizes data from multiple existing studies to derive comprehensive conclusions.<n>Traditional meta-analysis involves a complex multi-stage pipeline including literature retrieval, paper screening, and data extraction.<n>We propose a multi-agent system, Manalyzer, which achieves end-to-end automated meta-analysis through tool calls.
arXiv Detail & Related papers (2025-05-22T07:25:31Z)
Automating Bibliometric Analysis with Sentence Transformers and Retrieval-Augmented Generation (RAG): A Pilot Study in Semantic and Contextual Search for Customized Literature Characterization for High-Impact Urban Research [2.1728621449144763]
Bibliometric analysis is essential for understanding research trends, scope, and impact in urban science. Traditional methods, relying on keyword searches, often fail to uncover valuable insights not explicitly stated in article titles or keywords. We leverage Generative AI models, specifically transformers and Retrieval-Augmented Generation (RAG), to automate and enhance bibliometric analysis.
arXiv Detail & Related papers (2024-10-08T05:13:27Z)
Automated Extraction and Maturity Analysis of Open Source Clinical Informatics Repositories from Scientific Literature [0.0]
This study introduces an automated methodology to bridge the gap by systematically extracting GitHub repository URLs from academic papers indexed in arXiv. Our approach encompasses querying the arXiv API for relevant papers, cleaning extracted GitHub URLs, fetching comprehensive repository information via the GitHub API, and analyzing repository maturity based on defined metrics such as stars, forks, open issues, and contributors.
arXiv Detail & Related papers (2024-03-20T17:06:51Z)
LLMs for Science: Usage for Code Generation and Data Analysis [0.07499722271664144]
Large language models (LLMs) have been touted to enable increased productivity in many areas of today's work life. It is still unclear how the potential of LLMs will materialise in research practice.
arXiv Detail & Related papers (2023-11-28T12:29:33Z)
Source-Free Collaborative Domain Adaptation via Multi-Perspective Feature Enrichment for Functional MRI Analysis [55.03872260158717]
Resting-state MRI functional (rs-fMRI) is increasingly employed in multi-site research to aid neurological disorder analysis. Many methods have been proposed to reduce fMRI heterogeneity between source and target domains. But acquiring source data is challenging due to concerns and/or data storage burdens in multi-site studies. We design a source-free collaborative domain adaptation framework for fMRI analysis, where only a pretrained source model and unlabeled target data are accessible.
arXiv Detail & Related papers (2023-08-24T01:30:18Z)
PyRCA: A Library for Metric-based Root Cause Analysis [66.72542200701807]
PyRCA is an open-source machine learning library of Root Cause Analysis (RCA) for Artificial Intelligence for IT Operations (AIOps) It provides a holistic framework to uncover the complicated metric causal dependencies and automatically locate root causes of incidents.
arXiv Detail & Related papers (2023-06-20T09:55:10Z)
GAIA Search: Hugging Face and Pyserini Interoperability for NLP Training Data Exploration [97.68234051078997]
We discuss how Pyserini can be integrated with the Hugging Face ecosystem of open-source AI libraries and artifacts. We include a Jupyter Notebook-based walk through the core interoperability features, available on GitHub. We present GAIA Search - a search engine built following previously laid out principles, giving access to four popular large-scale text collections.
arXiv Detail & Related papers (2023-06-02T12:09:59Z)
Deep Learning for Survival Analysis: A Review [7.016568778869699]
The influx of deep learning (DL) techniques into the field of survival analysis has led to substantial methodological progress. We conduct a systematic review of DL-based methods for time-to-event analysis, characterizing them according to both survival- and DL-related attributes.
arXiv Detail & Related papers (2023-05-24T09:56:20Z)
A Diachronic Analysis of Paradigm Shifts in NLP Research: When, How, and Why? [84.46288849132634]
We propose a systematic framework for analyzing the evolution of research topics in a scientific field using causal discovery and inference techniques. We define three variables to encompass diverse facets of the evolution of research topics within NLP. We utilize a causal discovery algorithm to unveil the causal connections among these variables using observational data.
arXiv Detail & Related papers (2023-05-22T11:08:00Z)
Research Trends and Applications of Data Augmentation Algorithms [77.34726150561087]
We identify the main areas of application of data augmentation algorithms, the types of algorithms used, significant research trends, their progression over time and research gaps in data augmentation literature. We expect readers to understand the potential of data augmentation, as well as identify future research directions and open questions within data augmentation research.
arXiv Detail & Related papers (2022-07-18T11:38:32Z)
Combining Feature and Instance Attribution to Detect Artifacts [62.63504976810927]
We propose methods to facilitate identification of training data artifacts. We show that this proposed training-feature attribution approach can be used to uncover artifacts in training data. We execute a small user study to evaluate whether these methods are useful to NLP researchers in practice.
arXiv Detail & Related papers (2021-07-01T09:26:13Z)
Topic Space Trajectories: A case study on machine learning literature [0.0]
We present topic space trajectories, a structure that allows for the comprehensible tracking of research topics. We show the applicability of our approach on a publication corpus spanning 50 years of machine learning research from 32 publication venues. Our novel analysis method may be employed for paper classification, for the prediction of future research topics, and for the recommendation of fitting conferences and journals for submitting unpublished work.
arXiv Detail & Related papers (2020-10-23T10:53:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.