Related papers: Can citations tell us about a paper's reproducibility? A case study of machine learning papers

Can citations tell us about a paper's reproducibility? A case study of machine learning papers

URL: http://arxiv.org/abs/2405.03977v1
Date: Tue, 7 May 2024 03:29:11 GMT
Title: Can citations tell us about a paper's reproducibility? A case study of machine learning papers
Authors: Rochana R. Obadage, Sarah M. Rajtmajer, Jian Wu,
Abstract summary: Resource constraints and inadequate documentation can make running replications particularly challenging. We introduce a sentiment analysis framework applied to citation contexts from papers involved in Machine Learning Reproducibility Challenges.
Score: 3.5120846057971065
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: The iterative character of work in machine learning (ML) and artificial intelligence (AI) and reliance on comparisons against benchmark datasets emphasize the importance of reproducibility in that literature. Yet, resource constraints and inadequate documentation can make running replications particularly challenging. Our work explores the potential of using downstream citation contexts as a signal of reproducibility. We introduce a sentiment analysis framework applied to citation contexts from papers involved in Machine Learning Reproducibility Challenges in order to interpret the positive or negative outcomes of reproduction attempts. Our contributions include training classifiers for reproducibility-related contexts and sentiment analysis, and exploring correlations between citation context sentiment and reproducibility scores. Study data, software, and an artifact appendix are publicly available at https://github.com/lamps-lab/ccair-ai-reproducibility .

Related papers

Automatic Classification of User Requirements from Online Feedback -- A Replication Study [0.0]
We replicate a previous NLP4RE study (baseline), which evaluated different deep learning models for requirement classification from user reviews.<n>We reproduced the original results using publicly released source code, thereby helping to strengthen the external validity of the baseline study.<n>Our findings revealed that baseline deep learning models, BERT and ELMo, exhibited good capabilities on an external dataset, and GPT-4o showed performance comparable to traditional baseline machine learning models.
arXiv Detail & Related papers (2025-07-29T06:52:27Z)
Cite Pretrain: Retrieval-Free Knowledge Attribution for Large Language Models [53.17363502535395]
Trustworthy language models should provide both correct and verifiable answers.<n>Current systems insert citations by querying an external retriever at inference time.<n>We propose Active Indexing, which continually pretrains on synthetic QA pairs.
arXiv Detail & Related papers (2025-06-21T04:48:05Z)
AutoReproduce: Automatic AI Experiment Reproduction with Paper Lineage [62.049868205196425]
AutoReproduce is a framework capable of automatically reproducing experiments described in research papers in an end-to-end manner.<n>Results show that AutoReproduce achieves an average performance gap of $22.1%$ on $89.74%$ of the executable experiment runs.
arXiv Detail & Related papers (2025-05-27T03:15:21Z)
Interpreting Multilingual and Document-Length Sensitive Relevance Computations in Neural Retrieval Models through Axiomatic Causal Interventions [0.0]
This study analyzes and extends the paper "Axiomatic Causal Interventions for Reverse Engineering Relevance in Neural Retrieval Models"<n>We reproduce key experiments from the original paper, confirming that information on query terms is captured in the model encoding.<n>We extend this work by applying activation patching to Spanish and Chinese datasets and by exploring whether document-length information is encoded in the model as well.
arXiv Detail & Related papers (2025-05-04T15:30:45Z)
On the Capacity of Citation Generation by Large Language Models [38.47160164251295]
Retrieval-augmented generation (RAG) appears as a promising method to alleviate the "hallucination" problem in large language models (LLMs)
arXiv Detail & Related papers (2024-10-15T03:04:26Z)
Analysis of Plan-based Retrieval for Grounded Text Generation [78.89478272104739]
hallucinations occur when a language model is given a generation task outside its parametric knowledge. A common strategy to address this limitation is to infuse the language models with retrieval mechanisms. We analyze how planning can be used to guide retrieval to further reduce the frequency of hallucinations.
arXiv Detail & Related papers (2024-08-20T02:19:35Z)
C-ICL: Contrastive In-context Learning for Information Extraction [54.39470114243744]
c-ICL is a novel few-shot technique that leverages both correct and incorrect sample constructions to create in-context learning demonstrations. Our experiments on various datasets indicate that c-ICL outperforms previous few-shot in-context learning methods.
arXiv Detail & Related papers (2024-02-17T11:28:08Z)
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection [74.51523859064802]
We introduce a new framework called Self-Reflective Retrieval-Augmented Generation (Self-RAG) Self-RAG enhances an LM's quality and factuality through retrieval and self-reflection. It significantly outperforms state-of-the-art LLMs and retrieval-augmented models on a diverse set of tasks.
arXiv Detail & Related papers (2023-10-17T18:18:32Z)
Lessons in Reproducibility: Insights from NLP Studies in Materials Science [4.205692673448206]
We aim to comprehend these studies from a perspective, acknowledging their significant influence on the field of materials informatics, rather than critiquing them. Our study indicates that both papers offered thorough, tidy and well-documenteds, and clear guidance for model evaluation. We highlight areas for improvement such as to provide access to training data where copyright restrictions permit, more transparency on model architecture and the training process, and specifications of software dependency versions.
arXiv Detail & Related papers (2023-07-28T18:36:42Z)
Factually Consistent Summarization via Reinforcement Learning with Textual Entailment Feedback [57.816210168909286]
We leverage recent progress on textual entailment models to address this problem for abstractive summarization systems. We use reinforcement learning with reference-free, textual entailment rewards to optimize for factual consistency. Our results, according to both automatic metrics and human evaluation, show that our method considerably improves the faithfulness, salience, and conciseness of the generated summaries.
arXiv Detail & Related papers (2023-05-31T21:04:04Z)
CiteBench: A benchmark for Scientific Citation Text Generation [69.37571393032026]
CiteBench is a benchmark for citation text generation. We make the code for CiteBench publicly available at https://github.com/UKPLab/citebench.
arXiv Detail & Related papers (2022-12-19T16:10:56Z)
Investigating Fairness Disparities in Peer Review: A Language Model Enhanced Approach [77.61131357420201]
We conduct a thorough and rigorous study on fairness disparities in peer review with the help of large language models (LMs) We collect, assemble, and maintain a comprehensive relational database for the International Conference on Learning Representations (ICLR) conference from 2017 to date. We postulate and study fairness disparities on multiple protective attributes of interest, including author gender, geography, author, and institutional prestige.
arXiv Detail & Related papers (2022-11-07T16:19:42Z)
No Pattern, No Recognition: a Survey about Reproducibility and Distortion Issues of Text Clustering and Topic Modeling [0.0]
Machine learning algorithms can be used to extract knowledge from unlabeled texts. Unsupervised learning can lead to variability depending on the machine learning algorithm. The presence of outliers and anomalies can be a determining factor.
arXiv Detail & Related papers (2022-08-02T19:51:43Z)
Predicting the Reproducibility of Social and Behavioral Science Papers Using Supervised Learning Models [21.69933721765681]
We propose a framework that extracts five types of features from scholarly work that can be used to support assessments of published research claims. We analyze pairwise correlations between individual features and their importance for predicting a set of human-assessed ground truth labels.
arXiv Detail & Related papers (2021-04-08T00:45:20Z)
"Let's Eat Grandma": When Punctuation Matters in Sentence Representation for Sentiment Analysis [13.873803872380229]
We argue that punctuation could play a significant role in sentiment analysis and propose a novel representation model to improve syntactic and contextual performance. We conduct experiments on publicly available datasets and verify that our model can identify the sentiments more accurately over other state-of-the-art baseline methods.
arXiv Detail & Related papers (2020-12-10T19:07:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.