Can citations tell us about a paper's reproducibility? A case study of machine learning papers
- URL: http://arxiv.org/abs/2405.03977v1
- Date: Tue, 7 May 2024 03:29:11 GMT
- Title: Can citations tell us about a paper's reproducibility? A case study of machine learning papers
- Authors: Rochana R. Obadage, Sarah M. Rajtmajer, Jian Wu,
- Abstract summary: Resource constraints and inadequate documentation can make running replications particularly challenging.
We introduce a sentiment analysis framework applied to citation contexts from papers involved in Machine Learning Reproducibility Challenges.
- Score: 3.5120846057971065
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The iterative character of work in machine learning (ML) and artificial intelligence (AI) and reliance on comparisons against benchmark datasets emphasize the importance of reproducibility in that literature. Yet, resource constraints and inadequate documentation can make running replications particularly challenging. Our work explores the potential of using downstream citation contexts as a signal of reproducibility. We introduce a sentiment analysis framework applied to citation contexts from papers involved in Machine Learning Reproducibility Challenges in order to interpret the positive or negative outcomes of reproduction attempts. Our contributions include training classifiers for reproducibility-related contexts and sentiment analysis, and exploring correlations between citation context sentiment and reproducibility scores. Study data, software, and an artifact appendix are publicly available at https://github.com/lamps-lab/ccair-ai-reproducibility .
Related papers
- On the Capacity of Citation Generation by Large Language Models [38.47160164251295]
Retrieval-augmented generation (RAG) appears as a promising method to alleviate the "hallucination" problem in large language models (LLMs)
arXiv Detail & Related papers (2024-10-15T03:04:26Z) - Analysis of Plan-based Retrieval for Grounded Text Generation [78.89478272104739]
hallucinations occur when a language model is given a generation task outside its parametric knowledge.
A common strategy to address this limitation is to infuse the language models with retrieval mechanisms.
We analyze how planning can be used to guide retrieval to further reduce the frequency of hallucinations.
arXiv Detail & Related papers (2024-08-20T02:19:35Z) - C-ICL: Contrastive In-context Learning for Information Extraction [54.39470114243744]
c-ICL is a novel few-shot technique that leverages both correct and incorrect sample constructions to create in-context learning demonstrations.
Our experiments on various datasets indicate that c-ICL outperforms previous few-shot in-context learning methods.
arXiv Detail & Related papers (2024-02-17T11:28:08Z) - Self-RAG: Learning to Retrieve, Generate, and Critique through
Self-Reflection [74.51523859064802]
We introduce a new framework called Self-Reflective Retrieval-Augmented Generation (Self-RAG)
Self-RAG enhances an LM's quality and factuality through retrieval and self-reflection.
It significantly outperforms state-of-the-art LLMs and retrieval-augmented models on a diverse set of tasks.
arXiv Detail & Related papers (2023-10-17T18:18:32Z) - Lessons in Reproducibility: Insights from NLP Studies in Materials
Science [4.205692673448206]
We aim to comprehend these studies from a perspective, acknowledging their significant influence on the field of materials informatics, rather than critiquing them.
Our study indicates that both papers offered thorough, tidy and well-documenteds, and clear guidance for model evaluation.
We highlight areas for improvement such as to provide access to training data where copyright restrictions permit, more transparency on model architecture and the training process, and specifications of software dependency versions.
arXiv Detail & Related papers (2023-07-28T18:36:42Z) - Factually Consistent Summarization via Reinforcement Learning with
Textual Entailment Feedback [57.816210168909286]
We leverage recent progress on textual entailment models to address this problem for abstractive summarization systems.
We use reinforcement learning with reference-free, textual entailment rewards to optimize for factual consistency.
Our results, according to both automatic metrics and human evaluation, show that our method considerably improves the faithfulness, salience, and conciseness of the generated summaries.
arXiv Detail & Related papers (2023-05-31T21:04:04Z) - CiteBench: A benchmark for Scientific Citation Text Generation [69.37571393032026]
CiteBench is a benchmark for citation text generation.
We make the code for CiteBench publicly available at https://github.com/UKPLab/citebench.
arXiv Detail & Related papers (2022-12-19T16:10:56Z) - Investigating Fairness Disparities in Peer Review: A Language Model
Enhanced Approach [77.61131357420201]
We conduct a thorough and rigorous study on fairness disparities in peer review with the help of large language models (LMs)
We collect, assemble, and maintain a comprehensive relational database for the International Conference on Learning Representations (ICLR) conference from 2017 to date.
We postulate and study fairness disparities on multiple protective attributes of interest, including author gender, geography, author, and institutional prestige.
arXiv Detail & Related papers (2022-11-07T16:19:42Z) - No Pattern, No Recognition: a Survey about Reproducibility and
Distortion Issues of Text Clustering and Topic Modeling [0.0]
Machine learning algorithms can be used to extract knowledge from unlabeled texts.
Unsupervised learning can lead to variability depending on the machine learning algorithm.
The presence of outliers and anomalies can be a determining factor.
arXiv Detail & Related papers (2022-08-02T19:51:43Z) - Predicting the Reproducibility of Social and Behavioral Science Papers
Using Supervised Learning Models [21.69933721765681]
We propose a framework that extracts five types of features from scholarly work that can be used to support assessments of published research claims.
We analyze pairwise correlations between individual features and their importance for predicting a set of human-assessed ground truth labels.
arXiv Detail & Related papers (2021-04-08T00:45:20Z) - "Let's Eat Grandma": When Punctuation Matters in Sentence Representation
for Sentiment Analysis [13.873803872380229]
We argue that punctuation could play a significant role in sentiment analysis and propose a novel representation model to improve syntactic and contextual performance.
We conduct experiments on publicly available datasets and verify that our model can identify the sentiments more accurately over other state-of-the-art baseline methods.
arXiv Detail & Related papers (2020-12-10T19:07:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.