Related papers: CC30k: A Citation Contexts Dataset for Reproducibility-Oriented Sentiment Analysis

CC30k: A Citation Contexts Dataset for Reproducibility-Oriented Sentiment Analysis

URL: http://arxiv.org/abs/2511.07790v1
Date: Wed, 12 Nov 2025 01:18:21 GMT
Title: CC30k: A Citation Contexts Dataset for Reproducibility-Oriented Sentiment Analysis
Authors: Rochana R. Obadage, Sarah M. Rajtmajer, Jian Wu,
Abstract summary: We introduce the CC30k dataset, comprising a total of 30,734 citation contexts in machine learning papers.<n>The resulting dataset achieves a labeling accuracy of 94%.<n>The dataset lays the foundation for large-scale assessments of machine learning papers.
Score: 3.4246771373930187
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Sentiments about the reproducibility of cited papers in downstream literature offer community perspectives and have shown as a promising signal of the actual reproducibility of published findings. To train effective models to effectively predict reproducibility-oriented sentiments and further systematically study their correlation with reproducibility, we introduce the CC30k dataset, comprising a total of 30,734 citation contexts in machine learning papers. Each citation context is labeled with one of three reproducibility-oriented sentiment labels: Positive, Negative, or Neutral, reflecting the cited paper's perceived reproducibility or replicability. Of these, 25,829 are labeled through crowdsourcing, supplemented with negatives generated through a controlled pipeline to counter the scarcity of negative labels. Unlike traditional sentiment analysis datasets, CC30k focuses on reproducibility-oriented sentiments, addressing a research gap in resources for computational reproducibility studies. The dataset was created through a pipeline that includes robust data cleansing, careful crowd selection, and thorough validation. The resulting dataset achieves a labeling accuracy of 94%. We then demonstrated that the performance of three large language models significantly improves on the reproducibility-oriented sentiment classification after fine-tuning using our dataset. The dataset lays the foundation for large-scale assessments of the reproducibility of machine learning papers. The CC30k dataset and the Jupyter notebooks used to produce and analyze the dataset are publicly available at https://github.com/lamps-lab/CC30k .

Related papers

Assessing Reproducibility in Evolutionary Computation: A Case Study using Human- and LLM-based Assessment [2.0365636651755263]
We study the practices in papers published in the Combinatorial Optimization and Metaheuristics track of the Evolutionary Computation Conference over a ten-year period.<n>We introduce a structured checklist and apply it through a systematic manual assessment of the selected corpus.<n>In addition, we propose RECAP (REproducibility Checklist Automation Pipeline), an automated system that automatically evaluates signals from paper text and associated code.
arXiv Detail & Related papers (2026-02-05T08:32:29Z)
Beyond Quantity: Distribution-Aware Labeling for Visual Grounding [72.43984105242177]
Visual grounding requires large and diverse region-text pairs.<n>Existing pseudo-labeling pipelines often overfit to biased distributions.<n>We propose DAL, a distribution-aware labeling framework for visual grounding.
arXiv Detail & Related papers (2025-05-30T09:04:47Z)
When in Doubt, Cascade: Towards Building Efficient and Capable Guardrails [19.80434777786657]
We develop a synthetic pipeline to generate targeted and labeled data.<n>We show that our method achieves competitive performance with a fraction of the cost in compute.
arXiv Detail & Related papers (2024-07-08T18:39:06Z)
Towards Enhancing Coherence in Extractive Summarization: Dataset and Experiments with LLMs [70.15262704746378]
We propose a systematically created human-annotated dataset consisting of coherent summaries for five publicly available datasets and natural language user feedback. Preliminary experiments with Falcon-40B and Llama-2-13B show significant performance improvements (10% Rouge-L) in terms of producing coherent summaries.
arXiv Detail & Related papers (2024-07-05T20:25:04Z)
Can citations tell us about a paper's reproducibility? A case study of machine learning papers [3.5120846057971065]
Resource constraints and inadequate documentation can make running replications particularly challenging. We introduce a sentiment analysis framework applied to citation contexts from papers involved in Machine Learning Reproducibility Challenges.
arXiv Detail & Related papers (2024-05-07T03:29:11Z)
Predicting Scientific Impact Through Diffusion, Conformity, and Contribution Disentanglement [11.684776349325887]
Existing models typically rely on static graphs for citation count estimation. We introduce a novel model, DPPDCC, which Disentangles the Potential impacts of Papers into Diffusion, Conformity, and Contribution values.
arXiv Detail & Related papers (2023-11-15T07:21:11Z)
TRIAGE: Characterizing and auditing training data for improved regression [80.11415390605215]
We introduce TRIAGE, a novel data characterization framework tailored to regression tasks and compatible with a broad class of regressors. TRIAGE utilizes conformal predictive distributions to provide a model-agnostic scoring method, the TRIAGE score. We show that TRIAGE's characterization is consistent and highlight its utility to improve performance via data sculpting/filtering, in multiple regression settings.
arXiv Detail & Related papers (2023-10-29T10:31:59Z)
Challenging the Myth of Graph Collaborative Filtering: a Reasoned and Reproducibility-driven Analysis [50.972595036856035]
We present a code that successfully replicates results from six popular and recent graph recommendation models. We compare these graph models with traditional collaborative filtering models that historically performed well in offline evaluations. By investigating the information flow from users' neighborhoods, we aim to identify which models are influenced by intrinsic features in the dataset structure.
arXiv Detail & Related papers (2023-08-01T09:31:44Z)
Boosting Differentiable Causal Discovery via Adaptive Sample Reweighting [62.23057729112182]
Differentiable score-based causal discovery methods learn a directed acyclic graph from observational data. We propose a model-agnostic framework to boost causal discovery performance by dynamically learning the adaptive weights for the Reweighted Score function, ReScore.
arXiv Detail & Related papers (2023-03-06T14:49:59Z)
Debiasing Vision-Language Models via Biased Prompts [79.04467131711775]
We propose a general approach for debiasing vision-language foundation models by projecting out biased directions in the text embedding. We show that debiasing only the text embedding with a calibrated projection matrix suffices to yield robust classifiers and fair generative models.
arXiv Detail & Related papers (2023-01-31T20:09:33Z)
Heavy-tailed Representations, Text Polarity Classification & Data Augmentation [11.624944730002298]
We develop a novel method to learn a heavy-tailed embedding with desirable regularity properties. A classifier dedicated to the tails of the proposed embedding is obtained which performance outperforms the baseline. Numerical experiments on synthetic and real text data demonstrate the relevance of the proposed framework.
arXiv Detail & Related papers (2020-03-25T19:24:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.