Related papers: RevMine: An LLM-Assisted Tool for Code Review Mining and Analysis Across Git Platforms

RevMine: An LLM-Assisted Tool for Code Review Mining and Analysis Across Git Platforms

URL: http://arxiv.org/abs/2510.04796v1
Date: Mon, 06 Oct 2025 13:22:10 GMT
Title: RevMine: An LLM-Assisted Tool for Code Review Mining and Analysis Across Git Platforms
Authors: Samah Kansab, Francis Bordeleau, Ali Tizghadam,
Abstract summary: RevMine streamlines the entire code review mining pipeline using large language models (LLMs)<n>It guides users through authentication, endpoint discovery, and natural language-driven data collection.<n>It supports both quantitative and qualitative analysis based on user-defined filters or LLM-inferred patterns.
Score: 1.2744523252873348
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: Empirical research on code review processes is increasingly central to understanding software quality and collaboration. However, collecting and analyzing review data remains a time-consuming and technically intensive task. Most researchers follow similar workflows - writing ad hoc scripts to extract, filter, and analyze review data from platforms like GitHub and GitLab. This paper introduces RevMine, a conceptual tool that streamlines the entire code review mining pipeline using large language models (LLMs). RevMine guides users through authentication, endpoint discovery, and natural language-driven data collection, significantly reducing the need for manual scripting. After retrieving review data, it supports both quantitative and qualitative analysis based on user-defined filters or LLM-inferred patterns. This poster outlines the tool's architecture, use cases, and research potential. By lowering the barrier to entry, RevMine aims to democratize code review mining and enable a broader range of empirical software engineering studies.

Related papers

LongDA: Benchmarking LLM Agents for Long-Document Data Analysis [55.32211515932351]
LongDA targets real-world settings in which navigating long documentation and complex data is the primary bottleneck.<n>LongTA is a tool-augmented agent framework that enables document access, retrieval, and code execution.<n>Our experiments reveal substantial performance gaps even among state-of-the-art models.
arXiv Detail & Related papers (2026-01-05T23:23:16Z)
AILINKPREVIEWER: Enhancing Code Reviews with LLM-Powered Link Previews [4.664062055146575]
Code review is a key practice in software engineering, where developers evaluate code changes to ensure quality and maintainability.<n> Links to issues and external resources are often included in Pull Requests (PRs) to provide additional context.<n>We present AIlinkPREVIEWER, a tool that generates previews of links in PRs using PR metadata, including titles, descriptions, comments, and link body content.
arXiv Detail & Related papers (2025-11-12T11:36:12Z)
Accelerating Discovery: Rapid Literature Screening with LLMs [1.2586771241101986]
Researchers must review and filter a large number of unstructured sources, which frequently contain sparse information.<n>We developed a Large Language Model (LLM) assistant to support the search and filtering of documents.
arXiv Detail & Related papers (2025-09-16T14:01:44Z)
Mic-hackathon 2024: Hackathon on Machine Learning for Electron and Scanning Probe Microscopy [54.24356756795849]
Microscopy is a primary source of information on materials structure and functionality at nanometer and atomic scales.<n>The adoption of Data Management Plans (DMPs) by major funding agencies promotes preservation and access.<n> deriving insights remains difficult due to the lack of standardized code ecosystems, benchmarks, and integration strategies.
arXiv Detail & Related papers (2025-06-10T03:54:36Z)
Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning [70.04746094652653]
We introduce PaperCoder, a framework that transforms machine learning papers into functional code repositories.<n>PaperCoder operates in three stages: planning, designs the system architecture with diagrams, identifies file dependencies, and generates configuration files.<n>We then evaluate PaperCoder on generating code implementations from machine learning papers based on both model-based and human evaluations.
arXiv Detail & Related papers (2025-04-24T01:57:01Z)
LazyReview A Dataset for Uncovering Lazy Thinking in NLP Peer Reviews [74.87393214734114]
This work introduces LazyReview, a dataset of peer-review sentences annotated with fine-grained lazy thinking categories.<n>Large Language Models (LLMs) struggle to detect these instances in a zero-shot setting.<n> instruction-based fine-tuning on our dataset significantly boosts performance by 10-20 performance points.
arXiv Detail & Related papers (2025-04-15T10:07:33Z)
Automating Code Review: A Systematic Literature Review [15.416725497289697]
Code Review consists in assessing the code written by teammates with the goal of increasing code quality.<n> Empirical studies documented the benefits brought by such a practice that, however, has its cost to pay in terms of developers' time.<n>Researchers have proposed techniques and tools to automate code review tasks.
arXiv Detail & Related papers (2025-03-12T16:19:10Z)
AIRepr: An Analyst-Inspector Framework for Evaluating Reproducibility of LLMs in Data Science [5.064778712920176]
Large language models (LLMs) are increasingly used to automate data analysis through executable code generation.<n>We present $itAIRepr, an $itA$nalyst - $itI$nspector framework for automatically evaluating and improving the $itRepr$oducibility of LLM-generated data analysis.
arXiv Detail & Related papers (2025-02-23T01:15:50Z)
SnipGen: A Mining Repository Framework for Evaluating LLMs for Code [51.07471575337676]
Language Models (LLMs) are trained on extensive datasets that include code repositories.<n> evaluating their effectiveness poses significant challenges due to the potential overlap between the datasets used for training and those employed for evaluation.<n>We introduce SnipGen, a comprehensive repository mining framework designed to leverage prompt engineering across various downstream tasks for code generation.
arXiv Detail & Related papers (2025-02-10T21:28:15Z)
LatteReview: A Multi-Agent Framework for Systematic Review Automation Using Large Language Models [0.0]
LatteReview is a Python-based framework that leverages large language models (LLMs) and multi-agent systems to automate key elements of the systematic review process.<n>The framework supports features such as Retrieval-Augmented Generation (RAG) for incorporating external context, multimodal reviews, Pydantic-based validation for structured inputs and outputs, and asynchronous programming for handling large-scale datasets.
arXiv Detail & Related papers (2025-01-05T17:53:00Z)
Towards Realistic Evaluation of Commit Message Generation by Matching Online and Offline Settings [77.20838441870151]
We use an online metric - the number of edits users introduce before committing the generated messages to the VCS - to select metrics for offline experiments.<n>We collect a dataset with 57 pairs consisting of commit messages generated by GPT-4 and their counterparts edited by human experts.<n>Our results indicate that edit distance exhibits the highest correlation with the online metric, whereas commonly used similarity metrics such as BLEU and METEOR demonstrate low correlation.
arXiv Detail & Related papers (2024-10-15T20:32:07Z)
Automating Patch Set Generation from Code Review Comments Using Large Language Models [2.045040820541428]
We provide code contexts to five popular Large Language Models (LLMs) We obtain the suggested code-changes (patch sets) derived from real-world code-review comments. The performance of each model is meticulously assessed by comparing their generated patch sets against the historical data of human-generated patch-sets.
arXiv Detail & Related papers (2024-04-10T02:46:08Z)
CodecLM: Aligning Language Models with Tailored Synthetic Data [51.59223474427153]
We introduce CodecLM, a framework for adaptively generating high-quality synthetic data for instruction-following abilities. We first encode seed instructions into metadata, which are concise keywords generated on-the-fly to capture the target instruction distribution. We also introduce Self-Rubrics and Contrastive Filtering during decoding to tailor data-efficient samples.
arXiv Detail & Related papers (2024-04-08T21:15:36Z)
CORL: Research-oriented Deep Offline Reinforcement Learning Library [48.47248460865739]
CORL is an open-source library that provides thoroughly benchmarked single-file implementations of reinforcement learning algorithms. It emphasizes a simple developing experience with a straightforward and a modern analysis tracking tool.
arXiv Detail & Related papers (2022-10-13T15:40:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.