RevMine: An LLM-Assisted Tool for Code Review Mining and Analysis Across Git Platforms
- URL: http://arxiv.org/abs/2510.04796v1
- Date: Mon, 06 Oct 2025 13:22:10 GMT
- Title: RevMine: An LLM-Assisted Tool for Code Review Mining and Analysis Across Git Platforms
- Authors: Samah Kansab, Francis Bordeleau, Ali Tizghadam,
- Abstract summary: RevMine streamlines the entire code review mining pipeline using large language models (LLMs)<n>It guides users through authentication, endpoint discovery, and natural language-driven data collection.<n>It supports both quantitative and qualitative analysis based on user-defined filters or LLM-inferred patterns.
- Score: 1.2744523252873348
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Empirical research on code review processes is increasingly central to understanding software quality and collaboration. However, collecting and analyzing review data remains a time-consuming and technically intensive task. Most researchers follow similar workflows - writing ad hoc scripts to extract, filter, and analyze review data from platforms like GitHub and GitLab. This paper introduces RevMine, a conceptual tool that streamlines the entire code review mining pipeline using large language models (LLMs). RevMine guides users through authentication, endpoint discovery, and natural language-driven data collection, significantly reducing the need for manual scripting. After retrieving review data, it supports both quantitative and qualitative analysis based on user-defined filters or LLM-inferred patterns. This poster outlines the tool's architecture, use cases, and research potential. By lowering the barrier to entry, RevMine aims to democratize code review mining and enable a broader range of empirical software engineering studies.
Related papers
- LongDA: Benchmarking LLM Agents for Long-Document Data Analysis [55.32211515932351]
LongDA targets real-world settings in which navigating long documentation and complex data is the primary bottleneck.<n>LongTA is a tool-augmented agent framework that enables document access, retrieval, and code execution.<n>Our experiments reveal substantial performance gaps even among state-of-the-art models.
arXiv Detail & Related papers (2026-01-05T23:23:16Z) - AILINKPREVIEWER: Enhancing Code Reviews with LLM-Powered Link Previews [4.664062055146575]
Code review is a key practice in software engineering, where developers evaluate code changes to ensure quality and maintainability.<n> Links to issues and external resources are often included in Pull Requests (PRs) to provide additional context.<n>We present AIlinkPREVIEWER, a tool that generates previews of links in PRs using PR metadata, including titles, descriptions, comments, and link body content.
arXiv Detail & Related papers (2025-11-12T11:36:12Z) - Accelerating Discovery: Rapid Literature Screening with LLMs [1.2586771241101986]
Researchers must review and filter a large number of unstructured sources, which frequently contain sparse information.<n>We developed a Large Language Model (LLM) assistant to support the search and filtering of documents.
arXiv Detail & Related papers (2025-09-16T14:01:44Z) - Mic-hackathon 2024: Hackathon on Machine Learning for Electron and Scanning Probe Microscopy [54.24356756795849]
Microscopy is a primary source of information on materials structure and functionality at nanometer and atomic scales.<n>The adoption of Data Management Plans (DMPs) by major funding agencies promotes preservation and access.<n> deriving insights remains difficult due to the lack of standardized code ecosystems, benchmarks, and integration strategies.
arXiv Detail & Related papers (2025-06-10T03:54:36Z) - Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning [70.04746094652653]
We introduce PaperCoder, a framework that transforms machine learning papers into functional code repositories.<n>PaperCoder operates in three stages: planning, designs the system architecture with diagrams, identifies file dependencies, and generates configuration files.<n>We then evaluate PaperCoder on generating code implementations from machine learning papers based on both model-based and human evaluations.
arXiv Detail & Related papers (2025-04-24T01:57:01Z) - LazyReview A Dataset for Uncovering Lazy Thinking in NLP Peer Reviews [74.87393214734114]
This work introduces LazyReview, a dataset of peer-review sentences annotated with fine-grained lazy thinking categories.<n>Large Language Models (LLMs) struggle to detect these instances in a zero-shot setting.<n> instruction-based fine-tuning on our dataset significantly boosts performance by 10-20 performance points.
arXiv Detail & Related papers (2025-04-15T10:07:33Z) - Automating Code Review: A Systematic Literature Review [15.416725497289697]
Code Review consists in assessing the code written by teammates with the goal of increasing code quality.<n> Empirical studies documented the benefits brought by such a practice that, however, has its cost to pay in terms of developers' time.<n>Researchers have proposed techniques and tools to automate code review tasks.
arXiv Detail & Related papers (2025-03-12T16:19:10Z) - AIRepr: An Analyst-Inspector Framework for Evaluating Reproducibility of LLMs in Data Science [5.064778712920176]
Large language models (LLMs) are increasingly used to automate data analysis through executable code generation.<n>We present $itAIRepr, an $itA$nalyst - $itI$nspector framework for automatically evaluating and improving the $itRepr$oducibility of LLM-generated data analysis.
arXiv Detail & Related papers (2025-02-23T01:15:50Z) - SnipGen: A Mining Repository Framework for Evaluating LLMs for Code [51.07471575337676]
Language Models (LLMs) are trained on extensive datasets that include code repositories.<n> evaluating their effectiveness poses significant challenges due to the potential overlap between the datasets used for training and those employed for evaluation.<n>We introduce SnipGen, a comprehensive repository mining framework designed to leverage prompt engineering across various downstream tasks for code generation.
arXiv Detail & Related papers (2025-02-10T21:28:15Z) - LatteReview: A Multi-Agent Framework for Systematic Review Automation Using Large Language Models [0.0]
LatteReview is a Python-based framework that leverages large language models (LLMs) and multi-agent systems to automate key elements of the systematic review process.<n>The framework supports features such as Retrieval-Augmented Generation (RAG) for incorporating external context, multimodal reviews, Pydantic-based validation for structured inputs and outputs, and asynchronous programming for handling large-scale datasets.
arXiv Detail & Related papers (2025-01-05T17:53:00Z) - Towards Realistic Evaluation of Commit Message Generation by Matching Online and Offline Settings [77.20838441870151]
We use an online metric - the number of edits users introduce before committing the generated messages to the VCS - to select metrics for offline experiments.<n>We collect a dataset with 57 pairs consisting of commit messages generated by GPT-4 and their counterparts edited by human experts.<n>Our results indicate that edit distance exhibits the highest correlation with the online metric, whereas commonly used similarity metrics such as BLEU and METEOR demonstrate low correlation.
arXiv Detail & Related papers (2024-10-15T20:32:07Z) - Automating Patch Set Generation from Code Review Comments Using Large Language Models [2.045040820541428]
We provide code contexts to five popular Large Language Models (LLMs)
We obtain the suggested code-changes (patch sets) derived from real-world code-review comments.
The performance of each model is meticulously assessed by comparing their generated patch sets against the historical data of human-generated patch-sets.
arXiv Detail & Related papers (2024-04-10T02:46:08Z) - CodecLM: Aligning Language Models with Tailored Synthetic Data [51.59223474427153]
We introduce CodecLM, a framework for adaptively generating high-quality synthetic data for instruction-following abilities.
We first encode seed instructions into metadata, which are concise keywords generated on-the-fly to capture the target instruction distribution.
We also introduce Self-Rubrics and Contrastive Filtering during decoding to tailor data-efficient samples.
arXiv Detail & Related papers (2024-04-08T21:15:36Z) - CORL: Research-oriented Deep Offline Reinforcement Learning Library [48.47248460865739]
CORL is an open-source library that provides thoroughly benchmarked single-file implementations of reinforcement learning algorithms.
It emphasizes a simple developing experience with a straightforward and a modern analysis tracking tool.
arXiv Detail & Related papers (2022-10-13T15:40:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.