Related papers: Identifying non-natural language artifacts in bug reports

Identifying non-natural language artifacts in bug reports

URL: http://arxiv.org/abs/2110.01336v1
Date: Mon, 4 Oct 2021 11:33:51 GMT
Title: Identifying non-natural language artifacts in bug reports
Authors: Thomas Hirsch, Birgit Hofer
Abstract summary: We present a machine learning based approach to classify content into natural language and artifacts at line level in Python. We show how data from GitHub issue trackers can be used for automated training set generation. Our model scores at 0.95 ROC-AUC and 0.93 F1 against our manually annotated validation set, and classifies 10k lines in 0.72 seconds.
Score: 1.464410818828473
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Bug reports are a popular target for natural language processing (NLP). However, bug reports often contain artifacts such as code snippets, log outputs and stack traces. These artifacts not only inflate the bug reports with noise, but often constitute a real problem for the NLP approach at hand and have to be removed. In this paper, we present a machine learning based approach to classify content into natural language and artifacts at line level implemented in Python. We show how data from GitHub issue trackers can be used for automated training set generation, and present a custom preprocessing approach for bug reports. Our model scores at 0.95 ROC-AUC and 0.93 F1 against our manually annotated validation set, and classifies 10k lines in 0.72 seconds. We cross evaluated our model against a foreign dataset and a foreign R model for the same task. The Python implementation of our model and our datasets are made publicly available under an open source license.

Related papers

PyResBugs: A Dataset of Residual Python Bugs for Natural Language-Driven Fault Injection [5.383910843560784]
PyResBugs is a curated dataset of residual bugs from major Python frameworks.<n>Each bug is paired with its corresponding fault-free (fixed) version and annotated with multi-level natural language (NL) descriptions.
arXiv Detail & Related papers (2025-05-09T04:39:09Z)
Automated Duplicate Bug Report Detection in Large Open Bug Repositories [3.481985817302898]
Many users and contributors of large open-source projects report software defects or enhancement requests (known as bug reports) to the issue-tracking systems. We propose a novel approach based on machine learning methods that can automatically detect duplicate bug reports in an open bug repository.
arXiv Detail & Related papers (2025-04-21T01:55:54Z)
LLPut: Investigating Large Language Models for Bug Report-Based Input Generation [0.0]
Failure-inducing inputs play a crucial role in diagnosing and analyzing software bugs. Prior research has leveraged various Natural Language Processing (NLP) techniques for automated input extraction. With the advent of Large Language Models (LLMs), an important research question arises: how effectively can generative LLMs extract failure-inducing inputs from bug reports?
arXiv Detail & Related papers (2025-03-26T14:25:01Z)
Improving Automatic Text Recognition with Language Models in the PyLaia Open-Source Library [3.3484434195495605]
We focus on the incorporation of reliable confidence scores and the integration of statistical language modeling during decoding. Our implementation provides an easy way to combine PyLaia with n-grams language models at different levels. We evaluate PyLaia's performance on twelve datasets, both with and without language modelling.
arXiv Detail & Related papers (2024-04-29T14:11:16Z)
Language Models for Text Classification: Is In-Context Learning Enough? [54.869097980761595]
Recent foundational language models have shown state-of-the-art performance in many NLP tasks in zero- and few-shot settings. An advantage of these models over more standard approaches is the ability to understand instructions written in natural language (prompts) This makes them suitable for addressing text classification problems for domains with limited amounts of annotated instances.
arXiv Detail & Related papers (2024-03-26T12:47:39Z)
PyTy: Repairing Static Type Errors in Python [19.74043303068795]
This paper presents PyTy, an automated program repair approach targeted at statically type errors in Python. We create a dataset of 2,766 error-fix pairs from 176 GitHub repositories, named PyTyDefects. Our evaluation shows that PyTy offers fixes for ten frequent categories of type errors, successfully addressing 85.4% of 281 real-world errors.
arXiv Detail & Related papers (2024-01-12T15:08:56Z)
Paloma: A Benchmark for Evaluating Language Model Fit [112.481957296585]
Evaluations of language models (LMs) commonly report perplexity on monolithic data held out from training. We introduce Perplexity Analysis for Language Model Assessment (Paloma), a benchmark to measure LM fit to 546 English and code domains.
arXiv Detail & Related papers (2023-12-16T19:12:45Z)
An Open Dataset and Model for Language Identification [84.15194457400253]
We present a LID model which achieves a macro-average F1 score of 0.93 and a false positive rate of 0.033 across 201 languages. We make both the model and the dataset available to the research community.
arXiv Detail & Related papers (2023-05-23T08:43:42Z)
Automatic Classification of Bug Reports Based on Multiple Text Information and Reports' Intention [37.67372105858311]
This paper proposes a new automatic classification method for bug reports. The innovation is that when categorizing bug reports, in addition to using the text information of the report, the intention of the report is also considered. Our proposed method achieves better performance and its F-Measure achieves from 87.3% to 95.5%.
arXiv Detail & Related papers (2022-08-02T06:44:51Z)
Falsesum: Generating Document-level NLI Examples for Recognizing Factual Inconsistency in Summarization [63.21819285337555]
We show that NLI models can be effective for this task when the training data is augmented with high-quality task-oriented examples. We introduce Falsesum, a data generation pipeline leveraging a controllable text generation model to perturb human-annotated summaries. We show that models trained on a Falsesum-augmented NLI dataset improve the state-of-the-art performance across four benchmarks for detecting factual inconsistency in summarization.
arXiv Detail & Related papers (2022-05-12T10:43:42Z)
Automatic Language Identification for Celtic Texts [0.0]
This work addresses the identification of the related low-resource languages on the example of the Celtic language family. We collected a new dataset including Irish, Scottish, Welsh and English records. We tested supervised models such as SVM and neural networks with traditional statistical features alongside the output of clustering, autoencoder, and topic modelling methods.
arXiv Detail & Related papers (2022-03-09T16:04:13Z)
MINIMAL: Mining Models for Data Free Universal Adversarial Triggers [57.14359126600029]
We present a novel data-free approach, MINIMAL, to mine input-agnostic adversarial triggers from NLP models. We reduce the accuracy of Stanford Sentiment Treebank's positive class from 93.6% to 9.6%. For the Stanford Natural Language Inference (SNLI), our single-word trigger reduces the accuracy of the entailment class from 90.95% to less than 0.6%.
arXiv Detail & Related papers (2021-09-25T17:24:48Z)
MOROCCO: Model Resource Comparison Framework [61.444083353087294]
We present MOROCCO, a framework to compare language models compatible with ttjiant environment which supports over 50 NLU tasks. We demonstrate its applicability for two GLUE-like suites in different languages.
arXiv Detail & Related papers (2021-04-29T13:01:27Z)
ManyTypes4Py: A Benchmark Python Dataset for Machine Learning-based Type Inference [9.384801062680786]
ManyTypes4Py is a large Python dataset for machine learning (ML)-based type inference. The dataset contains a total of 5,382 Python projects with more than 869K type annotations.
arXiv Detail & Related papers (2021-04-10T08:10:06Z)
A ground-truth dataset and classification model for detecting bots in GitHub issue and PR comments [70.1864008701113]
Bots are used in Github repositories to automate repetitive activities that are part of the distributed software development process. This paper proposes a ground-truth dataset, based on a manual analysis with high interrater agreement, of pull request and issue comments in 5,000 distinct Github accounts. We propose an automated classification model to detect bots, taking as main features the number of empty and non-empty comments of each account, the number of comment patterns, and the inequality between comments within comment patterns.
arXiv Detail & Related papers (2020-10-07T09:30:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.