Identifying non-natural language artifacts in bug reports
- URL: http://arxiv.org/abs/2110.01336v1
- Date: Mon, 4 Oct 2021 11:33:51 GMT
- Title: Identifying non-natural language artifacts in bug reports
- Authors: Thomas Hirsch, Birgit Hofer
- Abstract summary: We present a machine learning based approach to classify content into natural language and artifacts at line level in Python.
We show how data from GitHub issue trackers can be used for automated training set generation.
Our model scores at 0.95 ROC-AUC and 0.93 F1 against our manually annotated validation set, and classifies 10k lines in 0.72 seconds.
- Score: 1.464410818828473
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Bug reports are a popular target for natural language processing (NLP).
However, bug reports often contain artifacts such as code snippets, log outputs
and stack traces. These artifacts not only inflate the bug reports with noise,
but often constitute a real problem for the NLP approach at hand and have to be
removed. In this paper, we present a machine learning based approach to
classify content into natural language and artifacts at line level implemented
in Python. We show how data from GitHub issue trackers can be used for
automated training set generation, and present a custom preprocessing approach
for bug reports. Our model scores at 0.95 ROC-AUC and 0.93 F1 against our
manually annotated validation set, and classifies 10k lines in 0.72 seconds. We
cross evaluated our model against a foreign dataset and a foreign R model for
the same task. The Python implementation of our model and our datasets are made
publicly available under an open source license.
Related papers
- Improving Automatic Text Recognition with Language Models in the PyLaia Open-Source Library [3.3484434195495605]
We focus on the incorporation of reliable confidence scores and the integration of statistical language modeling during decoding.
Our implementation provides an easy way to combine PyLaia with n-grams language models at different levels.
We evaluate PyLaia's performance on twelve datasets, both with and without language modelling.
arXiv Detail & Related papers (2024-04-29T14:11:16Z) - Language Models for Text Classification: Is In-Context Learning Enough? [54.869097980761595]
Recent foundational language models have shown state-of-the-art performance in many NLP tasks in zero- and few-shot settings.
An advantage of these models over more standard approaches is the ability to understand instructions written in natural language (prompts)
This makes them suitable for addressing text classification problems for domains with limited amounts of annotated instances.
arXiv Detail & Related papers (2024-03-26T12:47:39Z) - PyTy: Repairing Static Type Errors in Python [19.74043303068795]
This paper presents PyTy, an automated program repair approach targeted at statically type errors in Python.
We create a dataset of 2,766 error-fix pairs from 176 GitHub repositories, named PyTyDefects.
Our evaluation shows that PyTy offers fixes for ten frequent categories of type errors, successfully addressing 85.4% of 281 real-world errors.
arXiv Detail & Related papers (2024-01-12T15:08:56Z) - An Open Dataset and Model for Language Identification [84.15194457400253]
We present a LID model which achieves a macro-average F1 score of 0.93 and a false positive rate of 0.033 across 201 languages.
We make both the model and the dataset available to the research community.
arXiv Detail & Related papers (2023-05-23T08:43:42Z) - Automatic Classification of Bug Reports Based on Multiple Text
Information and Reports' Intention [37.67372105858311]
This paper proposes a new automatic classification method for bug reports.
The innovation is that when categorizing bug reports, in addition to using the text information of the report, the intention of the report is also considered.
Our proposed method achieves better performance and its F-Measure achieves from 87.3% to 95.5%.
arXiv Detail & Related papers (2022-08-02T06:44:51Z) - Falsesum: Generating Document-level NLI Examples for Recognizing Factual
Inconsistency in Summarization [63.21819285337555]
We show that NLI models can be effective for this task when the training data is augmented with high-quality task-oriented examples.
We introduce Falsesum, a data generation pipeline leveraging a controllable text generation model to perturb human-annotated summaries.
We show that models trained on a Falsesum-augmented NLI dataset improve the state-of-the-art performance across four benchmarks for detecting factual inconsistency in summarization.
arXiv Detail & Related papers (2022-05-12T10:43:42Z) - Automatic Language Identification for Celtic Texts [0.0]
This work addresses the identification of the related low-resource languages on the example of the Celtic language family.
We collected a new dataset including Irish, Scottish, Welsh and English records.
We tested supervised models such as SVM and neural networks with traditional statistical features alongside the output of clustering, autoencoder, and topic modelling methods.
arXiv Detail & Related papers (2022-03-09T16:04:13Z) - MINIMAL: Mining Models for Data Free Universal Adversarial Triggers [57.14359126600029]
We present a novel data-free approach, MINIMAL, to mine input-agnostic adversarial triggers from NLP models.
We reduce the accuracy of Stanford Sentiment Treebank's positive class from 93.6% to 9.6%.
For the Stanford Natural Language Inference (SNLI), our single-word trigger reduces the accuracy of the entailment class from 90.95% to less than 0.6%.
arXiv Detail & Related papers (2021-09-25T17:24:48Z) - MOROCCO: Model Resource Comparison Framework [61.444083353087294]
We present MOROCCO, a framework to compare language models compatible with ttjiant environment which supports over 50 NLU tasks.
We demonstrate its applicability for two GLUE-like suites in different languages.
arXiv Detail & Related papers (2021-04-29T13:01:27Z) - ManyTypes4Py: A Benchmark Python Dataset for Machine Learning-based Type
Inference [9.384801062680786]
ManyTypes4Py is a large Python dataset for machine learning (ML)-based type inference.
The dataset contains a total of 5,382 Python projects with more than 869K type annotations.
arXiv Detail & Related papers (2021-04-10T08:10:06Z) - A ground-truth dataset and classification model for detecting bots in
GitHub issue and PR comments [70.1864008701113]
Bots are used in Github repositories to automate repetitive activities that are part of the distributed software development process.
This paper proposes a ground-truth dataset, based on a manual analysis with high interrater agreement, of pull request and issue comments in 5,000 distinct Github accounts.
We propose an automated classification model to detect bots, taking as main features the number of empty and non-empty comments of each account, the number of comment patterns, and the inequality between comments within comment patterns.
arXiv Detail & Related papers (2020-10-07T09:30:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.