S3M: Siamese Stack (Trace) Similarity Measure
- URL: http://arxiv.org/abs/2103.10526v1
- Date: Thu, 18 Mar 2021 21:10:41 GMT
- Title: S3M: Siamese Stack (Trace) Similarity Measure
- Authors: Aleksandr Khvorov, Roman Vasiliev, George Chernishev, Irving Muller
Rodrigues, Dmitrij Koznov, Nikita Povarov
- Abstract summary: We present S3M -- the first approach to computing stack trace similarity based on deep learning.
It is based on a biLSTM encoder and a fully-connected classifier to compute similarity.
Our experiments demonstrate the superiority of our approach over the state-of-the-art on both open-sourced data and a private JetBrains dataset.
- Score: 55.58269472099399
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Automatic crash reporting systems have become a de-facto standard in software
development. These systems monitor target software, and if a crash occurs they
send details to a backend application. Later on, these reports are aggregated
and used in the development process to 1) understand whether it is a new or an
existing issue, 2) assign these bugs to appropriate developers, and 3) gain a
general overview of the application's bug landscape. The efficiency of report
aggregation and subsequent operations heavily depends on the quality of the
report similarity metric. However, a distinctive feature of this kind of report
is that no textual input from the user (i.e., bug description) is available: it
contains only stack trace information.
In this paper, we present S3M ("extreme") -- the first approach to computing
stack trace similarity based on deep learning. It is based on a siamese
architecture that uses a biLSTM encoder and a fully-connected classifier to
compute similarity. Our experiments demonstrate the superiority of our approach
over the state-of-the-art on both open-sourced data and a private JetBrains
dataset. Additionally, we review the impact of stack trace trimming on the
quality of the results.
Related papers
- Towards Realistic Evaluation of Commit Message Generation by Matching Online and Offline Settings [77.20838441870151]
Commit message generation is a crucial task in software engineering that is challenging to evaluate correctly.
We use an online metric - the number of edits users introduce before committing the generated messages to the VCS - to select metrics for offline experiments.
Our results indicate that edit distance exhibits the highest correlation, whereas commonly used similarity metrics such as BLEU and METEOR demonstrate low correlation.
arXiv Detail & Related papers (2024-10-15T20:32:07Z) - Multi-View Adaptive Contrastive Learning for Information Retrieval Based Fault Localization [5.1987901165589]
We propose a novel approach named Multi-View Adaptive Contrastive Learning for Information Retrieval Fault localization (MACL-IRFL)
We first generate data augmentations from report-code interaction view, report-report similarity view and code-code co-citation view separately, and adopt graph neural network to aggregate the information of bug reports or source code files from the three views in the embedding process.
Our design of contrastive learning task will force the bug report representations to encode information shared by report-report and report-code views,and the source code file representations shared by code-code and report-code views,
arXiv Detail & Related papers (2024-09-19T07:20:10Z) - SegVG: Transferring Object Bounding Box to Segmentation for Visual Grounding [56.079013202051094]
We present SegVG, a novel method transfers the box-level annotation as signals to provide an additional pixel-level supervision for Visual Grounding.
This approach allows us to iteratively exploit the annotation as signals for both box-level regression and pixel-level segmentation.
arXiv Detail & Related papers (2024-07-03T15:30:45Z) - Fact Checking Beyond Training Set [64.88575826304024]
We show that the retriever-reader suffers from performance deterioration when it is trained on labeled data from one domain and used in another domain.
We propose an adversarial algorithm to make the retriever component robust against distribution shift.
We then construct eight fact checking scenarios from these datasets, and compare our model to a set of strong baseline models.
arXiv Detail & Related papers (2024-03-27T15:15:14Z) - EMBERSim: A Large-Scale Databank for Boosting Similarity Search in
Malware Analysis [48.5877840394508]
In recent years there has been a shift from quantifications-based malware detection towards machine learning.
We propose to address the deficiencies in the space of similarity research on binary files, starting from EMBER.
We enhance EMBER with similarity information as well as malware class tags, to enable further research in the similarity space.
arXiv Detail & Related papers (2023-10-03T06:58:45Z) - MaintainoMATE: A GitHub App for Intelligent Automation of Maintenance
Activities [3.2228025627337864]
Software development projects rely on issue tracking systems at the core of tracking maintenance tasks such as bug reports, and enhancement requests.
The handling of issue-reports is critical and requires thorough scanning of the text entered in an issue-report making it a labor-intensive task.
We present a unified framework called MaintainoMATE, which is capable of automatically categorizing the issue-reports in their respective category and further assigning the issue-reports to a developer with relevant expertise.
arXiv Detail & Related papers (2023-08-31T05:15:42Z) - Modeling Entities as Semantic Points for Visual Information Extraction
in the Wild [55.91783742370978]
We propose an alternative approach to precisely and robustly extract key information from document images.
We explicitly model entities as semantic points, i.e., center points of entities are enriched with semantic information describing the attributes and relationships of different entities.
The proposed method can achieve significantly enhanced performance on entity labeling and linking, compared with previous state-of-the-art models.
arXiv Detail & Related papers (2023-03-23T08:21:16Z) - Auto-labelling of Bug Report using Natural Language Processing [0.0]
Rule and Query-based solutions recommend a long list of potential similar bug reports with no clear ranking.
In this paper, we have proposed a solution using a combination of NLP techniques.
It uses a custom data transformer, a deep neural network, and a non-generalizing machine learning method to retrieve existing identical bug reports.
arXiv Detail & Related papers (2022-12-13T02:32:42Z) - Automatic Classification of Bug Reports Based on Multiple Text
Information and Reports' Intention [37.67372105858311]
This paper proposes a new automatic classification method for bug reports.
The innovation is that when categorizing bug reports, in addition to using the text information of the report, the intention of the report is also considered.
Our proposed method achieves better performance and its F-Measure achieves from 87.3% to 95.5%.
arXiv Detail & Related papers (2022-08-02T06:44:51Z) - Leveraging Structural Properties of Source Code Graphs for Just-In-Time
Bug Prediction [6.467090475885797]
A graph is one of the most commonly used representations for understanding relational data.
In this study, we propose a methodology to utilize the relational properties of source code in the form of a graph.
arXiv Detail & Related papers (2022-01-25T07:20:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.