Mining Java Memory Errors using Subjective Interesting Subgroups with
Hierarchical Targets
- URL: http://arxiv.org/abs/2310.00781v1
- Date: Sun, 1 Oct 2023 20:24:59 GMT
- Title: Mining Java Memory Errors using Subjective Interesting Subgroups with
Hierarchical Targets
- Authors: Youcef Remil and Anes Bendimerad and Mathieu Chambard and Romain
Mathonat and Marc Plantevit and Mehdi Kaytoue
- Abstract summary: Subgroup Discovery (SD) is a data mining method that can automatically mine incident code and extract discriminant patterns to identify the root causes of issues.
We propose a novel SD approach that can handle complex target concepts with hierarchies.
We apply this framework to investigate out-of-memory errors and demonstrate its usefulness in incident diagnosis.
- Score: 1.188383832081829
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Software applications, especially Enterprise Resource Planning (ERP) systems,
are crucial to the day-to-day operations of many industries. Therefore, it is
essential to maintain these systems effectively using tools that can identify,
diagnose, and mitigate their incidents. One promising data-driven approach is
the Subgroup Discovery (SD) technique, a data mining method that can
automatically mine incident datasets and extract discriminant patterns to
identify the root causes of issues. However, current SD solutions have
limitations in handling complex target concepts with multiple attributes
organized hierarchically. To illustrate this scenario, we examine the case of
Java out-of-memory incidents among several possible applications. We have a
dataset that describes these incidents, including their context and the types
of Java objects occupying memory when it reaches saturation, with these types
arranged hierarchically. This scenario inspires us to propose a novel Subgroup
Discovery approach that can handle complex target concepts with hierarchies. To
achieve this, we design a pattern syntax and a quality measure that ensure the
identified subgroups are relevant, non-redundant, and resilient to noise. To
achieve the desired quality measure, we use the Subjective Interestingness
model that incorporates prior knowledge about the data and promotes patterns
that are both informative and surprising relative to that knowledge. We apply
this framework to investigate out-of-memory errors and demonstrate its
usefulness in incident diagnosis. To validate the effectiveness of our approach
and the quality of the identified patterns, we present an empirical study. The
source code and data used in the evaluation are publicly accessible, ensuring
transparency and reproducibility.
Related papers
- Resilience to the Flowing Unknown: an Open Set Recognition Framework for Data Streams [6.7236795813629]
This work investigates the application of an Open Set Recognition framework that combines classification and clustering to address the textitover-occupied space problem in streaming scenarios.
arXiv Detail & Related papers (2024-10-31T11:06:54Z) - Web-Scale Visual Entity Recognition: An LLM-Driven Data Approach [56.55633052479446]
Web-scale visual entity recognition presents significant challenges due to the lack of clean, large-scale training data.
We propose a novel methodology to curate such a dataset, leveraging a multimodal large language model (LLM) for label verification, metadata generation, and rationale explanation.
Experiments demonstrate that models trained on this automatically curated data achieve state-of-the-art performance on web-scale visual entity recognition tasks.
arXiv Detail & Related papers (2024-10-31T06:55:24Z) - Weakly-supervised Contrastive Learning for Unsupervised Object Discovery [52.696041556640516]
Unsupervised object discovery is promising due to its ability to discover objects in a generic manner.
We design a semantic-guided self-supervised learning model to extract high-level semantic features from images.
We introduce Principal Component Analysis (PCA) to localize object regions.
arXiv Detail & Related papers (2023-07-07T04:03:48Z) - infoVerse: A Universal Framework for Dataset Characterization with
Multidimensional Meta-information [68.76707843019886]
infoVerse is a universal framework for dataset characterization.
infoVerse captures multidimensional characteristics of datasets by incorporating various model-driven meta-information.
In three real-world applications (data pruning, active learning, and data annotation), the samples chosen on infoVerse space consistently outperform strong baselines.
arXiv Detail & Related papers (2023-05-30T18:12:48Z) - Data AUDIT: Identifying Attribute Utility- and Detectability-Induced
Bias in Task Models [8.420252576694583]
We present a first technique for the rigorous, quantitative screening of medical image datasets.
Our method decomposes the risks associated with dataset attributes in terms of their detectability and utility.
Using our method, we show our screening method reliably identifies nearly imperceptible bias-inducing artifacts.
arXiv Detail & Related papers (2023-04-06T16:50:15Z) - INoD: Injected Noise Discriminator for Self-Supervised Representation
Learning in Agricultural Fields [6.891600948991265]
We propose an Injected Noise Discriminator (INoD) which exploits principles of feature replacement and dataset discrimination for self-supervised representation learning.
INoD interleaves feature maps from two disjoint datasets during their convolutional encoding and predicts the dataset affiliation of the resultant feature map as a pretext task.
Our approach enables the network to learn unequivocal representations of objects seen in one dataset while observing them in conjunction with similar features from the disjoint dataset.
arXiv Detail & Related papers (2023-03-31T14:46:31Z) - A Framework for Verifiable and Auditable Federated Anomaly Detection [3.639790324866155]
Federated Leaning is an emerging approach to manage cooperation between a group of agents for the solution of Machine Learning tasks.
We present a novel algorithmic architecture that tackle this problem in the particular case of Anomaly Detection.
arXiv Detail & Related papers (2022-03-15T11:34:02Z) - Learning to Detect Instance-level Salient Objects Using Complementary
Image Labels [55.049347205603304]
We present the first weakly-supervised approach to the salient instance detection problem.
We propose a novel weakly-supervised network with three branches: a Saliency Detection Branch leveraging class consistency information to locate candidate objects; a Boundary Detection Branch exploiting class discrepancy information to delineate object boundaries; and a Centroid Detection Branch using subitizing information to detect salient instance centroids.
arXiv Detail & Related papers (2021-11-19T10:15:22Z) - Class Introspection: A Novel Technique for Detecting Unlabeled
Subclasses by Leveraging Classifier Explainability Methods [0.0]
latent structure is a crucial step in performing analysis of a dataset.
By leveraging instance explanation methods, an existing classifier can be extended to detect latent classes.
This paper also contains a pipeline for analyzing classifiers automatically, and a web application for interactively exploring the results from this technique.
arXiv Detail & Related papers (2021-07-04T14:58:29Z) - Predicting Themes within Complex Unstructured Texts: A Case Study on
Safeguarding Reports [66.39150945184683]
We focus on the problem of automatically identifying the main themes in a safeguarding report using supervised classification approaches.
Our results show the potential of deep learning models to simulate subject-expert behaviour even for complex tasks with limited labelled data.
arXiv Detail & Related papers (2020-10-27T19:48:23Z) - A Few-Shot Sequential Approach for Object Counting [63.82757025821265]
We introduce a class attention mechanism that sequentially attends to objects in the image and extracts their relevant features.
The proposed technique is trained on point-level annotations and uses a novel loss function that disentangles class-dependent and class-agnostic aspects of the model.
We present our results on a variety of object-counting/detection datasets, including FSOD and MS COCO.
arXiv Detail & Related papers (2020-07-03T18:23:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.