Discovering Frequent Gradual Itemsets with Imprecise Data
- URL: http://arxiv.org/abs/2005.11045v1
- Date: Fri, 22 May 2020 08:02:15 GMT
- Title: Discovering Frequent Gradual Itemsets with Imprecise Data
- Authors: Micha\"el Chirmeni Boujike, Jerry Lonlac, Norbert Tsopze, Engelbert
Mephu Nguifo
- Abstract summary: The gradual patterns that model the complex co-variations of attributes of the form "The more/less X, The more/less Y" play a crucial role in many real world applications.
This paper suggests to introduce the gradualness thresholds from which to consider an increase or a decrease.
- Score: 0.4874780144224056
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The gradual patterns that model the complex co-variations of attributes of
the form "The more/less X, The more/less Y" play a crucial role in many real
world applications where the amount of numerical data to manage is important,
this is the biological data. Recently, these types of patterns have caught the
attention of the data mining community, where several methods have been defined
to automatically extract and manage these patterns from different data models.
However, these methods are often faced the problem of managing the quantity of
mined patterns, and in many practical applications, the calculation of all
these patterns can prove to be intractable for the user-defined frequency
threshold and the lack of focus leads to generating huge collections of
patterns. Moreover another problem with the traditional approaches is that the
concept of gradualness is defined just as an increase or a decrease. Indeed, a
gradualness is considered as soon as the values of the attribute on both
objects are different. As a result, numerous quantities of patterns extracted
by traditional algorithms can be presented to the user although their
gradualness is only a noise effect in the data. To address this issue, this
paper suggests to introduce the gradualness thresholds from which to consider
an increase or a decrease. In contrast to literature approaches, the proposed
approach takes into account the distribution of attribute values, as well as
the user's preferences on the gradualness threshold and makes it possible to
extract gradual patterns on certain databases where literature approaches fail
due to too large search space. Moreover, results from an experimental
evaluation on real databases show that the proposed algorithm is scalable,
efficient, and can eliminate numerous patterns that do not verify specific
gradualness requirements to show a small set of patterns to the user.
Related papers
- USD: Unsupervised Soft Contrastive Learning for Fault Detection in Multivariate Time Series [6.055410677780381]
We introduce a combination of data augmentation and soft contrastive learning, specifically designed to capture the multifaceted nature of state behaviors more accurately.
This dual strategy significantly boosts the model's ability to distinguish between normal and abnormal states, leading to a marked improvement in fault detection performance across multiple datasets and settings.
arXiv Detail & Related papers (2024-05-25T14:48:04Z) - Latent variable model for high-dimensional point process with structured missingness [4.451479907610764]
Longitudinal data are important in numerous fields, such as healthcare, sociology and seismology.
Real-world datasets can be high-dimensional, contain structured missingness patterns, and measurement time points can be governed by an unknown process.
We propose a flexible and efficient latent-variable model that is capable of addressing all these limitations.
arXiv Detail & Related papers (2024-02-08T15:41:48Z) - Stochastic Amortization: A Unified Approach to Accelerate Feature and
Data Attribution [67.28273187033693]
We show that training a network that directly predicts the desired output, known as amortization, is inexpensive and surprisingly effective.
This approach significantly accelerates several feature attribution and data valuation methods, often yielding an order of magnitude speedup over existing approaches.
arXiv Detail & Related papers (2024-01-29T03:42:37Z) - Graph Spatiotemporal Process for Multivariate Time Series Anomaly
Detection with Missing Values [67.76168547245237]
We introduce a novel framework called GST-Pro, which utilizes a graphtemporal process and anomaly scorer to detect anomalies.
Our experimental results show that the GST-Pro method can effectively detect anomalies in time series data and outperforms state-of-the-art methods.
arXiv Detail & Related papers (2024-01-11T10:10:16Z) - ChiroDiff: Modelling chirographic data with Diffusion Models [132.5223191478268]
We introduce a powerful model-class namely "Denoising Diffusion Probabilistic Models" or DDPMs for chirographic data.
Our model named "ChiroDiff", being non-autoregressive, learns to capture holistic concepts and therefore remains resilient to higher temporal sampling rate.
arXiv Detail & Related papers (2023-04-07T15:17:48Z) - Learning with Noisy labels via Self-supervised Adversarial Noisy Masking [33.87292143223425]
We propose a novel training approach termed adversarial noisy masking.
It adaptively modulates the input data and label simultaneously, preventing the model to overfit noisy samples.
It is tested on both synthetic and real-world noisy datasets.
arXiv Detail & Related papers (2023-02-14T03:13:26Z) - Semi-supervised Deep Learning for Image Classification with Distribution
Mismatch: A Survey [1.5469452301122175]
Deep learning models rely on the abundance of labelled observations to train a prospective model.
It is expensive to gather labelled observations of data, making the usage of deep learning models not ideal.
In many situations different unlabelled data sources might be available.
This raises the risk of a significant distribution mismatch between the labelled and unlabelled datasets.
arXiv Detail & Related papers (2022-03-01T02:46:00Z) - Perceptual Score: What Data Modalities Does Your Model Perceive? [73.75255606437808]
We introduce the perceptual score, a metric that assesses the degree to which a model relies on the different subsets of the input features.
We find that recent, more accurate multi-modal models for visual question-answering tend to perceive the visual data less than their predecessors.
Using the perceptual score also helps to analyze model biases by decomposing the score into data subset contributions.
arXiv Detail & Related papers (2021-10-27T12:19:56Z) - Goal-directed Generation of Discrete Structures with Conditional
Generative Models [85.51463588099556]
We introduce a novel approach to directly optimize a reinforcement learning objective, maximizing an expected reward.
We test our methodology on two tasks: generating molecules with user-defined properties and identifying short python expressions which evaluate to a given target value.
arXiv Detail & Related papers (2020-10-05T20:03:13Z) - Variable Skipping for Autoregressive Range Density Estimation [84.60428050170687]
We show a technique, variable skipping, for accelerating range density estimation over deep autoregressive models.
We show that variable skipping provides 10-100$times$ efficiency improvements when targeting challenging high-quantile error metrics.
arXiv Detail & Related papers (2020-07-10T19:01:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.