Discovering Frequent Gradual Itemsets with Imprecise Data
- URL: http://arxiv.org/abs/2005.11045v1
- Date: Fri, 22 May 2020 08:02:15 GMT
- Title: Discovering Frequent Gradual Itemsets with Imprecise Data
- Authors: Micha\"el Chirmeni Boujike, Jerry Lonlac, Norbert Tsopze, Engelbert
Mephu Nguifo
- Abstract summary: The gradual patterns that model the complex co-variations of attributes of the form "The more/less X, The more/less Y" play a crucial role in many real world applications.
This paper suggests to introduce the gradualness thresholds from which to consider an increase or a decrease.
- Score: 0.4874780144224056
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The gradual patterns that model the complex co-variations of attributes of
the form "The more/less X, The more/less Y" play a crucial role in many real
world applications where the amount of numerical data to manage is important,
this is the biological data. Recently, these types of patterns have caught the
attention of the data mining community, where several methods have been defined
to automatically extract and manage these patterns from different data models.
However, these methods are often faced the problem of managing the quantity of
mined patterns, and in many practical applications, the calculation of all
these patterns can prove to be intractable for the user-defined frequency
threshold and the lack of focus leads to generating huge collections of
patterns. Moreover another problem with the traditional approaches is that the
concept of gradualness is defined just as an increase or a decrease. Indeed, a
gradualness is considered as soon as the values of the attribute on both
objects are different. As a result, numerous quantities of patterns extracted
by traditional algorithms can be presented to the user although their
gradualness is only a noise effect in the data. To address this issue, this
paper suggests to introduce the gradualness thresholds from which to consider
an increase or a decrease. In contrast to literature approaches, the proposed
approach takes into account the distribution of attribute values, as well as
the user's preferences on the gradualness threshold and makes it possible to
extract gradual patterns on certain databases where literature approaches fail
due to too large search space. Moreover, results from an experimental
evaluation on real databases show that the proposed algorithm is scalable,
efficient, and can eliminate numerous patterns that do not verify specific
gradualness requirements to show a small set of patterns to the user.
Related papers
- Enhancing Fine-Grained Visual Recognition in the Low-Data Regime Through Feature Magnitude Regularization [23.78498670529746]
We introduce a regularization technique to ensure that the magnitudes of the extracted features are evenly distributed.
Despite its apparent simplicity, our approach has demonstrated significant performance improvements across various fine-grained visual recognition datasets.
arXiv Detail & Related papers (2024-09-03T07:32:46Z) - Pattern based learning and optimisation through pricing for bin packing problem [50.83768979636913]
We argue that when problem conditions such as the distributions of random variables change, the patterns that performed well in previous circumstances may become less effective.
We propose a novel scheme to efficiently identify patterns and dynamically quantify their values for each specific condition.
Our method quantifies the value of patterns based on their ability to satisfy constraints and their effects on the objective value.
arXiv Detail & Related papers (2024-08-27T17:03:48Z) - USD: Unsupervised Soft Contrastive Learning for Fault Detection in Multivariate Time Series [6.055410677780381]
We introduce a combination of data augmentation and soft contrastive learning, specifically designed to capture the multifaceted nature of state behaviors more accurately.
This dual strategy significantly boosts the model's ability to distinguish between normal and abnormal states, leading to a marked improvement in fault detection performance across multiple datasets and settings.
arXiv Detail & Related papers (2024-05-25T14:48:04Z) - Latent variable model for high-dimensional point process with structured missingness [4.451479907610764]
Longitudinal data are important in numerous fields, such as healthcare, sociology and seismology.
Real-world datasets can be high-dimensional, contain structured missingness patterns, and measurement time points can be governed by an unknown process.
We propose a flexible and efficient latent-variable model that is capable of addressing all these limitations.
arXiv Detail & Related papers (2024-02-08T15:41:48Z) - Stochastic Amortization: A Unified Approach to Accelerate Feature and Data Attribution [62.71425232332837]
We show that training amortized models with noisy labels is inexpensive and surprisingly effective.
This approach significantly accelerates several feature attribution and data valuation methods, often yielding an order of magnitude speedup over existing approaches.
arXiv Detail & Related papers (2024-01-29T03:42:37Z) - ChiroDiff: Modelling chirographic data with Diffusion Models [132.5223191478268]
We introduce a powerful model-class namely "Denoising Diffusion Probabilistic Models" or DDPMs for chirographic data.
Our model named "ChiroDiff", being non-autoregressive, learns to capture holistic concepts and therefore remains resilient to higher temporal sampling rate.
arXiv Detail & Related papers (2023-04-07T15:17:48Z) - Learning with Noisy labels via Self-supervised Adversarial Noisy Masking [33.87292143223425]
We propose a novel training approach termed adversarial noisy masking.
It adaptively modulates the input data and label simultaneously, preventing the model to overfit noisy samples.
It is tested on both synthetic and real-world noisy datasets.
arXiv Detail & Related papers (2023-02-14T03:13:26Z) - Perceptual Score: What Data Modalities Does Your Model Perceive? [73.75255606437808]
We introduce the perceptual score, a metric that assesses the degree to which a model relies on the different subsets of the input features.
We find that recent, more accurate multi-modal models for visual question-answering tend to perceive the visual data less than their predecessors.
Using the perceptual score also helps to analyze model biases by decomposing the score into data subset contributions.
arXiv Detail & Related papers (2021-10-27T12:19:56Z) - Goal-directed Generation of Discrete Structures with Conditional
Generative Models [85.51463588099556]
We introduce a novel approach to directly optimize a reinforcement learning objective, maximizing an expected reward.
We test our methodology on two tasks: generating molecules with user-defined properties and identifying short python expressions which evaluate to a given target value.
arXiv Detail & Related papers (2020-10-05T20:03:13Z) - Variable Skipping for Autoregressive Range Density Estimation [84.60428050170687]
We show a technique, variable skipping, for accelerating range density estimation over deep autoregressive models.
We show that variable skipping provides 10-100$times$ efficiency improvements when targeting challenging high-quantile error metrics.
arXiv Detail & Related papers (2020-07-10T19:01:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.