MCRapper: Monte-Carlo Rademacher Averages for Poset Families and
Approximate Pattern Mining
- URL: http://arxiv.org/abs/2006.09085v1
- Date: Tue, 16 Jun 2020 11:42:56 GMT
- Title: MCRapper: Monte-Carlo Rademacher Averages for Poset Families and
Approximate Pattern Mining
- Authors: Leonardo Pellegrina, Cyrus Cousins, Fabio Vandin, Matteo Riondato
- Abstract summary: We present MCRapper, an algorithm for efficient computation of Monte-Carlo Empirical Rademacher Averages (MCERA)
MCRapper computes both statistically-significant functions (i.e., patterns) when the available data is seen as a sample from an unknown distribution, and approximations of collections of high-expectation functions (e.g., frequent patterns) when the available data is a small sample from a large dataset.
- Score: 22.88915237311897
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present MCRapper, an algorithm for efficient computation of Monte-Carlo
Empirical Rademacher Averages (MCERA) for families of functions exhibiting
poset (e.g., lattice) structure, such as those that arise in many pattern
mining tasks. The MCERA allows us to compute upper bounds to the maximum
deviation of sample means from their expectations, thus it can be used to find
both statistically-significant functions (i.e., patterns) when the available
data is seen as a sample from an unknown distribution, and approximations of
collections of high-expectation functions (e.g., frequent patterns) when the
available data is a small sample from a large dataset. This feature is a strong
improvement over previously proposed solutions that could only achieve one of
the two. MCRapper uses upper bounds to the discrepancy of the functions to
efficiently explore and prune the search space, a technique borrowed from
pattern mining itself. To show the practical use of MCRapper, we employ it to
develop an algorithm TFP-R for the task of True Frequent Pattern (TFP) mining.
TFP-R gives guarantees on the probability of including any false positives
(precision) and exhibits higher statistical power (recall) than existing
methods offering the same guarantees. We evaluate MCRapper and TFP-R and show
that they outperform the state-of-the-art for their respective tasks.
Related papers
- A distribution-guided Mapper algorithm [0.3683202928838613]
We introduce a distribution guided Mapper algorithm named D-Mapper.
Our proposed algorithm is a probabilistic model-based approach, which could serve as an alternative to non-prababilistic ones.
Our numerical experiments indicate that the D-Mapper outperforms the classical Mapper algorithm in various scenarios.
arXiv Detail & Related papers (2024-01-19T17:07:05Z) - Provably Efficient Representation Learning with Tractable Planning in
Low-Rank POMDP [81.00800920928621]
We study representation learning in partially observable Markov Decision Processes (POMDPs)
We first present an algorithm for decodable POMDPs that combines maximum likelihood estimation (MLE) and optimism in the face of uncertainty (OFU)
We then show how to adapt this algorithm to also work in the broader class of $gamma$-observable POMDPs.
arXiv Detail & Related papers (2023-06-21T16:04:03Z) - Nearly Optimal Latent State Decoding in Block MDPs [74.51224067640717]
In episodic Block MDPs, the decision maker has access to rich observations or contexts generated from a small number of latent states.
We are first interested in estimating the latent state decoding function based on data generated under a fixed behavior policy.
We then study the problem of learning near-optimal policies in the reward-free framework.
arXiv Detail & Related papers (2022-08-17T18:49:53Z) - Low-variance estimation in the Plackett-Luce model via quasi-Monte Carlo
sampling [58.14878401145309]
We develop a novel approach to producing more sample-efficient estimators of expectations in the PL model.
We illustrate our findings both theoretically and empirically using real-world recommendation data from Amazon Music and the Yahoo learning-to-rank challenge.
arXiv Detail & Related papers (2022-05-12T11:15:47Z) - Flexible Pattern Discovery and Analysis [2.075126998649103]
We introduce an algorithm for the mining of flexible high utility-occupancy patterns.
The proposed algorithm can effectively control the length of the derived patterns, for both real-world and synthetic datasets.
arXiv Detail & Related papers (2021-11-24T01:25:15Z) - GFlowNet Foundations [66.69854262276391]
Generative Flow Networks (GFlowNets) have been introduced as a method to sample a diverse set of candidates in an active learning context.
We show a number of additional theoretical properties of GFlowNets.
arXiv Detail & Related papers (2021-11-17T17:59:54Z) - Leverage Score Sampling for Complete Mode Coverage in Generative
Adversarial Networks [11.595070613477548]
A generative model may overlook underrepresented modes that are less frequent in the empirical data distribution.
We propose a sampling procedure based on ridge leverage scores which significantly improves mode coverage when compared to standard methods.
arXiv Detail & Related papers (2021-04-06T09:00:38Z) - Efficient semidefinite-programming-based inference for binary and
multi-class MRFs [83.09715052229782]
We propose an efficient method for computing the partition function or MAP estimate in a pairwise MRF.
We extend semidefinite relaxations from the typical binary MRF to the full multi-class setting, and develop a compact semidefinite relaxation that can again be solved efficiently using the solver.
arXiv Detail & Related papers (2020-12-04T15:36:29Z) - Particle-Gibbs Sampling For Bayesian Feature Allocation Models [77.57285768500225]
Most widely used MCMC strategies rely on an element wise Gibbs update of the feature allocation matrix.
We have developed a Gibbs sampler that can update an entire row of the feature allocation matrix in a single move.
This sampler is impractical for models with a large number of features as the computational complexity scales exponentially in the number of features.
We develop a Particle Gibbs sampler that targets the same distribution as the row wise Gibbs updates, but has computational complexity that only grows linearly in the number of features.
arXiv Detail & Related papers (2020-01-25T22:11:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.