Related papers: MCRapper: Monte-Carlo Rademacher Averages for Poset Families and Approximate Pattern Mining

MCRapper: Monte-Carlo Rademacher Averages for Poset Families and Approximate Pattern Mining

URL: http://arxiv.org/abs/2006.09085v1
Date: Tue, 16 Jun 2020 11:42:56 GMT
Title: MCRapper: Monte-Carlo Rademacher Averages for Poset Families and Approximate Pattern Mining
Authors: Leonardo Pellegrina, Cyrus Cousins, Fabio Vandin, Matteo Riondato
Abstract summary: We present MCRapper, an algorithm for efficient computation of Monte-Carlo Empirical Rademacher Averages (MCERA) MCRapper computes both statistically-significant functions (i.e., patterns) when the available data is seen as a sample from an unknown distribution, and approximations of collections of high-expectation functions (e.g., frequent patterns) when the available data is a small sample from a large dataset.
Score: 22.88915237311897
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present MCRapper, an algorithm for efficient computation of Monte-Carlo Empirical Rademacher Averages (MCERA) for families of functions exhibiting poset (e.g., lattice) structure, such as those that arise in many pattern mining tasks. The MCERA allows us to compute upper bounds to the maximum deviation of sample means from their expectations, thus it can be used to find both statistically-significant functions (i.e., patterns) when the available data is seen as a sample from an unknown distribution, and approximations of collections of high-expectation functions (e.g., frequent patterns) when the available data is a small sample from a large dataset. This feature is a strong improvement over previously proposed solutions that could only achieve one of the two. MCRapper uses upper bounds to the discrepancy of the functions to efficiently explore and prune the search space, a technique borrowed from pattern mining itself. To show the practical use of MCRapper, we employ it to develop an algorithm TFP-R for the task of True Frequent Pattern (TFP) mining. TFP-R gives guarantees on the probability of including any false positives (precision) and exhibits higher statistical power (recall) than existing methods offering the same guarantees. We evaluate MCRapper and TFP-R and show that they outperform the state-of-the-art for their respective tasks.

Related papers

seMCD: Sequentially implemented Monte Carlo depth computation with statistical guarantees [0.0]
Statistical depth functions provide center-outward orderings in spaces of dimension larger than one.<n>The numerical evaluation of such depth functions can be computationally prohibitive, even for relatively low dimensions.<n>We present a novel sequentially implemented Monte Carlo methodology for the computation of, theoretical and empirical, depth functions and related quantities.
arXiv Detail & Related papers (2025-07-08T17:59:07Z)
Efficient MCMC Sampling with Expensive-to-Compute and Irregular Likelihoods [2.299872239734834]
We explore several sampling algorithms that make use of subset evaluations to reduce computational overhead.<n>We introduce data-driven proxies in place of Taylor expansions and define a novel-cost aware adaptive controller.<n>We find our improved version of Importance with Nested Training Samples (HINTS), with adaptive proposals and a data-driven proxy, obtains the best sampling error in a fixed computational budget.
arXiv Detail & Related papers (2025-05-15T16:06:44Z)
A distribution-guided Mapper algorithm [0.3683202928838613]
We introduce a distribution guided Mapper algorithm named D-Mapper. Our proposed algorithm is a probabilistic model-based approach, which could serve as an alternative to non-prababilistic ones. Our numerical experiments indicate that the D-Mapper outperforms the classical Mapper algorithm in various scenarios.
arXiv Detail & Related papers (2024-01-19T17:07:05Z)
Provably Efficient Representation Learning with Tractable Planning in Low-Rank POMDP [81.00800920928621]
We study representation learning in partially observable Markov Decision Processes (POMDPs) We first present an algorithm for decodable POMDPs that combines maximum likelihood estimation (MLE) and optimism in the face of uncertainty (OFU) We then show how to adapt this algorithm to also work in the broader class of $gamma$-observable POMDPs.
arXiv Detail & Related papers (2023-06-21T16:04:03Z)
Nearly Optimal Latent State Decoding in Block MDPs [74.51224067640717]
In episodic Block MDPs, the decision maker has access to rich observations or contexts generated from a small number of latent states. We are first interested in estimating the latent state decoding function based on data generated under a fixed behavior policy. We then study the problem of learning near-optimal policies in the reward-free framework.
arXiv Detail & Related papers (2022-08-17T18:49:53Z)
Low-variance estimation in the Plackett-Luce model via quasi-Monte Carlo sampling [58.14878401145309]
We develop a novel approach to producing more sample-efficient estimators of expectations in the PL model. We illustrate our findings both theoretically and empirically using real-world recommendation data from Amazon Music and the Yahoo learning-to-rank challenge.
arXiv Detail & Related papers (2022-05-12T11:15:47Z)
Flexible Pattern Discovery and Analysis [2.075126998649103]
We introduce an algorithm for the mining of flexible high utility-occupancy patterns. The proposed algorithm can effectively control the length of the derived patterns, for both real-world and synthetic datasets.
arXiv Detail & Related papers (2021-11-24T01:25:15Z)
GFlowNet Foundations [66.69854262276391]
Generative Flow Networks (GFlowNets) have been introduced as a method to sample a diverse set of candidates in an active learning context. We show a number of additional theoretical properties of GFlowNets.
arXiv Detail & Related papers (2021-11-17T17:59:54Z)
Leverage Score Sampling for Complete Mode Coverage in Generative Adversarial Networks [11.595070613477548]
A generative model may overlook underrepresented modes that are less frequent in the empirical data distribution. We propose a sampling procedure based on ridge leverage scores which significantly improves mode coverage when compared to standard methods.
arXiv Detail & Related papers (2021-04-06T09:00:38Z)
Efficient semidefinite-programming-based inference for binary and multi-class MRFs [83.09715052229782]
We propose an efficient method for computing the partition function or MAP estimate in a pairwise MRF. We extend semidefinite relaxations from the typical binary MRF to the full multi-class setting, and develop a compact semidefinite relaxation that can again be solved efficiently using the solver.
arXiv Detail & Related papers (2020-12-04T15:36:29Z)
Particle-Gibbs Sampling For Bayesian Feature Allocation Models [77.57285768500225]
Most widely used MCMC strategies rely on an element wise Gibbs update of the feature allocation matrix. We have developed a Gibbs sampler that can update an entire row of the feature allocation matrix in a single move. This sampler is impractical for models with a large number of features as the computational complexity scales exponentially in the number of features. We develop a Particle Gibbs sampler that targets the same distribution as the row wise Gibbs updates, but has computational complexity that only grows linearly in the number of features.
arXiv Detail & Related papers (2020-01-25T22:11:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.