Related papers: Random Intersection Chains

Random Intersection Chains

URL: http://arxiv.org/abs/2104.04714v1
Date: Sat, 10 Apr 2021 08:41:15 GMT
Title: Random Intersection Chains
Authors: Qiuqiang Lin, Chuanhou Gao
Abstract summary: We propose a method that selects interactions of categorical features, called Random Intersection Chains. It uses random intersections to detect frequent patterns, then selects the most meaningful ones among them. We show that if the number and length of chains are appropriately chosen, the patterns in the tail nodes are indeed the most frequent ones in the data set.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Interactions between several features sometimes play an important role in prediction tasks. But taking all the interactions into consideration will lead to an extremely heavy computational burden. For categorical features, the situation is more complicated since the input will be extremely high-dimensional and sparse if one-hot encoding is applied. Inspired by association rule mining, we propose a method that selects interactions of categorical features, called Random Intersection Chains. It uses random intersections to detect frequent patterns, then selects the most meaningful ones among them. At first a number of chains are generated, in which each node is the intersection of the previous node and a random chosen observation. The frequency of patterns in the tail nodes is estimated by maximum likelihood estimation, then the patterns with largest estimated frequency are selected. After that, their confidence is calculated by Bayes' theorem. The most confident patterns are finally returned by Random Intersection Chains. We show that if the number and length of chains are appropriately chosen, the patterns in the tail nodes are indeed the most frequent ones in the data set. We analyze the computation complexity of the proposed algorithm and prove the convergence of the estimators. The results of a series of experiments verify the efficiency and effectiveness of the algorithm.

Related papers

Random Fourier Signature Features [8.766411351797885]
algebras give rise to one of the most powerful measures of similarity for sequences of arbitrary length called the signature kernel. Previous algorithms to compute the signature kernel scale quadratically in terms of the length and the number of the sequences. We develop a random Fourier feature-based acceleration of the signature kernel acting on the inherently non-Euclidean domain of sequences.
arXiv Detail & Related papers (2023-11-20T22:08:17Z)
Learning Temporal Point Processes for Efficient Retrieval of Continuous Time Event Sequences [24.963828650935913]
We propose NEUROSEQRET which learns to retrieve and rank a relevant set of continuous-time event sequences for a given query sequence. We develop two variants of the relevance model which offer a tradeoff between accuracy and efficiency. Our experiments with several datasets show the significant accuracy boost of NEUROSEQRET beyond several baselines.
arXiv Detail & Related papers (2022-02-17T11:16:31Z)
Optimal Clustering with Bandit Feedback [57.672609011609886]
This paper considers the problem of online clustering with bandit feedback. It includes a novel stopping rule for sequential testing that circumvents the need to solve any NP-hard weighted clustering problem as its subroutines. We show through extensive simulations on synthetic and real-world datasets that BOC's performance matches the lower boundally, and significantly outperforms a non-adaptive baseline algorithm.
arXiv Detail & Related papers (2022-02-09T06:05:05Z)
Complex Event Forecasting with Prediction Suffix Trees: Extended Technical Report [70.7321040534471]
Complex Event Recognition (CER) systems have become popular in the past two decades due to their ability to "instantly" detect patterns on real-time streams of events. There is a lack of methods for forecasting when a pattern might occur before such an occurrence is actually detected by a CER engine. We present a formal framework that attempts to address the issue of Complex Event Forecasting.
arXiv Detail & Related papers (2021-09-01T09:52:31Z)
Determinantal Beam Search [75.84501052642361]
Beam search is a go-to strategy for decoding neural sequence models. In use-cases that call for multiple solutions, a diverse or representative set is often desired. By posing iterations in beam search as a series of subdeterminant problems, we can turn the algorithm into a diverse subset selection process.
arXiv Detail & Related papers (2021-06-14T13:01:46Z)
Modeling Sequences as Distributions with Uncertainty for Sequential Recommendation [63.77513071533095]
Most existing sequential methods assume users are deterministic. Item-item transitions might fluctuate significantly in several item aspects and exhibit randomness of user interests. We propose a Distribution-based Transformer Sequential Recommendation (DT4SR) which injects uncertainties into sequential modeling.
arXiv Detail & Related papers (2021-06-11T04:35:21Z)
A Preference Random Walk Algorithm for Link Prediction through Mutual Influence Nodes in Complex Networks [1.345669927504424]
Local random walk is considered to be one of the most well-known algorithms in the category of quasi-local methods. In most datasets, this method is not able to perform accurately in scoring remarkably similar nodes. An efficient method is proposed for improving local random walk by encouraging random walk to move towards the node which has a stronger influence.
arXiv Detail & Related papers (2021-05-20T03:35:38Z)
Alleviate Exposure Bias in Sequence Prediction \\ with Recurrent Neural Networks [47.52214243454995]
A popular strategy to train recurrent neural networks (RNNs) is to take the ground truth as input at each time step. We propose a fully differentiable training algorithm for RNNs to better capture long-term dependencies.
arXiv Detail & Related papers (2021-03-22T06:15:22Z)
Consensus-Guided Correspondence Denoising [67.35345850146393]
We propose to denoise correspondences with a local-to-global consensus learning framework to robustly identify correspondence. A novel "pruning" block is introduced to distill reliable candidates from initial matches according to their consensus scores estimated by dynamic graphs from local to global regions. Our method outperforms state-of-the-arts on robust line fitting, wide-baseline image matching and image localization benchmarks by noticeable margins.
arXiv Detail & Related papers (2021-01-03T09:10:00Z)
Consistency of a Recurrent Language Model With Respect to Incomplete Decoding [67.54760086239514]
We study the issue of receiving infinite-length sequences from a recurrent language model. We propose two remedies which address inconsistency: consistent variants of top-k and nucleus sampling, and a self-terminating recurrent language model.
arXiv Detail & Related papers (2020-02-06T19:56:15Z)
Scalable Influence Estimation Without Sampling [9.873635079670091]
In a diffusion process on a network, how many nodes are expected to be influenced by a set of initial spreaders? Here, we suggest an algorithm for estimating the influence function in popular independent model based on a scalable dynamic message-passing approach. We also provide dynamic message-passing equations for a version of the linear threshold model.
arXiv Detail & Related papers (2019-12-29T22:15:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.