Block Model Guided Unsupervised Feature Selection
- URL: http://arxiv.org/abs/2007.02376v1
- Date: Sun, 5 Jul 2020 16:19:47 GMT
- Title: Block Model Guided Unsupervised Feature Selection
- Authors: Zilong Bai, Hoa Nguyen, Ian Davidson
- Abstract summary: We present a novel approach for graph-driven unsupervised feature selection for linked data.
We take the novel approach of first building a block model on the graph and then using the block model for feature selection.
Experimental results show that our method outperforms the state of the art on several real-world public datasets.
- Score: 32.21728295212875
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Feature selection is a core area of data mining with a recent innovation of
graph-driven unsupervised feature selection for linked data. In this setting we
have a dataset $\mathbf{Y}$ consisting of $n$ instances each with $m$ features
and a corresponding $n$ node graph (whose adjacency matrix is $\mathbf{A}$)
with an edge indicating that the two instances are similar. Existing efforts
for unsupervised feature selection on attributed networks have explored either
directly regenerating the links by solving for $f$ such that
$f(\mathbf{y}_i,\mathbf{y}_j) \approx \mathbf{A}_{i,j}$ or finding community
structure in $\mathbf{A}$ and using the features in $\mathbf{Y}$ to predict
these communities. However, graph-driven unsupervised feature selection remains
an understudied area with respect to exploring more complex guidance. Here we
take the novel approach of first building a block model on the graph and then
using the block model for feature selection. That is, we discover
$\mathbf{F}\mathbf{M}\mathbf{F}^T \approx \mathbf{A}$ and then find a subset of
features $\mathcal{S}$ that induces another graph to preserve both $\mathbf{F}$
and $\mathbf{M}$. We call our approach Block Model Guided Unsupervised Feature
Selection (BMGUFS). Experimental results show that our method outperforms the
state of the art on several real-world public datasets in finding high-quality
features for clustering.
Related papers
- Guarantees for Nonlinear Representation Learning: Non-identical Covariates, Dependent Data, Fewer Samples [24.45016514352055]
We study the sample-complexity of learning $T+1$ functions $f_star(t) circ g_star$ from a function class $mathcal F times mathcal G$.
We show that as the number of tasks $T$ increases, both the sample requirement and risk bound converge to that of $r$-dimensional regression.
arXiv Detail & Related papers (2024-10-15T03:20:19Z) - Online Learning of Halfspaces with Massart Noise [47.71073318490341]
We study the task of online learning in the presence of Massart noise.
We present a computationally efficient algorithm that achieves mistake bound $eta T + o(T)$.
We use our Massart online learner to design an efficient bandit algorithm that obtains expected reward at least $(1-1/k) Delta T - o(T)$ bigger than choosing a random action at every round.
arXiv Detail & Related papers (2024-05-21T17:31:10Z) - Provably learning a multi-head attention layer [55.2904547651831]
Multi-head attention layer is one of the key components of the transformer architecture that sets it apart from traditional feed-forward models.
In this work, we initiate the study of provably learning a multi-head attention layer from random examples.
We prove computational lower bounds showing that in the worst case, exponential dependence on $m$ is unavoidable.
arXiv Detail & Related papers (2024-02-06T15:39:09Z) - SQ Lower Bounds for Learning Mixtures of Linear Classifiers [43.63696593768504]
We show that known algorithms for this problem are essentially best possible, even for the special case of uniform mixtures.
The key technical ingredient is a new construction of spherical designs that may be of independent interest.
arXiv Detail & Related papers (2023-10-18T10:56:57Z) - Learning a Single Neuron with Adversarial Label Noise via Gradient
Descent [50.659479930171585]
We study a function of the form $mathbfxmapstosigma(mathbfwcdotmathbfx)$ for monotone activations.
The goal of the learner is to output a hypothesis vector $mathbfw$ that $F(mathbbw)=C, epsilon$ with high probability.
arXiv Detail & Related papers (2022-06-17T17:55:43Z) - Overparametrized linear dimensionality reductions: From projection
pursuit to two-layer neural networks [10.368585938419619]
Given a cloud of $n$ data points in $mathbbRd$, consider all projections onto $m$-dimensional subspaces of $mathbbRd$.
What does this collection of probability distributions look like when $n,d$ grow large?
Denoting by $mathscrF_m, alpha$ the set of probability distributions in $mathbbRm$ that arise as low-dimensional projections in this limit, we establish new inner and outer bounds on $mathscrF_
arXiv Detail & Related papers (2022-06-14T00:07:33Z) - Fast Graph Sampling for Short Video Summarization using Gershgorin Disc
Alignment [52.577757919003844]
We study the problem of efficiently summarizing a short video into several paragraphs, leveraging recent progress in fast graph sampling.
Experimental results show that our algorithm achieves comparable video summarization as state-of-the-art methods, at a substantially reduced complexity.
arXiv Detail & Related papers (2021-10-21T18:43:00Z) - Random matrices in service of ML footprint: ternary random features with
no performance loss [55.30329197651178]
We show that the eigenspectrum of $bf K$ is independent of the distribution of the i.i.d. entries of $bf w$.
We propose a novel random technique, called Ternary Random Feature (TRF)
The computation of the proposed random features requires no multiplication and a factor of $b$ less bits for storage compared to classical random features.
arXiv Detail & Related papers (2021-10-05T09:33:49Z) - Model Selection with Near Optimal Rates for Reinforcement Learning with
General Model Classes [27.361399036211694]
We address the problem of model selection for the finite horizon episodic Reinforcement Learning (RL) problem.
In the model selection framework, instead of $mathcalP*$, we are given $M$ nested families of transition kernels.
We show that textttARL-GEN obtains a regret of $TildemathcalO(d_mathcalE*H2+sqrtd_mathcalE* mathbbM* H2 T)$
arXiv Detail & Related papers (2021-07-13T05:00:38Z) - Agnostic Learning of a Single Neuron with Gradient Descent [92.7662890047311]
We consider the problem of learning the best-fitting single neuron as measured by the expected square loss.
For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$.
For the ReLU activation, our population risk guarantee is $O(mathsfOPT1/2)+epsilon$.
arXiv Detail & Related papers (2020-05-29T07:20:35Z) - High-Dimensional Feature Selection for Genomic Datasets [3.9499155245102275]
A central problem in machine learning and pattern recognition is the process of recognizing the most important features.
In this paper, we provide a new feature selection method (T) that consists of first removing the irrelevant features and then detecting correlations between the remaining features.
The effectiveness of DRPT has been verified by performing a series of comparisons with seven state-of-the-art feature selection methods over ten genetic datasets.
arXiv Detail & Related papers (2020-02-27T14:17:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.