A Self-Penalizing Objective Function for Scalable Interaction Detection
- URL: http://arxiv.org/abs/2011.12215v2
- Date: Sun, 13 Dec 2020 00:14:34 GMT
- Title: A Self-Penalizing Objective Function for Scalable Interaction Detection
- Authors: Keli Liu and Feng Ruan
- Abstract summary: We tackle the problem of nonparametric variable selection with a computation on discovering interactions between variables.
The trick is to maximize parametrized nonparametric dependence measures which we call metric learning objectives.
- Score: 2.208242292882514
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We tackle the problem of nonparametric variable selection with a focus on
discovering interactions between variables. With $p$ variables there are
$O(p^s)$ possible order-$s$ interactions making exhaustive search infeasible.
It is nonetheless possible to identify the variables involved in interactions
with only linear computation cost, $O(p)$. The trick is to maximize a class of
parametrized nonparametric dependence measures which we call metric learning
objectives; the landscape of these nonconvex objective functions is sensitive
to interactions but the objectives themselves do not explicitly model
interactions. Three properties make metric learning objectives highly
attractive:
(a) The stationary points of the objective are automatically sparse (i.e.
performs selection) -- no explicit $\ell_1$ penalization is needed.
(b) All stationary points of the objective exclude noise variables with high
probability.
(c) Guaranteed recovery of all signal variables without needing to reach the
objective's global maxima or special stationary points.
The second and third properties mean that all our theoretical results apply
in the practical case where one uses gradient ascent to maximize the metric
learning objective. While not all metric learning objectives enjoy good
statistical power, we design an objective based on $\ell_1$ kernels that does
exhibit favorable power: it recovers (i) main effects with $n \sim \log p$
samples, (ii) hierarchical interactions with $n \sim \log p$ samples and (iii)
order-$s$ pure interactions with $n \sim p^{2(s-1)}\log p$ samples.
Related papers
- $f$-MICL: Understanding and Generalizing InfoNCE-based Contrastive
Learning [37.45319637345343]
In contrastive learning, a widely-adopted objective function is InfoNCE, which uses the Gaussian cosine similarity for the representation comparison.
In this paper, we aim at answering two intriguing questions: (1) Can we go beyond the KL-based objective?
We provide answers by generalizing the KL-based mutual information to the $f$-Mutual Information in Contrastive Learning ($f$-MICL) using the $f$-divergences.
arXiv Detail & Related papers (2024-02-15T17:57:54Z) - Efficient Conditionally Invariant Representation Learning [41.320360597120604]
Conditional Independence Regression CovariancE (CIRCE)
Measures of conditional feature dependence require multiple regressions for each step of feature learning.
In experiments, we show superior performance to previous methods on challenging benchmarks.
arXiv Detail & Related papers (2022-12-16T18:39:32Z) - Multi-Task Imitation Learning for Linear Dynamical Systems [50.124394757116605]
We study representation learning for efficient imitation learning over linear systems.
We find that the imitation gap over trajectories generated by the learned target policy is bounded by $tildeOleft( frack n_xHN_mathrmshared + frack n_uN_mathrmtargetright)$.
arXiv Detail & Related papers (2022-12-01T00:14:35Z) - Sharper Rates and Flexible Framework for Nonconvex SGD with Client and
Data Sampling [64.31011847952006]
We revisit the problem of finding an approximately stationary point of the average of $n$ smooth and possibly non-color functions.
We generalize the $smallsfcolorgreen$ so that it can provably work with virtually any sampling mechanism.
We provide the most general and most accurate analysis of optimal bound in the smooth non-color regime.
arXiv Detail & Related papers (2022-06-05T21:32:33Z) - Approximate Function Evaluation via Multi-Armed Bandits [51.146684847667125]
We study the problem of estimating the value of a known smooth function $f$ at an unknown point $boldsymbolmu in mathbbRn$, where each component $mu_i$ can be sampled via a noisy oracle.
We design an instance-adaptive algorithm that learns to sample according to the importance of each coordinate, and with probability at least $1-delta$ returns an $epsilon$ accurate estimate of $f(boldsymbolmu)$.
arXiv Detail & Related papers (2022-03-18T18:50:52Z) - Agnostic learning with unknown utilities [70.14742836006042]
In many real-world problems, the utility of a decision depends on the underlying context $x$ and decision $y$.
We study this as agnostic learning with unknown utilities.
We show that estimating the utilities of only the sampled points$S$ suffices to learn a decision function which generalizes well.
arXiv Detail & Related papers (2021-04-17T08:22:04Z) - $PredDiff$: Explanations and Interactions from Conditional Expectations [0.3655021726150368]
$PredDiff$ is a model-agnostic, local attribution method rooted in probability theory.
In this work, we clarify properties of $PredDiff$ and put forward several extensions of the original formalism.
arXiv Detail & Related papers (2021-02-26T14:46:47Z) - Nearly Optimal Regret for Learning Adversarial MDPs with Linear Function
Approximation [92.3161051419884]
We study the reinforcement learning for finite-horizon episodic Markov decision processes with adversarial reward and full information feedback.
We show that it can achieve $tildeO(dHsqrtT)$ regret, where $H$ is the length of the episode.
We also prove a matching lower bound of $tildeOmega(dHsqrtT)$ up to logarithmic factors.
arXiv Detail & Related papers (2021-02-17T18:54:08Z) - Intervention Efficient Algorithms for Approximate Learning of Causal
Graphs [22.401163479802094]
We study the problem of learning the causal relationships between a set of observed variables in the presence of latents.
Our goal is to recover the directions of all causal or ancestral relations in $G$, via a minimum cost set of interventions.
Our algorithms combine work on efficient intervention design and the design of low-cost separating set systems.
arXiv Detail & Related papers (2020-12-27T17:08:46Z) - Learning to extrapolate using continued fractions: Predicting the
critical temperature of superconductor materials [5.905364646955811]
In the field of Artificial Intelligence (AI) and Machine Learning (ML), the approximation of unknown target functions $y=f(mathbfx)$ is a common objective.
We refer to $S$ as the training set and aim to identify a low-complexity mathematical model that can effectively approximate this target function for new instances $mathbfx$.
arXiv Detail & Related papers (2020-11-27T04:57:40Z) - Neural Bayes: A Generic Parameterization Method for Unsupervised
Representation Learning [175.34232468746245]
We introduce a parameterization method called Neural Bayes.
It allows computing statistical quantities that are in general difficult to compute.
We show two independent use cases for this parameterization.
arXiv Detail & Related papers (2020-02-20T22:28:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.