MINTY: Rule-based Models that Minimize the Need for Imputing Features
with Missing Values
- URL: http://arxiv.org/abs/2311.14108v1
- Date: Thu, 23 Nov 2023 17:09:12 GMT
- Title: MINTY: Rule-based Models that Minimize the Need for Imputing Features
with Missing Values
- Authors: Lena Stempfle and Fredrik D. Johansson
- Abstract summary: MINTY is a method that learns rules in the form of disjunctions between variables that act as replacements for each other when one or more is missing.
We demonstrate the value of MINTY in experiments using synthetic and real-world data sets and find its predictive performance comparable or favorable to baselines.
- Score: 10.591844776850857
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Rule models are often preferred in prediction tasks with tabular inputs as
they can be easily interpreted using natural language and provide predictive
performance on par with more complex models. However, most rule models'
predictions are undefined or ambiguous when some inputs are missing, forcing
users to rely on statistical imputation models or heuristics like zero
imputation, undermining the interpretability of the models. In this work, we
propose fitting concise yet precise rule models that learn to avoid relying on
features with missing values and, therefore, limit their reliance on imputation
at test time. We develop MINTY, a method that learns rules in the form of
disjunctions between variables that act as replacements for each other when one
or more is missing. This results in a sparse linear rule model, regularized to
have small dependence on features with missing values, that allows a trade-off
between goodness of fit, interpretability, and robustness to missing values at
test time. We demonstrate the value of MINTY in experiments using synthetic and
real-world data sets and find its predictive performance comparable or
favorable to baselines, with smaller reliance on features with missing values.
Related papers
- COME: Test-time adaption by Conservatively Minimizing Entropy [45.689829178140634]
Conservatively Minimize the Entropy (COME) is a drop-in replacement of traditional entropy (EM)
COME explicitly models the uncertainty by characterizing a Dirichlet prior distribution over model predictions.
We show that COME achieves state-of-the-art performance on commonly used benchmarks.
arXiv Detail & Related papers (2024-10-12T09:20:06Z) - Efficient Shapley Values Estimation by Amortization for Text
Classification [66.7725354593271]
We develop an amortized model that directly predicts each input feature's Shapley Value without additional model evaluations.
Experimental results on two text classification datasets demonstrate that our amortized model estimates Shapley Values accurately with up to 60 times speedup.
arXiv Detail & Related papers (2023-05-31T16:19:13Z) - Improving Adaptive Conformal Prediction Using Self-Supervised Learning [72.2614468437919]
We train an auxiliary model with a self-supervised pretext task on top of an existing predictive model and use the self-supervised error as an additional feature to estimate nonconformity scores.
We empirically demonstrate the benefit of the additional information using both synthetic and real data on the efficiency (width), deficit, and excess of conformal prediction intervals.
arXiv Detail & Related papers (2023-02-23T18:57:14Z) - Predicting is not Understanding: Recognizing and Addressing
Underspecification in Machine Learning [47.651130958272155]
Underspecification refers to the existence of multiple models that are indistinguishable in their in-domain accuracy.
We formalize the concept of underspecification and propose a method to identify and partially address it.
arXiv Detail & Related papers (2022-07-06T11:20:40Z) - Sharing pattern submodels for prediction with missing values [12.981974894538668]
Missing values are unavoidable in many applications of machine learning and present challenges both during training and at test time.
We propose an alternative approach, called sharing pattern submodels, which i) makes predictions robust to missing values at test time, ii) maintains or improves the predictive power of pattern submodels andiii) has a short description, enabling improved interpretability.
arXiv Detail & Related papers (2022-06-22T15:09:40Z) - Minimax rate of consistency for linear models with missing values [0.0]
Missing values arise in most real-world data sets due to the aggregation of multiple sources and intrinsically missing information (sensor failure, unanswered questions in surveys...).
In this paper, we focus on the extensively-studied linear models, but in presence of missing values, which turns out to be quite a challenging task.
This eventually requires to solve a number of learning tasks, exponential in the number of input features, which makes predictions impossible for current real-world datasets.
arXiv Detail & Related papers (2022-02-03T08:45:34Z) - Imputation-Free Learning from Incomplete Observations [73.15386629370111]
We introduce the importance of guided gradient descent (IGSGD) method to train inference from inputs containing missing values without imputation.
We employ reinforcement learning (RL) to adjust the gradients used to train the models via back-propagation.
Our imputation-free predictions outperform the traditional two-step imputation-based predictions using state-of-the-art imputation methods.
arXiv Detail & Related papers (2021-07-05T12:44:39Z) - Goal-directed Generation of Discrete Structures with Conditional
Generative Models [85.51463588099556]
We introduce a novel approach to directly optimize a reinforcement learning objective, maximizing an expected reward.
We test our methodology on two tasks: generating molecules with user-defined properties and identifying short python expressions which evaluate to a given target value.
arXiv Detail & Related papers (2020-10-05T20:03:13Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z) - On the consistency of supervised learning with missing values [15.666860186278782]
In many application settings, the data have missing entries which make analysis challenging.
Here, we consider supervised-learning settings: predicting a target when missing values appear in both training and testing data.
We show that the widely-used method of imputing with a constant, such as the mean prior to learning, is consistent when missing values are not informative.
arXiv Detail & Related papers (2019-02-19T07:27:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.