Monotonicity Regularization: Improved Penalties and Novel Applications
to Disentangled Representation Learning and Robust Classification
- URL: http://arxiv.org/abs/2205.08247v1
- Date: Tue, 17 May 2022 11:42:45 GMT
- Title: Monotonicity Regularization: Improved Penalties and Novel Applications
to Disentangled Representation Learning and Robust Classification
- Authors: Joao Monteiro, Mohamed Osama Ahmed, Hossein Hajimirsadeghi, Greg Mori
- Abstract summary: We study settings where gradient penalties are used alongside risk minimization.
We show that different choices of penalties define the regions of the input space where the property is observed.
We propose an approach that uses mixtures of training instances and random points to populate the space and enforce the penalty in a much larger region.
- Score: 27.827211361104222
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: We study settings where gradient penalties are used alongside risk
minimization with the goal of obtaining predictors satisfying different notions
of monotonicity. Specifically, we present two sets of contributions. In the
first part of the paper, we show that different choices of penalties define the
regions of the input space where the property is observed. As such, previous
methods result in models that are monotonic only in a small volume of the input
space. We thus propose an approach that uses mixtures of training instances and
random points to populate the space and enforce the penalty in a much larger
region. As a second set of contributions, we introduce regularization
strategies that enforce other notions of monotonicity in different settings. In
this case, we consider applications, such as image classification and
generative modeling, where monotonicity is not a hard constraint but can help
improve some aspects of the model. Namely, we show that inducing monotonicity
can be beneficial in applications such as: (1) allowing for controllable data
generation, (2) defining strategies to detect anomalous data, and (3)
generating explanations for predictions. Our proposed approaches do not
introduce relevant computational overhead while leading to efficient procedures
that provide extra benefits over baseline models.
Related papers
- Exploring Data Augmentations on Self-/Semi-/Fully- Supervised
Pre-trained Models [24.376036129920948]
We investigate how data augmentation affects performance of vision pre-trained models.
We apply 4 types of data augmentations termed with Random Erasing, CutOut, CutMix and MixUp.
We report their performance on vision tasks such as image classification, object detection, instance segmentation, and semantic segmentation.
arXiv Detail & Related papers (2023-10-28T23:46:31Z) - On Regularization and Inference with Label Constraints [62.60903248392479]
We compare two strategies for encoding label constraints in a machine learning pipeline, regularization with constraints and constrained inference.
For regularization, we show that it narrows the generalization gap by precluding models that are inconsistent with the constraints.
For constrained inference, we show that it reduces the population risk by correcting a model's violation, and hence turns the violation into an advantage.
arXiv Detail & Related papers (2023-07-08T03:39:22Z) - Just One Byte (per gradient): A Note on Low-Bandwidth Decentralized
Language Model Finetuning Using Shared Randomness [86.61582747039053]
Language model training in distributed settings is limited by the communication cost of exchanges.
We extend recent work using shared randomness to perform distributed fine-tuning with low bandwidth.
arXiv Detail & Related papers (2023-06-16T17:59:51Z) - Domain-Specific Risk Minimization for Out-of-Distribution Generalization [104.17683265084757]
We first establish a generalization bound that explicitly considers the adaptivity gap.
We propose effective gap estimation methods for guiding the selection of a better hypothesis for the target.
The other method is minimizing the gap directly by adapting model parameters using online target samples.
arXiv Detail & Related papers (2022-08-18T06:42:49Z) - Two-level monotonic multistage recommender systems [5.983189537988243]
Two-level monotonic property characterizing a monotonic chain of events for personalized prediction.
Regularized cost function to learn user-specific behaviors at different stages.
Algorithm based on blockwise coordinate descent.
arXiv Detail & Related papers (2021-10-06T08:50:32Z) - On the Role of Optimization in Double Descent: A Least Squares Study [30.44215064390409]
We show an excess risk bound for the descent gradient solution of the least squares objective.
We find that in case of noiseless regression, double descent is explained solely by optimization-related quantities.
We empirically explore if our predictions hold for neural networks.
arXiv Detail & Related papers (2021-07-27T09:13:11Z) - Causality-based Counterfactual Explanation for Classification Models [11.108866104714627]
We propose a prototype-based counterfactual explanation framework (ProCE)
ProCE is capable of preserving the causal relationship underlying the features of the counterfactual data.
In addition, we design a novel gradient-free optimization based on the multi-objective genetic algorithm that generates the counterfactual explanations.
arXiv Detail & Related papers (2021-05-03T09:25:59Z) - Goal-directed Generation of Discrete Structures with Conditional
Generative Models [85.51463588099556]
We introduce a novel approach to directly optimize a reinforcement learning objective, maximizing an expected reward.
We test our methodology on two tasks: generating molecules with user-defined properties and identifying short python expressions which evaluate to a given target value.
arXiv Detail & Related papers (2020-10-05T20:03:13Z) - A One-step Approach to Covariate Shift Adaptation [82.01909503235385]
A default assumption in many machine learning scenarios is that the training and test samples are drawn from the same probability distribution.
We propose a novel one-step approach that jointly learns the predictive model and the associated weights in one optimization.
arXiv Detail & Related papers (2020-07-08T11:35:47Z) - Adaptive Correlated Monte Carlo for Contextual Categorical Sequence
Generation [77.7420231319632]
We adapt contextual generation of categorical sequences to a policy gradient estimator, which evaluates a set of correlated Monte Carlo (MC) rollouts for variance control.
We also demonstrate the use of correlated MC rollouts for binary-tree softmax models, which reduce the high generation cost in large vocabulary scenarios.
arXiv Detail & Related papers (2019-12-31T03:01:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.