On Regularization and Inference with Label Constraints
- URL: http://arxiv.org/abs/2307.03886v1
- Date: Sat, 8 Jul 2023 03:39:22 GMT
- Title: On Regularization and Inference with Label Constraints
- Authors: Kaifu Wang, Hangfeng He, Tin D. Nguyen, Piyush Kumar, Dan Roth
- Abstract summary: We compare two strategies for encoding label constraints in a machine learning pipeline, regularization with constraints and constrained inference.
For regularization, we show that it narrows the generalization gap by precluding models that are inconsistent with the constraints.
For constrained inference, we show that it reduces the population risk by correcting a model's violation, and hence turns the violation into an advantage.
- Score: 62.60903248392479
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Prior knowledge and symbolic rules in machine learning are often expressed in
the form of label constraints, especially in structured prediction problems. In
this work, we compare two common strategies for encoding label constraints in a
machine learning pipeline, regularization with constraints and constrained
inference, by quantifying their impact on model performance. For
regularization, we show that it narrows the generalization gap by precluding
models that are inconsistent with the constraints. However, its preference for
small violations introduces a bias toward a suboptimal model. For constrained
inference, we show that it reduces the population risk by correcting a model's
violation, and hence turns the violation into an advantage. Given these
differences, we further explore the use of two approaches together and propose
conditions for constrained inference to compensate for the bias introduced by
regularization, aiming to improve both the model complexity and optimal risk.
Related papers
- On the Geometry of Regularization in Adversarial Training: High-Dimensional Asymptotics and Generalization Bounds [11.30047438005394]
This work investigates the question of how to choose the regularization norm $lVert cdot rVert$ in the context of high-dimensional adversarial training for binary classification.
We quantitatively characterize the relationship between perturbation size and the optimal choice of $lVert cdot rVert$, confirming the intuition that, in the data scarce regime, the type of regularization becomes increasingly important for adversarial training as perturbations grow in size.
arXiv Detail & Related papers (2024-10-21T14:53:12Z) - Inference-Time Rule Eraser: Fair Recognition via Distilling and Removing Biased Rules [16.85221824455542]
Machine learning models often make predictions based on biased features such as gender, race, and other social attributes.
Traditional approaches to addressing this issue involve retraining or fine-tuning neural networks with fairness-aware optimization objectives.
We introduce the Inference-Time Rule Eraser (Eraser), a novel method designed to address fairness concerns.
arXiv Detail & Related papers (2024-04-07T05:47:41Z) - ConstraintMatch for Semi-constrained Clustering [32.92933231199262]
Constrained clustering allows the training of classification models using pairwise constraints only, which are weak and relatively easy to mine.
We propose a semi-supervised context whereby a large amount of textitunconstrained data is available alongside a smaller set of constraints, and propose textitConstraintMatch to leverage such unconstrained data.
arXiv Detail & Related papers (2023-11-26T19:31:52Z) - Representation Disentaglement via Regularization by Causal
Identification [3.9160947065896803]
We propose the use of a causal collider structured model to describe the underlying data generative process assumptions in disentangled representation learning.
For this, we propose regularization by identification (ReI), a modular regularization engine designed to align the behavior of large scale generative models with the disentanglement constraints imposed by causal identification.
arXiv Detail & Related papers (2023-02-28T23:18:54Z) - Improving Adaptive Conformal Prediction Using Self-Supervised Learning [72.2614468437919]
We train an auxiliary model with a self-supervised pretext task on top of an existing predictive model and use the self-supervised error as an additional feature to estimate nonconformity scores.
We empirically demonstrate the benefit of the additional information using both synthetic and real data on the efficiency (width), deficit, and excess of conformal prediction intervals.
arXiv Detail & Related papers (2023-02-23T18:57:14Z) - On the Importance of Gradient Norm in PAC-Bayesian Bounds [92.82627080794491]
We propose a new generalization bound that exploits the contractivity of the log-Sobolev inequalities.
We empirically analyze the effect of this new loss-gradient norm term on different neural architectures.
arXiv Detail & Related papers (2022-10-12T12:49:20Z) - Causally-motivated Shortcut Removal Using Auxiliary Labels [63.686580185674195]
Key challenge to learning such risk-invariant predictors is shortcut learning.
We propose a flexible, causally-motivated approach to address this challenge.
We show both theoretically and empirically that this causally-motivated regularization scheme yields robust predictors.
arXiv Detail & Related papers (2021-05-13T16:58:45Z) - Squared $\ell_2$ Norm as Consistency Loss for Leveraging Augmented Data
to Learn Robust and Invariant Representations [76.85274970052762]
Regularizing distance between embeddings/representations of original samples and augmented counterparts is a popular technique for improving robustness of neural networks.
In this paper, we explore these various regularization choices, seeking to provide a general understanding of how we should regularize the embeddings.
We show that the generic approach we identified (squared $ell$ regularized augmentation) outperforms several recent methods, which are each specially designed for one task.
arXiv Detail & Related papers (2020-11-25T22:40:09Z) - Accounting for Unobserved Confounding in Domain Generalization [107.0464488046289]
This paper investigates the problem of learning robust, generalizable prediction models from a combination of datasets.
Part of the challenge of learning robust models lies in the influence of unobserved confounders.
We demonstrate the empirical performance of our approach on healthcare data from different modalities.
arXiv Detail & Related papers (2020-07-21T08:18:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.