Related papers: A Unified Empirical Risk Minimization Framework for Flexible N-Tuples Weak Supervision

A Unified Empirical Risk Minimization Framework for Flexible N-Tuples Weak Supervision

URL: http://arxiv.org/abs/2507.07771v1
Date: Thu, 10 Jul 2025 13:54:59 GMT
Title: A Unified Empirical Risk Minimization Framework for Flexible N-Tuples Weak Supervision
Authors: Shuying Huang, Junpeng Li, Changchun Hua, Yana Yang,
Abstract summary: We propose a general N-tuples learning framework based on empirical risk minimization.<n>We show that our framework consistently improves generalization across various N-tuples learning tasks.
Score: 13.39950379203994
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: To alleviate the annotation burden in supervised learning, N-tuples learning has recently emerged as a powerful weakly-supervised method. While existing N-tuples learning approaches extend pairwise learning to higher-order comparisons and accommodate various real-world scenarios, they often rely on task-specific designs and lack a unified theoretical foundation. In this paper, we propose a general N-tuples learning framework based on empirical risk minimization, which systematically integrates pointwise unlabeled data to enhance learning performance. This paper first unifies the data generation processes of N-tuples and pointwise unlabeled data under a shared probabilistic formulation. Based on this unified view, we derive an unbiased empirical risk estimator that generalizes a broad class of existing N-tuples models. We further establish a generalization error bound for theoretical support. To demonstrate the flexibility of the framework, we instantiate it in four representative weakly supervised scenarios, each recoverable as a special case of our general model. Additionally, to address overfitting issues arising from negative risk terms, we adopt correction functions to adjust the empirical risk. Extensive experiments on benchmark datasets validate the effectiveness of the proposed framework and demonstrate that leveraging pointwise unlabeled data consistently improves generalization across various N-tuples learning tasks.

Related papers

Learning from M-Tuple Dominant Positive and Unlabeled Data [9.568395664931504]
This paper proposes a generalized learning framework emphMDPU to better align with real-world application scenarios.<n>We derive an unbiased risk estimator that satisfies risk consistency based on the empirical risk minimization (ERM) method.<n>To mitigate the inevitable overfitting issue during training, a risk correction method is introduced, leading to the development of a corrected risk estimator.
arXiv Detail & Related papers (2025-05-25T13:20:11Z)
Regularized Neural Ensemblers [55.15643209328513]
In this study, we explore employing regularized neural networks as ensemble methods.<n>Motivated by the risk of learning low-diversity ensembles, we propose regularizing the ensembling model by randomly dropping base model predictions.<n>We demonstrate this approach provides lower bounds for the diversity within the ensemble, reducing overfitting and improving generalization capabilities.
arXiv Detail & Related papers (2024-10-06T15:25:39Z)
Bounds on the Generalization Error in Active Learning [0.0]
We establish empirical risk principles for active learning by deriving a family of upper bounds on the generalization error. We systematically link diverse active learning scenarios, characterized by their loss functions and hypothesis classes to their corresponding upper bounds. Our results show that regularization techniques used to constraint the complexity of various hypothesis classes are sufficient conditions to ensure the validity of the bounds.
arXiv Detail & Related papers (2024-09-10T08:08:09Z)
Error Bounds of Supervised Classification from Information-Theoretic Perspective [0.0]
We explore bounds on the expected risk when using deep neural networks for supervised classification from an information theoretic perspective. We introduce model risk and fitting error, which are derived from further decomposing the empirical risk.
arXiv Detail & Related papers (2024-06-07T01:07:35Z)
Learning with Complementary Labels Revisited: The Selected-Completely-at-Random Setting Is More Practical [66.57396042747706]
Complementary-label learning is a weakly supervised learning problem. We propose a consistent approach that does not rely on the uniform distribution assumption. We find that complementary-label learning can be expressed as a set of negative-unlabeled binary classification problems.
arXiv Detail & Related papers (2023-11-27T02:59:17Z)
A Generalized Unbiased Risk Estimator for Learning with Augmented Classes [70.20752731393938]
Given unlabeled data, an unbiased risk estimator (URE) can be derived, which can be minimized for LAC with theoretical guarantees. We propose a generalized URE that can be equipped with arbitrary loss functions while maintaining the theoretical guarantees.
arXiv Detail & Related papers (2023-06-12T06:52:04Z)
Synergies between Disentanglement and Sparsity: Generalization and Identifiability in Multi-Task Learning [79.83792914684985]
We prove a new identifiability result that provides conditions under which maximally sparse base-predictors yield disentangled representations. Motivated by this theoretical result, we propose a practical approach to learn disentangled representations based on a sparsity-promoting bi-level optimization problem.
arXiv Detail & Related papers (2022-11-26T21:02:09Z)
FedGen: Generalizable Federated Learning for Sequential Data [8.784435748969806]
In many real-world distributed settings, spurious correlations exist due to biases and data sampling issues. We present a generalizable federated learning framework called FedGen, which allows clients to identify and distinguish between spurious and invariant features. We show that FedGen results in models that achieve significantly better generalization and can outperform the accuracy of current federated learning approaches by over 24%.
arXiv Detail & Related papers (2022-11-03T15:48:14Z)
Supervised Multivariate Learning with Simultaneous Feature Auto-grouping and Dimension Reduction [7.093830786026851]
This paper proposes a novel clustered reduced-rank learning framework. It imposes two joint matrix regularizations to automatically group the features in constructing predictive factors. It is more interpretable than low-rank modeling and relaxes the stringent sparsity assumption in variable selection.
arXiv Detail & Related papers (2021-12-17T20:11:20Z)
TsmoBN: Interventional Generalization for Unseen Clients in Federated Learning [23.519212374186232]
We form a training structural causal model (SCM) to explain the challenges of model generalization in a distributed learning paradigm. We present a simple yet effective method using test-specific and momentum tracked batch normalization (TsmoBN) to generalize FL models to testing clients.
arXiv Detail & Related papers (2021-10-19T13:46:37Z)
Unsupervised Learning of Debiased Representations with Pseudo-Attributes [85.5691102676175]
We propose a simple but effective debiasing technique in an unsupervised manner. We perform clustering on the feature embedding space and identify pseudoattributes by taking advantage of the clustering results. We then employ a novel cluster-based reweighting scheme for learning debiased representation.
arXiv Detail & Related papers (2021-08-06T05:20:46Z)
RATT: Leveraging Unlabeled Data to Guarantee Generalization [96.08979093738024]
We introduce a method that leverages unlabeled data to produce generalization bounds. We prove that our bound is valid for 0-1 empirical risk minimization. This work provides practitioners with an option for certifying the generalization of deep nets even when unseen labeled data is unavailable.
arXiv Detail & Related papers (2021-05-01T17:05:29Z)
Learning Diverse Representations for Fast Adaptation to Distribution Shift [78.83747601814669]
We present a method for learning multiple models, incorporating an objective that pressures each to learn a distinct way to solve the task. We demonstrate our framework's ability to facilitate rapid adaptation to distribution shift.
arXiv Detail & Related papers (2020-06-12T12:23:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.