On Training-Test (Mis)alignment in Unsupervised Combinatorial Optimization: Observation, Empirical Exploration, and Analysis
- URL: http://arxiv.org/abs/2506.16732v1
- Date: Fri, 20 Jun 2025 04:05:09 GMT
- Title: On Training-Test (Mis)alignment in Unsupervised Combinatorial Optimization: Observation, Empirical Exploration, and Analysis
- Authors: Fanchen Bu, Kijung Shin,
- Abstract summary: In unsupervised optimization (UCO), during training, one aims to have continuous decisions that are promising in a probabilistic sense for each training instance.<n>We explore a preliminary idea to better align training and testing in UCO by including a differentiable version of derandomization into training.<n>Our empirical exploration shows that such an idea indeed improves training-test alignment, but also introduces nontrivial challenges into training.
- Score: 25.69187509653635
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In unsupervised combinatorial optimization (UCO), during training, one aims to have continuous decisions that are promising in a probabilistic sense for each training instance, which enables end-to-end training on initially discrete and non-differentiable problems. At the test time, for each test instance, starting from continuous decisions, derandomization is typically applied to obtain the final deterministic decisions. Researchers have developed more and more powerful test-time derandomization schemes to enhance the empirical performance and the theoretical guarantee of UCO methods. However, we notice a misalignment between training and testing in the existing UCO methods. Consequently, lower training losses do not necessarily entail better post-derandomization performance, even for the training instances without any data distribution shift. Empirically, we indeed observe such undesirable cases. We explore a preliminary idea to better align training and testing in UCO by including a differentiable version of derandomization into training. Our empirical exploration shows that such an idea indeed improves training-test alignment, but also introduces nontrivial challenges into training.
Related papers
- Limitations of Using Identical Distributions for Training and Testing When Learning Boolean Functions [1.3537117504260623]
We study whether it is always optimal for the training distribution to be identical to the test distribution when the learner is allowed to be optimally adapted to the training distribution.<n>We also show that when certain regularities are imposed on the target functions, the standard conclusion is recovered in the case of the uniform distribution.
arXiv Detail & Related papers (2025-11-30T09:06:07Z) - Uncertainty-aware Test-Time Training (UT$^3$) for Efficient On-the-fly Domain Adaptive Dense Regression [3.316593788543852]
Deep neural networks (DNNs) are increasingly being used in autonomous systems.<n>DNNs do not generalize well to domain shift.<n>Recent work on test-time training proposes methods that adapt to a new test distribution on the fly.
arXiv Detail & Related papers (2025-09-03T04:41:43Z) - Generalizing to any diverse distribution: uniformity, gentle finetuning and rebalancing [55.791818510796645]
We aim to develop models that generalize well to any diverse test distribution, even if the latter deviates significantly from the training data.
Various approaches like domain adaptation, domain generalization, and robust optimization attempt to address the out-of-distribution challenge.
We adopt a more conservative perspective by accounting for the worst-case error across all sufficiently diverse test distributions within a known domain.
arXiv Detail & Related papers (2024-10-08T12:26:48Z) - A Comparative Study of Pre-training and Self-training [0.40964539027092917]
We propose an ensemble method to empirical study all feasible training paradigms combining pre-training, self-training, and fine-tuning.
We conduct experiments on six datasets, four data augmentation, and imbalanced data for sentiment analysis and natural language inference tasks.
Our findings confirm that the pre-training and fine-tuning paradigm yields the best overall performances.
arXiv Detail & Related papers (2024-09-04T14:30:13Z) - Stability and Generalization in Free Adversarial Training [9.831489366502302]
We analyze the interconnections between generalization and optimization in adversarial training using the algorithmic stability framework.<n>We compare the generalization gap of neural networks trained using the vanilla adversarial training method and the free adversarial training method.<n>Our empirical findings suggest that the free adversarial training method could lead to a smaller generalization gap over a similar number of training iterations.
arXiv Detail & Related papers (2024-04-13T12:07:20Z) - Revisiting Long-tailed Image Classification: Survey and Benchmarks with
New Evaluation Metrics [88.39382177059747]
A corpus of metrics is designed for measuring the accuracy, robustness, and bounds of algorithms for learning with long-tailed distribution.
Based on our benchmarks, we re-evaluate the performance of existing methods on CIFAR10 and CIFAR100 datasets.
arXiv Detail & Related papers (2023-02-03T02:40:54Z) - DELTA: degradation-free fully test-time adaptation [59.74287982885375]
We find that two unfavorable defects are concealed in the prevalent adaptation methodologies like test-time batch normalization (BN) and self-learning.
First, we reveal that the normalization statistics in test-time BN are completely affected by the currently received test samples, resulting in inaccurate estimates.
Second, we show that during test-time adaptation, the parameter update is biased towards some dominant classes.
arXiv Detail & Related papers (2023-01-30T15:54:00Z) - Agree to Disagree: Diversity through Disagreement for Better
Transferability [54.308327969778155]
We propose D-BAT (Diversity-By-disAgreement Training), which enforces agreement among the models on the training data.
We show how D-BAT naturally emerges from the notion of generalized discrepancy.
arXiv Detail & Related papers (2022-02-09T12:03:02Z) - Unified Regularity Measures for Sample-wise Learning and Generalization [18.10522585996242]
We propose a pair of sample regularity measures for both processes with a formulation-consistent representation.
Experiments validated the effectiveness and robustness of the proposed approaches for mini-batch SGD optimization.
arXiv Detail & Related papers (2021-08-09T10:11:14Z) - A Novel DNN Training Framework via Data Sampling and Multi-Task
Optimization [7.001799696806368]
We propose a novel framework to train DNN models.
It generates multiple pairs of training and validation sets from the gross training set via random splitting.
It outputs the best, among all trained models, which has the overall best performance across the validation sets from all pairs.
arXiv Detail & Related papers (2020-07-02T10:58:57Z) - Robust Sampling in Deep Learning [62.997667081978825]
Deep learning requires regularization mechanisms to reduce overfitting and improve generalization.
We address this problem by a new regularization method based on distributional robust optimization.
During the training, the selection of samples is done according to their accuracy in such a way that the worst performed samples are the ones that contribute the most in the optimization.
arXiv Detail & Related papers (2020-06-04T09:46:52Z) - Fine-Tuning Pretrained Language Models: Weight Initializations, Data
Orders, and Early Stopping [62.78338049381917]
Fine-tuning pretrained contextual word embedding models to supervised downstream tasks has become commonplace in natural language processing.
We experiment with four datasets from the GLUE benchmark, fine-tuning BERT hundreds of times on each while varying only the random seeds.
We find substantial performance increases compared to previously reported results, and we quantify how the performance of the best-found model varies as a function of the number of fine-tuning trials.
arXiv Detail & Related papers (2020-02-15T02:40:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.