Related papers: Does Weak-to-strong Generalization Happen under Spurious Correlations?

Does Weak-to-strong Generalization Happen under Spurious Correlations?

URL: http://arxiv.org/abs/2509.24005v1
Date: Sun, 28 Sep 2025 17:57:49 GMT
Title: Does Weak-to-strong Generalization Happen under Spurious Correlations?
Authors: Chenruo Liu, Yijun Dong, Qi Lei,
Abstract summary: Key problem in weak-to-strong (W2S) generalization: when fine-tuning a strong pre-trained student with pseudolabels from a weaker teacher on a downstream task with spurious correlations, does W2S happen, and how to improve it upon failures?<n>We consider two sources of spurious correlations caused by group imbalance: (i) a weak teacher fine-tuned on group-imbalanced labeled data with a minority group of fraction $eta_ell$, and (ii) a group-imbalanced unlabeled set pseudolabeled by teacher with minority fraction $eta_u$.
Score: 17.02943058643617
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We initiate a unified theoretical and algorithmic study of a key problem in weak-to-strong (W2S) generalization: when fine-tuning a strong pre-trained student with pseudolabels from a weaker teacher on a downstream task with spurious correlations, does W2S happen, and how to improve it upon failures? We consider two sources of spurious correlations caused by group imbalance: (i) a weak teacher fine-tuned on group-imbalanced labeled data with a minority group of fraction $\eta_\ell$, and (ii) a group-imbalanced unlabeled set pseudolabeled by the teacher with a minority group of fraction $\eta_u$. Theoretically, a precise characterization of W2S gain at the proportional asymptotic limit shows that W2S always happens with sufficient pseudolabels when $\eta_u = \eta_\ell$ but may fail when $\eta_u \ne \eta_\ell$, where W2S gain diminishes as $(\eta_u - \eta_\ell)^2$ increases. Our theory is corroborated by extensive experiments on various spurious correlation benchmarks and teacher-student pairs. To boost W2S performance upon failures, we further propose a simple, effective algorithmic remedy that retrains the strong student on its high-confidence data subset after W2S fine-tuning. Our algorithm is group-label-free and achieves consistent, substantial improvements over vanilla W2S fine-tuning.

Related papers

Reinforcement Learning with Verifiable yet Noisy Rewards under Imperfect Verifiers [90.50039419576807]
Reinforcement Learning with Verifiable Rewards (RLVR) trains policies against automated verifiers to avoid costly human labeling.<n>To reduce vulnerability to verifier hacking, many RLVR systems collapse rewards to binary $0,1$ during training.<n>This choice carries a cost: it introduces textitfalse negatives (rejecting correct answers, FNs) and textitfalse positives (accepting incorrect ones, FPs)
arXiv Detail & Related papers (2025-10-01T13:56:44Z)
Discrepancies are Virtue: Weak-to-Strong Generalization through Lens of Intrinsic Dimension [48.431551146556714]
Weak-to-strong (W2S) generalization is a type of finetuning (FT) where a strong student model is trained on pseudo-labels generated by a weak teacher.<n>We analyze W2S in the ridgeless regression setting from a variance reduction perspective.
arXiv Detail & Related papers (2025-02-07T16:46:43Z)
Provable Weak-to-Strong Generalization via Benign Overfitting [3.4652800888823294]
We consider the inverted situation, where a weak teacher supervises a strong student with imperfect pseudolabels.<n>We theoretically investigate weak-to-strong generalization for binary and multilabel classification.<n>Our techniques should eventually extend to weak-to-strong multiclass classification.
arXiv Detail & Related papers (2024-10-06T22:10:50Z)
On Characterizing and Mitigating Imbalances in Multi-Instance Partial Label Learning [57.18649648182171]
We make contributions towards addressing a problem that hasn't been studied so far in the context of MI-PLL.<n>We derive class-specific risk bounds for MI-PLL, while making minimal assumptions.<n>Our theory reveals a unique phenomenon: that $sigma$ can greatly impact learning imbalances.
arXiv Detail & Related papers (2024-07-13T20:56:34Z)
Sharp Rates in Dependent Learning Theory: Avoiding Sample Size Deflation for the Square Loss [33.18537822803389]
We show that whenever the topologies of $L2$ and $Psi_p$ are comparable on our hypothesis class $mathscrF$, $mathscrF$ is a weakly sub-Gaussian class.<n>Our result holds whether the problem is realizable or not and we refer to this as a emphnear mixing-free rate, since direct dependence on mixing is relegated to an additive higher order term.
arXiv Detail & Related papers (2024-02-08T18:57:42Z)
Out-Of-Domain Unlabeled Data Improves Generalization [0.7589678255312519]
We propose a novel framework for incorporating unlabeled data into semi-supervised classification problems. We show that unlabeled samples can be harnessed to narrow the generalization gap. We validate our claims through experiments conducted on a variety of synthetic and real-world datasets.
arXiv Detail & Related papers (2023-09-29T02:00:03Z)
WR-ONE2SET: Towards Well-Calibrated Keyphrase Generation [57.11538133231843]
Keyphrase generation aims to automatically generate short phrases summarizing an input document. The recently emerged ONE2SET paradigm generates keyphrases as a set and has achieved competitive performance. We propose WR-ONE2SET which extends ONE2SET with an adaptive instance-level cost Weighting strategy and a target Re-assignment mechanism.
arXiv Detail & Related papers (2022-11-13T09:56:24Z)
Characterizing Datapoints via Second-Split Forgetting [93.99363547536392]
We propose $$-second-$split$ $forgetting$ $time$ (SSFT), a complementary metric that tracks the epoch (if any) after which an original training example is forgotten. We demonstrate that $mislabeled$ examples are forgotten quickly, and seemingly $rare$ examples are forgotten comparatively slowly. SSFT can (i) help to identify mislabeled samples, the removal of which improves generalization; and (ii) provide insights about failure modes.
arXiv Detail & Related papers (2022-10-26T21:03:46Z)
Cycle Self-Training for Semi-Supervised Object Detection with Distribution Consistency Reweighting [16.63549313217797]
We propose a Cycle Self-Training (CST) framework for semi-supervised object detection (SSOD) CST consists of two teachers T1 and T2, two students S1 and S2 and based on these networks, a cycle self-training mechanism is built. Experiments prove the superiority of CST by consistently improving the AP over the baseline and outperforming state-of-the-art methods by 2.1% absolute AP improvements with scarce labeled data.
arXiv Detail & Related papers (2022-07-12T06:16:48Z)
Rebalanced Siamese Contrastive Mining for Long-Tailed Recognition [120.80038161330623]
We show that supervised contrastive learning suffers a dual class-imbalance problem at both the original batch and Siamese batch levels. We propose supervised hard positive and negative pairs mining to pick up informative pairs for contrastive computation and improve representation learning.
arXiv Detail & Related papers (2022-03-22T07:30:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.