Related papers: TestNUC: Enhancing Test-Time Computing Approaches through Neighboring Unlabeled Data Consistency

TestNUC: Enhancing Test-Time Computing Approaches through Neighboring Unlabeled Data Consistency

URL: http://arxiv.org/abs/2502.19163v1
Date: Wed, 26 Feb 2025 14:17:56 GMT
Title: TestNUC: Enhancing Test-Time Computing Approaches through Neighboring Unlabeled Data Consistency
Authors: Henry Peng Zou, Zhengyao Gu, Yue Zhou, Yankai Chen, Weizhi Zhang, Liancheng Fang, Yibo Wang, Yangning Li, Kay Liu, Philip S. Yu,
Abstract summary: TestNUC improves test-time predictions by leveraging the local consistency of neighboring unlabeled data.<n>TestNUC can be seamlessly integrated with existing test-time computing approaches.
Score: 42.81348222668079
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Test-time computing approaches, which leverage additional computational resources during inference, have been proven effective in enhancing large language model performance. This work introduces a novel, linearly scaling approach, TestNUC, that improves test-time predictions by leveraging the local consistency of neighboring unlabeled data-it classifies an input instance by considering not only the model's prediction on that instance but also on neighboring unlabeled instances. We evaluate TestNUC across eight diverse datasets, spanning intent classification, topic mining, domain discovery, and emotion detection, demonstrating its consistent superiority over baseline methods such as standard prompting and self-consistency. Furthermore, TestNUC can be seamlessly integrated with existing test-time computing approaches, substantially boosting their performance. Our analysis reveals that TestNUC scales effectively with increasing amounts of unlabeled data and performs robustly across different embedding models, making it practical for real-world applications. Our code is available at https://github.com/HenryPengZou/TestNUC.

Related papers

Sample, Don't Search: Rethinking Test-Time Alignment for Language Models [55.2480439325792]
We introduce QAlign, a new test-time alignment approach. As we scale test-time compute, QAlign converges to sampling from the optimal aligned distribution for each individual prompt. By adopting recent advances in Markov chain Monte Carlo for text generation, our method enables better-aligned outputs without modifying the underlying model or even requiring logit access.
arXiv Detail & Related papers (2025-04-04T00:41:40Z)
Shapley-Guided Utility Learning for Effective Graph Inference Data Valuation [6.542796128290513]
We propose Shapley-Guided Utility Learning (SGUL), a novel framework for graph inference data valuation. SGUL combines transferable data-specific and modelspecific features to approximate test accuracy without relying on ground truth labels. We show that SGUL consistently outperforms existing baselines in both inductive and transductive settings.
arXiv Detail & Related papers (2025-03-23T20:35:03Z)
CALICO: Confident Active Learning with Integrated Calibration [11.978551396144532]
We propose an AL framework that self-calibrates the confidence used for sample selection during the training process. We show improved classification performance compared to a softmax-based classifier with fewer labeled samples.
arXiv Detail & Related papers (2024-07-02T15:05:19Z)
Efficient Open Set Single Image Test Time Adaptation of Vision Language Models [15.621092104244003]
Adapting models to dynamic, real-world environments is a critical challenge in deep learning.<n>We propose ROSITA, a novel framework that leverages dynamically updated feature banks to identify reliable test samples.<n>Our approach effectively adapts models to domain shifts for known classes while rejecting unfamiliar samples.
arXiv Detail & Related papers (2024-06-01T16:21:42Z)
Test Case Recommendations with Distributed Representation of Code Syntactic Features [2.225268436173329]
We propose an automated approach which exploits both structural and semantic properties of source code methods and test cases. The proposed approach initially trains a neural network to transform method-level source code, as well as unit tests, into distributed representations. The model computes cosine similarity between the method's embedding and the previously-embedded training instances.
arXiv Detail & Related papers (2023-10-04T21:42:01Z)
pSTarC: Pseudo Source Guided Target Clustering for Fully Test-Time Adaptation [15.621092104244003]
Test Time Adaptation (TTA) is a pivotal concept in machine learning, enabling models to perform well in real-world scenarios. We propose a novel approach called pseudo Source guided Target Clustering (pSTarC) addressing the relatively unexplored area of TTA under real-world domain shifts.
arXiv Detail & Related papers (2023-09-02T07:13:47Z)
Provable Robustness for Streaming Models with a Sliding Window [51.85182389861261]
In deep learning applications such as online content recommendation and stock market analysis, models use historical data to make predictions. We derive robustness certificates for models that use a fixed-size sliding window over the input stream. Our guarantees hold for the average model performance across the entire stream and are independent of stream size, making them suitable for large data streams.
arXiv Detail & Related papers (2023-03-28T21:02:35Z)
Rethinking Precision of Pseudo Label: Test-Time Adaptation via Complementary Learning [10.396596055773012]
We propose a novel complementary learning approach to enhance test-time adaptation. In test-time adaptation tasks, information from the source domain is typically unavailable. We highlight that the risk function of complementary labels agrees with their Vanilla loss formula.
arXiv Detail & Related papers (2023-01-15T03:36:33Z)
TeST: Test-time Self-Training under Distribution Shift [99.68465267994783]
Test-Time Self-Training (TeST) is a technique that takes as input a model trained on some source data and a novel data distribution at test time. We find that models adapted using TeST significantly improve over baseline test-time adaptation algorithms.
arXiv Detail & Related papers (2022-09-23T07:47:33Z)
Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions. In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data. We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z)
ALT-MAS: A Data-Efficient Framework for Active Testing of Machine Learning Algorithms [58.684954492439424]
We propose a novel framework to efficiently test a machine learning model using only a small amount of labeled test data. The idea is to estimate the metrics of interest for a model-under-test using Bayesian neural network (BNN)
arXiv Detail & Related papers (2021-04-11T12:14:04Z)
Meta-Learned Confidence for Few-shot Learning [60.6086305523402]
A popular transductive inference technique for few-shot metric-based approaches, is to update the prototype of each class with the mean of the most confident query examples. We propose to meta-learn the confidence for each query sample, to assign optimal weights to unlabeled queries. We validate our few-shot learning model with meta-learned confidence on four benchmark datasets.
arXiv Detail & Related papers (2020-02-27T10:22:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.