Related papers: SynBench: Task-Agnostic Benchmarking of Pretrained Representations using Synthetic Data

SynBench: Task-Agnostic Benchmarking of Pretrained Representations using Synthetic Data

URL: http://arxiv.org/abs/2210.02989v2
Date: Fri, 7 Oct 2022 04:07:50 GMT
Title: SynBench: Task-Agnostic Benchmarking of Pretrained Representations using Synthetic Data
Authors: Ching-Yun Ko, Pin-Yu Chen, Jeet Mohapatra, Payel Das, Luca Daniel
Abstract summary: Recent success in fine-tuning large models, that are pretrained on broad data at scale, on downstream tasks has led to a significant paradigm shift in deep learning. This paper proposes a new task-agnostic framework, textitSynBench, to measure the quality of pretrained representations using synthetic data.
Score: 78.21197488065177
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent success in fine-tuning large models, that are pretrained on broad data at scale, on downstream tasks has led to a significant paradigm shift in deep learning, from task-centric model design to task-agnostic representation learning and task-specific fine-tuning. As the representations of pretrained models are used as a foundation for different downstream tasks, this paper proposes a new task-agnostic framework, \textit{SynBench}, to measure the quality of pretrained representations using synthetic data. We set up a reference by a theoretically-derived robustness-accuracy tradeoff of the class conditional Gaussian mixture. Given a pretrained model, the representations of data synthesized from the Gaussian mixture are used to compare with our reference to infer the quality. By comparing the ratio of area-under-curve between the raw data and their representations, SynBench offers a quantifiable score for robustness-accuracy performance benchmarking. Our framework applies to a wide range of pretrained models taking continuous data inputs and is independent of the downstream tasks and datasets. Evaluated with several pretrained vision transformer models, the experimental results show that our SynBench score well matches the actual linear probing performance of the pre-trained model when fine-tuned on downstream tasks. Moreover, our framework can be used to inform the design of robust linear probing on pretrained representations to mitigate the robustness-accuracy tradeoff in downstream tasks.

Related papers

DAViD: Data-efficient and Accurate Vision Models from Synthetic Data [6.829390872619486]
We demonstrate that it is possible to train models on much smaller but high-fidelity synthetic datasets.<n>Our models require only a fraction of the cost of training and inference when compared with foundational models of similar accuracy.
arXiv Detail & Related papers (2025-07-21T08:17:41Z)
ImageNet-RIB Benchmark: Large Pre-Training Datasets Don't Guarantee Robustness after Fine-Tuning [30.422932548359952]
We introduce a new robust fine-tuning benchmark, ImageNet-RIB (Robustness Inheritance Benchmark) The benchmark consists of related but distinct specialized (downstream) tasks. We find that the continual learning methods, EWC and LwF maintain robustness after fine-tuning.
arXiv Detail & Related papers (2024-10-28T22:33:22Z)
DaWin: Training-free Dynamic Weight Interpolation for Robust Adaptation [57.11544252399801]
We propose DaWin, a training-free dynamic weight method that leverages the entropy of individual models over each unlabeled test sample. Unlike previous works that typically rely on additional training to learn such coefficients, our approach requires no training. Results demonstrate that DaWin achieves significant performance gain in considered settings, with minimal computational overhead.
arXiv Detail & Related papers (2024-10-03T16:25:35Z)
Learning Defect Prediction from Unrealistic Data [57.53586547895278]
Pretrained models of code have become popular choices for code understanding and generation tasks. Such models tend to be large and require commensurate volumes of training data. It has become popular to train models with far larger but less realistic datasets, such as functions with artificially injected bugs. Models trained on such data tend to only perform well on similar data, while underperforming on real world programs.
arXiv Detail & Related papers (2023-11-02T01:51:43Z)
Feedback-guided Data Synthesis for Imbalanced Classification [10.836265321046561]
We introduce a framework for augmenting static datasets with useful synthetic samples. We find that the samples must be close to the support of the real data of the task at hand, and be sufficiently diverse. On ImageNet-LT, we achieve state-of-the-art results, with over 4 percent improvement on underrepresented classes.
arXiv Detail & Related papers (2023-09-29T21:47:57Z)
Too Fine or Too Coarse? The Goldilocks Composition of Data Complexity for Robust Left-Right Eye-Tracking Classifiers [0.0]
We train machine learning models utilizing a mixed dataset composed of both fine- and coarse-grain data. For our purposes, finer-grain data refers to data collected using more complex methods whereas coarser-grain data refers to data collected using more simple methods.
arXiv Detail & Related papers (2022-08-24T23:18:08Z)
Task2Sim : Towards Effective Pre-training and Transfer from Synthetic Data [74.66568380558172]
We study the transferability of pre-trained models based on synthetic data generated by graphics simulators to downstream tasks. We introduce Task2Sim, a unified model mapping downstream task representations to optimal simulation parameters. It learns this mapping by training to find the set of best parameters on a set of "seen" tasks. Once trained, it can then be used to predict best simulation parameters for novel "unseen" tasks in one shot.
arXiv Detail & Related papers (2021-11-30T19:25:27Z)
Comparing Test Sets with Item Response Theory [53.755064720563]
We evaluate 29 datasets using predictions from 18 pretrained Transformer models on individual test examples. We find that Quoref, HellaSwag, and MC-TACO are best suited for distinguishing among state-of-the-art models. We also observe span selection task format, which is used for QA datasets like QAMR or SQuAD2.0, is effective in differentiating between strong and weak models.
arXiv Detail & Related papers (2021-06-01T22:33:53Z)
Deep Ensembles for Low-Data Transfer Learning [21.578470914935938]
We study different ways of creating ensembles from pre-trained models. We show that the nature of pre-training itself is a performant source of diversity. We propose a practical algorithm that efficiently identifies a subset of pre-trained models for any downstream dataset.
arXiv Detail & Related papers (2020-10-14T07:59:00Z)
Do Adversarially Robust ImageNet Models Transfer Better? [102.09335596483695]
adversarially robust models often perform better than their standard-trained counterparts when used for transfer learning. Our results are consistent with (and in fact, add to) recent hypotheses stating that robustness leads to improved feature representations.
arXiv Detail & Related papers (2020-07-16T17:42:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.