A Theory of Dynamic Benchmarks
- URL: http://arxiv.org/abs/2210.03165v1
- Date: Thu, 6 Oct 2022 18:56:46 GMT
- Title: A Theory of Dynamic Benchmarks
- Authors: Ali Shirali, Rediet Abebe, Moritz Hardt
- Abstract summary: We study the benefits and practical limitations of dynamic benchmarking.
These results provide a theoretical foundation and a causal explanation for observed bottlenecks in empirical work.
- Score: 24.170405353348592
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Dynamic benchmarks interweave model fitting and data collection in an attempt
to mitigate the limitations of static benchmarks. In contrast to an extensive
theoretical and empirical study of the static setting, the dynamic counterpart
lags behind due to limited empirical studies and no apparent theoretical
foundation to date. Responding to this deficit, we initiate a theoretical study
of dynamic benchmarking. We examine two realizations, one capturing current
practice and the other modeling more complex settings. In the first model,
where data collection and model fitting alternate sequentially, we prove that
model performance improves initially but can stall after only three rounds.
Label noise arising from, for instance, annotator disagreement leads to even
stronger negative results. Our second model generalizes the first to the case
where data collection and model fitting have a hierarchical dependency
structure. We show that this design guarantees strictly more progress than the
first, albeit at a significant increase in complexity. We support our
theoretical analysis by simulating dynamic benchmarks on two popular datasets.
These results illuminate the benefits and practical limitations of dynamic
benchmarking, providing both a theoretical foundation and a causal explanation
for observed bottlenecks in empirical work.
Related papers
- Is Smoothness the Key to Robustness? A Comparison of Attention and Convolution Models Using a Novel Metric [0.0]
Existing robustness evaluation approaches often lack theoretical generality or rely heavily on empirical assessments.
We propose TopoLip, a metric based on layer-wise analysis that bridges topological data analysis and Lipschitz continuity for robustness evaluation.
arXiv Detail & Related papers (2024-10-23T07:44:14Z) - Neural Persistence Dynamics [8.197801260302642]
We consider the problem of learning the dynamics in the topology of time-evolving point clouds.
Our proposed model - $textitNeural Persistence Dynamics$ - substantially outperforms the state-of-the-art across a diverse set of parameter regression tasks.
arXiv Detail & Related papers (2024-05-24T17:20:18Z) - When predict can also explain: few-shot prediction to select better neural latents [3.6218162133579703]
We present a novel prediction metric designed to yield latent variables that more accurately reflect the ground truth.
In the absence of ground truth, we suggest a proxy measure to quantify extraneous dynamics.
arXiv Detail & Related papers (2024-05-23T10:48:30Z) - Disperse-Then-Merge: Pushing the Limits of Instruction Tuning via Alignment Tax Reduction [75.25114727856861]
Large language models (LLMs) tend to suffer from deterioration at the latter stage ofSupervised fine-tuning process.
We introduce a simple disperse-then-merge framework to address the issue.
Our framework outperforms various sophisticated methods such as data curation and training regularization on a series of standard knowledge and reasoning benchmarks.
arXiv Detail & Related papers (2024-05-22T08:18:19Z) - Investigating the Robustness of Counterfactual Learning to Rank Models: A Reproducibility Study [61.64685376882383]
Counterfactual learning to rank (CLTR) has attracted extensive attention in the IR community for its ability to leverage massive logged user interaction data to train ranking models.
This paper investigates the robustness of existing CLTR models in complex and diverse situations.
We find that the DLA models and IPS-DCM show better robustness under various simulation settings than IPS-PBM and PRS with offline propensity estimation.
arXiv Detail & Related papers (2024-04-04T10:54:38Z) - On the Dynamics Under the Unhinged Loss and Beyond [104.49565602940699]
We introduce the unhinged loss, a concise loss function, that offers more mathematical opportunities to analyze closed-form dynamics.
The unhinged loss allows for considering more practical techniques, such as time-vary learning rates and feature normalization.
arXiv Detail & Related papers (2023-12-13T02:11:07Z) - Towards Causal Foundation Model: on Duality between Causal Inference and Attention [18.046388712804042]
We take a first step towards building causally-aware foundation models for treatment effect estimations.
We propose a novel, theoretically justified method called Causal Inference with Attention (CInA)
arXiv Detail & Related papers (2023-10-01T22:28:34Z) - When to Update Your Model: Constrained Model-based Reinforcement
Learning [50.74369835934703]
We propose a novel and general theoretical scheme for a non-decreasing performance guarantee of model-based RL (MBRL)
Our follow-up derived bounds reveal the relationship between model shifts and performance improvement.
A further example demonstrates that learning models from a dynamically-varying number of explorations benefit the eventual returns.
arXiv Detail & Related papers (2022-10-15T17:57:43Z) - Causal Dynamics Learning for Task-Independent State Abstraction [61.707048209272884]
We introduce Causal Dynamics Learning for Task-Independent State Abstraction (CDL)
CDL learns a theoretically proved causal dynamics model that removes unnecessary dependencies between state variables and the action.
A state abstraction can then be derived from the learned dynamics.
arXiv Detail & Related papers (2022-06-27T17:02:53Z) - Counterfactual Analysis in Dynamic Latent State Models [2.766648389933265]
We provide an optimization-based framework to perform counterfactual analysis in a dynamic model with hidden states.
We are the first to compute lower and upper bounds on a counterfactual query in a dynamic latent-state model.
arXiv Detail & Related papers (2022-05-27T08:51:07Z) - Autoregressive Dynamics Models for Offline Policy Evaluation and
Optimization [60.73540999409032]
We show that expressive autoregressive dynamics models generate different dimensions of the next state and reward sequentially conditioned on previous dimensions.
We also show that autoregressive dynamics models are useful for offline policy optimization by serving as a way to enrich the replay buffer.
arXiv Detail & Related papers (2021-04-28T16:48:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.