Effective Benchmarks for Optical Turbulence Modeling
- URL: http://arxiv.org/abs/2401.03573v1
- Date: Sun, 7 Jan 2024 20:00:35 GMT
- Title: Effective Benchmarks for Optical Turbulence Modeling
- Authors: Christopher Jellen and Charles Nelson and Cody Brownell and John
Burkhardt
- Abstract summary: We introduce the textttotbench package, a Python package for rigorous development and evaluation of optical turbulence strength prediction models.
The package provides a consistent interface for evaluating optical turbulence models on a variety of benchmark tasks and data sets.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Optical turbulence presents a significant challenge for communication,
directed energy, and imaging systems, especially in the atmospheric boundary
layer. Effective modeling of optical turbulence strength is critical for the
development and deployment of these systems. The lack of standard evaluation
tools, especially long-term data sets, modeling tasks, metrics, and baseline
models, prevent effective comparisons between approaches and models. This
reduces the ease of reproducing results and contributes to over-fitting on
local micro-climates. Performance characterized using evaluation metrics
provides some insight into the applicability of a model for predicting the
strength of optical turbulence. However, these metrics are not sufficient for
understanding the relative quality of a model. We introduce the
\texttt{otbench} package, a Python package for rigorous development and
evaluation of optical turbulence strength prediction models. The package
provides a consistent interface for evaluating optical turbulence models on a
variety of benchmark tasks and data sets. The \texttt{otbench} package includes
a range of baseline models, including statistical, data-driven, and deep
learning models, to provide a sense of relative model quality. \texttt{otbench}
also provides support for adding new data sets, tasks, and evaluation metrics.
The package is available at \url{https://github.com/cdjellen/otbench}.
Related papers
- Explanatory Model Monitoring to Understand the Effects of Feature Shifts on Performance [61.06245197347139]
We propose a novel approach to explain the behavior of a black-box model under feature shifts.
We refer to our method that combines concepts from Optimal Transport and Shapley Values as Explanatory Performance Estimation.
arXiv Detail & Related papers (2024-08-24T18:28:19Z) - PerturBench: Benchmarking Machine Learning Models for Cellular Perturbation Analysis [14.526536510805755]
We present a comprehensive framework for predicting the effects of perturbations in single cells, designed to standardize benchmarking in this rapidly evolving field.
Our framework, PerturBench, includes a user-friendly platform, diverse datasets, metrics for fair model comparison, and detailed performance analysis.
arXiv Detail & Related papers (2024-08-20T07:40:20Z) - SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models [85.67096251281191]
We present an innovative approach to model fusion called zero-shot Sparse MIxture of Low-rank Experts (SMILE) construction.
SMILE allows for the upscaling of source models into an MoE model without extra data or further training.
We conduct extensive experiments across diverse scenarios, such as image classification and text generation tasks, using full fine-tuning and LoRA fine-tuning.
arXiv Detail & Related papers (2024-08-19T17:32:15Z) - Meta-learning and Data Augmentation for Stress Testing Forecasting Models [0.33554367023486936]
A model is considered to be under stress if it shows a negative behaviour, such as higher-than-usual errors or increased uncertainty.
This paper contributes with a novel framework called MAST (Meta-learning and data Augmentation for Stress Testing)
arXiv Detail & Related papers (2024-06-24T17:59:33Z) - Data-efficient Large Vision Models through Sequential Autoregression [58.26179273091461]
We develop an efficient, autoregression-based vision model on a limited dataset.
We demonstrate how this model achieves proficiency in a spectrum of visual tasks spanning both high-level and low-level semantic understanding.
Our empirical evaluations underscore the model's agility in adapting to various tasks, heralding a significant reduction in the parameter footprint.
arXiv Detail & Related papers (2024-02-07T13:41:53Z) - QualEval: Qualitative Evaluation for Model Improvement [82.73561470966658]
We propose QualEval, which augments quantitative scalar metrics with automated qualitative evaluation as a vehicle for model improvement.
QualEval uses a powerful LLM reasoner and our novel flexible linear programming solver to generate human-readable insights.
We demonstrate that leveraging its insights, for example, improves the absolute performance of the Llama 2 model by up to 15% points relative.
arXiv Detail & Related papers (2023-11-06T00:21:44Z) - Robustness and Generalization Performance of Deep Learning Models on
Cyber-Physical Systems: A Comparative Study [71.84852429039881]
Investigation focuses on the models' ability to handle a range of perturbations, such as sensor faults and noise.
We test the generalization and transfer learning capabilities of these models by exposing them to out-of-distribution (OOD) samples.
arXiv Detail & Related papers (2023-06-13T12:43:59Z) - Studying How to Efficiently and Effectively Guide Models with Explanations [52.498055901649025]
'Model guidance' is the idea of regularizing the models' explanations to ensure that they are "right for the right reasons"
We conduct an in-depth evaluation across various loss functions, attribution methods, models, and 'guidance depths' on the PASCAL VOC 2007 and MS COCO 2014 datasets.
Specifically, we guide the models via bounding box annotations, which are much cheaper to obtain than the commonly used segmentation masks.
arXiv Detail & Related papers (2023-03-21T15:34:50Z) - How Faithful is your Synthetic Data? Sample-level Metrics for Evaluating
and Auditing Generative Models [95.8037674226622]
We introduce a 3-dimensional evaluation metric that characterizes the fidelity, diversity and generalization performance of any generative model in a domain-agnostic fashion.
Our metric unifies statistical divergence measures with precision-recall analysis, enabling sample- and distribution-level diagnoses of model fidelity and diversity.
arXiv Detail & Related papers (2021-02-17T18:25:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.