Related papers: Employing Continuous Integration inspired workflows for benchmarking of scientific software -- a use case on numerical cut cell quadrature

Employing Continuous Integration inspired workflows for benchmarking of scientific software -- a use case on numerical cut cell quadrature

URL: http://arxiv.org/abs/2503.17192v3
Date: Wed, 21 May 2025 08:45:21 GMT
Title: Employing Continuous Integration inspired workflows for benchmarking of scientific software -- a use case on numerical cut cell quadrature
Authors: Teoman Toprak, Michael Loibl, Guilherme H. Teixeira, Irina Shiskina, Chen Miao, Josef Kiendl, Benjamin Marussig, Florian Kummer,
Abstract summary: This paper presents a proven approach that utilizes established Continuous Integration tools and practices to achieve high automation of benchmark execution and reporting.<n>Our use case is the numerical integration (quadrature) on arbitrary domains, which are bounded by implicitly or parametrically defined curves or surfaces in 2D or 3D.
Score: 0.3387808070669509
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: In the field of scientific computing, one often finds several alternative software packages (with open or closed source code) for solving a specific problem. These packages sometimes even use alternative methodological approaches, e.g., different numerical discretizations. If one decides to use one of these packages, it is often not clear which one is the best choice. To make an informed decision, it is necessary to measure the performance of the alternative software packages for a suitable set of test problems, i.e. to set up a benchmark. However, setting up benchmarks ad-hoc can become overwhelming as the parameter space expands rapidly. Very often, the design of the benchmark is also not fully set at the start of some project. For instance, adding new libraries, adapting metrics, or introducing new benchmark cases during the project can significantly increase complexity and necessitate laborious re-evaluation of previous results. This paper presents a proven approach that utilizes established Continuous Integration tools and practices to achieve high automation of benchmark execution and reporting. Our use case is the numerical integration (quadrature) on arbitrary domains, which are bounded by implicitly or parametrically defined curves or surfaces in 2D or 3D.

Related papers

Guided Search Strategies in Non-Serializable Environments with Applications to Software Engineering Agents [31.651748374218446]
Large language models (LLMs) have recently achieved remarkable results in complex multi-step tasks.<n>They often struggle to maintain consistent performance across multiple solution attempts.
arXiv Detail & Related papers (2025-05-19T18:50:15Z)
Sample, Don't Search: Rethinking Test-Time Alignment for Language Models [55.2480439325792]
We introduce QAlign, a new test-time alignment approach. As we scale test-time compute, QAlign converges to sampling from the optimal aligned distribution for each individual prompt. By adopting recent advances in Markov chain Monte Carlo for text generation, our method enables better-aligned outputs without modifying the underlying model or even requiring logit access.
arXiv Detail & Related papers (2025-04-04T00:41:40Z)
Thinking Longer, Not Larger: Enhancing Software Engineering Agents via Scaling Test-Time Compute [61.00662702026523]
We propose a unified Test-Time Compute scaling framework that leverages increased inference-time instead of larger models.<n>Our framework incorporates two complementary strategies: internal TTC and external TTC.<n>We demonstrate our textbf32B model achieves a 46% issue resolution rate, surpassing significantly larger models such as DeepSeek R1 671B and OpenAI o1.
arXiv Detail & Related papers (2025-03-31T07:31:32Z)
Codev-Bench: How Do LLMs Understand Developer-Centric Code Completion? [60.84912551069379]
We present the Code-Development Benchmark (Codev-Bench), a fine-grained, real-world, repository-level, and developer-centric evaluation framework. Codev-Agent is an agent-based system that automates repository crawling, constructs execution environments, extracts dynamic calling chains from existing unit tests, and generates new test samples to avoid data leakage.
arXiv Detail & Related papers (2024-10-02T09:11:10Z)
Benchmarking Predictive Coding Networks -- Made Simple [48.652114040426625]
We tackle the problems of efficiency and scalability for predictive coding networks (PCNs) in machine learning.<n>We propose a library, called PCX, that focuses on performance and simplicity, and use it to implement a large set of standard benchmarks.<n>We perform extensive tests on such benchmarks using both existing algorithms for PCNs, as well as adaptations of other methods popular in the bio-plausible deep learning community.
arXiv Detail & Related papers (2024-07-01T10:33:44Z)
Automatic benchmarking of large multimodal models via iterative experiment programming [71.78089106671581]
We present APEx, the first framework for automatic benchmarking of LMMs. Given a research question expressed in natural language, APEx leverages a large language model (LLM) and a library of pre-specified tools to generate a set of experiments for the model at hand. The report drives the testing procedure: based on the current status of the investigation, APEx chooses which experiments to perform and whether the results are sufficient to draw conclusions.
arXiv Detail & Related papers (2024-06-18T06:43:46Z)
ComplexityMeasures.jl: scalable software to unify and accelerate entropy and complexity timeseries analysis [0.0]
ComplexityMeasures.jl is an easily extendable and highly performant open-source software that implements a vast selection of complexity measures.<n>The software provides 1638 measures with 3,841 lines of source code, averaging only 2.3 lines of code per exported quantity.
arXiv Detail & Related papers (2024-06-07T15:22:45Z)
LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit [55.73370804397226]
Quantization, a key compression technique, can effectively mitigate these demands by compressing and accelerating large language models. We present LLMC, a plug-and-play compression toolkit, to fairly and systematically explore the impact of quantization. Powered by this versatile toolkit, our benchmark covers three key aspects: calibration data, algorithms (three strategies), and data formats.
arXiv Detail & Related papers (2024-05-09T11:49:05Z)
High-dimensional mixed-categorical Gaussian processes with application to multidisciplinary design optimization for a green aircraft [0.6749750044497732]
This paper introduces an innovative dimension reduction algorithm that relies on partial least squares regression. Our goal is to generalize classical dimension reduction techniques to handle mixed-categorical inputs. The good potential of the proposed method is demonstrated in both structural and multidisciplinary application contexts.
arXiv Detail & Related papers (2023-11-10T15:48:51Z)
FuzzyFlow: Leveraging Dataflow To Find and Squash Program Optimization Bugs [92.47146416628965]
FuzzyFlow is a fault localization and test case extraction framework designed to test program optimizations. We leverage dataflow program representations to capture a fully reproducible system state and area-of-effect for optimizations. To reduce testing time, we design an algorithm for minimizing test inputs, trading off memory for recomputation.
arXiv Detail & Related papers (2023-06-28T13:00:17Z)
Efficiently Controlling Multiple Risks with Pareto Testing [34.83506056862348]
We propose a two-stage process which combines multi-objective optimization with multiple hypothesis testing. We demonstrate the effectiveness of our approach to reliably accelerate the execution of large-scale Transformer models in natural language processing (NLP) applications.
arXiv Detail & Related papers (2022-10-14T15:54:39Z)
PDEBENCH: An Extensive Benchmark for Scientific Machine Learning [20.036987098901644]
We introduce PDEBench, a benchmark suite of time-dependent simulation tasks based on Partial Differential Equations (PDEs) PDEBench comprises both code and data to benchmark the performance of novel machine learning models against both classical numerical simulations and machine learning baselines.
arXiv Detail & Related papers (2022-10-13T17:03:36Z)
FaDIn: Fast Discretized Inference for Hawkes Processes with General Parametric Kernels [82.53569355337586]
This work offers an efficient solution to temporal point processes inference using general parametric kernels with finite support. The method's effectiveness is evaluated by modeling the occurrence of stimuli-induced patterns from brain signals recorded with magnetoencephalography (MEG) Results show that the proposed approach leads to an improved estimation of pattern latency than the state-of-the-art.
arXiv Detail & Related papers (2022-10-10T12:35:02Z)
Theseus: A Library for Differentiable Nonlinear Optimization [21.993680737841476]
Theseus is an efficient application-agnostic library for differentiable nonlinear least squares (DNLS) optimization built on PyTorch. Theseus provides a common framework for end-to-end structured learning in robotics and vision.
arXiv Detail & Related papers (2022-07-19T17:57:40Z)
HyperImpute: Generalized Iterative Imputation with Automatic Model Selection [77.86861638371926]
We propose a generalized iterative imputation framework for adaptively and automatically configuring column-wise models. We provide a concrete implementation with out-of-the-box learners, simulators, and interfaces.
arXiv Detail & Related papers (2022-06-15T19:10:35Z)
Geometric Optimisation on Manifolds with Applications to Deep Learning [6.85316573653194]
We design and implement a Python library to help the non-expert using all these powerful tools. The algorithms implemented in this library have been designed with usability and GPU efficiency in mind.
arXiv Detail & Related papers (2022-03-09T15:20:07Z)
MQBench: Towards Reproducible and Deployable Model Quantization Benchmark [53.12623958951738]
MQBench is a first attempt to evaluate, analyze, and benchmark the and deployability for model quantization algorithms. We choose multiple platforms for real-world deployments, including CPU, GPU, ASIC, DSP, and evaluate extensive state-of-the-art quantization algorithms. We conduct a comprehensive analysis and find considerable intuitive or counter-intuitive insights.
arXiv Detail & Related papers (2021-11-05T23:38:44Z)
Finding Geometric Models by Clustering in the Consensus Space [61.65661010039768]
We propose a new algorithm for finding an unknown number of geometric models, e.g., homographies. We present a number of applications where the use of multiple geometric models improves accuracy. These include pose estimation from multiple generalized homographies; trajectory estimation of fast-moving objects.
arXiv Detail & Related papers (2021-03-25T14:35:07Z)
Adaptive Local Bayesian Optimization Over Multiple Discrete Variables [9.860437640748113]
This paper describes the approach of team KAIST OSI in a step-wise manner, which outperforms the baseline algorithms by up to +20.39%. In a similar vein, we combine the methodology of Bayesian and multi-armed bandit,(MAB) approach to select the values with the consideration of the variable types. Empirical evaluations demonstrate that our method outperforms the existing methods across different tasks.
arXiv Detail & Related papers (2020-12-07T07:51:23Z)
Information-Theoretic Multi-Objective Bayesian Optimization with Continuous Approximations [44.25245545568633]
We propose information-Theoretic Multi-Objective Bayesian Optimization with Continuous Approximations (iMOCA) to solve this problem. Our experiments on diverse synthetic and real-world benchmarks show that iMOCA significantly improves over existing single-fidelity methods.
arXiv Detail & Related papers (2020-09-12T01:46:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.