A portfolio-based analysis method for competition results
- URL: http://arxiv.org/abs/2205.15414v1
- Date: Mon, 30 May 2022 20:20:45 GMT
- Title: A portfolio-based analysis method for competition results
- Authors: Nguyen Dang
- Abstract summary: I will describe a portfolio-based analysis method which can give complementary insights into the performance of participating solvers in a competition.
The method is demonstrated on the results of the MiniZinc Challenges and new insights gained from the portfolio viewpoint are presented.
- Score: 0.8680676599607126
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Competitions such as the MiniZinc Challenges or the SAT competitions have
been very useful sources for comparing performance of different solving
approaches and for advancing the state-of-the-arts of the fields. Traditional
competition setting often focuses on producing a ranking between solvers based
on their average performance across a wide range of benchmark problems and
instances. While this is a sensible way to assess the relative performance of
solvers, such ranking does not necessarily reflect the full potential of a
solver, especially when we want to utilise a portfolio of solvers instead of a
single one for solving a new problem. In this paper, I will describe a
portfolio-based analysis method which can give complementary insights into the
performance of participating solvers in a competition. The method is
demonstrated on the results of the MiniZinc Challenges and new insights gained
from the portfolio viewpoint are presented.
Related papers
- LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level Mathematical Reasoning [56.273799410256075]
The framework combines Monte Carlo Tree Search (MCTS) with iterative Self-Refine to optimize the reasoning path.
The framework has been tested on general and advanced benchmarks, showing superior performance in terms of search efficiency and problem-solving capability.
arXiv Detail & Related papers (2024-10-03T18:12:29Z) - Are we making progress in unlearning? Findings from the first NeurIPS unlearning competition [70.60872754129832]
First NeurIPS competition on unlearning sought to stimulate the development of novel algorithms.
Nearly 1,200 teams from across the world participated.
We analyze top solutions and delve into discussions on benchmarking unlearning.
arXiv Detail & Related papers (2024-06-13T12:58:00Z) - Analysis of Systems' Performance in Natural Language Processing Competitions [6.197993866688085]
This manuscript describes an evaluation methodology for statistically analyzing competition results and competition.
The proposed methodology offers several advantages, including off-the-shell comparisons with correction mechanisms and the inclusion of confidence intervals.
Our analysis shows the potential usefulness of our methodology for effectively evaluating competition results.
arXiv Detail & Related papers (2024-03-07T17:42:40Z) - CompeteSMoE -- Effective Training of Sparse Mixture of Experts via
Competition [52.2034494666179]
Sparse mixture of experts (SMoE) offers an appealing solution to scale up the model complexity beyond the mean of increasing the network's depth or width.
We propose a competition mechanism to address this fundamental challenge of representation collapse.
By routing inputs only to experts with the highest neural response, we show that, under mild assumptions, competition enjoys the same convergence rate as the optimal estimator.
arXiv Detail & Related papers (2024-02-04T15:17:09Z) - Benchmarking Robustness and Generalization in Multi-Agent Systems: A
Case Study on Neural MMO [50.58083807719749]
We present the results of the second Neural MMO challenge, hosted at IJCAI 2022, which received 1600+ submissions.
This competition targets robustness and generalization in multi-agent systems.
We will open-source our benchmark including the environment wrapper, baselines, a visualization tool, and selected policies for further research.
arXiv Detail & Related papers (2023-08-30T07:16:11Z) - Competitions in AI -- Robustly Ranking Solvers Using Statistical
Resampling [9.02080113915613]
We show that rankings resulting from the standard interpretation of competition results can be very sensitive to even minor changes in the benchmark instance set used as the basis for assessment.
We introduce a novel approach to statistically meaningful analysis of competition results based on resampling performance data.
Our approach produces confidence intervals of competition scores as well as statistically robust solver rankings with bounded error.
arXiv Detail & Related papers (2023-08-09T16:47:04Z) - EFaR 2023: Efficient Face Recognition Competition [51.77649060180531]
The paper presents the summary of the Efficient Face Recognition Competition (EFaR) held at the 2023 International Joint Conference on Biometrics (IJCB 2023)
The competition received 17 submissions from 6 different teams.
The submitted solutions are ranked based on a weighted score of the achieved verification accuracies on a diverse set of benchmarks, as well as the deployability given by the number of floating-point operations and model size.
arXiv Detail & Related papers (2023-08-08T09:58:22Z) - Towards robust and domain agnostic reinforcement learning competitions [12.731614722371376]
Reinforcement learning competitions have formed the basis for standard research benchmarks.
Despite this, a majority of challenges suffer from the same fundamental problems.
We present a new framework of competition design that promotes the development of algorithms that overcome these barriers.
arXiv Detail & Related papers (2021-06-07T16:15:46Z) - Weight-Sharing Neural Architecture Search: A Battle to Shrink the
Optimization Gap [90.93522795555724]
Neural architecture search (NAS) has attracted increasing attentions in both academia and industry.
Weight-sharing methods were proposed in which exponentially many architectures share weights in the same super-network.
This paper provides a literature review on NAS, in particular the weight-sharing methods.
arXiv Detail & Related papers (2020-08-04T11:57:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.