Comparison of classifiers in challenge scheme
- URL: http://arxiv.org/abs/2305.10452v1
- Date: Tue, 16 May 2023 23:38:34 GMT
- Title: Comparison of classifiers in challenge scheme
- Authors: Sergio Nava-Mu\~noz and Mario Graff Guerrero and Hugo Jair Escalante
- Abstract summary: This paper analyzes the results of the MeOffendEs@IberLEF 2021 competition.
It proposes to make inference through resampling techniques (bootstrap) to support Challenge organizers' decision-making.
- Score: 12.030094148004176
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In recent decades, challenges have become very popular in scientific research
as these are crowdsourcing schemes. In particular, challenges are essential for
developing machine learning algorithms. For the challenges settings, it is
vital to establish the scientific question, the dataset (with adequate quality,
quantity, diversity, and complexity), performance metrics, as well as a way to
authenticate the participants' results (Gold Standard). This paper addresses
the problem of evaluating the performance of different competitors (algorithms)
under the restrictions imposed by the challenge scheme, such as the comparison
of multiple competitors with a unique dataset (with fixed size), a minimal
number of submissions and, a set of metrics chosen to assess performance. The
algorithms are sorted according to the performance metric. Still, it is common
to observe performance differences among competitors as small as hundredths or
even thousandths, so the question is whether the differences are significant.
This paper analyzes the results of the MeOffendEs@IberLEF 2021 competition and
proposes to make inference through resampling techniques (bootstrap) to support
Challenge organizers' decision-making.
Related papers
- Analysis of Systems' Performance in Natural Language Processing Competitions [6.197993866688085]
This manuscript describes an evaluation methodology for statistically analyzing competition results and competition.
The proposed methodology offers several advantages, including off-the-shell comparisons with correction mechanisms and the inclusion of confidence intervals.
Our analysis shows the potential usefulness of our methodology for effectively evaluating competition results.
arXiv Detail & Related papers (2024-03-07T17:42:40Z) - Benchmarking Robustness and Generalization in Multi-Agent Systems: A
Case Study on Neural MMO [50.58083807719749]
We present the results of the second Neural MMO challenge, hosted at IJCAI 2022, which received 1600+ submissions.
This competition targets robustness and generalization in multi-agent systems.
We will open-source our benchmark including the environment wrapper, baselines, a visualization tool, and selected policies for further research.
arXiv Detail & Related papers (2023-08-30T07:16:11Z) - Tackling Diverse Minorities in Imbalanced Classification [80.78227787608714]
Imbalanced datasets are commonly observed in various real-world applications, presenting significant challenges in training classifiers.
We propose generating synthetic samples iteratively by mixing data samples from both minority and majority classes.
We demonstrate the effectiveness of our proposed framework through extensive experiments conducted on seven publicly available benchmark datasets.
arXiv Detail & Related papers (2023-08-28T18:48:34Z) - Competitions in AI -- Robustly Ranking Solvers Using Statistical
Resampling [9.02080113915613]
We show that rankings resulting from the standard interpretation of competition results can be very sensitive to even minor changes in the benchmark instance set used as the basis for assessment.
We introduce a novel approach to statistically meaningful analysis of competition results based on resampling performance data.
Our approach produces confidence intervals of competition scores as well as statistically robust solver rankings with bounded error.
arXiv Detail & Related papers (2023-08-09T16:47:04Z) - EFaR 2023: Efficient Face Recognition Competition [51.77649060180531]
The paper presents the summary of the Efficient Face Recognition Competition (EFaR) held at the 2023 International Joint Conference on Biometrics (IJCB 2023)
The competition received 17 submissions from 6 different teams.
The submitted solutions are ranked based on a weighted score of the achieved verification accuracies on a diverse set of benchmarks, as well as the deployability given by the number of floating-point operations and model size.
arXiv Detail & Related papers (2023-08-08T09:58:22Z) - A Gold Standard Dataset for the Reviewer Assignment Problem [117.59690218507565]
"Similarity score" is a numerical estimate of the expertise of a reviewer in reviewing a paper.
Our dataset consists of 477 self-reported expertise scores provided by 58 researchers.
For the task of ordering two papers in terms of their relevance for a reviewer, the error rates range from 12%-30% in easy cases to 36%-43% in hard cases.
arXiv Detail & Related papers (2023-03-23T16:15:03Z) - A portfolio-based analysis method for competition results [0.8680676599607126]
I will describe a portfolio-based analysis method which can give complementary insights into the performance of participating solvers in a competition.
The method is demonstrated on the results of the MiniZinc Challenges and new insights gained from the portfolio viewpoint are presented.
arXiv Detail & Related papers (2022-05-30T20:20:45Z) - Team Cogitat at NeurIPS 2021: Benchmarks for EEG Transfer Learning
Competition [55.34407717373643]
Building subject-independent deep learning models for EEG decoding faces the challenge of strong co-shift.
Our approach is to explicitly align feature distributions at various layers of the deep learning model.
The methodology won first place in the 2021 Benchmarks in EEG Transfer Learning competition, hosted at the NeurIPS conference.
arXiv Detail & Related papers (2022-02-01T11:11:08Z) - Investigating Class-level Difficulty Factors in Multi-label
Classification Problems [23.51529285126783]
This work investigates the use of class-level difficulty factors in multi-label classification problems for the first time.
Four difficulty factors are proposed: frequency, visual variation, semantic abstraction, and class co-occurrence.
These difficulty factors are shown to have several potential applications including the prediction of class-level performance across datasets.
arXiv Detail & Related papers (2020-05-01T15:06:53Z) - Long-Tailed Recognition Using Class-Balanced Experts [128.73438243408393]
We propose an ensemble of class-balanced experts that combines the strength of diverse classifiers.
Our ensemble of class-balanced experts reaches results close to state-of-the-art and an extended ensemble establishes a new state-of-the-art on two benchmarks for long-tailed recognition.
arXiv Detail & Related papers (2020-04-07T20:57:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.