Structured Prediction Problem Archive
- URL: http://arxiv.org/abs/2202.03574v5
- Date: Fri, 17 Nov 2023 10:01:56 GMT
- Title: Structured Prediction Problem Archive
- Authors: Paul Swoboda, Bjoern Andres, Andrea Hornakova, Florian Bernard, Jannik
Irmai, Paul Roetzer, Bogdan Savchynskyy, David Stein, Ahmed Abbas
- Abstract summary: Structured prediction problems are one of the fundamental tools in machine learning.
We collect in one place a large number of datasets in easy to read formats for a diverse set of problem classes.
For reference we also give a non-exhaustive selection of algorithms proposed in the literature for their solution.
- Score: 30.27508546519084
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Structured prediction problems are one of the fundamental tools in machine
learning. In order to facilitate algorithm development for their numerical
solution, we collect in one place a large number of datasets in easy to read
formats for a diverse set of problem classes. We provide archival links to
datasets, description of the considered problems and problem formats, and a
short summary of problem characteristics including size, number of instances
etc. For reference we also give a non-exhaustive selection of algorithms
proposed in the literature for their solution. We hope that this central
repository will make benchmarking and comparison to established works easier.
We welcome submission of interesting new datasets and algorithms for inclusion
in our archive.
Related papers
- RV-Syn: Rational and Verifiable Mathematical Reasoning Data Synthesis based on Structured Function Library [58.404895570822184]
RV-Syn is a novel mathematical Synthesis approach.
It generates graphs as solutions by combining Python-formatted functions from this library.
Based on the constructed graph, we achieve solution-guided logic-aware problem generation.
arXiv Detail & Related papers (2025-04-29T04:42:02Z) - A Survey on Small Sample Imbalance Problem: Metrics, Feature Analysis, and Solutions [41.77642958758829]
The small sample imbalance (S&I) problem is a major challenge in machine learning and data analysis.
Existing methods often rely on algorithmics without sufficiently analyzing the underlying data characteristics.
We argue that a detailed analysis from the data perspective is essential before developing an appropriate solution.
arXiv Detail & Related papers (2025-04-21T01:58:29Z) - Discovering Data Structures: Nearest Neighbor Search and Beyond [18.774836778996544]
We propose a general framework for end-to-end learning of data structures.
Our framework adapts to the underlying data distribution and provides fine-grained control over query and space complexity.
We first apply this framework to the problem of nearest neighbor search.
arXiv Detail & Related papers (2024-11-05T16:50:54Z) - A Survey on Computational Solutions for Reconstructing Complete Objects by Reassembling Their Fractured Parts [25.59032022422813]
Reconstructing a complete object from its parts is a fundamental problem in many scientific domains.
We provide existing algorithms in this context and emphasize their similarities and differences to general-purpose approaches.
In addition to algorithms, this survey will also describe existing datasets, open-source software packages, and applications.
arXiv Detail & Related papers (2024-10-18T17:53:07Z) - Deep Learning-Driven Approach for Handwritten Chinese Character Classification [0.0]
Handwritten character recognition is a challenging problem for machine learning researchers.
With numerous unique character classes present, some data, such as Logographic Scripts or Sino-Korean character sequences, bring new complications to the HCR problem.
This paper proposes a highly scalable approach for detailed character image classification by introducing the model architecture, data preprocessing steps, and testing design instructions.
arXiv Detail & Related papers (2024-01-30T15:29:32Z) - Problem-Solving Guide: Predicting the Algorithm Tags and Difficulty for Competitive Programming Problems [7.955313479061445]
Most tech companies require the ability to solve algorithm problems including Google, Meta, and Amazon.
Our study addresses the task of predicting the algorithm tag as a useful tool for engineers and developers.
We also consider predicting the difficulty levels of algorithm problems, which can be used as useful guidance to calculate the required time to solve that problem.
arXiv Detail & Related papers (2023-10-09T15:26:07Z) - Optimal and Efficient Binary Questioning for Human-in-the-Loop
Annotation [11.4375764457726]
This paper studies the neglected complementary problem of getting annotated data given a predictor.
For the simple binary classification setting, we present the spectrum ranging from optimal general solutions to practical efficient methods.
arXiv Detail & Related papers (2023-07-04T09:11:33Z) - A Gold Standard Dataset for the Reviewer Assignment Problem [117.59690218507565]
"Similarity score" is a numerical estimate of the expertise of a reviewer in reviewing a paper.
Our dataset consists of 477 self-reported expertise scores provided by 58 researchers.
For the task of ordering two papers in terms of their relevance for a reviewer, the error rates range from 12%-30% in easy cases to 36%-43% in hard cases.
arXiv Detail & Related papers (2023-03-23T16:15:03Z) - Minimalistic Predictions to Schedule Jobs with Online Precedence
Constraints [117.8317521974783]
We consider non-clairvoyant scheduling with online precedence constraints.
An algorithm is oblivious to any job dependencies and learns about a job only if all of its predecessors have been completed.
arXiv Detail & Related papers (2023-01-30T13:17:15Z) - When is Memorization of Irrelevant Training Data Necessary for
High-Accuracy Learning? [53.523017945443115]
We describe natural prediction problems in which every sufficiently accurate training algorithm must encode, in the prediction model, essentially all the information about a large subset of its training examples.
Our results do not depend on the training algorithm or the class of models used for learning.
arXiv Detail & Related papers (2020-12-11T15:25:14Z) - An Integer Linear Programming Framework for Mining Constraints from Data [81.60135973848125]
We present a general framework for mining constraints from data.
In particular, we consider the inference in structured output prediction as an integer linear programming (ILP) problem.
We show that our approach can learn to solve 9x9 Sudoku puzzles and minimal spanning tree problems from examples without providing the underlying rules.
arXiv Detail & Related papers (2020-06-18T20:09:53Z) - Machine Number Sense: A Dataset of Visual Arithmetic Problems for
Abstract and Relational Reasoning [95.18337034090648]
We propose a dataset, Machine Number Sense (MNS), consisting of visual arithmetic problems automatically generated using a grammar model--And-Or Graph (AOG)
These visual arithmetic problems are in the form of geometric figures.
We benchmark the MNS dataset using four predominant neural network models as baselines in this visual reasoning task.
arXiv Detail & Related papers (2020-04-25T17:14:58Z) - Optimal Clustering from Noisy Binary Feedback [75.17453757892152]
We study the problem of clustering a set of items from binary user feedback.
We devise an algorithm with a minimal cluster recovery error rate.
For adaptive selection, we develop an algorithm inspired by the derivation of the information-theoretical error lower bounds.
arXiv Detail & Related papers (2019-10-14T09:18:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.