Related papers: Structured Prediction Problem Archive

Structured Prediction Problem Archive

URL: http://arxiv.org/abs/2202.03574v5
Date: Fri, 17 Nov 2023 10:01:56 GMT
Title: Structured Prediction Problem Archive
Authors: Paul Swoboda, Bjoern Andres, Andrea Hornakova, Florian Bernard, Jannik Irmai, Paul Roetzer, Bogdan Savchynskyy, David Stein, Ahmed Abbas
Abstract summary: Structured prediction problems are one of the fundamental tools in machine learning. We collect in one place a large number of datasets in easy to read formats for a diverse set of problem classes. For reference we also give a non-exhaustive selection of algorithms proposed in the literature for their solution.
Score: 30.27508546519084
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Structured prediction problems are one of the fundamental tools in machine learning. In order to facilitate algorithm development for their numerical solution, we collect in one place a large number of datasets in easy to read formats for a diverse set of problem classes. We provide archival links to datasets, description of the considered problems and problem formats, and a short summary of problem characteristics including size, number of instances etc. For reference we also give a non-exhaustive selection of algorithms proposed in the literature for their solution. We hope that this central repository will make benchmarking and comparison to established works easier. We welcome submission of interesting new datasets and algorithms for inclusion in our archive.

Related papers

Large Language Models: A Mathematical Formulation [9.837462698662947]
Large language models (LLMs) process and predict sequences containing text to answer questions.<n>We provide a mathematical framework for LLMs by describing the encoding of text sequences into sequences of tokens.<n>We explain how these models are learned from data, and demonstrate how they are deployed to address a variety of tasks.
arXiv Detail & Related papers (2026-01-21T21:22:49Z)
CPRet: A Dataset, Benchmark, and Model for Retrieval in Competitive Programming [56.17331530444765]
CPRet is a retrieval-oriented benchmark suite for competitive programming.<n>It covers four retrieval tasks: two code-centric (i.e., Text-to-Code and Code-to-Code) and two newly proposed problem-centric tasks (i.e., Problem-to-Duplicate and Simplified-to-Full)<n>Our contribution includes both high-quality training data and temporally separated test sets for reliable evaluation.
arXiv Detail & Related papers (2025-05-19T10:07:51Z)
RV-Syn: Rational and Verifiable Mathematical Reasoning Data Synthesis based on Structured Function Library [58.404895570822184]
RV-Syn is a novel mathematical Synthesis approach. It generates graphs as solutions by combining Python-formatted functions from this library. Based on the constructed graph, we achieve solution-guided logic-aware problem generation.
arXiv Detail & Related papers (2025-04-29T04:42:02Z)
A Survey on Small Sample Imbalance Problem: Metrics, Feature Analysis, and Solutions [41.77642958758829]
The small sample imbalance (S&I) problem is a major challenge in machine learning and data analysis. Existing methods often rely on algorithmics without sufficiently analyzing the underlying data characteristics. We argue that a detailed analysis from the data perspective is essential before developing an appropriate solution.
arXiv Detail & Related papers (2025-04-21T01:58:29Z)
Discovering Data Structures: Nearest Neighbor Search and Beyond [18.774836778996544]
We propose a general framework for end-to-end learning of data structures. Our framework adapts to the underlying data distribution and provides fine-grained control over query and space complexity. We first apply this framework to the problem of nearest neighbor search.
arXiv Detail & Related papers (2024-11-05T16:50:54Z)
A Survey on Computational Solutions for Reconstructing Complete Objects by Reassembling Their Fractured Parts [25.59032022422813]
Reconstructing a complete object from its parts is a fundamental problem in many scientific domains. We provide existing algorithms in this context and emphasize their similarities and differences to general-purpose approaches. In addition to algorithms, this survey will also describe existing datasets, open-source software packages, and applications.
arXiv Detail & Related papers (2024-10-18T17:53:07Z)
Deep Learning-Driven Approach for Handwritten Chinese Character Classification [0.0]
Handwritten character recognition is a challenging problem for machine learning researchers. With numerous unique character classes present, some data, such as Logographic Scripts or Sino-Korean character sequences, bring new complications to the HCR problem. This paper proposes a highly scalable approach for detailed character image classification by introducing the model architecture, data preprocessing steps, and testing design instructions.
arXiv Detail & Related papers (2024-01-30T15:29:32Z)
Problem-Solving Guide: Predicting the Algorithm Tags and Difficulty for Competitive Programming Problems [7.955313479061445]
Most tech companies require the ability to solve algorithm problems including Google, Meta, and Amazon. Our study addresses the task of predicting the algorithm tag as a useful tool for engineers and developers. We also consider predicting the difficulty levels of algorithm problems, which can be used as useful guidance to calculate the required time to solve that problem.
arXiv Detail & Related papers (2023-10-09T15:26:07Z)
Optimal and Efficient Binary Questioning for Human-in-the-Loop Annotation [11.4375764457726]
This paper studies the neglected complementary problem of getting annotated data given a predictor. For the simple binary classification setting, we present the spectrum ranging from optimal general solutions to practical efficient methods.
arXiv Detail & Related papers (2023-07-04T09:11:33Z)
A Gold Standard Dataset for the Reviewer Assignment Problem [117.59690218507565]
"Similarity score" is a numerical estimate of the expertise of a reviewer in reviewing a paper. Our dataset consists of 477 self-reported expertise scores provided by 58 researchers. For the task of ordering two papers in terms of their relevance for a reviewer, the error rates range from 12%-30% in easy cases to 36%-43% in hard cases.
arXiv Detail & Related papers (2023-03-23T16:15:03Z)
Minimalistic Predictions to Schedule Jobs with Online Precedence Constraints [117.8317521974783]
We consider non-clairvoyant scheduling with online precedence constraints. An algorithm is oblivious to any job dependencies and learns about a job only if all of its predecessors have been completed.
arXiv Detail & Related papers (2023-01-30T13:17:15Z)
When is Memorization of Irrelevant Training Data Necessary for High-Accuracy Learning? [53.523017945443115]
We describe natural prediction problems in which every sufficiently accurate training algorithm must encode, in the prediction model, essentially all the information about a large subset of its training examples. Our results do not depend on the training algorithm or the class of models used for learning.
arXiv Detail & Related papers (2020-12-11T15:25:14Z)
An Integer Linear Programming Framework for Mining Constraints from Data [81.60135973848125]
We present a general framework for mining constraints from data. In particular, we consider the inference in structured output prediction as an integer linear programming (ILP) problem. We show that our approach can learn to solve 9x9 Sudoku puzzles and minimal spanning tree problems from examples without providing the underlying rules.
arXiv Detail & Related papers (2020-06-18T20:09:53Z)
Machine Number Sense: A Dataset of Visual Arithmetic Problems for Abstract and Relational Reasoning [95.18337034090648]
We propose a dataset, Machine Number Sense (MNS), consisting of visual arithmetic problems automatically generated using a grammar model--And-Or Graph (AOG) These visual arithmetic problems are in the form of geometric figures. We benchmark the MNS dataset using four predominant neural network models as baselines in this visual reasoning task.
arXiv Detail & Related papers (2020-04-25T17:14:58Z)
Optimal Clustering from Noisy Binary Feedback [75.17453757892152]
We study the problem of clustering a set of items from binary user feedback. We devise an algorithm with a minimal cluster recovery error rate. For adaptive selection, we develop an algorithm inspired by the derivation of the information-theoretical error lower bounds.
arXiv Detail & Related papers (2019-10-14T09:18:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.