Related papers: A Configurable Library for Generating and Manipulating Maze Datasets

A Configurable Library for Generating and Manipulating Maze Datasets

URL: http://arxiv.org/abs/2309.10498v2
Date: Tue, 24 Oct 2023 21:42:01 GMT
Title: A Configurable Library for Generating and Manipulating Maze Datasets
Authors: Michael Igorevich Ivanitskiy, Rusheb Shah, Alex F. Spies, Tilman R\"auker, Dan Valentine, Can Rager, Lucia Quirke, Chris Mathwin, Guillaume Corlouer, Cecilia Diniz Behn, Samy Wu Fung
Abstract summary: Mazes serve as an excellent testbed due to varied generation algorithms. We present $textttmaze-dataset$, a comprehensive library for generating, processing, and visualizing datasets consisting of maze-solving tasks.
Score: 0.9268994664916388
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Understanding how machine learning models respond to distributional shifts is a key research challenge. Mazes serve as an excellent testbed due to varied generation algorithms offering a nuanced platform to simulate both subtle and pronounced distributional shifts. To enable systematic investigations of model behavior on out-of-distribution data, we present $\texttt{maze-dataset}$, a comprehensive library for generating, processing, and visualizing datasets consisting of maze-solving tasks. With this library, researchers can easily create datasets, having extensive control over the generation algorithm used, the parameters fed to the algorithm of choice, and the filters that generated mazes must satisfy. Furthermore, it supports multiple output formats, including rasterized and text-based, catering to convolutional neural networks and autoregressive transformer models. These formats, along with tools for visualizing and converting between them, ensure versatility and adaptability in research applications.

Related papers

SnipGen: A Mining Repository Framework for Evaluating LLMs for Code [51.07471575337676]
Language Models (LLMs) are trained on extensive datasets that include code repositories. evaluating their effectiveness poses significant challenges due to the potential overlap between the datasets used for training and those employed for evaluation. We introduce SnipGen, a comprehensive repository mining framework designed to leverage prompt engineering across various downstream tasks for code generation.
arXiv Detail & Related papers (2025-02-10T21:28:15Z)
Generating Realistic Tabular Data with Large Language Models [49.03536886067729]
Large language models (LLM) have been used for diverse tasks, but do not capture the correct correlation between the features and the target variable. We propose a LLM-based method with three important improvements to correctly capture the ground-truth feature-class correlation in the real data. Our experiments show that our method significantly outperforms 10 SOTA baselines on 20 datasets in downstream tasks.
arXiv Detail & Related papers (2024-10-29T04:14:32Z)
Contrastive Transformer Learning with Proximity Data Generation for Text-Based Person Search [60.626459715780605]
Given a descriptive text query, text-based person search aims to retrieve the best-matched target person from an image gallery. Such a cross-modal retrieval task is quite challenging due to significant modality gap, fine-grained differences and insufficiency of annotated data. In this paper, we propose a simple yet effective dual Transformer model for text-based person search.
arXiv Detail & Related papers (2023-11-15T16:26:49Z)
Dataset Factory: A Toolchain For Generative Computer Vision Datasets [0.9013233848500058]
We propose a "dataset factory" that separates the storage and processing of samples from metadata. This enables data-centric operations at scale for machine learning teams and individual researchers.
arXiv Detail & Related papers (2023-09-20T19:43:37Z)
Diffusion Model is an Effective Planner and Data Synthesizer for Multi-Task Reinforcement Learning [101.66860222415512]
Multi-Task Diffusion Model (textscMTDiff) is a diffusion-based method that incorporates Transformer backbones and prompt learning for generative planning and data synthesis. For generative planning, we find textscMTDiff outperforms state-of-the-art algorithms across 50 tasks on Meta-World and 8 maps on Maze2D.
arXiv Detail & Related papers (2023-05-29T05:20:38Z)
TSGM: A Flexible Framework for Generative Modeling of Synthetic Time Series [61.436361263605114]
Time series data are often scarce or highly sensitive, which precludes the sharing of data between researchers and industrial organizations. We introduce Time Series Generative Modeling (TSGM), an open-source framework for the generative modeling of synthetic time series.
arXiv Detail & Related papers (2023-05-19T10:11:21Z)
Creating Synthetic Datasets for Collaborative Filtering Recommender Systems using Generative Adversarial Networks [1.290382979353427]
Research and education in machine learning needs diverse, representative, and open datasets to handle the necessary training, validation, and testing tasks. To feed this research variety, it is necessary and convenient to reinforce the existing datasets with synthetic ones. This paper proposes a Generative Adversarial Network (GAN)-based method to generate collaborative filtering datasets.
arXiv Detail & Related papers (2023-03-02T14:23:27Z)
Variational Autoencoding Neural Operators [17.812064311297117]
Unsupervised learning with functional data is an emerging paradigm of machine learning research with applications to computer vision, climate modeling and physical systems. We present Variational Autoencoding Neural Operators (VANO), a general strategy for making a large class of operator learning architectures act as variational autoencoders.
arXiv Detail & Related papers (2023-02-20T22:34:43Z)
Dataset Interfaces: Diagnosing Model Failures Using Controllable Counterfactual Generation [85.13934713535527]
Distribution shift is a major source of failure for machine learning models. We introduce the notion of a dataset interface: a framework that, given an input dataset and a user-specified shift, returns instances that exhibit the desired shift. We demonstrate how applying this dataset interface to the ImageNet dataset enables studying model behavior across a diverse array of distribution shifts.
arXiv Detail & Related papers (2023-02-15T18:56:26Z)
Merlion: A Machine Learning Library for Time Series [73.46386700728577]
Merlion is an open-source machine learning library for time series. It features a unified interface for models and datasets for anomaly detection and forecasting. Merlion also provides a unique evaluation framework that simulates the live deployment and re-training of a model in production.
arXiv Detail & Related papers (2021-09-20T02:03:43Z)
Benchpress: A Scalable and Versatile Workflow for Benchmarking Structure Learning Algorithms [1.7188280334580197]
Probabilistic graphical models are one common approach to modelling the data generating mechanism. We present a novel Snakemake workflow called Benchpress for producing scalable, reproducible, and platform-independent benchmarks. We demonstrate the applicability of this workflow for learning Bayesian networks in five typical data scenarios.
arXiv Detail & Related papers (2021-07-08T14:19:28Z)
Out-of-distribution Detection and Generation using Soft Brownian Offset Sampling and Autoencoders [1.313418334200599]
Deep neural networks often suffer from overconfidence which can be partly remedied by improved out-of-distribution detection. We propose a novel approach that allows for the generation of out-of-distribution datasets based on a given in-distribution dataset. This new dataset can then be used to improve out-of-distribution detection for the given dataset and machine learning task at hand.
arXiv Detail & Related papers (2021-05-04T06:59:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.