A Configurable Library for Generating and Manipulating Maze Datasets
- URL: http://arxiv.org/abs/2309.10498v2
- Date: Tue, 24 Oct 2023 21:42:01 GMT
- Title: A Configurable Library for Generating and Manipulating Maze Datasets
- Authors: Michael Igorevich Ivanitskiy, Rusheb Shah, Alex F. Spies, Tilman
R\"auker, Dan Valentine, Can Rager, Lucia Quirke, Chris Mathwin, Guillaume
Corlouer, Cecilia Diniz Behn, Samy Wu Fung
- Abstract summary: Mazes serve as an excellent testbed due to varied generation algorithms.
We present $textttmaze-dataset$, a comprehensive library for generating, processing, and visualizing datasets consisting of maze-solving tasks.
- Score: 0.9268994664916388
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Understanding how machine learning models respond to distributional shifts is
a key research challenge. Mazes serve as an excellent testbed due to varied
generation algorithms offering a nuanced platform to simulate both subtle and
pronounced distributional shifts. To enable systematic investigations of model
behavior on out-of-distribution data, we present $\texttt{maze-dataset}$, a
comprehensive library for generating, processing, and visualizing datasets
consisting of maze-solving tasks. With this library, researchers can easily
create datasets, having extensive control over the generation algorithm used,
the parameters fed to the algorithm of choice, and the filters that generated
mazes must satisfy. Furthermore, it supports multiple output formats, including
rasterized and text-based, catering to convolutional neural networks and
autoregressive transformer models. These formats, along with tools for
visualizing and converting between them, ensure versatility and adaptability in
research applications.
Related papers
- Generating Realistic Tabular Data with Large Language Models [49.03536886067729]
Large language models (LLM) have been used for diverse tasks, but do not capture the correct correlation between the features and the target variable.
We propose a LLM-based method with three important improvements to correctly capture the ground-truth feature-class correlation in the real data.
Our experiments show that our method significantly outperforms 10 SOTA baselines on 20 datasets in downstream tasks.
arXiv Detail & Related papers (2024-10-29T04:14:32Z) - Contrastive Transformer Learning with Proximity Data Generation for
Text-Based Person Search [60.626459715780605]
Given a descriptive text query, text-based person search aims to retrieve the best-matched target person from an image gallery.
Such a cross-modal retrieval task is quite challenging due to significant modality gap, fine-grained differences and insufficiency of annotated data.
In this paper, we propose a simple yet effective dual Transformer model for text-based person search.
arXiv Detail & Related papers (2023-11-15T16:26:49Z) - Dataset Factory: A Toolchain For Generative Computer Vision Datasets [0.9013233848500058]
We propose a "dataset factory" that separates the storage and processing of samples from metadata.
This enables data-centric operations at scale for machine learning teams and individual researchers.
arXiv Detail & Related papers (2023-09-20T19:43:37Z) - Diffusion Model is an Effective Planner and Data Synthesizer for
Multi-Task Reinforcement Learning [101.66860222415512]
Multi-Task Diffusion Model (textscMTDiff) is a diffusion-based method that incorporates Transformer backbones and prompt learning for generative planning and data synthesis.
For generative planning, we find textscMTDiff outperforms state-of-the-art algorithms across 50 tasks on Meta-World and 8 maps on Maze2D.
arXiv Detail & Related papers (2023-05-29T05:20:38Z) - TSGM: A Flexible Framework for Generative Modeling of Synthetic Time Series [61.436361263605114]
Time series data are often scarce or highly sensitive, which precludes the sharing of data between researchers and industrial organizations.
We introduce Time Series Generative Modeling (TSGM), an open-source framework for the generative modeling of synthetic time series.
arXiv Detail & Related papers (2023-05-19T10:11:21Z) - Creating Synthetic Datasets for Collaborative Filtering Recommender
Systems using Generative Adversarial Networks [1.290382979353427]
Research and education in machine learning needs diverse, representative, and open datasets to handle the necessary training, validation, and testing tasks.
To feed this research variety, it is necessary and convenient to reinforce the existing datasets with synthetic ones.
This paper proposes a Generative Adversarial Network (GAN)-based method to generate collaborative filtering datasets.
arXiv Detail & Related papers (2023-03-02T14:23:27Z) - Variational Autoencoding Neural Operators [17.812064311297117]
Unsupervised learning with functional data is an emerging paradigm of machine learning research with applications to computer vision, climate modeling and physical systems.
We present Variational Autoencoding Neural Operators (VANO), a general strategy for making a large class of operator learning architectures act as variational autoencoders.
arXiv Detail & Related papers (2023-02-20T22:34:43Z) - Dataset Interfaces: Diagnosing Model Failures Using Controllable
Counterfactual Generation [85.13934713535527]
Distribution shift is a major source of failure for machine learning models.
We introduce the notion of a dataset interface: a framework that, given an input dataset and a user-specified shift, returns instances that exhibit the desired shift.
We demonstrate how applying this dataset interface to the ImageNet dataset enables studying model behavior across a diverse array of distribution shifts.
arXiv Detail & Related papers (2023-02-15T18:56:26Z) - Merlion: A Machine Learning Library for Time Series [73.46386700728577]
Merlion is an open-source machine learning library for time series.
It features a unified interface for models and datasets for anomaly detection and forecasting.
Merlion also provides a unique evaluation framework that simulates the live deployment and re-training of a model in production.
arXiv Detail & Related papers (2021-09-20T02:03:43Z) - Benchpress: A Scalable and Versatile Workflow for Benchmarking Structure
Learning Algorithms [1.7188280334580197]
Probabilistic graphical models are one common approach to modelling the data generating mechanism.
We present a novel Snakemake workflow called Benchpress for producing scalable, reproducible, and platform-independent benchmarks.
We demonstrate the applicability of this workflow for learning Bayesian networks in five typical data scenarios.
arXiv Detail & Related papers (2021-07-08T14:19:28Z) - Out-of-distribution Detection and Generation using Soft Brownian Offset
Sampling and Autoencoders [1.313418334200599]
Deep neural networks often suffer from overconfidence which can be partly remedied by improved out-of-distribution detection.
We propose a novel approach that allows for the generation of out-of-distribution datasets based on a given in-distribution dataset.
This new dataset can then be used to improve out-of-distribution detection for the given dataset and machine learning task at hand.
arXiv Detail & Related papers (2021-05-04T06:59:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.