Related papers: Dynatask: A Framework for Creating Dynamic AI Benchmark Tasks

Dynatask: A Framework for Creating Dynamic AI Benchmark Tasks

URL: http://arxiv.org/abs/2204.01906v1
Date: Tue, 5 Apr 2022 00:32:04 GMT
Title: Dynatask: A Framework for Creating Dynamic AI Benchmark Tasks
Authors: Tristan Thrush, Kushal Tirumala, Anmol Gupta, Max Bartolo, Pedro Rodriguez, Tariq Kane, William Gaviria Rojas, Peter Mattson, Adina Williams, Douwe Kiela
Abstract summary: Dynatask is an open source system for setting up custom NLP tasks. It is integrated with Dynabench, a research platform for rethinking benchmarking in AI.
Score: 31.460091555017197
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We introduce Dynatask: an open source system for setting up custom NLP tasks that aims to greatly lower the technical knowledge and effort required for hosting and evaluating state-of-the-art NLP models, as well as for conducting model in the loop data collection with crowdworkers. Dynatask is integrated with Dynabench, a research platform for rethinking benchmarking in AI that facilitates human and model in the loop data collection and evaluation. To create a task, users only need to write a short task configuration file from which the relevant web interfaces and model hosting infrastructure are automatically generated. The system is available at https://dynabench.org/ and the full library can be found at https://github.com/facebookresearch/dynabench.

Related papers

TURA: Tool-Augmented Unified Retrieval Agent for AI Search [18.427511565701394]
Traditional RAG approaches struggle with real-time needs and structured queries.<n>We introduce TURA, a novel three-stage framework that combines RAG with agentic tool-use to access both static content and dynamic, real-time information.
arXiv Detail & Related papers (2025-08-06T16:24:17Z)
Chain-of-Skills: A Configurable Model for Open-domain Question Answering [79.8644260578301]
The retrieval model is an indispensable component for real-world knowledge-intensive tasks. Recent work focuses on customized methods, limiting the model transferability and scalability. We propose a modular retriever where individual modules correspond to key skills that can be reused across datasets.
arXiv Detail & Related papers (2023-05-04T20:19:39Z)
TaskMatrix.AI: Completing Tasks by Connecting Foundation Models with Millions of APIs [71.7495056818522]
We introduce TaskMatrix.AI as a new AI ecosystem that connects foundation models with millions of APIs for task completion. We will present our vision of how to build such an ecosystem, explain each key component, and use study cases to illustrate both the feasibility of this vision and the main challenges we need to address next.
arXiv Detail & Related papers (2023-03-29T03:30:38Z)
Autoregressive Search Engines: Generating Substrings as Document Identifiers [53.0729058170278]
Autoregressive language models are emerging as the de-facto standard for generating answers. Previous work has explored ways to partition the search space into hierarchical structures. In this work we propose an alternative that doesn't force any structure in the search space: using all ngrams in a passage as its possible identifiers.
arXiv Detail & Related papers (2022-04-22T10:45:01Z)
Dynabench: Rethinking Benchmarking in NLP [82.26699038776812]
We introduce Dynabench, an open-source platform for dynamic dataset creation and model benchmarking. Dynabench runs in a web browser and supports human-and-model-in-the-loop dataset creation. We report on four initial NLP tasks, illustrating these concepts and highlighting the promise of the platform.
arXiv Detail & Related papers (2021-04-07T17:49:17Z)
KILT: a Benchmark for Knowledge Intensive Language Tasks [102.33046195554886]
We present a benchmark for knowledge-intensive language tasks (KILT) All tasks in KILT are grounded in the same snapshot of Wikipedia. We find that a shared dense vector index coupled with a seq2seq model is a strong baseline.
arXiv Detail & Related papers (2020-09-04T15:32:19Z)
MTL-NAS: Task-Agnostic Neural Architecture Search towards General-Purpose Multi-Task Learning [71.90902837008278]
We propose to incorporate neural architecture search (NAS) into general-purpose multi-task learning (GP-MTL) In order to adapt to different task combinations, we disentangle the GP-MTL networks into single-task backbones. We also propose a novel single-shot gradient-based search algorithm that closes the performance gap between the searched architectures.
arXiv Detail & Related papers (2020-03-31T09:49:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.