SCENIC: A JAX Library for Computer Vision Research and Beyond
- URL: http://arxiv.org/abs/2110.11403v1
- Date: Mon, 18 Oct 2021 08:41:17 GMT
- Title: SCENIC: A JAX Library for Computer Vision Research and Beyond
- Authors: Mostafa Dehghani and Alexey Gritsenko and Anurag Arnab and Matthias
Minderer and Yi Tay
- Abstract summary: Scenic is an open-source JAX library with a focus on Transformer-based models for computer vision research and beyond.
The goal of this toolkit is to facilitate rapid experimentation, prototyping, and research of new vision architectures and models.
- Score: 44.21002948898551
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Scenic is an open-source JAX library with a focus on Transformer-based models
for computer vision research and beyond. The goal of this toolkit is to
facilitate rapid experimentation, prototyping, and research of new vision
architectures and models. Scenic supports a diverse range of vision tasks
(e.g., classification, segmentation, detection)and facilitates working on
multi-modal problems, along with GPU/TPU support for multi-host, multi-device
large-scale training. Scenic also offers optimized implementations of
state-of-the-art research models spanning a wide range of modalities. Scenic
has been successfully used for numerous projects and published papers and
continues serving as the library of choice for quick prototyping and
publication of new research ideas.
Related papers
- ZenSVI: An Open-Source Software for the Integrated Acquisition, Processing and Analysis of Street View Imagery Towards Scalable Urban Science [1.5494074223643037]
Street view imagery (SVI) has been instrumental in many studies in the past decade to understand and characterize street features and the built environment.
We develop ZenSVI, a free and open-source Python package that integrates and implements the entire process of SVI analysis.
arXiv Detail & Related papers (2024-12-24T07:13:17Z) - Collage: Decomposable Rapid Prototyping for Information Extraction on Scientific PDFs [15.610004991273005]
We present Collage, a tool designed for rapid prototyping, visualization, and evaluation of different information extraction models on scientific PDFs.
We enable both developers and users of NLP-based tools to inspect, debug, and better understand modeling pipelines by providing granular views of intermediate states of processing.
arXiv Detail & Related papers (2024-10-30T22:00:34Z) - A Survey of Small Language Models [104.80308007044634]
Small Language Models (SLMs) have become increasingly important due to their efficiency and performance to perform various language tasks with minimal computational resources.
We present a comprehensive survey on SLMs, focusing on their architectures, training techniques, and model compression techniques.
arXiv Detail & Related papers (2024-10-25T23:52:28Z) - Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs [61.143381152739046]
We introduce Cambrian-1, a family of multimodal LLMs (MLLMs) designed with a vision-centric approach.
Our study uses LLMs and visual instruction tuning as an interface to evaluate various visual representations.
We provide model weights, code, supporting tools, datasets, and detailed instruction-tuning and evaluation recipes.
arXiv Detail & Related papers (2024-06-24T17:59:42Z) - Enhancing Text Corpus Exploration with Post Hoc Explanations and Comparative Design [6.8863648800930655]
Text corpus exploration (TCE) spans the range of exploratory search tasks.
Current systems lack the flexibility to support the range of tasks encountered in practice.
We provide methods that enhance TCE tools with post hoc explanations and multiscale, comparative designs.
arXiv Detail & Related papers (2024-06-14T03:13:58Z) - ESPnet-SPK: full pipeline speaker embedding toolkit with reproducible recipes, self-supervised front-ends, and off-the-shelf models [51.35570730554632]
ESPnet-SPK is a toolkit for training speaker embedding extractors.
We provide several models, ranging from x-vector to recent SKA-TDNN.
We also aspire to bridge developed models with other domains.
arXiv Detail & Related papers (2024-01-30T18:18:27Z) - torchdistill Meets Hugging Face Libraries for Reproducible, Coding-Free
Deep Learning Studies: A Case Study on NLP [3.0875505950565856]
We present a significantly upgraded version of torchdistill, a modular-driven coding-free deep learning framework.
We reproduce the GLUE benchmark results of BERT models using a script based on the upgraded torchdistill.
All the 27 fine-tuned BERT models and configurations to reproduce the results are published at Hugging Face.
arXiv Detail & Related papers (2023-10-26T17:57:15Z) - Automatic Image Content Extraction: Operationalizing Machine Learning in
Humanistic Photographic Studies of Large Visual Archives [81.88384269259706]
We introduce Automatic Image Content Extraction framework for machine learning-based search and analysis of large image archives.
The proposed framework can be applied in several domains in humanities and social sciences.
arXiv Detail & Related papers (2022-04-05T12:19:24Z) - X-modaler: A Versatile and High-performance Codebase for Cross-modal
Analytics [99.03895740754402]
X-modaler encapsulates the state-of-the-art cross-modal analytics into several general-purpose stages.
X-modaler is an Apache-licensed, and its source codes, sample projects and pre-trained models are available on-line.
arXiv Detail & Related papers (2021-08-18T16:05:30Z) - LayoutParser: A Unified Toolkit for Deep Learning Based Document Image
Analysis [3.4253416336476246]
This paper introduces layoutparser, an open-source library for streamlining the usage of deep learning (DL) models in document image analysis (DIA) research and applications.
layoutparser comes with a set of simple and intuitive interfaces for applying and customizing DL models for layout detection, character recognition, and many other document processing tasks.
We demonstrate that layoutparser is helpful for both lightweight and large-scale pipelines in real-word use cases.
arXiv Detail & Related papers (2021-03-29T05:55:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.