ZR-2021VG: Zero-Resource Speech Challenge, Visually-Grounded Language
Modelling track, 2021 edition
- URL: http://arxiv.org/abs/2107.06546v1
- Date: Wed, 14 Jul 2021 08:29:07 GMT
- Title: ZR-2021VG: Zero-Resource Speech Challenge, Visually-Grounded Language
Modelling track, 2021 edition
- Authors: Afra Alishahia, Grzegorz Chrupa{\l}a, Alejandrina Cristia, Emmanuel
Dupoux, Bertrand Higy, Marvin Lavechin, Okko R\"as\"anen and Chen Yu
- Abstract summary: This track was introduced in the Zero-Resource Speech challenge, 2021 edition, 2nd round.
We motivate the new track and discuss participation rules in detail.
We also present the two baseline systems that were developed for this track.
- Score: 96.87241233266448
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present the visually-grounded language modelling track that was introduced
in the Zero-Resource Speech challenge, 2021 edition, 2nd round. We motivate the
new track and discuss participation rules in detail. We also present the two
baseline systems that were developed for this track.
Related papers
- Overview of AI-Debater 2023: The Challenges of Argument Generation Tasks [62.443665295250035]
We present the results of the AI-Debater 2023 Challenge held by the Chinese Conference on Affect Computing (CCAC 2023)
In total, 32 competing teams register for the challenge, from which we received 11 successful submissions.
arXiv Detail & Related papers (2024-07-20T10:13:54Z) - Dialect Adaptation and Data Augmentation for Low-Resource ASR: TalTech
Systems for the MADASR 2023 Challenge [2.018088271426157]
This paper describes Tallinn University of Technology (TalTech) systems developed for the ASRU MADASR 2023 Challenge.
The challenge focuses on automatic speech recognition of dialect-rich Indian languages with limited training audio and text data.
TalTech participated in two tracks of the challenge: Track 1 that allowed using only the provided training data and Track 3 which allowed using additional audio data.
arXiv Detail & Related papers (2023-10-26T14:57:08Z) - Findings of the 2023 ML-SUPERB Challenge: Pre-Training and Evaluation
over More Languages and Beyond [89.54151859266202]
The 2023 Multilingual Speech Universal Performance Benchmark (ML-SUPERB) Challenge expands upon the acclaimed SUPERB framework.
The challenge garnered 12 model submissions and 54 language corpora, resulting in a comprehensive benchmark encompassing 154 languages.
The findings indicate that merely scaling models is not the definitive solution for multilingual speech tasks.
arXiv Detail & Related papers (2023-10-09T08:30:01Z) - NICE: CVPR 2023 Challenge on Zero-shot Image Captioning [149.28330263581012]
NICE project is designed to challenge the computer vision community to develop robust image captioning models.
Report includes information on the newly proposed NICE dataset, evaluation methods, challenge results, and technical details of top-ranking entries.
arXiv Detail & Related papers (2023-09-05T05:32:19Z) - GroundNLQ @ Ego4D Natural Language Queries Challenge 2023 [73.12670280220992]
To accurately ground in a video, an effective egocentric feature extractor and a powerful grounding model are required.
We leverage a two-stage pre-training strategy to train egocentric feature extractors and the grounding model on video narrations.
In addition, we introduce a novel grounding model GroundNLQ, which employs a multi-modal multi-scale grounding module.
arXiv Detail & Related papers (2023-06-27T07:27:52Z) - A Study on the Integration of Pipeline and E2E SLU systems for Spoken
Semantic Parsing toward STOP Quality Challenge [33.89616011003973]
We describe our proposed spoken semantic parsing system for the quality track (Track 1) in Spoken Language Understanding Grand Challenge.
Strong automatic speech recognition (ASR) models like Whisper and pretrained Language models (LM) like BART are utilized inside our SLU framework to boost performance.
We also investigate the output level combination of various models to get an exact match accuracy of 80.8, which won the 1st place at the challenge.
arXiv Detail & Related papers (2023-05-02T17:25:19Z) - Self-Supervised Representation Learning for Speech Using Visual
Grounding and Masked Language Modeling [13.956691231452336]
FaST-VGS is a Transformer-based model that learns to associate raw speech waveforms with semantically related images.
FaST-VGS+ is learned in a multi-task fashion with a masked language modeling objective.
We show that our models perform competitively on the ABX task, outperform all other concurrent submissions on the Syntactic and Semantic tasks, and nearly match the best system on the Lexical task.
arXiv Detail & Related papers (2022-02-07T22:09:54Z) - Two-Stream Consensus Network: Submission to HACS Challenge 2021
Weakly-Supervised Learning Track [78.64815984927425]
The goal of weakly-supervised temporal action localization is to temporally locate and classify action of interest in untrimmed videos.
We adopt the two-stream consensus network (TSCN) as the main framework in this challenge.
Our solution ranked 2rd in this challenge, and we hope our method can serve as a baseline for future academic research.
arXiv Detail & Related papers (2021-06-21T03:36:36Z) - The Zero Resource Speech Challenge 2020: Discovering discrete subword
and word units [40.41406551797358]
Zero Resource Speech Challenge 2020 aims at learning speech representations from raw audio signals without any labels.
We present the results of the twenty submitted models and discuss the implications of the main findings for unsupervised speech learning.
arXiv Detail & Related papers (2020-10-12T18:56:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.