ZR-2021VG: Zero-Resource Speech Challenge, Visually-Grounded Language
Modelling track, 2021 edition
- URL: http://arxiv.org/abs/2107.06546v1
- Date: Wed, 14 Jul 2021 08:29:07 GMT
- Title: ZR-2021VG: Zero-Resource Speech Challenge, Visually-Grounded Language
Modelling track, 2021 edition
- Authors: Afra Alishahia, Grzegorz Chrupa{\l}a, Alejandrina Cristia, Emmanuel
Dupoux, Bertrand Higy, Marvin Lavechin, Okko R\"as\"anen and Chen Yu
- Abstract summary: This track was introduced in the Zero-Resource Speech challenge, 2021 edition, 2nd round.
We motivate the new track and discuss participation rules in detail.
We also present the two baseline systems that were developed for this track.
- Score: 96.87241233266448
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present the visually-grounded language modelling track that was introduced
in the Zero-Resource Speech challenge, 2021 edition, 2nd round. We motivate the
new track and discuss participation rules in detail. We also present the two
baseline systems that were developed for this track.
Related papers
- The NPU-HWC System for the ISCSLP 2024 Inspirational and Convincing Audio Generation Challenge [12.862628838633396]
This paper presents the NPU-HWC system submitted to the ISCSLP 2024 Inspirational and Convincing Audio Generation Challenge 2024 (ICAGC)
Our system consists of two modules: a speech generator for Track 1 and a background audio generator for Track 2.
arXiv Detail & Related papers (2024-10-31T10:58:59Z) - TCG CREST System Description for the Second DISPLACE Challenge [19.387615374726444]
We describe the speaker diarization (SD) and language diarization (LD) systems developed by our team for the Second DISPLACE Challenge, 2024.
Our contributions were dedicated to Track 1 for SD and Track 2 for LD in multilingual and multi-speaker scenarios.
arXiv Detail & Related papers (2024-09-16T05:13:34Z) - Overview of AI-Debater 2023: The Challenges of Argument Generation Tasks [62.443665295250035]
We present the results of the AI-Debater 2023 Challenge held by the Chinese Conference on Affect Computing (CCAC 2023)
In total, 32 competing teams register for the challenge, from which we received 11 successful submissions.
arXiv Detail & Related papers (2024-07-20T10:13:54Z) - Findings of the 2023 ML-SUPERB Challenge: Pre-Training and Evaluation
over More Languages and Beyond [89.54151859266202]
The 2023 Multilingual Speech Universal Performance Benchmark (ML-SUPERB) Challenge expands upon the acclaimed SUPERB framework.
The challenge garnered 12 model submissions and 54 language corpora, resulting in a comprehensive benchmark encompassing 154 languages.
The findings indicate that merely scaling models is not the definitive solution for multilingual speech tasks.
arXiv Detail & Related papers (2023-10-09T08:30:01Z) - NICE: CVPR 2023 Challenge on Zero-shot Image Captioning [149.28330263581012]
NICE project is designed to challenge the computer vision community to develop robust image captioning models.
Report includes information on the newly proposed NICE dataset, evaluation methods, challenge results, and technical details of top-ranking entries.
arXiv Detail & Related papers (2023-09-05T05:32:19Z) - GroundNLQ @ Ego4D Natural Language Queries Challenge 2023 [73.12670280220992]
To accurately ground in a video, an effective egocentric feature extractor and a powerful grounding model are required.
We leverage a two-stage pre-training strategy to train egocentric feature extractors and the grounding model on video narrations.
In addition, we introduce a novel grounding model GroundNLQ, which employs a multi-modal multi-scale grounding module.
arXiv Detail & Related papers (2023-06-27T07:27:52Z) - A Study on the Integration of Pipeline and E2E SLU systems for Spoken
Semantic Parsing toward STOP Quality Challenge [33.89616011003973]
We describe our proposed spoken semantic parsing system for the quality track (Track 1) in Spoken Language Understanding Grand Challenge.
Strong automatic speech recognition (ASR) models like Whisper and pretrained Language models (LM) like BART are utilized inside our SLU framework to boost performance.
We also investigate the output level combination of various models to get an exact match accuracy of 80.8, which won the 1st place at the challenge.
arXiv Detail & Related papers (2023-05-02T17:25:19Z) - Self-Supervised Representation Learning for Speech Using Visual
Grounding and Masked Language Modeling [13.956691231452336]
FaST-VGS is a Transformer-based model that learns to associate raw speech waveforms with semantically related images.
FaST-VGS+ is learned in a multi-task fashion with a masked language modeling objective.
We show that our models perform competitively on the ABX task, outperform all other concurrent submissions on the Syntactic and Semantic tasks, and nearly match the best system on the Lexical task.
arXiv Detail & Related papers (2022-02-07T22:09:54Z) - Two-Stream Consensus Network: Submission to HACS Challenge 2021
Weakly-Supervised Learning Track [78.64815984927425]
The goal of weakly-supervised temporal action localization is to temporally locate and classify action of interest in untrimmed videos.
We adopt the two-stream consensus network (TSCN) as the main framework in this challenge.
Our solution ranked 2rd in this challenge, and we hope our method can serve as a baseline for future academic research.
arXiv Detail & Related papers (2021-06-21T03:36:36Z) - The Zero Resource Speech Challenge 2020: Discovering discrete subword
and word units [40.41406551797358]
Zero Resource Speech Challenge 2020 aims at learning speech representations from raw audio signals without any labels.
We present the results of the twenty submitted models and discuss the implications of the main findings for unsupervised speech learning.
arXiv Detail & Related papers (2020-10-12T18:56:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.