Many Heads but One Brain: an Overview of Fusion Brain Challenge on AI
Journey 2021
- URL: http://arxiv.org/abs/2111.10974v1
- Date: Mon, 22 Nov 2021 03:46:52 GMT
- Title: Many Heads but One Brain: an Overview of Fusion Brain Challenge on AI
Journey 2021
- Authors: Daria Bakshandaeva, Denis Dimitrov, Alex Shonenkov, Mark Potanin,
Vladimir Arkhipkin, Denis Karachev, Vera Davydova, Anton Voronov, Mikhail
Martynov, Natalia Semenova, Mikhail Stepnov, Elena Tutubalina, Andrey
Chertok, Aleksandr Petiushko
- Abstract summary: The Fusion Brain Challenge aims to make the universal architecture process different modalities.
We have created datasets for each task to test the participants' submissions on it.
The Russian part of the dataset is the largest Russian handwritten dataset in the world.
- Score: 46.56884693120608
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Supporting the current trend in the AI community, we propose the AI Journey
2021 Challenge called Fusion Brain which is targeted to make the universal
architecture process different modalities (namely, images, texts, and code) and
to solve multiple tasks for vision and language. The Fusion Brain Challenge
https://github.com/sberbank-ai/fusion_brain_aij2021 combines the following
specific tasks: Code2code Translation, Handwritten Text recognition, Zero-shot
Object Detection, and Visual Question Answering. We have created datasets for
each task to test the participants' submissions on it. Moreover, we have opened
a new handwritten dataset in both Russian and English, which consists of 94,130
pairs of images and texts. The Russian part of the dataset is the largest
Russian handwritten dataset in the world. We also propose the baseline solution
and corresponding task-specific solutions as well as overall metrics.
Related papers
- ViOCRVQA: Novel Benchmark Dataset and Vision Reader for Visual Question Answering by Understanding Vietnamese Text in Images [1.2529442734851663]
We introduce a novel dataset, ViOCRVQA (Vietnamese Optical Character Recognition - Visual Question Answering dataset), consisting of 28,000+ images and 120,000+ question-answer pairs.
In this dataset, all the images contain text and questions about the information relevant to the text in the images.
We deploy ideas from state-of-the-art methods proposed for English to conduct experiments on our dataset, revealing the challenges and difficulties inherent in a Vietnamese dataset.
arXiv Detail & Related papers (2024-04-29T03:17:47Z) - The All-Seeing Project: Towards Panoptic Visual Recognition and
Understanding of the Open World [71.52132776748628]
We present the All-Seeing (AS) project: a large-scale data and model for recognizing and understanding everything in the open world.
We create a new dataset (AS-1B) with over 1 billion regions annotated with semantic tags, question-answering pairs, and detailed captions.
We develop the All-Seeing model (ASM), a unified framework for panoptic visual recognition and understanding.
arXiv Detail & Related papers (2023-08-03T17:59:47Z) - The Algonauts Project 2023 Challenge: UARK-UAlbany Team Solution [21.714597774964194]
This work presents our solutions to the Algonauts Project 2023 Challenge.
The primary objective of the challenge revolves around employing computational models to anticipate brain responses.
We constructed an image-based brain encoder through a two-step training process to tackle this challenge.
arXiv Detail & Related papers (2023-08-01T03:46:59Z) - RH20T: A Comprehensive Robotic Dataset for Learning Diverse Skills in
One-Shot [56.130215236125224]
A key challenge in robotic manipulation in open domains is how to acquire diverse and generalizable skills for robots.
Recent research in one-shot imitation learning has shown promise in transferring trained policies to new tasks based on demonstrations.
This paper aims to unlock the potential for an agent to generalize to hundreds of real-world skills with multi-modal perception.
arXiv Detail & Related papers (2023-07-02T15:33:31Z) - Are Deep Neural Networks SMARTer than Second Graders? [85.60342335636341]
We evaluate the abstraction, deduction, and generalization abilities of neural networks in solving visuo-linguistic puzzles designed for children in the 6--8 age group.
Our dataset consists of 101 unique puzzles; each puzzle comprises a picture question, and their solution needs a mix of several elementary skills, including arithmetic, algebra, and spatial reasoning.
Experiments reveal that while powerful deep models offer reasonable performances on puzzles in a supervised setting, they are not better than random accuracy when analyzed for generalization.
arXiv Detail & Related papers (2022-12-20T04:33:32Z) - ConvFinQA: Exploring the Chain of Numerical Reasoning in Conversational
Finance Question Answering [70.6359636116848]
We propose a new large-scale dataset, ConvFinQA, to study the chain of numerical reasoning in conversational question answering.
Our dataset poses great challenge in modeling long-range, complex numerical reasoning paths in real-world conversations.
arXiv Detail & Related papers (2022-10-07T23:48:50Z) - Multilingual Event Linking to Wikidata [5.726712522440283]
We propose two variants of the event linking task: 1) multilingual, where event descriptions are from the same language as the mention, and 2) crosslingual, where all event descriptions are in English.
We automatically compile a large-scale dataset for this task, comprising of 1.8M mentions across 44 languages referring to over 10.9K events from Wikidata.
arXiv Detail & Related papers (2022-04-13T17:28:23Z) - Handshakes AI Research at CASE 2021 Task 1: Exploring different
approaches for multilingual tasks [0.22940141855172036]
The aim of the CASE 2021 Shared Task 1 was to detect and classify socio-political and crisis event information in a multilingual setting.
Our submission contained entries in all of the subtasks, and the scores obtained validated our research finding.
arXiv Detail & Related papers (2021-10-29T07:58:49Z) - Visuo-Linguistic Question Answering (VLQA) Challenge [47.54738740910987]
We propose a novel task to derive joint inference about a given image-text modality.
We compile the Visuo-Linguistic Question Answering (VLQA) challenge corpus in a question answering setting.
arXiv Detail & Related papers (2020-05-01T12:18:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.