Understanding attention-based encoder-decoder networks: a case study with chess scoresheet recognition
- URL: http://arxiv.org/abs/2406.06538v1
- Date: Tue, 23 Apr 2024 16:23:18 GMT
- Title: Understanding attention-based encoder-decoder networks: a case study with chess scoresheet recognition
- Authors: Sergio Y. Hayashi, Nina S. T. Hirata,
- Abstract summary: We study encoder-decoder recurrent neural networks with attention mechanisms for the task of reading handwritten chess scoresheets.
We characterize the task in terms of three subtasks, namely input-output alignment, sequential pattern recognition, and handwriting recognition.
We argue that such knowledge might help one to better balance factors to properly train a network.
- Score: 0.36832029288386137
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep neural networks are largely used for complex prediction tasks. There is plenty of empirical evidence of their successful end-to-end training for a diversity of tasks. Success is often measured based solely on the final performance of the trained network, and explanations on when, why and how they work are less emphasized. In this paper we study encoder-decoder recurrent neural networks with attention mechanisms for the task of reading handwritten chess scoresheets. Rather than prediction performance, our concern is to better understand how learning occurs in these type of networks. We characterize the task in terms of three subtasks, namely input-output alignment, sequential pattern recognition, and handwriting recognition, and experimentally investigate which factors affect their learning. We identify competition, collaboration and dependence relations between the subtasks, and argue that such knowledge might help one to better balance factors to properly train a network.
Related papers
- Coding schemes in neural networks learning classification tasks [52.22978725954347]
We investigate fully-connected, wide neural networks learning classification tasks.
We show that the networks acquire strong, data-dependent features.
Surprisingly, the nature of the internal representations depends crucially on the neuronal nonlinearity.
arXiv Detail & Related papers (2024-06-24T14:50:05Z) - Evaluating the structure of cognitive tasks with transfer learning [67.22168759751541]
This study investigates the transferability of deep learning representations between different EEG decoding tasks.
We conduct extensive experiments using state-of-the-art decoding models on two recently released EEG datasets.
arXiv Detail & Related papers (2023-07-28T14:51:09Z) - Provable Guarantees for Nonlinear Feature Learning in Three-Layer Neural
Networks [49.808194368781095]
We show that three-layer neural networks have provably richer feature learning capabilities than two-layer networks.
This work makes progress towards understanding the provable benefit of three-layer neural networks over two-layer networks in the feature learning regime.
arXiv Detail & Related papers (2023-05-11T17:19:30Z) - Rank Diminishing in Deep Neural Networks [71.03777954670323]
Rank of neural networks measures information flowing across layers.
It is an instance of a key structural condition that applies across broad domains of machine learning.
For neural networks, however, the intrinsic mechanism that yields low-rank structures remains vague and unclear.
arXiv Detail & Related papers (2022-06-13T12:03:32Z) - Quasi-orthogonality and intrinsic dimensions as measures of learning and
generalisation [55.80128181112308]
We show that dimensionality and quasi-orthogonality of neural networks' feature space may jointly serve as network's performance discriminants.
Our findings suggest important relationships between the networks' final performance and properties of their randomly initialised feature spaces.
arXiv Detail & Related papers (2022-03-30T21:47:32Z) - Predictive Coding: Towards a Future of Deep Learning beyond
Backpropagation? [41.58529335439799]
The backpropagation of error algorithm used to train deep neural networks has been fundamental to the successes of deep learning.
Recent work has developed the idea into a general-purpose algorithm able to train neural networks using only local computations.
We show the substantially greater flexibility of predictive coding networks against equivalent deep neural networks.
arXiv Detail & Related papers (2022-02-18T22:57:03Z) - Pointer Value Retrieval: A new benchmark for understanding the limits of
neural network generalization [40.21297628440919]
We introduce a novel benchmark, Pointer Value Retrieval (PVR) tasks, that explore the limits of neural network generalization.
PVR tasks can consist of visual as well as symbolic inputs, each with varying levels of difficulty.
We demonstrate that this task structure provides a rich testbed for understanding generalization.
arXiv Detail & Related papers (2021-07-27T03:50:31Z) - Statistical Mechanical Analysis of Catastrophic Forgetting in Continual
Learning with Teacher and Student Networks [5.209145866174911]
When a computational system continuously learns from an ever-changing environment, it rapidly forgets its past experiences.
We provide the theoretical framework for analyzing catastrophic forgetting by using teacher-student learning.
We find that the network can avoid catastrophic forgetting when the similarity among input distributions is small and that of the input-output relationship of the target functions is large.
arXiv Detail & Related papers (2021-05-16T09:02:48Z) - A neural anisotropic view of underspecification in deep learning [60.119023683371736]
We show that the way neural networks handle the underspecification of problems is highly dependent on the data representation.
Our results highlight that understanding the architectural inductive bias in deep learning is fundamental to address the fairness, robustness, and generalization of these systems.
arXiv Detail & Related papers (2021-04-29T14:31:09Z) - Untangling in Invariant Speech Recognition [17.996356271398295]
We study how information is untangled within neural networks trained to recognize speech.
We observe speaker-specific nuisance variations are discarded by the network's hierarchy, whereas task-relevant properties are untangled in later layers.
We find that the deep representations carry out significant temporal untangling by efficiently extracting task-relevant features at each time step of the computation.
arXiv Detail & Related papers (2020-03-03T20:48:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.