Convolutional Neural Networks Trained to Identify Words Provide a Good
Account of Visual Form Priming Effects
- URL: http://arxiv.org/abs/2302.03992v1
- Date: Wed, 8 Feb 2023 11:01:19 GMT
- Title: Convolutional Neural Networks Trained to Identify Words Provide a Good
Account of Visual Form Priming Effects
- Authors: Dong Yin and Valerio Biscione and Jeffrey Bowers
- Abstract summary: We find that deep convolutional networks perform as well or better than the coding schemes and word recognition models.
Findings add to the recent work of suggesting that convolutional networks may capture key aspects of visual word identification.
- Score: 14.202583960390394
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A wide variety of orthographic coding schemes and models of visual word
identification have been developed to account for masked priming data that
provide a measure of orthographic similarity between letter strings. These
models tend to include hand-coded orthographic representations with single unit
coding for specific forms of knowledge (e.g., units coding for a letter in a
given position or a letter sequence). Here we assess how well a range of these
coding schemes and models account for the pattern of form priming effects taken
from the Form Priming Project and compare these findings to results observed in
with 11 standard deep neural network models (DNNs) developed in computer
science. We find that deep convolutional networks perform as well or better
than the coding schemes and word recognition models, whereas transformer
networks did less well. The success of convolutional networks is remarkable as
their architectures were not developed to support word recognition (they were
designed to perform well on object recognition) and they classify pixel images
of words (rather artificial encodings of letter strings). The findings add to
the recent work of (Hannagan et al., 2021) suggesting that convolutional
networks may capture key aspects of visual word identification.
Related papers
- Towards Retrieval-Augmented Architectures for Image Captioning [81.11529834508424]
This work presents a novel approach towards developing image captioning models that utilize an external kNN memory to improve the generation process.
Specifically, we propose two model variants that incorporate a knowledge retriever component that is based on visual similarities.
We experimentally validate our approach on COCO and nocaps datasets and demonstrate that incorporating an explicit external memory can significantly enhance the quality of captions.
arXiv Detail & Related papers (2024-05-21T18:02:07Z) - Cracking the neural code for word recognition in convolutional neural networks [1.0991358618541507]
We show how a small subset of units becomes specialized for word recognition in the learned script.
We show that these units are sensitive to specific letter identities and their distance from the blank space at the left or right of a word.
The proposed neural code provides a mechanistic insight into how information on letter identity and position is extracted and allow for invariant word recognition.
arXiv Detail & Related papers (2024-03-10T10:12:32Z) - Seeing in Words: Learning to Classify through Language Bottlenecks [59.97827889540685]
Humans can explain their predictions using succinct and intuitive descriptions.
We show that a vision model whose feature representations are text can effectively classify ImageNet images.
arXiv Detail & Related papers (2023-06-29T00:24:42Z) - Transforming Visual Scene Graphs to Image Captions [69.13204024990672]
We propose to transform Scene Graphs (TSG) into more descriptive captions.
In TSG, we apply multi-head attention (MHA) to design the Graph Neural Network (GNN) for embedding scene graphs.
In TSG, each expert is built on MHA, for discriminating the graph embeddings to generate different kinds of words.
arXiv Detail & Related papers (2023-05-03T15:18:37Z) - A Transformer Architecture for Online Gesture Recognition of
Mathematical Expressions [0.0]
Transformer architecture is shown to provide an end-to-end model for building expression trees from online handwritten gestures corresponding to glyph strokes.
The attention mechanism was successfully used to encode, learn and enforce the underlying syntax of expressions.
For the first time, the encoder is fed with unseen online-temporal data tokens potentially forming an infinitely large vocabulary.
arXiv Detail & Related papers (2022-11-04T17:55:55Z) - Retrieval-Augmented Transformer for Image Captioning [51.79146669195357]
We develop an image captioning approach with a kNN memory, with which knowledge can be retrieved from an external corpus to aid the generation process.
Our architecture combines a knowledge retriever based on visual similarities, a differentiable encoder, and a kNN-augmented attention layer to predict tokens.
Experimental results, conducted on the COCO dataset, demonstrate that employing an explicit external memory can aid the generation process and increase caption quality.
arXiv Detail & Related papers (2022-07-26T19:35:49Z) - MaskOCR: Text Recognition with Masked Encoder-Decoder Pretraining [68.05105411320842]
We propose a novel approach MaskOCR to unify vision and language pre-training in the classical encoder-decoder recognition framework.
We adopt the masked image modeling approach to pre-train the feature encoder using a large set of unlabeled real text images.
We transform text data into synthesized text images to unify the data modalities of vision and language, and enhance the language modeling capability of the sequence decoder.
arXiv Detail & Related papers (2022-06-01T08:27:19Z) - Deep learning based dictionary learning and tomographic image
reconstruction [0.0]
This work presents an approach for image reconstruction in clinical low-dose tomography that combines principles from sparse signal processing with ideas from deep learning.
First, we describe sparse signal representation in terms of dictionaries from a statistical perspective and interpret dictionary learning as a process of aligning distribution that arises from a generative model with empirical distribution of true signals.
As a result we can see that sparse coding with learned dictionaries resembles a specific variational autoencoder, where the decoder is a linear function and the encoder is a sparse coding algorithm.
arXiv Detail & Related papers (2021-08-26T12:10:17Z) - Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation [63.46694853953092]
Swin-Unet is an Unet-like pure Transformer for medical image segmentation.
tokenized image patches are fed into the Transformer-based U-shaped decoder-Decoder architecture.
arXiv Detail & Related papers (2021-05-12T09:30:26Z) - Characterization and recognition of handwritten digits using Julia [0.0]
We implement a hybrid neural network model that is capable of recognizing the digit of MNISTdataset.
The proposed neural model network can extract features from the image and recognize the features in the layer by layer.
It also can recognize the auto-encoding system and the variational auto-encoding system of the MNIST dataset.
arXiv Detail & Related papers (2021-02-24T00:30:41Z) - Comparative evaluation of CNN architectures for Image Caption Generation [1.2183405753834562]
We have evaluated 17 different Convolutional Neural Networks on two popular Image Caption Generation frameworks.
We observe that model complexity of Convolutional Neural Network, as measured by number of parameters, and the accuracy of the model on Object Recognition task does not necessarily co-relate with its efficacy on feature extraction for Image Caption Generation task.
arXiv Detail & Related papers (2021-02-23T05:43:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.