Related papers: Vision-Based Layout Detection from Scientific Literature using Recurrent Convolutional Neural Networks

Vision-Based Layout Detection from Scientific Literature using Recurrent Convolutional Neural Networks

URL: http://arxiv.org/abs/2010.11727v1
Date: Sun, 18 Oct 2020 23:50:28 GMT
Title: Vision-Based Layout Detection from Scientific Literature using Recurrent Convolutional Neural Networks
Authors: Huichen Yang, William H. Hsu
Abstract summary: We present an approach for adapting convolutional neural networks for object recognition and classification to scientific literature layout detection (SLLD) SLLD is a shared subtask of several information extraction problems. Our results show good improvement with fine-tuning of a pre-trained base network.
Score: 12.221478896815292
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present an approach for adapting convolutional neural networks for object recognition and classification to scientific literature layout detection (SLLD), a shared subtask of several information extraction problems. Scientific publications contain multiple types of information sought by researchers in various disciplines, organized into an abstract, bibliography, and sections documenting related work, experimental methods, and results; however, there is no effective way to extract this information due to their diverse layout. In this paper, we present a novel approach to developing an end-to-end learning framework to segment and classify major regions of a scientific document. We consider scientific document layout analysis as an object detection task over digital images, without any additional text features that need to be added into the network during the training process. Our technical objective is to implement transfer learning via fine-tuning of pre-trained networks and thereby demonstrate that this deep learning architecture is suitable for tasks that lack very large document corpora for training ab initio. As part of the experimental test bed for empirical evaluation of this approach, we created a merged multi-corpus data set for scientific publication layout detection tasks. Our results show good improvement with fine-tuning of a pre-trained base network using this merged data set, compared to the baseline convolutional neural network architecture.

Related papers

Evolving CNN Architectures: From Custom Designs to Deep Residual Models for Diverse Image Classification and Detection Tasks [0.9023847175654603]
This paper presents a comparative study of a custom convolutional neural network (CNN) architecture against widely used pretrained and transfer learning CNN models.<n>The datasets span binary classification, fine-grained multiclass recognition, and object detection scenarios.<n>We analyze how architectural factors, such as network depth, residual connections, and feature extraction strategies, influence classification and localization performance.
arXiv Detail & Related papers (2026-01-03T07:45:08Z)
A Sentiment Analysis of Medical Text Based on Deep Learning [1.8130068086063336]
This paper focuses on the medical domain, using bidirectional encoder representations from transformers (BERT) as the basic pre-trained model. Experiments and analyses were conducted on the METS-CoV dataset to explore the training performance after integrating different deep learning networks. CNN models outperform other networks when trained on smaller medical text datasets in combination with pre-trained models like BERT.
arXiv Detail & Related papers (2024-04-16T12:20:49Z)
Data Augmentations in Deep Weight Spaces [89.45272760013928]
We introduce a novel augmentation scheme based on the Mixup method. We evaluate the performance of these techniques on existing benchmarks as well as new benchmarks we generate.
arXiv Detail & Related papers (2023-11-15T10:43:13Z)
Pre-training Multi-task Contrastive Learning Models for Scientific Literature Understanding [52.723297744257536]
Pre-trained language models (LMs) have shown effectiveness in scientific literature understanding tasks. We propose a multi-task contrastive learning framework, SciMult, to facilitate common knowledge sharing across different literature understanding tasks.
arXiv Detail & Related papers (2023-05-23T16:47:22Z)
Human-in-the-Loop Disinformation Detection: Stance, Sentiment, or Something Else? [93.91375268580806]
Both politics and pandemics have recently provided ample motivation for the development of machine learning-enabled disinformation (a.k.a. fake news) detection algorithms. Existing literature has focused primarily on the fully-automated case, but the resulting techniques cannot reliably detect disinformation on the varied topics, sources, and time scales required for military applications. By leveraging an already-available analyst as a human-in-the-loop, canonical machine learning techniques of sentiment analysis, aspect-based sentiment analysis, and stance detection become plausible methods to use for a partially-automated disinformation detection system.
arXiv Detail & Related papers (2021-11-09T13:30:34Z)
Exploiting the relationship between visual and textual features in social networks for image classification with zero-shot deep learning [0.0]
In this work, we propose a classifier ensemble based on the transferable learning capabilities of the CLIP neural network architecture. Our experiments, based on image classification tasks according to the labels of the Places dataset, are performed by first considering only the visual part. Considering the associated texts to the images can help to improve the accuracy depending on the goal.
arXiv Detail & Related papers (2021-07-08T10:54:59Z)
Anomaly Detection on Attributed Networks via Contrastive Self-Supervised Learning [50.24174211654775]
We present a novel contrastive self-supervised learning framework for anomaly detection on attributed networks. Our framework fully exploits the local information from network data by sampling a novel type of contrastive instance pair. A graph neural network-based contrastive learning model is proposed to learn informative embedding from high-dimensional attributes and local structure.
arXiv Detail & Related papers (2021-02-27T03:17:20Z)
Learning to Segment Human Body Parts with Synthetically Trained Deep Convolutional Networks [58.0240970093372]
This paper presents a new framework for human body part segmentation based on Deep Convolutional Neural Networks trained using only synthetic data. The proposed approach achieves cutting-edge results without the need of training the models with real annotated data of human body parts.
arXiv Detail & Related papers (2021-02-02T12:26:50Z)
NAS-Navigator: Visual Steering for Explainable One-Shot Deep Neural Network Synthesis [53.106414896248246]
We present a framework that allows analysts to effectively build the solution sub-graph space and guide the network search by injecting their domain knowledge. Applying this technique in an iterative manner allows analysts to converge to the best performing neural network architecture for a given application.
arXiv Detail & Related papers (2020-09-28T01:48:45Z)
Multi-Subspace Neural Network for Image Recognition [33.61205842747625]
In image classification task, feature extraction is always a big issue. Intra-class variability increases the difficulty in designing the extractors. Recently, deep learning has drawn lots of attention on automatically learning features from data. In this study, we proposed multi-subspace neural network (MSNN) which integrates key components of the convolutional neural network (CNN), receptive field, with subspace concept.
arXiv Detail & Related papers (2020-06-17T02:55:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.