Evaluation of Neural Network Classification Systems on Document Stream
- URL: http://arxiv.org/abs/2007.07547v1
- Date: Wed, 15 Jul 2020 08:52:39 GMT
- Title: Evaluation of Neural Network Classification Systems on Document Stream
- Authors: Joris Voerman, Aurelie Joseph, Mickael Coustaty, Vincent Poulain d
Andecy and Jean-Marc Ogier
- Abstract summary: We analyse the efficiency of NN-based document classification systems in a sub-optimal training case.
The evaluation was divided into four parts: a reference case, to assess the performance of the system in the lab; two cases that each simulate a specific difficulty linked to document stream processing; and a realistic case that combined all of these difficulties.
- Score: 0.5068448669777386
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: One major drawback of state of the art Neural Networks (NN)-based approaches
for document classification purposes is the large number of training samples
required to obtain an efficient classification. The minimum required number is
around one thousand annotated documents for each class. In many cases it is
very difficult, if not impossible, to gather this number of samples in real
industrial processes. In this paper, we analyse the efficiency of NN-based
document classification systems in a sub-optimal training case, based on the
situation of a company document stream. We evaluated three different
approaches, one based on image content and two on textual content. The
evaluation was divided into four parts: a reference case, to assess the
performance of the system in the lab; two cases that each simulate a specific
difficulty linked to document stream processing; and a realistic case that
combined all of these difficulties. The realistic case highlighted the fact
that there is a significant drop in the efficiency of NN-Based document
classification systems. Although they remain efficient for well represented
classes (with an over-fitting of the system for those classes), it is
impossible for them to handle appropriately less well represented classes.
NN-Based document classification systems need to be adapted to resolve these
two problems before they can be considered for use in a company document
stream.
Related papers
- Beyond Document Page Classification: Design, Datasets, and Challenges [32.94494070330065]
This paper highlights the need to bring document classification benchmarking closer to real-world applications.
We identify the lack of public multi-page document classification datasets, formalize different classification tasks arising in application scenarios, and motivate the value of targeting efficient multi-page document representations.
arXiv Detail & Related papers (2023-08-24T16:16:47Z) - An Upper Bound for the Distribution Overlap Index and Its Applications [18.481370450591317]
This paper proposes an easy-to-compute upper bound for the overlap index between two probability distributions.
The proposed bound shows its value in one-class classification and domain shift analysis.
Our work shows significant promise toward broadening the applications of overlap-based metrics.
arXiv Detail & Related papers (2022-12-16T20:02:03Z) - Domain Agnostic Few-Shot Learning For Document Intelligence [4.243926243206826]
Few-shot learning aims to generalize to novel classes with only a few samples with class labels.
In this work, we address the problem of few-shot document image classification under domain shift.
arXiv Detail & Related papers (2021-10-29T03:19:31Z) - No Fear of Heterogeneity: Classifier Calibration for Federated Learning
with Non-IID Data [78.69828864672978]
A central challenge in training classification models in the real-world federated system is learning with non-IID data.
We propose a novel and simple algorithm called Virtual Representations (CCVR), which adjusts the classifier using virtual representations sampled from an approximated ssian mixture model.
Experimental results demonstrate that CCVR state-of-the-art performance on popular federated learning benchmarks including CIFAR-10, CIFAR-100, and CINIC-10.
arXiv Detail & Related papers (2021-06-09T12:02:29Z) - Automating Document Classification with Distant Supervision to Increase
the Efficiency of Systematic Reviews [18.33687903724145]
Well-done systematic reviews are expensive, time-demanding, and labor-intensive.
We propose an automatic document classification approach to significantly reduce the effort in reviewing documents.
arXiv Detail & Related papers (2020-12-09T22:45:40Z) - Legal Document Classification: An Application to Law Area Prediction of
Petitions to Public Prosecution Service [6.696983725360808]
This paper proposes the use of NLP techniques for textual classification.
Our main goal is to automate the process of assigning petitions to their respective areas of law.
The best results were obtained with a combination of Word2Vec trained on a domain-specific corpus and a Recurrent Neural Network architecture.
arXiv Detail & Related papers (2020-10-13T18:05:37Z) - Region Comparison Network for Interpretable Few-shot Image
Classification [97.97902360117368]
Few-shot image classification has been proposed to effectively use only a limited number of labeled examples to train models for new classes.
We propose a metric learning based method named Region Comparison Network (RCN), which is able to reveal how few-shot learning works.
We also present a new way to generalize the interpretability from the level of tasks to categories.
arXiv Detail & Related papers (2020-09-08T07:29:05Z) - ReMarNet: Conjoint Relation and Margin Learning for Small-Sample Image
Classification [49.87503122462432]
We introduce a novel neural network termed Relation-and-Margin learning Network (ReMarNet)
Our method assembles two networks of different backbones so as to learn the features that can perform excellently in both of the aforementioned two classification mechanisms.
Experiments on four image datasets demonstrate that our approach is effective in learning discriminative features from a small set of labeled samples.
arXiv Detail & Related papers (2020-06-27T13:50:20Z) - Graph Prototypical Networks for Few-shot Learning on Attributed Networks [72.31180045017835]
We propose a graph meta-learning framework -- Graph Prototypical Networks (GPN)
GPN is able to perform textitmeta-learning on an attributed network and derive a highly generalizable model for handling the target classification task.
arXiv Detail & Related papers (2020-06-23T04:13:23Z) - Fine-Grained Visual Classification with Efficient End-to-end
Localization [49.9887676289364]
We present an efficient localization module that can be fused with a classification network in an end-to-end setup.
We evaluate the new model on the three benchmark datasets CUB200-2011, Stanford Cars and FGVC-Aircraft.
arXiv Detail & Related papers (2020-05-11T14:07:06Z) - One-Shot Object Detection without Fine-Tuning [62.39210447209698]
We introduce a two-stage model consisting of a first stage Matching-FCOS network and a second stage Structure-Aware Relation Module.
We also propose novel training strategies that effectively improve detection performance.
Our method exceeds the state-of-the-art one-shot performance consistently on multiple datasets.
arXiv Detail & Related papers (2020-05-08T01:59:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.